All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Anyone had this issue with arp requests on CentOS?
This is an interesting issue, and I'm wondering if any other providers might have gone through a similar problem. This system is currently inaccessible, so I can't really take advice and run tests right now, but I'd like to see if sharing this generates some ideas.
So we get a new server set up, rented IP range and connected to our provider's network. The primary IP operates fine, the node is fully accessible from the internet. So I install OpenVZ, set all of the configuration correctly (knowing the configuration is correct, I still compared to a working system that is almost identical and connected to identical network equipment), create OpenVZ container, container has no internet. Also cannot ping container from the outside.
Now, I know exactly what you're thinking. I messed up a setting in sysctl.conf or vz.conf. Unfortunately, not so. IP forwarding is enabled, verified as enabled. Neighbor_devs is set to all. Tried setting neighbor_devs to detect. Tried enabling arp proxy.
The only way I can get an IP beyond the primary one to work on this system is to manually initiate an arp request OR to have the provider remove the subnet from the vlan and then add it back. Of course, only a matter of time before the arp entry expires.
According to my provider, and from everything he has shown me (been quite open about it), there is 0 difference between two nearly identical servers and their setup. Yet, on server1, I can do "ifconfig eth0:0 whateveriphere up" and immediately ping it from the outside internet. On server2, if I do the same, I have to manually initiate an arp request or it will never be pingable.
Both systems running CentOS 6.3, issue persists with or without OpenVZ kernel.
So at the risk of sounding like I can't run my business, because I know that's how some people will take it here, I'm being humble enough to ask for ideas. My real hope is that I find someone else who has had the same problem.
Comments
Yes, I've had this same problem (as we've discussed) :P
I can't recall if I fixed it on my own or if it ever went away with the provider you're using.
I'm told that ip stealing protection on SolusVM was related to your issue, not sure on that. I'm down to malfunctioning NIC, gateway not acting as it should, or bug in CentOS. But...I've never had to manually initiate an arp request on any server with any operating system prior to this. I've always added IPs as an alias and then used them (aside from OpenVZ usage obviously). I'd be thrilled to find out I'm a complete idiot so this could be over haha
Jarland,
Are you having this problem on a fresh install, or just after installing the kernel and/or SolusVM / other sugary treats...
I wouldn't perhaps say it was a faulty NIC if you're able to manually initiate the ARP request, however who knows? Are both nodes setup the exact same?
Fresh install of CentOS or Scientific 6.3, with or without OpenVZ installed by hand or with SolusVM (done for kicks, when options start running out). Every time we tried something different yesterday we did a fresh install, just to make sure it was a clean slate.
Network and mostly hardware are exactly the same. At least same model, not the exact same gateway just identical.
Looks like you're wittled through all the things I would of; good luck in fixing it; and keep us updated, for seo keepsakes
Thanks brother. Definitely let you guys know what happens so this can be documented for anyone else. I've run into pieces that look similar everywhere, a couple people that may or may not have the same issue, never quite exactly this and a solid solution that doesn't involve a returned arp error.
I'll say this, Gordon has gone above and beyond. Given the circumstances I don't blame him for thinking its my config first and foremost. Don't blame him one bit. But he's messing with it now.
@Jarland
Did you mention you were getting another node setup? I'd be interested to see if the new node has the same problem.
Have you performed a yum-upgrade?
Check the versions of the NODE 2 to th NODE 1. It's a longshot, but it's a process of elimination.
Yeah this is our second Dallas server. The first one works great and works exactly as I would expect. Now, it was configured with SolusVM when we set it up, but I tried that on the new one too. I keep the first node up to date because I haven't seen a risky package update since first install, so both are running the same packages. I had to install the previous openvz kernel on the new one to completely match the first, then started mirroring all relevant configs aside from hardware addresses and IPs. Didn't seem to have any effect.
At least I'm not the only one who thinks its strange. I'd love to hear "Hey stupid, do this, can't believe you're a provider!" But I'll settle for knowing it makes others scratch their heads
Is it an Intel-based Ethernet card? I've heard of some of their cards having issues. Sometimes upgrading to the latest version of the drivers helps. I think there was a thread about in on WHT a while back.
Interesting. It is intel. Worth looking into, thanks
Yeah. I'm really not sure what to suggest on this. I've never encountered this before, but had some flakey .32 servers, that required alot of tweaking to perfect. (Perhaps, try this with .18 kernel?)
what @qps said is also a very good point, but Intel NICs are often quite reliable, and I've never had a NIC fail on me, or any faults at all. touch wood
On a couple supermicro boards, the inbuilt NICs have driver problems. But without a doubt gordon would already know this. (Driver updates can fix this).
While I doubt this is a related issue, E3 boards have horrible nic issues with CentOS (Or RHEL based distro's), and updating to the latest bios and using kmod-e1000e from elrepo will fix all but OpenVZ based distro's. the tell tale sign is NIC reset on the console and only way to resolve is reboot, but again, doesn't seem related at all to what @jarland is describing.
@jarland use
'ip route add' to add your other ranges don't waste an ip in ifcfg eth0:0 . Make sure iptables is turned off.
I would most likely rofl if this issue is IpTables related.
service networking restart and then ping and wait
Try manually installing the e1000 drivers like others have suggested. I use the following on my CentOS 6 servers:
Then add the following to your kernel line in grub.conf
Main reason I started doing it that way was that it was a good way to show what I believed the problem to be while removing OpenVZ from the equation. Opens up my options for troubleshooting.
Oh man I wish it was
Thanks! Gave it a shot and no luck, but certainly didn't hurt anything.
This is why I love LET.
Handing it back to Gordon for him to toy with it some more. I'm about to lose my mind. This thing is such a beast too. Dual E5, 8x 500GB drives.
I'm having this issue with just one IP in the middle of a range. It's so weird. I get my ips so cheap it's not worth fixing it.
That is quite strange. Maybe what we figure out helps you gain an IP
Sorry I couldn't be of help... those are the issues I usually run into with openvz.
Same issue here strangely.
Appreciate any attempts. It's really all any of us can do, throw random ideas at it. It seems I've hit a pretty legit issue.
Broadcast limit changed on gateway, problem appears solved. Watched Gordon repeat all of our steps on the KVM and was quite generous in dealing with the issue once it was caught.
Broadcast limit on gateway? So on the providers switch there was a limit on your gateway? Or do you have your own layer3 device?
All Gordon's equipment. Gateway, switch, words are getting lumped together in my head after all this
@jarland I am having same problem with my server in incero.can you enlighten how you solved it?
@ftpit
Did you read the thread the resolution is in it.....2 posts up.
This is the problem we faced too
Are you with incero too?
It is entirely possible that there could be another problem, perhaps even software. However, if you have ruled this out, you can ask Gordon if he can see any reason why you would be having the same issue that Jarland & Ryan from Catalyst had. The situation should still be pretty fresh on his mind.