LXC host network freezing
Monday, September 19th, 2011 by Gary Richards - Categories: Linux, Operating Systems, VirtualisationI’ve been performing some testing for a client who wanted to setup a few Linux Containers on some of their systems.
Everything was working fine fairly quickly, but sometimes starting or stopping a container would cause the hosts networking to freeze (or hang, whatever you’d like to call it) for a small period (10-30 seconds generally).
Investigating further, we’re using bridged networking just like i’ve seen with a whole bunch of other virtualisation before:
root@host:~# ifconfig br0 br0 Link encap:Ethernet HWaddr b4:99:ba:XX:XX:XX inet addr:192.168.121.61 Bcast:192.168.121.255 Mask:255.255.255.0 inet6 addr: fe80::b699:baff:feXX:XXXX/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:934628 errors:0 dropped:0 overruns:0 frame:0 TX packets:777998 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:257326709 (257.3 MB) TX bytes:4874452976 (4.8 GB) root@host:~# brctl show bridge name bridge id STP enabled interfaces br0 8000.b499baXXXXXX no bond0
Which all looks fine. Having the bonded interface added to the bridge ’should’ in theory be ok. At least, i’m pretty sure i’ve done it before…
Startup a container and look again:
root@host:~# ifconfig
br0 Link encap:Ethernet HWaddr b2:c8:45:94:e4:0a
inet addr:192.168.121.61 Bcast:192.168.121.255 Mask:255.255.255.0
inet6 addr: fe80::b699:baff:feXX:XXXX/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:910208 errors:0 dropped:0 overruns:0 frame:0
TX packets:757874 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:254991889 (254.9 MB) TX bytes:4769801361 (4.7 GB)
vethYldGtj Link encap:Ethernet HWaddr b2:c8:45:94:e4:0a
inet6 addr: fe80::b0c8:45ff:fe94:e40a/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:7 errors:0 dropped:0 overruns:0 frame:0
TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:648 (648.0 B) TX bytes:3048 (3.0 KB)
root@host:~# brctl show
bridge name bridge id STP enabled interfaces
br0 8000.b2c84594e40a no bond0
vethYldGtj
Everything looked ok (I hadn’t at this point spotted the MAC address change!). Starting/stopping this container had caused the host networking to freeze. Now I know for a fact that this doesn’t happen EVERY single time that a container is started, so I tried again:
root@host:~# ifconfig
br0 Link encap:Ethernet HWaddr b4:99:ba:XX:XX:XX
inet addr:192.168.121.61 Bcast:192.168.121.255 Mask:255.255.255.0
inet6 addr: fe80::b699:baff:feXX:XXXX/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:910208 errors:0 dropped:0 overruns:0 frame:0
TX packets:757874 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:254991889 (254.9 MB) TX bytes:4769801361 (4.7 GB)
vethLxwrPl Link encap:Ethernet HWaddr c6:bf:b2:18:db:4e
inet6 addr: fe80::24bf:b2ff:fe18:db4e/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:46 errors:0 dropped:0 overruns:0 frame:0
TX packets:871 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:4013 (4.0 KB) TX bytes:189734 (189.7 KB)
root@host:~# brctl show
bridge name bridge id STP enabled interfaces
br0 8000.b499baXXXXXX no bond0
vethLxwrPl
Wait.. why did nothing go wrong this time?!
After much pondering (and various tcpdumping, lxc configuration changes, etc.) I was still no closer to a solution. It seemed that randomly my containers would cause this to happen and other times they wouldn’t. At no point were the containers networking affected (even though the host machine would be affected).
I even looked at one of the containers networking:
root@lxc1:~# ifconfig eth0 Link encap:Ethernet HWaddr 54:52:00:3d:40:87 inet addr:192.168.121.62 Bcast:192.168.121.255 Mask:255.255.255.0 inet6 addr: fe80::5652:ff:fe3d:4087/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:859641 errors:0 dropped:0 overruns:0 frame:0 TX packets:125441 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2129954 (203 MiB) TX bytes:2323974545 (2.0 GiB)
Nothing obviously out of the ordinary. The MAC address is the same as we’ve told LXC to configure (it’s correct when it breaks too).
So what’s the problem? Stopping lxc from adding/removing the veth device from the bridge allowed me to start/stop my container without problems. So the point at which the veth device was added to the bridge was definitely the problem. But why?
We setup a loop that continually started/stopped containers to see if there’s anything useful that we could find by watching various things whilst containers were started/stopped. One of the things I had open was watch on ifconfig. It didn’t take long before I realised that sometimes the bridge’s MAC address was changing. Sometimes a container would start, the MAC address of the bridge would stay the same, everything was fine. Other times, the container would start, the MAC address of the bridge would change, then the hosts networking would break.
Ok, now we’re on to something…. what’s causing the MAC address to change (sometimes) and why it it changing to totally random things that seem so totally unrelated to any MAC address that we’ve configured anywhere?
After this is it was fairly simple. The MAC address given to our bridge was b4:99:ba:XX:XX:XX (the MAC of bond0, which happens to be the MAC of eth0 too, although that’s less important). Whenever the MAC of the bridge changed (and we saw problems with the host networking) it always changed to a MAC address ‘lower’ in value that the original MAC of the bridge.
So why were random MAC addresses lower than our bridges MAC address even being created? We’ve configured our containers to have MAC addresses in the range that KVM seems to use, which start 52:54:….. So that wasn’t it. I then backtrack to my original comments about not noticing the MAC address of the bridge changing and also realise that the MAC of the bridge was the same as the veth device that was created and is associated with my container. Ok, one step closer, so why is this? LXC’s source reveals that the veth devices are created in the simplest way possible and no extra configuration (so a colleague tells me!) and therefore the MAC address of the new veth device is totally random.
Adding this veth device with its random MAC to our bridge causes the bridge to change its MAC to the MAC of the veth device if the veth devices MAC is lower than the bridges current MAC. I’m unsure why this happens, but it seems like whenever you create a bridge, something decides that the bridge will assume the MAC address of the interface with the lowest value MAC address thats added to the bridge. I have no idea if this is a standard or a choice a kernel developer has made or what? But it’s causing me problems!
Why haven’t any of my KVM or Xen boxes i’ve built/used in the past suffered from this problem? It seems that KVM (at least KVM vm’s created with libvirt) have their associated veth device created with a MAC address specified (rather than random like the LXC). The MAC used is the MAC that the virtual machine is configured with, but with the first part replaced with fe (ie. 01:23:45:67:89:ab would become fe:23:45:67:89:ab). Now presumably they do this because they know most users are probably using bridged networking and when these devices are added to the bridge, they’re probably going to have a higher MAC than any real device and therefore probably aren’t going to break your hosts networking!
What can we do to solve this problem? Hopefully the people behind LXC will adopt a similar approach in the future. However until then the only solution that I can find is to tell the bridge what it’s MAC address is. It then doesn’t seem to get changed when machines with lower MAC addresses are added to the bridge.
It seems that on Debian style systems you can do this with this option in /etc/network/interfaces:
bridge_hw <MAC addr>
You could do it manually with ip:
ip link set br0 address <MAC addr>
I’d imagine that you can do it with ifconfig too? I’m sure RedHat style systems has a way too.
I filed a bug on the LXC bug tracker at sourceforge.