<div dir="ltr">We set up a machine the other day and had a very weird problem when trying to aggregate a couple of NICs. The machine has 4*10GigE ports (2*2 cards) but we only connected two to a Force10 switch. We enabled LACP and the port channel was created on the switch, but we then spent the next 4 hours trying to get ARP requests to work. <div> <br></div><div>It turns out that, for our config, it is possible broadcast frames over interfaces that are down. We have not seen any unicast doing this, and we're not doing any multicast, I'm not 100% sure if having an L4 policy looks at the IP requested in the ARP frame, but we've tested with ~30 and they all went over the same, even with L4. It's also something that can happen if the interface goes down after the aggr interface was created, as we shutdown the two operational interfaces one by and ARP requests were still trying to go over the one that was down.</div> <div><br></div><div>Obviously, our setup is not just that, on top of the aggr1 interface, we have two VLANs (and the problem manifests itself on both). I cannot test right now, as the machine is in production, but we'll be getting an identical setup on Monday to play with for a bit. Anyway, some of the stuff I looked at (was talking to esproul in the IRC channel):</div> <div><br></div><div style>dladm show-aggr -si while trying to get the MAC of an IP with all 4 ifaces in (I removed the irrelevant ones so the headers match :) ):</div><div style><br></div><div style><div><font face="courier new, monospace">LINK PORT IPACKETS RBYTES OPACKETS OBYTES IPKTDIST OPKTDIST</font></div> <div><font face="courier new, monospace">aggr1 -- 6521 2804 2 294 -- --</font></div><div><font face="courier new, monospace">-- ixgbe0 0 0 0 46 0.0 0.0 </font></div> <div><font face="courier new, monospace">-- ixgbe1 5616 252 1 124 86.1 50.0 </font></div><div><font face="courier new, monospace">-- ixgbe2 0 0 0 0 0.0 0.0 </font></div> <div><font face="courier new, monospace">-- ixgbe3 905 2552 1 124 13.9 50.0 </font></div><div><br></div></div><div><div style>dladm show-aggr -x:</div><div style><br></div><div style><div> <font face="courier new, monospace">LINK PORT SPEED DUPLEX STATE ADDRESS PORTSTATE</font></div><div><font face="courier new, monospace">aggr1 -- 10000Mb full up 90:e2:ba:3f:d2:38 --</font></div> <div><font face="courier new, monospace"> ixgbe0 0Mb unknown down 90:e2:ba:3f:d2:38 standby</font></div><div><font face="courier new, monospace"> ixgbe1 10000Mb full up 90:e2:ba:3f:d2:39 attached</font></div> <div><font face="courier new, monospace"> ixgbe2 0Mb unknown down 90:e2:ba:3f:d0:50 standby</font></div><div><font face="courier new, monospace"> ixgbe3 10000Mb full up 90:e2:ba:3f:d0:51 attached</font></div> <div><br></div><div><br></div><div style>A tcpdump running on ixgbe0 at the same time:</div><div style><br></div><div style><div><font face="courier new, monospace"># /opt/omni/sbin/tcpdump -ni ixgbe0 ether host 90:e2:ba:3f:d2:38</font></div> <div><font face="courier new, monospace">tcpdump: WARNING: SIOCGIFADDR: ixgbe0: No such device or address</font></div><div><font face="courier new, monospace">tcpdump: verbose output suppressed, use -v or -vv for full protocol decode</font></div> <div><font face="courier new, monospace">listening on ixgbe0, link-type EN10MB (Ethernet), capture size 65535 bytes</font></div><div><font face="courier new, monospace">14:42:30.025630 ARP, Request who-has 10.0.64.105 (ff:ff:ff:ff:ff:ff) tell 10.0.64.131, length 28</font></div> <div><font face="courier new, monospace">14:42:30.520063 ARP, Request who-has 10.0.64.105 (ff:ff:ff:ff:ff:ff) tell 10.0.64.131, length 28</font></div><div><font face="courier new, monospace">14:42:31.020049 ARP, Request who-has 10.0.64.105 (ff:ff:ff:ff:ff:ff) tell 10.0.64.131, length 28</font></div> <div><font face="courier new, monospace">14:42:32.020122 ARP, Request who-has 10.0.64.105 (ff:ff:ff:ff:ff:ff) tell 10.0.64.131, length 28</font></div><div><font face="courier new, monospace">14:42:32.520041 ARP, Request who-has 10.0.64.105 (ff:ff:ff:ff:ff:ff) tell 10.0.64.131, length 28</font></div> <div><br></div><div><br></div><div style>While the machine will be mostly accessed from other hosts, not the other way around, this won't really be an issue, if an external machine sends an ARP request, the reply will go over unicast and it will be on the correct interface (i.e. one that is up), and we can just add static ARP entries (i.e. the log host and default gateway), going forward this might be quite a problem, and wanted to know if:</div> <div style><br></div><div style>a) I did something stupid (quite likely, I have absolutely no experience with OmniOS or Solaris except for the two weeks spent playing around with it)</div><div style>b) it's a bug (I took a look at aggr_send.c and couldn't see anything obviously wrong, and I cannot see why broadcast packets would be treated differently)</div> <div style>c) anyone has seen this behaviour before</div></div></div><div><br></div>-- <br>George-Cristian Bîrzan </div></div>