[OmniOS-discuss] ixgbe: breaking aggr on 10GbE X540-T2

Stephan Budach stephan.budach at JVM.DE
Wed May 11 12:50:58 UTC 2016


Am 11.05.16 um 13:36 schrieb Stephan Budach:
> Am 09.05.16 um 20:43 schrieb Dale Ghent:
>>> On May 9, 2016, at 2:04 PM, Stephan Budach <stephan.budach at JVM.DE> 
>>> wrote:
>>>
>>> Am 09.05.16 um 16:33 schrieb Dale Ghent:
>>>>> On May 9, 2016, at 8:24 AM, Stephan Budach <stephan.budach at JVM.DE> 
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I have a strange behaviour where OmniOS omnios-r151018-ae3141d 
>>>>> will break the LACP aggr-link on different boxes, when Intel 
>>>>> X540-T2s are involved. It first starts with a couple if link 
>>>>> downs/ups on one port and finally the link on that  port 
>>>>> negiotates to 1GbE instead of 10GbE, which then breaks the LACP 
>>>>> channel on my Cisco Nexus for this connection.
>>>>>
>>>>> I have tried swapping and interchangeing cables and thus 
>>>>> switchports, but to no avail.
>>>>>
>>>>> Anyone else noticed this and even better… knows a solution to this?
>>>> Was this an issue noticed only with r151018 and not with previous 
>>>> versions, or have you only tried this with 018?
>>>>
>>>> By your description, I presume that the two ixgbe physical links 
>>>> will stay at 10Gb and not bounce down to 1Gb if not LACP'd together?
>>>>
>>>> /dale
>>> I have noticed that on prior versions of OmniOS as well, but we only 
>>> recently started deploying 10GbE LACP bonds, when we introduced our 
>>> Nexus gear to our network. I will have to check if both links stay 
>>> at 10GbE, when not being configured as a LACP bond. Let me check 
>>> that tomorrow and report back. As we're heading for a streched DC, 
>>> we are mainly configuring 2-way LACP bonds over our Nexus gear, so 
>>> we don't actually have any single 10GbE connection, as they will all 
>>> have to be conencted to both DCs. This is achieved by using VPCs on 
>>> our Nexus switches.
>> Provide as much detail as you can - if you're using hw flow control, 
>> whether both links act this way at the same time or independently, 
>> and so-on. Problems like this often boil down to a very small and 
>> seemingly insignificant detail.
>>
>> I currently have ixgbe on the operating table for adding X550 
>> support, so I can take a look at this; however I don't have your type 
>> of switches available to me so LACP-specific testing is something I 
>> can't do for you.
>>
>> /dale
> I checked the ixgbe.conf files on each host and they all are still at 
> the standard setting, which includes flow_control = 3;
> So they all have flow control enabled. As for the Nexus config, all of 
> those ports are still on standard ethernet ports and modifications 
> have only been made globally to the switch.
> I will now have to yank the one port on one of the hosts from the aggr 
> and configure it as a standalone port. Then we will see, if it still 
> receives the disconnects/reconnects and finally the negotiation to 
> 1GbE instead of 10GbE. As this only seems to happen to the same port I 
> never experienced other ports of the affected aggrs acting up. I also 
> thought to notice, that those were always the "same" physical ports, 
> that is the first port on the card (ixgbe0), but that might of course 
> be a coincidence.
>
> Thanks,
> Stephan

Ok, so we can likely rule out LACP as a generic reason for this issue… 
After removing ixgbe0 from the aggr1, I plugged it into an unused port 
of my Nexus FEX and low and behold, here we go:

root at tr1206902:/root# tail -f /var/adm/messages
May 11 14:37:17 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link 
up, 1000 Mbps, full duplex
May 11 14:38:35 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 link 
down
May 11 14:38:48 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link 
up, 10000 Mbps, full duplex

May 11 15:24:55 tr1206902 mac: [ID 486395 kern.info] NOTICE: ixgbe0 link 
down
May 11 15:25:10 tr1206902 mac: [ID 435574 kern.info] NOTICE: ixgbe0 link 
up, 10000 Mbps, full duplex

So, after less than an hour, we had the first link-cycle on ixgbe0, alas 
on another port, which has no LACP config whatsoever. I will monitor 
this for a while and see, if we will get more of those.

Thanks,
Stephan


More information about the OmniOS-discuss mailing list