[OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

Thu Feb 18 10:57:20 UTC 2016

Am 18.02.16 um 09:29 schrieb Andrew Gabriel:
> On 18/02/2016 06:13, Stephan Budach wrote:
>> Hi,
>>
>> I have been test driving RSF-1 for the last week to accomplish the 
>> following:
>>
>> - cluster a zpool, that is made up from 8 mirrored vdevs, which are 
>> based on 8 x 2 SSD mirrors via iSCSI from another OmniOS box
>> - export a nfs share from above zpool via a vip
>> - have RSF-1 provide the fail-over and vip-moving
>> - use the nfs share as a repository for my Oracle VM guests and vdisks
>>
>> The setup seems to work fine, but I do have one issue, I can't seem 
>> to get solved. Whenever I failover the zpool, any inflight nfs data, 
>> will be stalled for some unpredictable time. Sometimes it takes not 
>> much longer than the "move" time of the resources but sometimes it 
>> takes up to 5 mins. until the nfs client on my VM server becomes 
>> alive again.
>>
>> So, when I issue a simple ls -l on the folder of the vdisks, while 
>> the switchover is happening, the command somtimes comcludes in 18 to 
>> 20 seconds, but sometime ls will just sit there for minutes.
>>
>> I wonder, if there's anything, I could do about that. I have already 
>> played with several timeouts, nfs wise and tcp wise, but nothing seem 
>> to yield any effect on this issue. Anyone, who knows some tricks to 
>> speed up the inflight data?
>
> I would capture a snoop trace on both sides of the cluster, and see 
> what's happening. In this case, I would run snoop in non-promiscuous 
> mode at least initially, to avoid picking up any frames which the IP 
> stack is going to discard.
Yes, I will do that and see what traffic is happening, but I have a gutt 
feeling that this happens, while the vip has been taken down and the 
next transmission to that, currently non-existent, but present in the 
arp-cache IP, will stall somewhere.
>
> Can you look at the ARP cache on client during the stall?
I will, but the arp cache must be updated quite soon, as the ping start 
working again after 15 to 20 seconds and they need a proper arp cache as 
well.
>
> BTW, if you have 2 clustered heads both relying on another single 
> system providing the iSCSI, that's a strange setup which may be giving 
> you less availability (and less performance) than serving NFS directly 
> from the SSD system without clustering.
>
This is actually not the way, it's going to be implemented. The missing 
storage node is already on it's way. Once the new head is in place, one 
of the iSCSI targets will be moved over to the new host, such as that 
all components are redundant.

Thanks,
Stephan