[OmniOS-discuss] Testing RSF-1 with zpool/nfs HA

Thu Feb 18 21:56:31 UTC 2016

comments below...

> On Feb 18, 2016, at 12:57 PM, Schweiss, Chip <chip at innovates.com> wrote:
> 
> 
> 
> On Thu, Feb 18, 2016 at 5:14 AM, Michael Rasmussen <mir at miras.org <mailto:mir at miras.org>> wrote:
> On Thu, 18 Feb 2016 07:13:36 +0100
> Stephan Budach <stephan.budach at JVM.DE <mailto:stephan.budach at JVM.DE>> wrote:
> 
> >
> > So, when I issue a simple ls -l on the folder of the vdisks, while the switchover is happening, the command somtimes comcludes in 18 to 20 seconds, but sometime ls will just sit there for minutes.
> >
> This is a known limitation in NFS. NFS was never intended to be
> clustered so what you experience is the NFS process on the client side
> keeps kernel locks for the now unavailable NFS server and any request
> to the process hangs waiting for these locks to be resolved. This can
> be compared to a situation where you hot-swap a drive in the pool
> without notifying the pool.
> 
> Only way to resolve this is to forcefully kill all NFS client processes
> and the restart the NFS client.

ugh. No, something else is wrong. I've been running such clusters for almost 20 years,
it isn't a problem with the NFS server code.

> 
> 
> I've been running RSF-1 on OmniOS since about r151008.  All my clients have always been NFSv3 and NFSv4.   
> 
> My memory is a bit fuzzy, but when I first started testing RSF-1, OmniOS still had the Sun lock manager which was later replaced with the BSD lock manager.   This has had many difficulties.
> 
> I do remember that fail overs when I first started with RSF-1 never had these stalls, I believe this was because the lock state was stored in the pool and the server taking over the pool would inherit that state too.   That state is now lost when a pool is imported with the BSD lock manager.   
> 
> When I did testing I would do both full speed reading and writing to the pool and force fail overs, both by command line and by killing power on the active server.    Never did I have a fail over take more than about 30 seconds for NFS to fully resume data flow.   

Clients will back-off, but the client's algorithm is not universal, so we do expect to
see different client retry intervals for different clients. For example, the retries can
exceed 30 seconds for Solaris clients after a minute or two (alas, I don't have the
detailed data at my fingertips anymore :-(. Hence we work hard to make sure failovers
occur as fast as feasible.

> 
> Others who know more about the BSD lock manager vs the old Sun lock manager may be able to tell us more.  I'd also be curious if Nexenta has addressed this.

Lock manager itself is an issue and through we're currently testing the BSD lock
manager in anger, we haven't seen this behaviour.

Related to lock manager is name lookup. If you use name services, you add a latency
dependency to failover for name lookups, which is why we often disable DNS or other
network name services on high-availability services as a best practice.
 -- richard

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20160218/5c8e934e/attachment.html>