[OmniOS-discuss] r151012 nlockmgr fails to start

Richard Elling richard.elling at richardelling.com
Fri Oct 10 14:26:46 UTC 2014


> On Oct 10, 2014, at 6:15 AM, "Schweiss, Chip" <chip at innovates.com> wrote:
> 
> 
>> On Thu, Oct 9, 2014 at 9:54 PM, Dan McDonald <danmcd at omniti.com> wrote:
>> 
>> On Oct 9, 2014, at 10:23 PM, Schweiss, Chip <chip at innovates.com> wrote:
>> 
>> > Just tried my 2nd system.   r151010 nlockmgr starts after clearing maintenance mode.   r151012 it will not start at all.  nfs/status was enabled and online.
>> >
>> > The commonality I see on the two systems I have tried is they are both part of an HA cluster.   So they don't import the pool at boot, but RSF-1 imports it with cache mapped to a different location.
>> 
>> That could be something HA Inc. needs to further test.  We don't directly support RSF-1, after all.
> 
> I there really isn't anything different than an auto imported pool.  I'm suspecting using an alternate cache location my be triggering something else to go wrong in the nlockmgr.   

no, these are totally separate subsystems. RSF-1 imports the pool. NFS sharing is started by the zpool command, in userland, via sharing after the dataset is mounted. You can do the same procedure manually... no magic pixie dust needed. 

> 
> Here's the command RSF-1 uses to import the pool:
> zpool import -c /opt/HAC/RSF-1/etc/volume-cache/nrgpool.cache -o cachefile=/opt/HAC/RSF-1/etc/v
> olume-cache/nrgpool.cache-live -o failmode=panic  nrgpool
> 
> After the pool import it  puts the ip addresses back and is done.   That happens in less than 1 second.
> 
> In the mean time NFS services auto start and nlockmgr starts spinning.

Perhaps share doesn't properly start all of the services? Does it work ok if you manually "svcadm enable" all of the NFS services?


  -- richard

> 
>  
>> > nlockmgr is becoming a real show stopper.
>> 
>> svcadm disable nlockmgr nfs/status
>> svcadm enable nfs/status
>> svcadm enable nlockmgr
>> 
>> You may wish to discuss this on illumos as well, I'm not sure who all else is seeing this save me one time, and you seemingly a lot of times.
> 
> I did that this time, no joy.   Today I'm working on a virtual setup with HA to see if I can get this reproduced on r151012.   
> 
> I thought this nlockmgr propblem was related to lots of nfs exports until, I ran into this on my SSD pool.  It used to be able to fail over in about 3-5 seconds.   It takes nlockmgr now sits in a spinning state for a few minutes and fails every time.   A clear of the maintenance mode, brings it back nearly instantly.   This is on r151010.  On r151012 it fails every time.   
> 
> Hopefully I can reproduce and I'll start a new thread copying Illumos too.
> 
> -Chip
> 
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20141010/bccf46c2/attachment-0001.html>


More information about the OmniOS-discuss mailing list