[OmniOS-discuss] r151012 nlockmgr fails to start

Schweiss, Chip chip at innovates.com
Fri Oct 10 13:15:24 UTC 2014


On Thu, Oct 9, 2014 at 9:54 PM, Dan McDonald <danmcd at omniti.com> wrote:

>
> On Oct 9, 2014, at 10:23 PM, Schweiss, Chip <chip at innovates.com> wrote:
>
> > Just tried my 2nd system.   r151010 nlockmgr starts after clearing
> maintenance mode.   r151012 it will not start at all.  nfs/status was
> enabled and online.
> >
> > The commonality I see on the two systems I have tried is they are both
> part of an HA cluster.   So they don't import the pool at boot, but RSF-1
> imports it with cache mapped to a different location.
>
> That could be something HA Inc. needs to further test.  We don't directly
> support RSF-1, after all.
>
>
I there really isn't anything different than an auto imported pool.  I'm
suspecting using an alternate cache location my be triggering something
else to go wrong in the nlockmgr.

Here's the command RSF-1 uses to import the pool:
zpool import -c /opt/HAC/RSF-1/etc/volume-cache/nrgpool.cache -o
cachefile=/opt/HAC/RSF-1/etc/v
olume-cache/nrgpool.cache-live -o failmode=panic  nrgpool

After the pool import it  puts the ip addresses back and is done.   That
happens in less than 1 second.

In the mean time NFS services auto start and nlockmgr starts spinning.



> > nlockmgr is becoming a real show stopper.
>
> svcadm disable nlockmgr nfs/status
> svcadm enable nfs/status
> svcadm enable nlockmgr
>
> You may wish to discuss this on illumos as well, I'm not sure who all else
> is seeing this save me one time, and you seemingly a lot of times.
>

I did that this time, no joy.   Today I'm working on a virtual setup with
HA to see if I can get this reproduced on r151012.

I thought this nlockmgr propblem was related to lots of nfs exports until,
I ran into this on my SSD pool.  It used to be able to fail over in about
3-5 seconds.   It takes nlockmgr now sits in a spinning state for a few
minutes and fails every time.   A clear of the maintenance mode, brings it
back nearly instantly.   This is on r151010.  On r151012 it fails every
time.

Hopefully I can reproduce and I'll start a new thread copying Illumos too.

-Chip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20141010/f63dc232/attachment.html>


More information about the OmniOS-discuss mailing list