[OmniOS-discuss] Ang: Continuing zone shutdown issues with r151016

Jim Klimov jimklimov at cos.ru
Thu Mar 31 05:13:50 UTC 2016


31 марта 2016 г. 7:06:23 CEST, Jim Klimov <jimklimov at cos.ru> пишет:
>30 марта 2016 г. 21:16:15 CEST, Dan McDonald <danmcd at omniti.com> пишет:
>>
>>> On Mar 30, 2016, at 3:07 PM, Bob Friesenhahn
>><bfriesen at simple.dallas.tx.us> wrote:
>>> 
>>> Normally I would use 'zoneadm' to do the shutdown.  However, I am
>>pretty sure that the former (based on 'zlogin') encountered the same
>>problem.  If so, that is very interesting.
>>
>>On a zone lock, we need to see a couple of things (from root at global):
>>
>>- pgrep -z <zonename>
>>
>>- pgrep -f <zonename>  (Should only produce the zoneadmd process.)
>>
>>IF AND ONLY IF THOSE TWO ABOVE consistently produce process lists (you
>>cannot run the following w/o a nonzero process list, otherwise it
>>defaults to ALL the processes):
>>
>>- ptree `pgrep -z <zonename>; pgrep -f <zonename>`
>>
>>- pstack `pgrep -z <zonename>; pgrep -f <zonename>`  (NOTE:  This may
>>spew a lot if you have a lot of processes.)
>>
>>Something's stopping zoneadmd from halting, and the process stacks may
>>help us figure it out.
>>
>>Dan
>>
>>_______________________________________________
>>OmniOS-discuss mailing list
>>OmniOS-discuss at lists.omniti.com
>>http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>This is reminiscent of what I saw in other solarish systems, maybe even
>sol10 and sxce although quite rare (once in hundreds of times). There a
>zoneadmd would lock up and not die until kill'ed, so blocking zone
>reboots from outside or within.
>
>On newer illumos setups that manage a comples zfs zoneroot hierarchy,
>this often seemed linked with some process (e.g. a shell or midnight
>commander or updatedb scans, nowadays maybe pkg integration between gz
>and lz can play a role? etc.) running from global zone (unkillable by
>lz shutdown) and having as current a directory from the local zone
>namespace - thus blocking its unmount. Killing such process or going
>into another dir allowed zoneadmd teardown to proceed.
>
>Alas, this description and method do not quite apply to sol10/sxce that
>have simpler gz-provided zoneroots, so there may be more sinister
>beasts of the old lurking in the deep ;)
>
>Hope these memories help,
>Jim Klimov 
>--
>Typos courtesy of K-9 Mail on my Samsung Android

Forgot to add: the cases similar to what i described can be inspected with fuser (-c, -m IIRC) to see if any processes block the zoneroot filesystems. Also blockage can be due to mounts underneath and other kernel-provided blockers, that userspace fuser won't show. 

In particular then look out for nfs+autofs client in a zone (something sol10 documented as a no-no, especiakly to not use nfs from gz of the same server, but which just worked and was cool to e.g. share homedirs or distribs, and so was widely used) - perhaps autofs failed to unmount something from a remote host, then was killed by lz shutdown, and then the remaining sub-fs blocks the zoneroot teardown.

Jim
--
Typos courtesy of K-9 Mail on my Samsung Android


More information about the OmniOS-discuss mailing list