[OmniOS-discuss] Reproducible r151008j kernel crash with ZFS pools on iSCSI

Sat Mar 8 17:11:36 UTC 2014

On 2014-03-07 20:34, Chris Siebenmann wrote:
>   I have a reproducible kernel crash with OmniOS r151008j. The situation:
>
>   The basic setup is a ZFS pool on mirrored pairs of iSCSI disks. The
> iSCSI disks come from two different iSCSI targets, and all
> targets are multipathed over two 10G networks. The pool is set to
> 'failmode=continue'.  If I start a large streaming write to the pool and
> then take down both iSCSI interfaces on both targets (making all disks
> in the pool completely unavailable), OmniOS panics after a couple of
> minutes. Fortunately this doesn't happen if only a single target becomes
> inaccessible.

By "pointing my finger into the sky" I might guesstimate that since you
have some streaming writes and they do go on, some buffer space becomes
exhausted (perhaps the hanging ZIOs waiting for the storage backends to
come back). I would expect the write()'s to not return and thus throttle
the clients from pushing more data, but perhaps there are enough client
threads trying to write that their maximum buffer spaces combined would
overwhelm the particular server.

In short: when reproducing the bug, try something like "vmstat 1" in a
separate SSH shell, to see if your available memory plummets when you
disconnect the devices and/or the "sr" (scanrate, search for swapping)
increases substantially.

HTH,
//Jim Klimov