[OmniOS-discuss] What should we look at in a memory exhaustion situation?

Chris Siebenmann cks at cs.toronto.edu
Tue Apr 28 15:06:45 UTC 2015


 We now have a reproducable situation where high NFS load can cause our
specific fileserver configuration to lock up with what looks like a
memory exhaustion deadlock. So far, attempts to get a crash dump haven't
worked (although they may someday). Without a crash dump, what system
stats and so on should we be looking at that would be useful for people
to identify the kernel bugs involved here, either in the run up to the
lockup or in kmdb after the fact once we break into it after the machine
locks up?

(Our specific fileserver is NFS to ZFS pools with mirrored vdevs where
each side of the mirror is an iSCSI disk accessed over two backend
iSCSI networks. We suspect that iSCSI is a contributing factor. Our
fileservers have 64 GB of RAM.)

 So far we have vmstat, mpstat, network volume, arcstat, '::memstat'
from mdb -k, and some homegrown NFS and ZFS DTrace activity monitoring
scripts. These say that just before the lockup happens, free memory and
ARC usage collapses abruptly and catastrophically, 'Kernel' memory usage
goes to almost or totally 100%, all CPUs spike to over 90% system time,
we see over a thousand runnable processes[*], and we have over a GB
of in-flight NFS writes but there is very little actual ZFS (and NFS)
IO either in flight or being completed.

(The NFS service pool also reports a massively increasing and
jaw-dropping number of 'Pending requests'; the last snapshot of
'::svc_pool -v nfs' a few seconds before the crash has 376862 of them.)

 This seems to happen more often with lower values for the number of
NFS server threads, but even very large values are not immune from it
(our most recent lockup happened at 4096 threads). Slowdowns in NFS server
IO responsiveness seem to make this more likely to happen; past slowdowns
have come from both disk IO problems and from full or nearly full pools.

 Thanks in advance.

	- cks
[*: this was with quite a lot of NFS server threads configured.]


More information about the OmniOS-discuss mailing list