<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jul 16, 2015, at 9:48 AM, Chris Siebenmann <<a href="mailto:cks@cs.toronto.edu" class="">cks@cs.toronto.edu</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">I wrote:<br class=""><blockquote type="cite" class=""> We have one ZFS-based NFS fileserver that persistently runs at a very<br class="">high level of non-ARC kernel memory usage that never seems to shrink.<br class="">On a 128 GB machine, mdb's ::memstat reports 95% memory usage by just<br class="">'Kernel' while the ZFS ARC is only at about 21 GB (as reported by<br class="">'kstat -m') although c_max should allow it to grow much bigger.<br class=""><br class=""> According to ::kmastat, a *huge* amount of this memory appears to be<br class="">vanishing into allocated but not used kmem_alloc_131072 slab buffers:<br class=""><br class=""><blockquote type="cite" class="">::kmastat<br class=""></blockquote>cache buf buf buf memory alloc alloc<br class="">name size in use total in use succeed fail<br class="">------------------------------ ----- --------- --------- ------ ---------- -----<br class="">[...]<br class="">kmem_alloc_131072 128K 6 613033 74.8G 196862991 0<br class=""></blockquote><br class=""> It turns out that the explanation for this is relatively simple, as<br class="">is the work around. Put simply: the OmniOS kernel does not actually<br class="">free up these deallocated cache objects until the system is put under<br class="">relatively strong memory pressure. Crucially, *the ZFS ARC does not<br class="">create this memory pressure*; I think that you pretty much need a user<br class="">level program allocating enough memory in order to trigger it, and I<br class="">think the memory growth needs to happen relatively rapidly fast so that<br class="">the kernel doesn't reclaim enough memory through lesser means (such as<br class="">shrinking the ZFS ARC).<br class=""></div></blockquote><div><br class=""></div><div>I don't think we will get much traction for ZFS pushing applications out of RAM.</div><div>There is a nuance here, that can be difficult to resolve.</div><br class=""><blockquote type="cite" class=""><div class=""><br class="">(Specifically, you need to force kmem_reap() to be called. The primary<br class="">path for this is if 'freemem' drops under 'lotsfree', which is only a few<br class="">hundred MB on many systems. See usr/src/uts/common/os/vm_pageout.c in<br class="">the OmniOS source repo.)<br class=""><br class=""> Since our fileservers are purely NFS fileservers and have a basically<br class="">static level of user memory usage, they rarely or never rapidly use up<br class="">enough memory to trigger this 'allocated but unused' reclaim[*].<br class=""><br class=""> The good news is that it's easy enough these days to eat memory at the<br class="">user level (you can do it with modern 64-bit scripting languages like<br class="">Python, even at an interactive prompt). The bad news is that when we did<br class="">this on the server in question we provoked a significant system stall at<br class="">both the NFS server level and even the level of ssh logins and shells;<br class="">this is clearly not something that we'd want to automate.<br class=""><br class=""> It's my personal opinion that there should be something in the kernel<br class="">that automatically reaps drastically outsized kmem caches after a<br class="">while. It's absurd that we've run for weeks with more than 70 GB of RAM<br class="">sitting unused and an undersized ZFS ARC because of this.<br class=""></div></blockquote><div><br class=""></div><div>kmem reaps can be very painful</div><br class=""><blockquote type="cite" class=""><div class=""><br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>- cks<br class="">[*: interested parties can see how often cache reaping has been triggered<br class=""> with the following 'mdb -k' command:<br class=""><span class="Apple-tab-span" style="white-space:pre"> </span>::walk kmem_cache | ::printf "%4d %s\n" kmem_cache_t cache_reap cache_name<br class=""></div></blockquote><div><br class=""></div><div>ugh. How about:</div><div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">kstat -p :::reap</div></div><div><br class=""></div> -- richard</div><div><br class=""><blockquote type="cite" class=""><div class=""><br class=""> Even on this heavily used fileserver, up for 45 days, the reap count<br class=""> was *8*. Many of our other fileservers, with less usage, have reap<br class=""> counts of 0.<br class="">]<br class="">_______________________________________________<br class="">OmniOS-discuss mailing list<br class=""><a href="mailto:OmniOS-discuss@lists.omniti.com" class="">OmniOS-discuss@lists.omniti.com</a><br class="">http://lists.omniti.com/mailman/listinfo/omnios-discuss<br class=""></div></blockquote></div><br class=""></body></html>