[OmniOS-discuss] zpool Write Bottlenecks

Fri Sep 30 16:46:23 UTC 2016

Ah. That explains it. At first I figured the single stream couldn't be the case since a dd from /dev/zero to /dev/null was in excess of 24GB/s. But what you said  about TX groups explains it all. I ran several dd's in parallel and now see all of the disks getting fully saturated.

Thanks

> On Sep 30, 2016, at 6:36 AM, Bob Friesenhahn <bfriesen at simple.dallas.tx.us> wrote:
> 
> On Thu, 29 Sep 2016, Michael Talbott wrote:
> 
>> I'm very aware that dd is not the best tool for measuring disk performance in most cases. And I know the read throughput number is way off b/c of zfs caching mechanisms. However it does work in this case to illustrate the point in my case of a write throttle somewhere in the system. If anyone needs me to test with some other
> 
> I think that Dale is correct that your benchmark may be only benchmarking 'dd'.  The performance of single-threaded 'dd' will be entirely driven by read performance of /dev/zero followed by write latency per operation on writes.  In your case, the issue is almost certainly that the write latency is a fixed value (regardless of the number of disks) and so there is a fixed maximum write rate regardless of the performance of the underlying store.
> 
> If you were to run two 'dd's in parallel, you may see an increase in the write rate.  This is the first thing to try.
> 
> With zfs, a "write" goes into a buffer associated with a transaction group (TXG) which is written to disk based on a maximum elapsed time, or the maximum size of the TXG being hit.  The maximum size of the TXG is estimated based on the available total RAM and zfs's guess as to the time to write the TXG (based on throughput), as well as tunables.
> 
> For a brief time, the writer proces will not be able to write any data at all as the current TXG is finished (resulting in writes) and a new one is started.
> 
> At the lowest level, the TXG can not be completed until all involved disks have written their data and completed a cache sync operation. The latency of the cache sync is driven by the latency of the slowest disk drive (+HBA) involved.  Getting rid of any disks which exhibit abnormally high latency would help with the transaction times.
> 
> Bob
> -- 
> Bob Friesenhahn
> bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/