[OmniOS-discuss] Slow scrub on SSD-only pool

Dale Ghent daleg at omniti.com
Sun Apr 17 18:42:21 UTC 2016


> On Apr 17, 2016, at 9:07 AM, Stephan Budach <stephan.budach at JVM.DE> wrote:
> 
> Well… searching the net somewhat more thoroughfully, I came across an archived discussion which deals also with a similar issue. Somewhere down the conversation, this parameter got suggested:
> 
> echo "zfs_scrub_delay/W0" | mdb -kw
> 
> I just tried that as well and although the caculated speed climbs rathet slowly up, iostat now shows  approx. 380 MB/s read from the devices, which rates  at 24 MB/s per single device * 8 *2.
> 
> Being curious, I issued a echo "zfs_scrub_delay/W1" | mdb -kw to see what would happen and that command immediately drowned the rate on each device down to 1.4 MB/s…
> 
> What is the rational behind that? Who wants to wait for weeks for a scrub to finish? Usually, I am having znapzend run as well, creating snapshots on a regular basis. Wouldn't that hurt scrub performance even more?

zfs_scrub_delay is described here:

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/dsl_scan.c#63

How busy are your disks if you subtract the IO caused by a scrub? Are you doing these scrubs with your VMs causing normal IO as well?

Scrubbing, overall, is treated as a background maintenance process. As such, it is designed to not interfere with "production IO" requests. It used to be that scrubs ran as fast as disk IO and bus bandwidth would allow, which in turn severely impacted the IO performance of running applications, and in some cases this would cause problems for production or user services.  The scrub delay setting which you've discovered is the main governor of this scrub throttle code[1], and by setting it to 0, you are effectively removing the delay it imposes on itself to allow non-scrub/resilvering IO requests to finish.

The solution in your case is specific to yourself and how you operate your servers and services. Can you accept degraded application IO while a scrub or resilver is running? Can you not? Maybe only during certain times?

/dale

[1] http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/dsl_scan.c#1841
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://omniosce.org/ml-archive/attachments/20160417/d6e7d7f3/attachment.bin>


More information about the OmniOS-discuss mailing list