[OmniOS-discuss] zpool Write Bottlenecks

Linda Kateley lkateley at kateley.com
Fri Sep 30 05:25:41 UTC 2016


One of the things I do is turn caching off. on a dataset #zfs get 
primarycache=none dataset. Give me better disk performance

lk


On 9/30/16 12:19 AM, Michael Talbott wrote:
> I'm very aware that dd is not the best tool for measuring disk 
> performance in most cases. And I know the read throughput number is 
> way off b/c of zfs caching mechanisms. However it does work in this 
> case to illustrate the point in my case of a write throttle somewhere 
> in the system. If anyone needs me to test with some other tool for 
> illustrative purposes, I can do that too. It's just so odd that 1 card 
> with a given set of disks attached in a pool provide roughly the same 
> net throughput as 2 cards with 2 sets of disks in said pool.
>
> But, for the nay-sayers of dd testing, I'll provide you with this.. 
> Here's an example of using 2 x 11 disk raidz3 vdevs where all 22 
> drives live on one backplane attached to 1 card with an 8x phy sas 
> connection. And then adding 2 more 11 disk raidz3 vdevs that are 
> connected to that system on a separate card (also using an 8x phy sas 
> link). No compression. Running bonnie++ to saturate the disks while I 
> pull the numbers from iotop.
>
> The following is an output of iotop (which uses dtrace for measuring 
> disk io)
> http://www.brendangregg.com/DTrace/iotop
>
> Here's the 22 drive pool (all attached to 1 card):
> ------------------------------------------------------------------------------------------------------
> 2016 Sep 30 00:02:42,  load: 3.17,  disk_r:      0 KB,  disk_w: 3394084 KB
>
>   UID    PID   PPID CMD              DEVICE  MAJ MIN D    BYTES
>     0   7630      0 zpool-datastore  sd124   194 7936 W  161812480
>     0   7630      0 zpool-datastore  sd118   194 7552 W  161820672
>     0   7630      0 zpool-datastore  sd127   194 8128 W  161845248
>     0   7630      0 zpool-datastore  sd128   194 8192 W  161845248
>     0   7630      0 zpool-datastore  sd122   194 7808 W  161849344
>     0   7630      0 zpool-datastore  sd119   194 7616 W  161857536
>     0   7630      0 zpool-datastore  sd121   194 7744 W  161857536
>     0   7630      0 zpool-datastore  sd125   194 8000 W  161865728
>     0   7630      0 zpool-datastore  sd123   194 7872 W  161869824
>     0   7630      0 zpool-datastore  sd126   194 8064 W  161873920
>     0   7630      0 zpool-datastore  sd120   194 7680 W  161906688
>     0   7630      0 zpool-datastore  sd136   194 8704 W  165916672
>     0   7630      0 zpool-datastore  sd137   194 8768 W  165916672
>     0   7630      0 zpool-datastore  sd138   194 8832 W  165933056
>     0   7630      0 zpool-datastore  sd135   194 8640 W  165937152
>     0   7630      0 zpool-datastore  sd139   194 8896 W  165941248
>     0   7630      0 zpool-datastore  sd134   194 8576 W  165945344
>     0   7630      0 zpool-datastore  sd130   194 8320 W  165974016
>     0   7630      0 zpool-datastore  sd129   194 8256 W  165978112
>     0   7630      0 zpool-datastore  sd132   194 8448 W  165994496
>     0   7630      0 zpool-datastore  sd133   194 8512 W  165994496
>     0   7630      0 zpool-datastore  sd131   194 8384 W  166006784
>
> ------------------------------------------------------------------------------------------------------
>
>
> And here's the pool extended with 2 more raidz3s with 2 cards
> notice it's almost LITERALLY cut in HALF per drive!
>
>
> ------------------------------------------------------------------------------------------------------
>
>
> 2016 Sep 30 00:01:07,  load: 4.59,  disk_r:      8 KB,  disk_w: 3609852 KB
>
>   UID    PID   PPID CMD              DEVICE  MAJ MIN D    BYTES
>     0   4550      0 zpool-datastore  sd133   194 8512 W   76558336
>     0   4550      0 zpool-datastore  sd132   194 8448 W   76566528
>     0   4550      0 zpool-datastore  sd135   194 8640 W   76570624
>     0   4550      0 zpool-datastore  sd134   194 8576 W   76574720
>     0   4550      0 zpool-datastore  sd136   194 8704 W   76582912
>     0   4550      0 zpool-datastore  sd131   194 8384 W   76611584
>     0   4550      0 zpool-datastore  sd130   194 8320 W   76644352
>     0   4550      0 zpool-datastore  sd137   194 8768 W   76648448
>     0   4550      0 zpool-datastore  sd129   194 8256 W   76660736
>     0   4550      0 zpool-datastore  sd138   194 8832 W   76713984
>     0   4550      0 zpool-datastore  sd139   194 8896 W   77369344
>     0   4550      0 zpool-datastore  sd113   194 7232 W   77770752
>     0   4550      0 zpool-datastore  sd115   194 7360 W   77832192
>     0   4550      0 zpool-datastore  sd114   194 7296 W   77836288
>     0   4550      0 zpool-datastore  sd111   194 7104 W   77840384
>     0   4550      0 zpool-datastore  sd112   194 7168 W   77840384
>     0   4550      0 zpool-datastore  sd108   194 6912 W   77844480
>     0   4550      0 zpool-datastore  sd110   194 7040 W   77864960
>     0   4550      0 zpool-datastore  sd116   194 7424 W   77873152
>     0   4550      0 zpool-datastore  sd107   194 6848 W   77914112
>     0   4550      0 zpool-datastore  sd106   194 6784 W   77918208
>     0   4550      0 zpool-datastore  sd109   194 6976 W   77926400
>     0   4550      0 zpool-datastore  sd128   194 8192 W   78938112
>     0   4550      0 zpool-datastore  sd118   194 7552 W   78979072
>     0   4550      0 zpool-datastore  sd125   194 8000 W   78991360
>     0   4550      0 zpool-datastore  sd120   194 7680 W   78999552
>     0   4550      0 zpool-datastore  sd127   194 8128 W   79007744
>     0   4550      0 zpool-datastore  sd126   194 8064 W   79011840
>     0   4550      0 zpool-datastore  sd119   194 7616 W   79020032
>     0   4550      0 zpool-datastore  sd123   194 7872 W   79020032
>     0   4550      0 zpool-datastore  sd122   194 7808 W   79048704
>     0   4550      0 zpool-datastore  sd124   194 7936 W   79056896
>     0   4550      0 zpool-datastore  sd121   194 7744 W   79065088
>     0   4550      0 zpool-datastore  sd105   194 6720 W   82460672
>     0   4550      0 zpool-datastore  sd102   194 6528 W   82468864
>     0   4550      0 zpool-datastore  sd101   194 6464 W   82477056
>     0   4550      0 zpool-datastore  sd104   194 6656 W   82477056
>     0   4550      0 zpool-datastore  sd103   194 6592 W   82481152
>     0   4550      0 zpool-datastore  sd141   194 9024 W   82489344
>     0   4550      0 zpool-datastore  sd99    194 6336 W   82493440
>     0   4550      0 zpool-datastore  sd98    194 6272 W   82501632
>     0   4550      0 zpool-datastore  sd140   194 8960 W   82513920
>     0   4550      0 zpool-datastore  sd97    194 6208 W   82538496
>     0   4550      0 zpool-datastore  sd100   194 6400 W   82542592
>
> ------------------------------------------------------------------------------------------------------
>
>
>
>
> Any thoughts of how to discover and/or overcome the true bottleneck is 
> much appreciated.
>
> Thanks
>
> Michael
>
>
>
>> On Sep 29, 2016, at 7:46 PM, Dale Ghent <daleg at omniti.com 
>> <mailto:daleg at omniti.com>> wrote:
>>
>>
>> Awesome that you're using LX Zones in a way with BeeGFS.
>>
>> A note on your testing methodology, however:
>> http://lethargy.org/~jesus/writes/disk-benchmarking-with-dd-dont/#.V-3RUqOZPOY 
>> <http://lethargy.org/%7Ejesus/writes/disk-benchmarking-with-dd-dont/#.V-3RUqOZPOY>
>>
>>> On Sep 29, 2016, at 10:21 PM, Michael Talbott <mtalbott at lji.org> wrote:
>>>
>>> Hi, I'm trying to find a way to achieve massive write speeds with 
>>> some decent hardware which will be used for some parallel computing 
>>> needs (bioinformatics). Eventually if all goes well and my testing 
>>> succeeds.. I'll be duplicating this setup and run BeeGFS in a few LX 
>>> zones (THANK YOU LX ZONES!!!) to run some truly massive parallel 
>>> computing storage happiness, but, I'd like to tune this box as much 
>>> as possible first.
>>>
>>> For this setup I'm using an Intel S2600GZ board and 2 x E5-2640 (six 
>>> cores ea) @ 2.5GHz and there's 144GB ECC ram installed. I have 3 x 
>>> SAS2008 based LSI cards in that box. 1 of those is connected to 8 
>>> internal SSDs and the other 2 cards (4 ports) are connected to a 45 
>>> bay drive enclosure. And then there's 2 intel x 2 port 10ge cards 
>>> for connectivity.
>>>
>>> I've created so many different zpools in different configurations 
>>> between straight up striped with no redundancy, radiz2, raidz3.. 
>>> used multipath, non-multipath'd with 8x phy links instead of 4x 
>>> multipath links, etc, etc in order to find the magic combination for 
>>> maximum performance, but there's something somewhere capping raw 
>>> throughput and I can't seem to find it.
>>>
>>> Now the crazy part is I can for instance, create zpoolA with ~20 
>>> drives (via cardA and attached only to backplaneA), create zpoolB 
>>> with another ~20 drives (via cardB and attached only to backplaneB), 
>>> and each of them gets the same performance individually (~1GB/s 
>>> write and 2.5GB/s read). So, my thought is if I destroy zpoolB and 
>>> attach all those drives to zpoolA via additional vdevs.. It should 
>>> double the performance or make some sort of significant improvement, 
>>> but, nope, roughly the same speed.. Then I thought, ok, well maybe 
>>> it's a slowest vdev sort of thing.. So then I created vdevs such 
>>> each vdev used half it's drives from backplaneA and the other half 
>>> from backplaneB.. That would force data distribution between the 
>>> cards for each vdev and double the speed and get me to 2GB/s write.. 
>>> but, nope. same deal. 1G/s write and 2.5G read :(
>>>
>>> When I create the pool from scratch, for each vdev I add I see a 
>>> linear increase in performance until I hit about 4-5 vdevs.. That's 
>>> where the performance flatlines and no matter what I do beyond that 
>>> it just wont go any faster :(
>>>
>>> Also, if I create a pure SSD pool with cardC, the linear read/write 
>>> performance of those are hitting the exact same numbers as the 
>>> others :( Bottom line, no matter what pool configuration I use, no 
>>> matter what recordsize is set in zfs, I'm always getting capped with 
>>> roughly 1GB/s write and 2.5GB/s read.
>>>
>>> I thought maybe there wasn't enough PCIe lanes to run all of those 
>>> cards at 8x, but, that's not the case, this board can run 6 x 8 lane 
>>> PCIe 3.0 cards at full speed.. I booted it up in linux to use lspci 
>>> -vv to make sure of it (since I'm not sure how to view pcie speeds 
>>> in OmniOS), and sure enough, everything is running with 8x width, so 
>>> that's not it.
>>>
>>> Oh, and just fyi, this is my super simple throughput testing script 
>>> that I run with compression disabled on the tested pool.
>>>
>>> START=$(date +%s)
>>> /usr/gnu/bin/dd if=/dev/zero of=/datastore/testdd bs=1M count=10k
>>> sync
>>> echo $(($(date +%s)-START))
>>>
>>> My goal is to find a way to achieve at least 2GB/s write and 4GB/s 
>>> read which I think is theoretically possbile with this hardware..
>>>
>>> Anyone have any ideas of what could be limiting this or how to 
>>> remedy it? Could it be the mpt_sas driver itself somehow throttling 
>>> access to all these devices? Or maybe do I need to do some sort of 
>>> irq-cpu pinning type of magic?
>>>
>>>
>>> Thanks,
>>>
>>> Michael
>>>
>>> _______________________________________________
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss at lists.omniti.com
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>
>
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20160930/de47fe2d/attachment-0001.html>


More information about the OmniOS-discuss mailing list