<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>One of the things I do is turn caching off. on a dataset #zfs get
primarycache=none dataset. Give me better disk performance</p>
<p>lk<br>
</p>
<br>
<div class="moz-cite-prefix">On 9/30/16 12:19 AM, Michael Talbott
wrote:<br>
</div>
<blockquote cite="mid:27B73CAE-7A40-42F6-AD37-1556015D6D4E@lji.org"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
I'm very aware that dd is not the best tool for measuring disk
performance in most cases. And I know the read throughput number
is way off b/c of zfs caching mechanisms. However it does work in
this case to illustrate the point in my case of a write throttle
somewhere in the system. If anyone needs me to test with some
other tool for illustrative purposes, I can do that too. It's just
so odd that 1 card with a given set of disks attached in a pool
provide roughly the same net throughput as 2 cards with 2 sets of
disks in said pool.
<div class="">
<div class=""><br class="">
</div>
<div class="">But, for the nay-sayers of dd testing, I'll
provide you with this.. Here's an example of using 2 x 11 disk
raidz3 vdevs where all 22 drives live on one backplane
attached to 1 card with an 8x phy sas connection. And then
adding 2 more 11 disk raidz3 vdevs that are connected to that
system on a separate card (also using an 8x phy sas link). No
compression. Running bonnie++ to saturate the disks while I
pull the numbers from iotop.</div>
<div class=""><br class="">
</div>
<div class="">The following is an output of iotop (which uses
dtrace for measuring disk io)</div>
<div class=""><a moz-do-not-send="true"
href="http://www.brendangregg.com/DTrace/iotop" class="">http://www.brendangregg.com/DTrace/iotop</a></div>
<div class=""><br class="">
</div>
<div class="">Here's the 22 drive pool (all attached to 1 card):</div>
<div class="">------------------------------------------------------------------------------------------------------</div>
<div class="">2016 Sep 30 00:02:42, load: 3.17, disk_r: 0
KB, disk_w: 3394084 KB<br class="">
<br class="">
UID PID PPID CMD DEVICE MAJ MIN D
BYTES<br class="">
0 7630 0 zpool-datastore sd124 194 7936 W
161812480<br class="">
0 7630 0 zpool-datastore sd118 194 7552 W
161820672<br class="">
0 7630 0 zpool-datastore sd127 194 8128 W
161845248<br class="">
0 7630 0 zpool-datastore sd128 194 8192 W
161845248<br class="">
0 7630 0 zpool-datastore sd122 194 7808 W
161849344<br class="">
0 7630 0 zpool-datastore sd119 194 7616 W
161857536<br class="">
0 7630 0 zpool-datastore sd121 194 7744 W
161857536<br class="">
0 7630 0 zpool-datastore sd125 194 8000 W
161865728<br class="">
0 7630 0 zpool-datastore sd123 194 7872 W
161869824<br class="">
0 7630 0 zpool-datastore sd126 194 8064 W
161873920<br class="">
0 7630 0 zpool-datastore sd120 194 7680 W
161906688<br class="">
0 7630 0 zpool-datastore sd136 194 8704 W
165916672<br class="">
0 7630 0 zpool-datastore sd137 194 8768 W
165916672<br class="">
0 7630 0 zpool-datastore sd138 194 8832 W
165933056<br class="">
0 7630 0 zpool-datastore sd135 194 8640 W
165937152<br class="">
0 7630 0 zpool-datastore sd139 194 8896 W
165941248<br class="">
0 7630 0 zpool-datastore sd134 194 8576 W
165945344<br class="">
0 7630 0 zpool-datastore sd130 194 8320 W
165974016<br class="">
0 7630 0 zpool-datastore sd129 194 8256 W
165978112<br class="">
0 7630 0 zpool-datastore sd132 194 8448 W
165994496<br class="">
0 7630 0 zpool-datastore sd133 194 8512 W
165994496<br class="">
0 7630 0 zpool-datastore sd131 194 8384 W
166006784<br class="">
<br class="">
<div class="">------------------------------------------------------------------------------------------------------</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">And here's the pool extended with 2 more raidz3s
with 2 cards</div>
<div class="">notice it's almost LITERALLY cut in HALF per
drive!</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">
<div class="">------------------------------------------------------------------------------------------------------</div>
</div>
<div class=""><br class="">
</div>
<br class="">
2016 Sep 30 00:01:07, load: 4.59, disk_r: 8 KB,
disk_w: 3609852 KB<br class="">
<br class="">
UID PID PPID CMD DEVICE MAJ MIN D
BYTES<br class="">
0 4550 0 zpool-datastore sd133 194 8512 W
76558336<br class="">
0 4550 0 zpool-datastore sd132 194 8448 W
76566528<br class="">
0 4550 0 zpool-datastore sd135 194 8640 W
76570624<br class="">
0 4550 0 zpool-datastore sd134 194 8576 W
76574720<br class="">
0 4550 0 zpool-datastore sd136 194 8704 W
76582912<br class="">
0 4550 0 zpool-datastore sd131 194 8384 W
76611584<br class="">
0 4550 0 zpool-datastore sd130 194 8320 W
76644352<br class="">
0 4550 0 zpool-datastore sd137 194 8768 W
76648448<br class="">
0 4550 0 zpool-datastore sd129 194 8256 W
76660736<br class="">
0 4550 0 zpool-datastore sd138 194 8832 W
76713984<br class="">
0 4550 0 zpool-datastore sd139 194 8896 W
77369344<br class="">
0 4550 0 zpool-datastore sd113 194 7232 W
77770752<br class="">
0 4550 0 zpool-datastore sd115 194 7360 W
77832192<br class="">
0 4550 0 zpool-datastore sd114 194 7296 W
77836288<br class="">
0 4550 0 zpool-datastore sd111 194 7104 W
77840384<br class="">
0 4550 0 zpool-datastore sd112 194 7168 W
77840384<br class="">
0 4550 0 zpool-datastore sd108 194 6912 W
77844480<br class="">
0 4550 0 zpool-datastore sd110 194 7040 W
77864960<br class="">
0 4550 0 zpool-datastore sd116 194 7424 W
77873152<br class="">
0 4550 0 zpool-datastore sd107 194 6848 W
77914112<br class="">
0 4550 0 zpool-datastore sd106 194 6784 W
77918208<br class="">
0 4550 0 zpool-datastore sd109 194 6976 W
77926400<br class="">
0 4550 0 zpool-datastore sd128 194 8192 W
78938112<br class="">
0 4550 0 zpool-datastore sd118 194 7552 W
78979072<br class="">
0 4550 0 zpool-datastore sd125 194 8000 W
78991360<br class="">
0 4550 0 zpool-datastore sd120 194 7680 W
78999552<br class="">
0 4550 0 zpool-datastore sd127 194 8128 W
79007744<br class="">
0 4550 0 zpool-datastore sd126 194 8064 W
79011840<br class="">
0 4550 0 zpool-datastore sd119 194 7616 W
79020032<br class="">
0 4550 0 zpool-datastore sd123 194 7872 W
79020032<br class="">
0 4550 0 zpool-datastore sd122 194 7808 W
79048704<br class="">
0 4550 0 zpool-datastore sd124 194 7936 W
79056896<br class="">
0 4550 0 zpool-datastore sd121 194 7744 W
79065088<br class="">
0 4550 0 zpool-datastore sd105 194 6720 W
82460672<br class="">
0 4550 0 zpool-datastore sd102 194 6528 W
82468864<br class="">
0 4550 0 zpool-datastore sd101 194 6464 W
82477056<br class="">
0 4550 0 zpool-datastore sd104 194 6656 W
82477056<br class="">
0 4550 0 zpool-datastore sd103 194 6592 W
82481152<br class="">
0 4550 0 zpool-datastore sd141 194 9024 W
82489344<br class="">
0 4550 0 zpool-datastore sd99 194 6336 W
82493440<br class="">
0 4550 0 zpool-datastore sd98 194 6272 W
82501632<br class="">
0 4550 0 zpool-datastore sd140 194 8960 W
82513920<br class="">
0 4550 0 zpool-datastore sd97 194 6208 W
82538496<br class="">
0 4550 0 zpool-datastore sd100 194 6400 W
82542592<br class="">
<br class="">
<div class="">------------------------------------------------------------------------------------------------------</div>
<div class=""><br class="">
</div>
<br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">Any thoughts of how to discover and/or overcome
the true bottleneck is much appreciated.</div>
<div class=""><br class="">
</div>
<div class="">Thanks</div>
<div class=""><br class="">
</div>
<div class="">Michael<br class="">
<div class="">
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Sep 29, 2016, at 7:46 PM, Dale Ghent
<<a moz-do-not-send="true"
href="mailto:daleg@omniti.com" class="">daleg@omniti.com</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class=""><br class="">
Awesome that you're using LX Zones in a way with
BeeGFS.<br class="">
<br class="">
A note on your testing methodology, however:<br
class="">
<a moz-do-not-send="true"
href="http://lethargy.org/%7Ejesus/writes/disk-benchmarking-with-dd-dont/#.V-3RUqOZPOY"
class="">http://lethargy.org/~jesus/writes/disk-benchmarking-with-dd-dont/#.V-3RUqOZPOY</a><br
class="">
<br class="">
<blockquote type="cite" class="">On Sep 29, 2016,
at 10:21 PM, Michael Talbott
<a class="moz-txt-link-rfc2396E" href="mailto:mtalbott@lji.org"><mtalbott@lji.org></a> wrote:<br class="">
<br class="">
Hi, I'm trying to find a way to achieve massive
write speeds with some decent hardware which
will be used for some parallel computing needs
(bioinformatics). Eventually if all goes well
and my testing succeeds.. I'll be duplicating
this setup and run BeeGFS in a few LX zones
(THANK YOU LX ZONES!!!) to run some truly
massive parallel computing storage happiness,
but, I'd like to tune this box as much as
possible first.<br class="">
<br class="">
For this setup I'm using an Intel S2600GZ board
and 2 x E5-2640 (six cores ea) @ 2.5GHz and
there's 144GB ECC ram installed. I have 3 x
SAS2008 based LSI cards in that box. 1 of those
is connected to 8 internal SSDs and the other 2
cards (4 ports) are connected to a 45 bay drive
enclosure. And then there's 2 intel x 2 port
10ge cards for connectivity.<br class="">
<br class="">
I've created so many different zpools in
different configurations between straight up
striped with no redundancy, radiz2, raidz3..
used multipath, non-multipath'd with 8x phy
links instead of 4x multipath links, etc, etc in
order to find the magic combination for maximum
performance, but there's something somewhere
capping raw throughput and I can't seem to find
it.<br class="">
<br class="">
Now the crazy part is I can for instance, create
zpoolA with ~20 drives (via cardA and attached
only to backplaneA), create zpoolB with another
~20 drives (via cardB and attached only to
backplaneB), and each of them gets the same
performance individually (~1GB/s write and
2.5GB/s read). So, my thought is if I destroy
zpoolB and attach all those drives to zpoolA via
additional vdevs.. It should double the
performance or make some sort of significant
improvement, but, nope, roughly the same speed..
Then I thought, ok, well maybe it's a slowest
vdev sort of thing.. So then I created vdevs
such each vdev used half it's drives from
backplaneA and the other half from backplaneB..
That would force data distribution between the
cards for each vdev and double the speed and get
me to 2GB/s write.. but, nope. same deal. 1G/s
write and 2.5G read :(<br class="">
<br class="">
When I create the pool from scratch, for each
vdev I add I see a linear increase in
performance until I hit about 4-5 vdevs.. That's
where the performance flatlines and no matter
what I do beyond that it just wont go any faster
:(<br class="">
<br class="">
Also, if I create a pure SSD pool with cardC,
the linear read/write performance of those are
hitting the exact same numbers as the others :(
Bottom line, no matter what pool configuration I
use, no matter what recordsize is set in zfs,
I'm always getting capped with roughly 1GB/s
write and 2.5GB/s read.<br class="">
<br class="">
I thought maybe there wasn't enough PCIe lanes
to run all of those cards at 8x, but, that's not
the case, this board can run 6 x 8 lane PCIe 3.0
cards at full speed.. I booted it up in linux to
use lspci -vv to make sure of it (since I'm not
sure how to view pcie speeds in OmniOS), and
sure enough, everything is running with 8x
width, so that's not it.<br class="">
<br class="">
Oh, and just fyi, this is my super simple
throughput testing script that I run with
compression disabled on the tested pool.<br
class="">
<br class="">
START=$(date +%s)<br class="">
/usr/gnu/bin/dd if=/dev/zero
of=/datastore/testdd bs=1M count=10k<br class="">
sync<br class="">
echo $(($(date +%s)-START))<br class="">
<br class="">
My goal is to find a way to achieve at least
2GB/s write and 4GB/s read which I think is
theoretically possbile with this hardware..<br
class="">
<br class="">
Anyone have any ideas of what could be limiting
this or how to remedy it? Could it be the
mpt_sas driver itself somehow throttling access
to all these devices? Or maybe do I need to do
some sort of irq-cpu pinning type of magic?<br
class="">
<br class="">
<br class="">
Thanks,<br class="">
<br class="">
Michael<br class="">
<br class="">
_______________________________________________<br
class="">
OmniOS-discuss mailing list<br class="">
<a class="moz-txt-link-abbreviated" href="mailto:OmniOS-discuss@lists.omniti.com">OmniOS-discuss@lists.omniti.com</a><br class="">
<a class="moz-txt-link-freetext" href="http://lists.omniti.com/mailman/listinfo/omnios-discuss">http://lists.omniti.com/mailman/listinfo/omnios-discuss</a><br class="">
</blockquote>
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
OmniOS-discuss mailing list
<a class="moz-txt-link-abbreviated" href="mailto:OmniOS-discuss@lists.omniti.com">OmniOS-discuss@lists.omniti.com</a>
<a class="moz-txt-link-freetext" href="http://lists.omniti.com/mailman/listinfo/omnios-discuss">http://lists.omniti.com/mailman/listinfo/omnios-discuss</a>
</pre>
</blockquote>
<br>
</body>
</html>