<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jan 7, 2015, at 1:21 PM, Stephan Budach <<a href="mailto:stephan.budach@jvm.de" class="">stephan.budach@jvm.de</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" class="">
<div text="#000000" bgcolor="#FFFFFF" class="">
<div class="moz-cite-prefix">Am 07.01.15 um 21:48 schrieb Richard
Elling:<br class="">
</div>
<blockquote cite="mid:3010EE58-59DE-408D-8BFA-28571F9B1A2B@richardelling.com" type="cite" class="">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" class="">
<br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Jan 7, 2015, at 12:11 PM, Stephan Budach <<a moz-do-not-send="true" href="mailto:stephan.budach@jvm.de" class="">stephan.budach@jvm.de</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div text="#000000" bgcolor="#FFFFFF" class="">
<div class="moz-cite-prefix">Am 07.01.15 um 18:00 schrieb
Richard Elling:<br class="">
</div>
<blockquote cite="mid:ACE15BA6-97B4-4C94-B758-654AF147FE27@richardelling.com" type="cite" class=""> <br class="">
<div class="">
<blockquote type="cite" class="">
<div class="">On Jan 7, 2015, at 2:28 AM, Stephan
Budach <<a moz-do-not-send="true" href="mailto:stephan.budach@JVM.DE" class="">stephan.budach@JVM.DE</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div text="#000000" bgcolor="#FFFFFF" class=""> <font class="" face="Helvetica, Arial, sans-serif">Hello
everyone,<br class="">
<br class="">
I am sharing my zfs via NFS to a couple of OVM
nodes. I noticed really bad NFS read
performance, when rsize goes beyond 128k,
whereas the performance is just fine at 32k.
The issue is, that the ovs-agent, which is
performing the actual mount, doesn't accept or
pass any NFS mount options to the NFS server.
</font></div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">The other issue is that illumos/Solaris
on x86 tuning of server-side size settings does</div>
<div class="">not work because the compiler optimizes
away the tunables. There is a trivial fix, but it</div>
<div class="">requires a rebuild.</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div text="#000000" bgcolor="#FFFFFF" class=""><font class="" face="Helvetica, Arial, sans-serif">To
give some numbers, a rsize of 1mb results in a
read throughput of approx. 2Mb/s, whereas a
rsize of 32k gives me 110Mb/s. Mounting a NFS
export from a OEL 6u4 box has no issues with
this, as the read speeds from this export are
108+MB/s regardles of the rsize of the NFS
mount.<br class="">
</font></div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">Brendan wrote about a similar issue in
the Dtrace book as a case study. See chapter 5</div>
<div class="">case study on ZFS 8KB mirror reads.</div>
<br class="">
<blockquote type="cite" class="">
<div class="">
<div text="#000000" bgcolor="#FFFFFF" class=""><font class="" face="Helvetica, Arial, sans-serif">
<br class="">
The OmniOS box is currently connected to a
10GbE port at our core 6509, but the NFS
client is connected through a 1GbE port only.
MTU is at 1500 and can currently not be upped.<br class="">
Anyone having a tip, why a rsize of 64k+ will
result in such a performance drop?<br class="">
</font></div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">It is entirely due to optimizations for
small I/O going way back to the 1980s.</div>
<div class=""> -- richard</div>
</div>
</blockquote>
But, doesn't that mean, that Oracle Solaris will have the
same issue or has Oracle addressed that in recent Solaris
versions? Not, that I am intending to switch over, but
that would be something I'd like to give my SR engineer to
chew on…<br class="">
</div>
</div>
</blockquote>
<div class=""><br class="">
</div>
<div class="">Look for yourself :-)</div>
<div class="">In "broken" systems, such as this Solaris 11.1 system:</div>
<div class="">
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class=""># echo nfs3_tsize::dis | mdb -k</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize: pushq %rbp</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+1: movq
%rsp,%rbp</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+4: subq
$0x8,%rsp</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+8: movq
%rdi,-0x8(%rbp)</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+0xc: movl
(%rdi),%eax</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+0xe: leal
-0x2(%rax),%ecx</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+0x11: cmpl
$0x1,%ecx</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+0x14: jbe +0x12
<nfs3_tsize+0x28></div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+0x16: cmpl
$0x5,%eax</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+0x19: movl
$0x100000,%eax</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+0x1e: movl
$0x8000,%ecx</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+0x23: cmovl.ne
%ecx,%eax</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+0x26: jmp +0x5
<nfs3_tsize+0x2d></div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+0x28: movl
$0x100000,%eax</div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+0x2d: leave </div>
<div style="margin: 0px; font-size: 11px; font-family: Menlo;
color: rgb(215, 201, 167); background-color: rgb(142, 53,
40);" class="">nfs3_tsize+0x2e: ret </div>
<div class=""><br class="">
</div>
<div class="">at +0x19 you'll notice hardwired 1MB</div>
</div>
</div>
</blockquote>
Ouch! Is that from a NFS client or server? </div></div></blockquote><div><br class=""></div><div>server</div><br class=""><blockquote type="cite" class=""><div class=""><div text="#000000" bgcolor="#FFFFFF" class="">Or rather, I know that
the NFS server negotiates the options with the client and if no
options are passed from the client to the server, the server sets up
the connection with it's defaults. </div></div></blockquote><div><br class=""></div><div>the server and client negotiate, so both can have defaults</div><br class=""><blockquote type="cite" class=""><div class=""><div text="#000000" bgcolor="#FFFFFF" class="">So, this S11.1 output - is that
from the NFS server? If yes, it would mean that the NFS server would
go with the 1mb rsize/wsize since the OracleVM Server has not
provided any options to it.<br class=""></div></div></blockquote><div><br class=""></div><div>You are not mistaken. AFAIK, this has been broken in Solaris x86 for more than 10 years.</div><div>Fortunately, most people can adjust on the client side, unless you're running ESX or something</div><div>that is difficult to adjust... like you seem to be.</div><br class=""><blockquote type="cite" class=""><div class=""><div text="#000000" bgcolor="#FFFFFF" class="">
<blockquote cite="mid:3010EE58-59DE-408D-8BFA-28571F9B1A2B@richardelling.com" type="cite" class="">
<div class="">
<div class="">
<div class=""><br class="">
</div>
<div class="">by contrast, on a proper system</div>
<div class="">
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">#
echo nfs3_tsize::dis | mdb -k</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize:
pushq %rbp</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+1:
movq %rsp,%rbp</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+4:
subq $0x10,%rsp</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+8:
movq %rdi,-0x8(%rbp)</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+0xc:
movl (%rdi),%edx</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+0xe:
leal -0x2(%rdx),%eax</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+0x11:
cmpl $0x1,%eax</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+0x14:
jbe +0x12 <nfs3_tsize+0x28></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+0x16:
</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">movl
-0x37f8ea60(%rip),%eax
<nfs3_max_transfer_size_rdma></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+0x1c:
cmpl $0x5,%edx</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+0x1f:
</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">cmovl.ne
-0x37f8ea72(%rip),%eax <nfs3_max_transfer_size_clts></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+0x26:
leave </div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+0x27:
ret </div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+0x28:
</div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">movl
-0x37f8ea76(%rip),%eax
<nfs3_max_transfer_size_cots></div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+0x2e:
leave </div>
<div style="margin: 0px; font-family: Menlo; color: rgb(76,
47, 45); background-color: rgb(223, 219, 196);" class="">nfs3_tsize+0x2f:
ret </div>
</div>
<div class=""><br class="">
</div>
<div class="">where you can actually tune it according to the
Solaris Tunable Parameters guide.</div>
<div class=""><br class="">
</div>
<div class="">NB, we fixed this years ago at Nexenta and I'm
certain it has not been upstreamed. There are</div>
<div class="">a number of other related fixes, all of the same
nature. If someone is inclined to upstream </div>
<div class="">contact me directly.</div>
<div class=""><br class="">
</div>
<div class="">Once, fixed, you'll be able to change the
server's settings for negotiating the rsize/wsize with</div>
<div class="">the clients. Many NAS vendors use smaller
limits, and IMHO it is a good idea anyway. For </div>
<div class="">example, see <a moz-do-not-send="true" href="http://blog.richardelling.com/2012/04/latency-and-io-size-cars-vs-trains.html" class="">http://blog.richardelling.com/2012/04/latency-and-io-size-cars-vs-trains.html</a></div>
<div class=""> -- richard</div>
</div>
<div class=""><br class="">
</div>
</div>
</blockquote>
I am mostly satisfied with a transfer size of 32k and as this NFS is
used as storage repository for the vdisk images and approx 80 guests
are accessing those, so the i/o is random anyway. So smaller I/Os
are preferred anyway. However, the NFS export from the OEL box just
doesn't have this massive performance hit, even with a rsize/wsize
of 1mb.<br class=""></div></div></blockquote><div><br class=""></div><div>Yes, this is not the only issue you're facing. Even with modest hardware and OOB settings, it is</div><div>easy to soak 1GbE. For ZFS backends, we use 128k as the max rsize/wsize, since that is a</div><div>practical upper limit (even though you can have larger block sizes in ZFS).</div><div><br class=""></div><div>Here are the OOB tcp parameters we use</div><div><div style="margin: 0px; font-family: Menlo; color: rgb(76, 47, 45); background-color: rgb(223, 219, 196);" class="">tcp max_buf rw 16777216 16777216 1048576 8192-1073741824</div><div style="margin: 0px; font-family: Menlo; color: rgb(76, 47, 45); background-color: rgb(223, 219, 196);" class="">tcp recv_buf rw 1250000 1250000 1048576 2048-16777216</div><div style="margin: 0px; font-family: Menlo; color: rgb(76, 47, 45); background-color: rgb(223, 219, 196);" class="">tcp sack rw active -- active never,passive,</div><div style="margin: 0px; font-family: Menlo; color: rgb(76, 47, 45); background-color: rgb(223, 219, 196);" class=""> active</div><div style="margin: 0px; font-family: Menlo; color: rgb(76, 47, 45); background-color: rgb(223, 219, 196);" class="">tcp send_buf rw 1250000 1250000 128000 4096-16777216</div><div class=""><br class=""></div><div class="">no real magic here, but if you measure your network closely and it doesn't change much, then</div><div class="">you can pre-set the values from your BDP.</div><div class=""><br class=""></div><div class="">And, of course, following the USE methodology, check for errors... I can't count the number of</div><div class="">times bad transceivers, cabling, or switch settings tripped people up.</div><div class=""> -- richard</div></div><br class=""><blockquote type="cite" class=""><div class=""><div text="#000000" bgcolor="#FFFFFF" class="">
<blockquote cite="mid:3010EE58-59DE-408D-8BFA-28571F9B1A2B@richardelling.com" type="cite" class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">
<div text="#000000" bgcolor="#FFFFFF" class=""> <br class="">
In any way, the first bummer is, that Oracle chose to not
have it's ovs-agent be capable of accepting and passing
the NFS mount options…<br class="">
<br class="">
Cheers,<br class="">
budy<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</blockquote>
Thanks,<br class="">
budy<br class="">
</div>
</div></blockquote></div><br class=""></body></html>