[OmniOS-discuss] Slow NFS speeds at rsize > 128k

Wed Jan 7 21:21:03 UTC 2015

Am 07.01.15 um 21:48 schrieb Richard Elling:
>
>> On Jan 7, 2015, at 12:11 PM, Stephan Budach <stephan.budach at jvm.de 
>> <mailto:stephan.budach at jvm.de>> wrote:
>>
>> Am 07.01.15 um 18:00 schrieb Richard Elling:
>>>
>>>> On Jan 7, 2015, at 2:28 AM, Stephan Budach <stephan.budach at JVM.DE 
>>>> <mailto:stephan.budach at JVM.DE>> wrote:
>>>>
>>>> Hello everyone,
>>>>
>>>> I am sharing my zfs via NFS to a couple of OVM nodes. I noticed 
>>>> really bad NFS read performance, when rsize goes beyond 128k, 
>>>> whereas the performance is just fine at 32k. The issue is, that the 
>>>> ovs-agent, which is performing the actual mount, doesn't accept or 
>>>> pass any NFS mount options to the NFS server.
>>>
>>> The other issue is that illumos/Solaris on x86 tuning of server-side 
>>> size settings does
>>> not work because the compiler optimizes away the tunables. There is 
>>> a trivial fix, but it
>>> requires a rebuild.
>>>
>>>> To give some numbers, a rsize of 1mb results in a read throughput 
>>>> of approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting 
>>>> a NFS export from a OEL 6u4 box has no issues with this, as the 
>>>> read speeds from this export are 108+MB/s regardles of the rsize of 
>>>> the NFS mount.
>>>
>>> Brendan wrote about a similar issue in the Dtrace book as a case 
>>> study. See chapter 5
>>> case study on ZFS 8KB mirror reads.
>>>
>>>>
>>>> The OmniOS box is currently connected to a 10GbE port at our core 
>>>> 6509, but the NFS client is connected through a 1GbE port only. MTU 
>>>> is at 1500 and can currently not be upped.
>>>> Anyone having a tip, why a rsize of 64k+ will result in such a 
>>>> performance drop?
>>>
>>> It is entirely due to optimizations for small I/O going way back to 
>>> the 1980s.
>>>  -- richard
>> But, doesn't that mean, that Oracle Solaris will have the same issue 
>> or has Oracle addressed that in recent Solaris versions? Not, that I 
>> am intending to switch over, but that would be something I'd like to 
>> give my SR engineer to chew on…
>
> Look for yourself :-)
> In "broken" systems, such as this Solaris 11.1 system:
> # echo nfs3_tsize::dis | mdb -k
> nfs3_tsize:                     pushq  %rbp
> nfs3_tsize+1:                   movq %rsp,%rbp
> nfs3_tsize+4:                   subq $0x8,%rsp
> nfs3_tsize+8:                   movq %rdi,-0x8(%rbp)
> nfs3_tsize+0xc:                 movl (%rdi),%eax
> nfs3_tsize+0xe:                 leal -0x2(%rax),%ecx
> nfs3_tsize+0x11:                cmpl $0x1,%ecx
> nfs3_tsize+0x14:                jbe    +0x12   <nfs3_tsize+0x28>
> nfs3_tsize+0x16:                cmpl $0x5,%eax
> nfs3_tsize+0x19:                movl $0x100000,%eax
> nfs3_tsize+0x1e:                movl $0x8000,%ecx
> nfs3_tsize+0x23:                cmovl.ne %ecx,%eax
> nfs3_tsize+0x26:                jmp    +0x5   <nfs3_tsize+0x2d>
> nfs3_tsize+0x28:                movl $0x100000,%eax
> nfs3_tsize+0x2d:                leave
> nfs3_tsize+0x2e:                ret
>
> at +0x19 you'll notice hardwired 1MB
Ouch! Is that from a NFS client or server? Or rather, I know that the 
NFS server negotiates the options with the client and if no options are 
passed from the client to the server, the server sets up the connection 
with it's defaults. So, this S11.1 output - is that from the NFS server? 
If yes, it would mean that the NFS server would go with the 1mb 
rsize/wsize since the OracleVM Server has not provided any options to it.
>
> by contrast, on a proper system
> # echo nfs3_tsize::dis | mdb -k
> nfs3_tsize:                     pushq  %rbp
> nfs3_tsize+1:                   movq   %rsp,%rbp
> nfs3_tsize+4:                   subq   $0x10,%rsp
> nfs3_tsize+8:                   movq   %rdi,-0x8(%rbp)
> nfs3_tsize+0xc:                 movl   (%rdi),%edx
> nfs3_tsize+0xe:                 leal   -0x2(%rdx),%eax
> nfs3_tsize+0x11:               cmpl   $0x1,%eax
> nfs3_tsize+0x14:               jbe    +0x12    <nfs3_tsize+0x28>
> nfs3_tsize+0x16:
> movl   -0x37f8ea60(%rip),%eax <nfs3_max_transfer_size_rdma>
> nfs3_tsize+0x1c:               cmpl   $0x5,%edx
> nfs3_tsize+0x1f:
> cmovl.ne -0x37f8ea72(%rip),%eax <nfs3_max_transfer_size_clts>
> nfs3_tsize+0x26:               leave
> nfs3_tsize+0x27:               ret
> nfs3_tsize+0x28:
> movl   -0x37f8ea76(%rip),%eax <nfs3_max_transfer_size_cots>
> nfs3_tsize+0x2e:               leave
> nfs3_tsize+0x2f:               ret
>
> where you can actually tune it according to the Solaris Tunable 
> Parameters guide.
>
> NB, we fixed this years ago at Nexenta and I'm certain it has not been 
> upstreamed. There are
> a number of other related fixes, all of the same nature. If someone is 
> inclined to upstream
> contact me directly.
>
> Once, fixed, you'll be able to change the server's settings for 
> negotiating the rsize/wsize with
> the clients. Many NAS vendors use smaller limits, and IMHO it is a 
> good idea anyway. For
> example, see 
> http://blog.richardelling.com/2012/04/latency-and-io-size-cars-vs-trains.html
>  -- richard
>
I am mostly satisfied with a transfer size of 32k and as this NFS is 
used as storage repository for the vdisk images and approx 80 guests are 
accessing those, so the i/o is random anyway. So smaller I/Os are 
preferred anyway. However, the NFS export from the OEL box just doesn't 
have this massive performance hit, even with a rsize/wsize of 1mb.
>
>>
>> In any way, the first bummer is, that Oracle chose to not have it's 
>> ovs-agent be capable of accepting and passing the NFS mount options…
>>
>> Cheers,
>> budy
>
Thanks,
budy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150107/1a41bf32/attachment.html>