[OmniOS-discuss] Slow NFS speeds at rsize > 128k

Wed Jan 7 20:48:17 UTC 2015

> On Jan 7, 2015, at 12:11 PM, Stephan Budach <stephan.budach at jvm.de> wrote:
> 
> Am 07.01.15 um 18:00 schrieb Richard Elling:
>> 
>>> On Jan 7, 2015, at 2:28 AM, Stephan Budach <stephan.budach at JVM.DE <mailto:stephan.budach at JVM.DE>> wrote:
>>> 
>>> Hello everyone,
>>> 
>>> I am sharing my zfs via NFS to a couple of OVM nodes. I noticed really bad NFS read performance, when rsize goes beyond 128k, whereas the performance is just fine at 32k. The issue is, that the ovs-agent, which is performing the actual mount, doesn't accept or pass any NFS mount options to the NFS server.
>> 
>> The other issue is that illumos/Solaris on x86 tuning of server-side size settings does
>> not work because the compiler optimizes away the tunables. There is a trivial fix, but it
>> requires a rebuild.
>> 
>>> To give some numbers, a rsize of 1mb results in a read throughput of approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting a NFS export from a OEL 6u4 box has no issues with this, as the read speeds from this export are 108+MB/s regardles of the rsize of the NFS mount.
>> 
>> Brendan wrote about a similar issue in the Dtrace book as a case study. See chapter 5
>> case study on ZFS 8KB mirror reads.
>> 
>>> 
>>> The OmniOS box is currently connected to a 10GbE port at our core 6509, but the NFS client is connected through a 1GbE port only. MTU is at 1500 and can currently not be upped.
>>> Anyone having a tip, why a rsize of 64k+ will result in such a performance drop?
>> 
>> It is entirely due to optimizations for small I/O going way back to the 1980s.
>>  -- richard
> But, doesn't that mean, that Oracle Solaris will have the same issue or has Oracle addressed that in recent Solaris versions? Not, that I am intending to switch over, but that would be something I'd like to give my SR engineer to chew on…

Look for yourself :-)
In "broken" systems, such as this Solaris 11.1 system:
# echo nfs3_tsize::dis | mdb -k
nfs3_tsize:                     pushq  %rbp
nfs3_tsize+1:                   movq   %rsp,%rbp
nfs3_tsize+4:                   subq   $0x8,%rsp
nfs3_tsize+8:                   movq   %rdi,-0x8(%rbp)
nfs3_tsize+0xc:                 movl   (%rdi),%eax
nfs3_tsize+0xe:                 leal   -0x2(%rax),%ecx
nfs3_tsize+0x11:                cmpl   $0x1,%ecx
nfs3_tsize+0x14:                jbe    +0x12    <nfs3_tsize+0x28>
nfs3_tsize+0x16:                cmpl   $0x5,%eax
nfs3_tsize+0x19:                movl   $0x100000,%eax
nfs3_tsize+0x1e:                movl   $0x8000,%ecx
nfs3_tsize+0x23:                cmovl.ne %ecx,%eax
nfs3_tsize+0x26:                jmp    +0x5     <nfs3_tsize+0x2d>
nfs3_tsize+0x28:                movl   $0x100000,%eax
nfs3_tsize+0x2d:                leave  
nfs3_tsize+0x2e:                ret    

at +0x19 you'll notice hardwired 1MB

by contrast, on a proper system
# echo nfs3_tsize::dis | mdb -k
nfs3_tsize:                     pushq  %rbp
nfs3_tsize+1:                   movq   %rsp,%rbp
nfs3_tsize+4:                   subq   $0x10,%rsp
nfs3_tsize+8:                   movq   %rdi,-0x8(%rbp)
nfs3_tsize+0xc:                 movl   (%rdi),%edx
nfs3_tsize+0xe:                 leal   -0x2(%rdx),%eax
nfs3_tsize+0x11:                cmpl   $0x1,%eax
nfs3_tsize+0x14:                jbe    +0x12    <nfs3_tsize+0x28>
nfs3_tsize+0x16:                
movl   -0x37f8ea60(%rip),%eax   <nfs3_max_transfer_size_rdma>
nfs3_tsize+0x1c:                cmpl   $0x5,%edx
nfs3_tsize+0x1f:                
cmovl.ne -0x37f8ea72(%rip),%eax <nfs3_max_transfer_size_clts>
nfs3_tsize+0x26:                leave  
nfs3_tsize+0x27:                ret    
nfs3_tsize+0x28:                
movl   -0x37f8ea76(%rip),%eax   <nfs3_max_transfer_size_cots>
nfs3_tsize+0x2e:                leave  
nfs3_tsize+0x2f:                ret    

where you can actually tune it according to the Solaris Tunable Parameters guide.

NB, we fixed this years ago at Nexenta and I'm certain it has not been upstreamed. There are
a number of other related fixes, all of the same nature. If someone is inclined to upstream 
contact me directly.

Once, fixed, you'll be able to change the server's settings for negotiating the rsize/wsize with
the clients. Many NAS vendors use smaller limits, and IMHO it is a good idea anyway. For 
example, see http://blog.richardelling.com/2012/04/latency-and-io-size-cars-vs-trains.html <http://blog.richardelling.com/2012/04/latency-and-io-size-cars-vs-trains.html>
 -- richard

> 
> In any way, the first bummer is, that Oracle chose to not have it's ovs-agent be capable of accepting and passing the NFS mount options…
> 
> Cheers,
> budy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150107/31090f0b/attachment-0001.html>