From nomad at ee.washington.edu Thu Aug 23 15:37:01 2018 From: nomad at ee.washington.edu (Lee Damon) Date: Thu, 23 Aug 2018 08:37:01 -0700 Subject: [OmniOS-discuss] Slow NFS writes in 151026 Message-ID: I recently installed a new host. So new I couldn't install LTS on it so I've installed 151026. This host is strictly for serving ZFS-based NFS & CIFS. Everything else is just default. Over time it has become fairly obvious to me that NFS writes are ... well, abysmal. This example is copying a 36GB directory of mixed size/type files. The first copy is strictly on a filesystem on the new server. The second is reading from the new server to an existing one. The third is doing the same read/write activity as test one but on an existing server running 151022. on new fileserver: : || nomad at omics1 fs2test ; time cp -rp 004test omics1/004test-1 real 22m27.225s user 0m0.188s sys 0m29.880s reading from new fileserver, writing to existing fileserver: : || nomad at omics1 hvfs2test ; time cp -rp /misc/fs2test/004test . real 2m9.770s user 0m0.180s sys 0m28.694s existing fileserver: : || nomad at omics1 hvfs2test ; time cp -rp 004test omics1/004test-1 real 2m14.158s user 0m0.242s sys 0m30.313s While the user and system times are consistent across all tests the wall clock time of the first test is 10x that of the others. I've seen wall clock time on these tests take as long as 50 minutes. All tests were done on the same CentOS 7 host. Watching snoop collect packets I see multiple-minutes-long pauses while writing to the new server. If I'm reading the heat maps right - https://drive.google.com/open?id=1zcX9ryXjrPMH0_uUbfywiTTnJDau4WW0 - it seems to be spending about 81% of its time in _t_cancel, waiting on a thread to cancel. I'm not a dev, haven't looked at the code, so it's quite possible I'm misunderstanding what the map is saying. The client spends a lot of time so stuck in diskwait that it can take several minutes to respond after a SIGINT, SIGHUP, or SIGKILL to the cp process. Is anyone else seeing similar problems? nomad -------------- next part -------------- An HTML attachment was scrubbed... URL: From bfriesen at simple.dallas.tx.us Thu Aug 23 16:08:57 2018 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Thu, 23 Aug 2018 11:08:57 -0500 (CDT) Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: What does 'zpool status poolname' (replace poolname with the name of the pool which is NFS exported) say? What is the output of 'iostat -xnE' on your new server? What is the native block size for the disks you used and what is the nature of the disks (SATA, SAS, near-term storage, exceptionally large size, etc.)? Do you have dedicated ZIL SSDs in your pool? Have you done a continual ping from the NFS client to the server to see if there are packet drops? If you use some other TCP-based protocol to transfer a file from the client to the server do you see any strange hangs during the transfer? Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From nomad at ee.washington.edu Thu Aug 23 16:38:38 2018 From: nomad at ee.washington.edu (Lee Damon) Date: Thu, 23 Aug 2018 09:38:38 -0700 Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: These are 12TB SAS drives (Seagate ST12000NM0027) for data & hot spare. ZIL & L2ARC are 480GB INTEL SSDSC2KG48 SSDs. Everything is left at default for sector size, etc. They were basically prepared for into the pool with a simple fdisk -B /dev/rdsk/drive. Ping never shows loss of connectivity. I ran this for about 5 minutes during a test: 303 packets transmitted, 303 received, 0% packet loss, time 302021ms rtt min/avg/max/mdev = 0.109/0.281/2.881/0.227 ms CIFS, scp, and rsync do not exhibit the problem. I forgot to mention that local copies on the file server are also as fast as I would expect (~2 min). pool: pool0 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM pool0 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c0t5000C500A612DA93d0 ONLINE 0 0 0 c0t5000C500957D4A93d0 ONLINE 0 0 0 c0t5000C500957D4C1Bd0 ONLINE 0 0 0 c0t5000C500957D25B3d0 ONLINE 0 0 0 c0t5000C500957D27F3d0 ONLINE 0 0 0 c0t5000C500957D2553d0 ONLINE 0 0 0 logs mirror-1 ONLINE 0 0 0 c0t55CD2E414EC0FF43d0s0 ONLINE 0 0 0 c3t0d0s0 ONLINE 0 0 0 cache c0t55CD2E414EC0FF43d0s1 ONLINE 0 0 0 c3t0d0s1 ONLINE 0 0 0 spares c0t5000C50095722E27d0 AVAIL iostat: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 15.8 57.5 1302.5 5372.3 823.0 0.6 11233.2 8.4 9 9 pool0 0.1 19.9 1.7 163.0 0.0 0.0 1.0 0.1 0 0 rpool 0.1 10.2 0.9 81.5 0.0 0.0 0.0 0.0 0 0 c1t4d0 0.0 10.1 0.8 81.5 0.0 0.0 0.0 0.1 0 0 c1t5d0 2.7 19.5 337.8 1359.4 0.4 0.0 16.1 0.5 8 1 c3t0d0 1.9 6.0 114.5 442.2 0.0 0.0 0.0 5.0 0 1 c0t5000C500957D27F3d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0 0 c0t5000C50095722E27d0 1.5 6.0 86.2 442.9 0.0 0.0 0.0 4.7 0 1 c0t5000C500957D25B3d0 1.5 6.0 78.2 442.5 0.0 0.0 0.0 4.2 0 1 c0t5000C500957D4C1Bd0 1.7 6.1 102.4 442.8 0.0 0.0 0.0 4.7 0 1 c0t5000C500A612DA93d0 1.6 6.0 86.6 442.5 0.0 0.0 0.0 4.3 0 1 c0t5000C500957D2553d0 2.0 5.9 122.2 442.2 0.0 0.0 0.0 5.0 0 1 c0t5000C500957D4A93d0 2.9 19.5 374.7 1357.9 0.0 0.0 0.0 1.2 0 1 c0t55CD2E414EC0FF43d0 c1t4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: INTEL SSDSC2KB24 Revision: 0121 Serial No: BTYS817407RE240 Size: 240.06GB <240057409536 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c1t5d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: INTEL SSDSC2KB24 Revision: 0121 Serial No: BTYS817409YS240 Size: 240.06GB <240057409536 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c3t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: INTEL SSDSC2KG48 Revision: 0121 Serial No: BTYM7405027L480 Size: 480.10GB <480103981056 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t5000C500957D27F3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST12000NM0027 Revision: E001 Serial No: ZJV0VFGX0000J74 Size: 12000.14GB <12000138625024 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t5000C50095722E27d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST12000NM0027 Revision: E001 Serial No: ZJV0S42H0000J75 Size: 12000.14GB <12000138625024 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t5000C500957D25B3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST12000NM0027 Revision: E001 Serial No: ZJV0VFJV0000J74 Size: 12000.14GB <12000138625024 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t5000C500957D4C1Bd0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST12000NM0027 Revision: E001 Serial No: ZJV0P6050000J83 Size: 12000.14GB <12000138625024 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t5000C500A612DA93d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST12000NM0027 Revision: E001 Serial No: ZJV0WCCN0000J80 Size: 12000.14GB <12000138625024 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t5000C500957D2553d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST12000NM0027 Revision: E001 Serial No: ZJV0VFK30000J74 Size: 12000.14GB <12000138625024 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t5000C500957D4A93d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE Product: ST12000NM0027 Revision: E001 Serial No: ZJV0VBQ80000R81 Size: 12000.14GB <12000138625024 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c0t55CD2E414EC0FF43d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: INTEL SSDSC2KG48 Revision: 0121 Serial No: BTYM740600ZT480 Size: 480.10GB <480103981056 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 nomad -------------- next part -------------- An HTML attachment was scrubbed... URL: From vab at bb-c.de Thu Aug 23 16:51:02 2018 From: vab at bb-c.de (Volker A. Brandt) Date: Thu, 23 Aug 2018 18:51:02 +0200 Subject: [OmniOS-discuss] pkg update broken on r151026 for lipkg branded NGZs Message-ID: <23422.58870.730172.385552@shelob.bb-c.de> Hello all! I have a very strange problem doing a pkg update on a r151026 system. This machine has 11 NGZs, all are lipkg brand. The GZ is running SunOS radbug 5.11 omnios-r151026-b6848f4455 i86pc i386 i86pc (before the update). When I run pkg update with the "-r" flag, it shows some packages it wants to update, then does it's thing, and ... stops. No new BE is created: # pkg update -v -rC0 --be-name=ooce-026-20180823 [...] Planning linked: 9/11 done; 2 working: zone:kayak zone:omnit3 Linked image 'zone:omnit3' output: | Packages to update: 11 | Services to change: 2 | Estimated space available: 426.13 GB | Estimated space to be consumed: 173.45 MB | Rebuild boot archive: No | | Changed packages: | omnios | SUNWcs | 0.5.11-0.151026:20180622T094606Z -> 0.5.11-0.151026:20180814T181134Z | developer/debug/mdb | 0.5.11-0.151026:20180621T235844Z -> 0.5.11-0.151026:20180814T181141Z | library/security/openssl | 1.0.2.15-0.151026 -> 1.0.2.16-0.151026 | network/dns/bind | 9.11.3-0.151026 -> 9.11.4-0.151026 | network/openssh | 7.6.1-0.151026:20180420T101453Z -> 7.6.1-0.151026:20180818T202827Z | network/openssh-server | 7.6.1-0.151026:20180420T101522Z -> 7.6.1-0.151026:20180818T202943Z | release/name | 0.5.11-0.151026:20180622T100612Z -> 0.5.11-0.151026:20180820T120713Z | service/network/ntp | 4.2.8.11-0.151026 -> 4.2.8.12-0.151026 | system/kernel | 0.5.11-0.151026:20180621T235958Z -> 0.5.11-0.151026:20180814T181345Z | system/kernel/platform | 0.5.11-0.151026:20180621T235956Z -> 0.5.11-0.151026:20180814T181344Z | web/curl | 7.60.0-0.151026 -> 7.61.0-0.151026 | | Services: | restart_fmri: | svc:/network/ntp:default | svc:/network/ssh:default | | Editable files to change: | Update: | etc/motd [...] Planning linked: 11/11 done DOWNLOAD PKGS FILES XFER (MB) SPEED Completed 11/11 2263/2263 46.3/46.3 0B/s Downloading linked: 0/11 done; 11 working: zone:kayak zone:omnib0 zone:omnib1 zone:omnib2 zone:omnib3 zone:omnib4 zone:omnit0 zone:omnit1 zone:omnit2 zone:omnit3 zone:omnit4 Downloading linked: 1/11 done; 10 working: zone:kayak zone:omnib1 zone:omnib2 zone:omnib3 zone:omnib4 zone:omnit0 zone:omnit1 zone:omnit2 zone:omnit3 zone:omnit4 Downloading linked: 2/11 done; 9 working: zone:omnib1 zone:omnib2 zone:omnib3 zone:omnib4 zone:omnit0 zone:omnit1 zone:omnit2 zone:omnit3 zone:omnit4 Linked progress: \||||||-|98.540u 11.950s 0:51.57 214.2% 0+0k 0+0io 0pf+0w Exit 1 Note that it just returned exit code 1 right in the middle of the "Linked progress" display. When I omit the "-r", things change: # zonename omnib0 # pkg update -v -C0 --be-name=ooce-026-20180823 [...] Planning linked: 10/11 done; 1 working: zone:omnit4 Linked image 'zone:omnit4' output: | Packages to update: 1 | Estimated space available: 426.01 GB | Estimated space to be consumed: 35.03 MB | Rebuild boot archive: No | | Changed packages: | omnios | SUNWcs | 0.5.11-0.151026:20180622T094606Z -> 0.5.11-0.151026:20180814T181134Z | | Editable files to change: | Update: | etc/motd A new BE is created. However, it just updates the SUNWcs package containing the new motd file. When I boot into the new BE and retry "pkg update -rC0", I get the same result: It just stops without a new BE. The GZ is now on: SunOS radbug 5.11 omnios-r151026-51c7d6fd75 i86pc i386 i86pc Logging into any one zone, I can update that zone individually. The update will try to apply all 11 packages that are newer in the repository. However, that produces an error because bootadm update-archive is run and subsequently fails: # pkg update -v --be-name=deleteme Packages to update: 11 [...] system/kernel/platform 0.5.11-0.151026:20180621T235956Z -> 0.5.11-0.151026:20180814T181344Z web/curl 7.60.0-0.151026 -> 7.61.0-0.151026 DOWNLOAD PKGS FILES XFER (MB) SPEED Completed 11/11 1519/1519 30.0/30.0 3.9M/s PHASE ITEMS Removing old actions 28/28 Installing new actions 78/78 Updating modified actions 1520/1520 Updating package state database Done Updating package cache 11/11 Updating image state Done Creating fast lookup database Done pkg: '/sbin/bootadm update-archive -R /tmp/tmp36Jtli' failed. with a return code of 1. Updating package cache 3/3 pkg: unable to activate deleteme Updating package cache 3/3 [...] # beadm list BE Active Mountpoint Space Policy Created zbe xb - 2.45M static 2018-07-11 23:16 zbe-1 xb - 204K static 2018-08-23 17:33 zbe-2 NR / 238K static 2018-08-23 17:59 deleteme - /tmp/tmp36Jtli 1.05G static 2018-08-23 18:28 # beadm unmount deleteme Unmounted successfully # beadm activate deleteme Unable to activate deleteme. BE promotion failed. Before all that, I had to update pkg which worked fine using -r -C0. I am now running pkg://omnios/package/pkg at 0.5.11-0.151026:20180725T094123Z which is the current version in the repo. Effectively I cannot pkg update my system including the zones any more. I have previously updated this system without any problems. Any ideas? Thanks -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Solaris-based Systems Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANY Email: vab at bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgr??e: 46 Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt "When logic and proportion have fallen sloppy dead" From doug at will.to Thu Aug 23 16:56:24 2018 From: doug at will.to (Doug Hughes) Date: Thu, 23 Aug 2018 12:56:24 -0400 Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: NFS writes (especially for lots of small files) to omniOS *really* benefit from having ZIL on those SSD. You could remove the cache from the pool, carve of an 8GB chunk for ZIL on each and the rest for L2arc if you want that. then add a mirrored zil using the 8GB chunks and the other partition for l2arc An SSD zil helps with metadata update absorption and small files writes that are synchronous over NFS. A lot. (that's my experience) On 8/23/2018 12:38 PM, Lee Damon wrote: > These are 12TB SAS drives (Seagate?ST12000NM0027) for data & hot > spare. ZIL & L2ARC are 480GB?INTEL SSDSC2KG48?SSDs. Everything is left > at default for sector size, etc. They were basically prepared for into > the pool with a simple fdisk -B /dev/rdsk/drive. > > Ping never shows loss of connectivity. I ran this for about 5 minutes > during a test: > > ? 303 packets transmitted, 303 received, 0% packet loss, time 302021ms > ? rtt min/avg/max/mdev = 0.109/0.281/2.881/0.227 ms > > CIFS, scp, and rsync do not exhibit the problem. I forgot to mention > that local copies on the file server are also as fast as I would > expect (~2 min). > > ? pool: pool0 > ?state: ONLINE > ? scan: none requested > config: > > ? ? ? ? NAME? ? ? ? ? ? ? ? ? ? ? ? ?STATE? ? ?READ WRITE CKSUM > ? ? ? ? pool0? ? ? ? ? ? ? ? ? ? ? ? ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? ? raidz2-0? ? ? ? ? ? ? ? ? ?ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? ? ? c0t5000C500A612DA93d0? ? ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? ? ? c0t5000C500957D4A93d0? ? ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? ? ? c0t5000C500957D4C1Bd0? ? ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? ? ? c0t5000C500957D25B3d0? ? ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? ? ? c0t5000C500957D27F3d0? ? ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? ? ? c0t5000C500957D2553d0? ? ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? logs > ? ? ? ? ? mirror-1? ? ? ? ? ? ? ? ? ?ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? ? ? c0t55CD2E414EC0FF43d0s0? ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? ? ? c3t0d0s0? ? ? ? ? ? ? ? ?ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? cache > ? ? ? ? ? c0t55CD2E414EC0FF43d0s1? ? ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? ? c3t0d0s1? ? ? ? ? ? ? ? ? ?ONLINE? ? ? ?0 ?0? ? ?0 > ? ? ? ? spares > ? ? ? ? ? c0t5000C50095722E27d0? ? ? AVAIL > > iostat: > ? ? ? ? ? ? ? ? ? ? extended device statistics > ? ? r/s? ? w/s? ?kr/s? ?kw/s wait actv wsvc_t asvc_t %w? %b device > ? ?15.8? ?57.5 1302.5 5372.3 823.0? 0.6 11233.2? ? 8.4 ?9? ?9 pool0 > ? ? 0.1? ?19.9? ? 1.7? 163.0? 0.0? 0.0? ? 1.0? ? 0.1 ?0? ?0 rpool > ? ? 0.1? ?10.2? ? 0.9? ?81.5? 0.0? 0.0? ? 0.0? ? 0.0 ?0? ?0 c1t4d0 > ? ? 0.0? ?10.1? ? 0.8? ?81.5? 0.0? 0.0? ? 0.0? ? 0.1 ?0? ?0 c1t5d0 > ? ? 2.7? ?19.5? 337.8 1359.4? 0.4? 0.0? ?16.1? ? 0.5 ?8? ?1 c3t0d0 > ? ? 1.9? ? 6.0? 114.5? 442.2? 0.0? 0.0? ? 0.0? ? 5.0 ?0? ?1 > c0t5000C500957D27F3d0 > ? ? 0.0? ? 0.0? ? 0.0? ? 0.0? 0.0? 0.0? ? 0.0? ? 0.1 ?0? ?0 > c0t5000C50095722E27d0 > ? ? 1.5? ? 6.0? ?86.2? 442.9? 0.0? 0.0? ? 0.0? ? 4.7 ?0? ?1 > c0t5000C500957D25B3d0 > ? ? 1.5? ? 6.0? ?78.2? 442.5? 0.0? 0.0? ? 0.0? ? 4.2 ?0? ?1 > c0t5000C500957D4C1Bd0 > ? ? 1.7? ? 6.1? 102.4? 442.8? 0.0? 0.0? ? 0.0? ? 4.7 ?0? ?1 > c0t5000C500A612DA93d0 > ? ? 1.6? ? 6.0? ?86.6? 442.5? 0.0? 0.0? ? 0.0? ? 4.3 ?0? ?1 > c0t5000C500957D2553d0 > ? ? 2.0? ? 5.9? 122.2? 442.2? 0.0? 0.0? ? 0.0? ? 5.0 ?0? ?1 > c0t5000C500957D4A93d0 > ? ? 2.9? ?19.5? 374.7 1357.9? 0.0? 0.0? ? 0.0? ? 1.2 ?0? ?1 > c0t55CD2E414EC0FF43d0 > c1t4d0? ? ? ? ? ?Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA? ? ? Product: INTEL SSDSC2KB24 Revision: 0121 Serial No: > BTYS817407RE240 > Size: 240.06GB <240057409536 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c1t5d0? ? ? ? ? ?Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA? ? ? Product: INTEL SSDSC2KB24 Revision: 0121 Serial No: > BTYS817409YS240 > Size: 240.06GB <240057409536 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c3t0d0? ? ? ? ? ?Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA? ? ? Product: INTEL SSDSC2KG48 Revision: 0121 Serial No: > BTYM7405027L480 > Size: 480.10GB <480103981056 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c0t5000C500957D27F3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: SEAGATE? Product: ST12000NM0027? ? Revision: E001 Serial No: > ZJV0VFGX0000J74 > Size: 12000.14GB <12000138625024 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c0t5000C50095722E27d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: SEAGATE? Product: ST12000NM0027? ? Revision: E001 Serial No: > ZJV0S42H0000J75 > Size: 12000.14GB <12000138625024 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c0t5000C500957D25B3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: SEAGATE? Product: ST12000NM0027? ? Revision: E001 Serial No: > ZJV0VFJV0000J74 > Size: 12000.14GB <12000138625024 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c0t5000C500957D4C1Bd0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: SEAGATE? Product: ST12000NM0027? ? Revision: E001 Serial No: > ZJV0P6050000J83 > Size: 12000.14GB <12000138625024 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c0t5000C500A612DA93d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: SEAGATE? Product: ST12000NM0027? ? Revision: E001 Serial No: > ZJV0WCCN0000J80 > Size: 12000.14GB <12000138625024 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c0t5000C500957D2553d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: SEAGATE? Product: ST12000NM0027? ? Revision: E001 Serial No: > ZJV0VFK30000J74 > Size: 12000.14GB <12000138625024 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c0t5000C500957D4A93d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: SEAGATE? Product: ST12000NM0027? ? Revision: E001 Serial No: > ZJV0VBQ80000R81 > Size: 12000.14GB <12000138625024 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c0t55CD2E414EC0FF43d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA? ? ? Product: INTEL SSDSC2KG48 Revision: 0121 Serial No: > BTYM740600ZT480 > Size: 480.10GB <480103981056 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > > nomad > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From vab at bb-c.de Thu Aug 23 17:00:18 2018 From: vab at bb-c.de (Volker A. Brandt) Date: Thu, 23 Aug 2018 19:00:18 +0200 Subject: [OmniOS-discuss] pkg update broken on r151026 for lipkg branded NGZs In-Reply-To: <23422.58870.730172.385552@shelob.bb-c.de> References: <23422.58870.730172.385552@shelob.bb-c.de> Message-ID: <23422.59426.339070.444420@shelob.bb-c.de> > When I omit the "-r", things change: > > # zonename > omnib0 Wrong cut&paste, the problem is in the GZ. Thanks -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Solaris-based Systems Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANY Email: vab at bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgr??e: 46 Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt "When logic and proportion have fallen sloppy dead" From bfriesen at simple.dallas.tx.us Thu Aug 23 17:22:32 2018 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Thu, 23 Aug 2018 12:22:32 -0500 (CDT) Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: On Thu, 23 Aug 2018, Lee Damon wrote: > These are 12TB SAS drives (Seagate ST12000NM0027) for data & hot spare. ZIL > & L2ARC are 480GB INTEL SSDSC2KG48 SSDs. Everything is left at default for > sector size, etc. They were basically prepared for into the pool with a > simple fdisk -B /dev/rdsk/drive. The device c3t0d0 looks like it is overloaded or experiencing issues due to a high read/write load and wsvc_t is very high. > logs > mirror-1 ONLINE 0 0 0 > c0t55CD2E414EC0FF43d0s0 ONLINE 0 0 0 > c3t0d0s0 ONLINE 0 0 0 > cache > c0t55CD2E414EC0FF43d0s1 ONLINE 0 0 0 > c3t0d0s1 ONLINE 0 0 0 I am confused by the above. Does the trailing 's0' and 's1' indicate that partitions were used rather than whole disks for logs and cache and so each SSD is providing both log and cache via partitions? Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From bfriesen at simple.dallas.tx.us Thu Aug 23 17:27:43 2018 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Thu, 23 Aug 2018 12:27:43 -0500 (CDT) Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: On Thu, 23 Aug 2018, Doug Hughes wrote: > NFS writes (especially for lots of small files) to omniOS *really* benefit > from having ZIL on those SSD. > > You could remove the cache from the pool, carve of an 8GB chunk for ZIL on > each and the rest for L2arc if you want that. > > then add a mirrored zil using the 8GB chunks and the other partition for > l2arc > > An SSD zil helps with metadata update absorption and small files writes that > are synchronous over NFS. A lot. (that's my experience) It looks like that is what he did but it looks like an error was made in that a spinning disk may have been added as a log drive (c0t55CD2E414EC0FF43d0s0) rather than an SSD as was intended. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From nomad at ee.washington.edu Thu Aug 23 17:32:32 2018 From: nomad at ee.washington.edu (Lee Damon) Date: Thu, 23 Aug 2018 10:32:32 -0700 Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: <27021750-ae20-bb20-2636-d40c563c1e1f@ee.washington.edu> On 8/23/18 10:22 , Bob Friesenhahn wrote: >> ?????? logs >> ???????? mirror-1?????????????????? ONLINE?????? 0???? 0???? 0 >> ?????????? c0t55CD2E414EC0FF43d0s0? ONLINE?????? 0???? 0???? 0 >> ?????????? c3t0d0s0???????????????? ONLINE?????? 0???? 0???? 0 >> ?????? cache >> ???????? c0t55CD2E414EC0FF43d0s1??? ONLINE?????? 0???? 0???? 0 >> ???????? c3t0d0s1?????????????????? ONLINE?????? 0???? 0???? 0 > > I am confused by the above.? Does the trailing 's0' and 's1' indicate > that partitions were used rather than whole disks for logs and cache and > so each SSD is providing both log and cache via partitions? Correct. Two 480GB SSDs split into two partitions. I have other pools configured the same way (on 151022) with no problems. I don't do that with spinning rust, mind you, just with SSDs. nomad From bfriesen at simple.dallas.tx.us Thu Aug 23 17:34:18 2018 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Thu, 23 Aug 2018 12:34:18 -0500 (CDT) Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: Lee, Just in case you did not see my follow-up post, it looks like there is an error in your pool configuration that a large spinning disk was added as a log device rather than a SSD as was intended. Luckily it should be possible to fix this without restarting the pool from scratch. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From bfriesen at simple.dallas.tx.us Thu Aug 23 17:37:48 2018 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Thu, 23 Aug 2018 12:37:48 -0500 (CDT) Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: On Thu, 23 Aug 2018, Bob Friesenhahn wrote: > Just in case you did not see my follow-up post, it looks like there is an > error in your pool configuration that a large spinning disk was added as a > log device rather than a SSD as was intended. Luckily it should be possible > to fix this without restarting the pool from scratch. Alas, it looks like I was wrong about this. It seems that the two SSDs are presented with much different device names. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From nomad at ee.washington.edu Thu Aug 23 17:39:32 2018 From: nomad at ee.washington.edu (Lee Damon) Date: Thu, 23 Aug 2018 10:39:32 -0700 Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: Do you mean c0t55CD2E414EC0FF43d0? It's an SSD. It just has a long name because it's in a hotswap sled instead being inside the chassis. Hardware properties: name='devid' type=string items=1 value='id1,sd at n55cd2e414ec0ff43' name='class' type=string items=1 value='scsi' name='inquiry-revision-id' type=string items=1 value='0121' name='inquiry-product-id' type=string items=1 value='INTEL SSDSC2KG48' name='inquiry-vendor-id' type=string items=1 value='ATA' name='inquiry-device-type' type=int items=1 value=00000000 name='pm-capable' type=int items=1 value=00000001 name='compatible' type=string items=4 value='scsiclass,00.vATA.pINTEL_SSDSC2KG48.r0121' + 'scsiclass,00.vATA.pINTEL_SSDSC2KG48' + 'scsiclass,00' + 'scsiclass' name='client-guid' type=string items=1 value='55cd2e414ec0ff43' nomad On Thu, Aug 23, 2018 at 10:35 AM Bob Friesenhahn < bfriesen at simple.dallas.tx.us> wrote: > Lee, > > Just in case you did not see my follow-up post, it looks like there is > an error in your pool configuration that a large spinning disk was > added as a log device rather than a SSD as was intended. Luckily it > should be possible to fix this without restarting the pool from > scratch. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vab at bb-c.de Thu Aug 23 17:43:30 2018 From: vab at bb-c.de (Volker A. Brandt) Date: Thu, 23 Aug 2018 19:43:30 +0200 Subject: [OmniOS-discuss] pkg update broken on r151026 for lipkg branded NGZs In-Reply-To: <23422.58870.730172.385552@shelob.bb-c.de> References: <23422.58870.730172.385552@shelob.bb-c.de> Message-ID: <23422.62018.786614.396822@shelob.bb-c.de> Hello all! After some hours of frustration, I wrote: > I have a very strange problem doing a pkg update on a r151026 system. > This machine has 11 NGZs, all are lipkg brand. [...] > Effectively I cannot pkg update my system including the zones any more. > I have previously updated this system without any problems. After the mail, *another* reboot, and *another* test, and it works. With no changes whatsoever. *sigh* Regards -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Solaris-based Systems Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANY Email: vab at bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgr??e: 46 Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt "When logic and proportion have fallen sloppy dead" From doug at will.to Thu Aug 23 19:19:44 2018 From: doug at will.to (Doug Hughes) Date: Thu, 23 Aug 2018 15:19:44 -0400 Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: Out of curiosity, if you disable the zil through the evil zfs tuning wiki mechanisms (diagnostic purposes only), does it dramatically help? If not, there's something else going on, if yes, it could be that the l2arc and zil are interfering with each other (I could imagine that the l2arc is causing a lot of need for erasing of blocks on the SSD which could be dramatically slowing things down. I haven't been following the implementation of TRIM support and any outstanding issues) On 8/23/2018 1:39 PM, Lee Damon wrote: > Do you mean c0t55CD2E414EC0FF43d0? > > It's an SSD. It just has a long name because it's in a hotswap sled > instead being inside the chassis. > ? ? Hardware properties: > ? ? ? ? ? ? ? ? name='devid' type=string items=1 > ? ? ? ? ? ? ? ? ? ? value='id1,sd at n55cd2e414ec0ff43' > ? ? ? ? ? ? ? ? name='class' type=string items=1 > ? ? ? ? ? ? ? ? ? ? value='scsi' > ? ? ? ? ? ? ? ? name='inquiry-revision-id' type=string items=1 > ? ? ? ? ? ? ? ? ? ? value='0121' > ? ? ? ? ? ? ? ? name='inquiry-product-id' type=string items=1 > ? ? ? ? ? ? ? ? ? ? value='INTEL SSDSC2KG48' > ? ? ? ? ? ? ? ? name='inquiry-vendor-id' type=string items=1 > ? ? ? ? ? ? ? ? ? ? value='ATA' > ? ? ? ? ? ? ? ? name='inquiry-device-type' type=int items=1 > ? ? ? ? ? ? ? ? ? ? value=00000000 > ? ? ? ? ? ? ? ? name='pm-capable' type=int items=1 > ? ? ? ? ? ? ? ? ? ? value=00000001 > ? ? ? ? ? ? ? ? name='compatible' type=string items=4 > value='scsiclass,00.vATA.pINTEL_SSDSC2KG48.r0121' + > 'scsiclass,00.vATA.pINTEL_SSDSC2KG48' + 'scsiclass,00' + 'scsiclass' > ? ? ? ? ? ? ? ? name='client-guid' type=string items=1 > ? ? ? ? ? ? ? ? ? ? value='55cd2e414ec0ff43' > nomad > > On Thu, Aug 23, 2018 at 10:35 AM Bob Friesenhahn > > > wrote: > > Lee, > > Just in case you did not see my follow-up post, it looks like > there is > an error in your pool configuration that a large spinning disk was > added as a log device rather than a SSD as was intended. Luckily it > should be possible to fix this without restarting the pool from > scratch. > > Bob > -- > Bob Friesenhahn > bfriesen at simple.dallas.tx.us > , > http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From nomad at ee.washington.edu Thu Aug 23 23:43:50 2018 From: nomad at ee.washington.edu (Lee Damon) Date: Thu, 23 Aug 2018 16:43:50 -0700 Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: (I've just changed from digest to regular subscription as I see there are messages relevant to this that I haven't received yet...) Doug, I'm not familiar with the evil zfs tuning wiki mechanism. I'll have to see if Google can help me find it. As for the ZIL+ L2ARC on the same SSD potentially being the problem, clearly I can't say with 100% certanty that it is not a problem however I have a second host (running 151022) with _exactly_ the same configuration of hard drives + split-SSD and NFS writes to that pool are fine. hvfs2 is ~18 months old but the chrup0 pool is a few months old. time cp -rp /misc/fs1test/004test /misc/hvfs2chru/omics1 real 3m11.431s user 0m0.177s sys 0m28.030s time cp -rp /misc/fs1test/004test /misc/fs2test/omics1 real 21m13.412s user 0m0.188s sys 0m28.678s nomad From nomad at ee.washington.edu Fri Aug 24 00:23:25 2018 From: nomad at ee.washington.edu (Lee Damon) Date: Thu, 23 Aug 2018 17:23:25 -0700 Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: <17f1d58d-4a5d-e4d3-979e-3ef7014d9396@ee.washington.edu> (This doesn't appear to have gone out so I'm re-sending. Apologies if it's a duplicate.) On 8/23/18 16:43 , Lee Damon wrote: > (I've just changed from digest to regular subscription as I see there > are messages relevant to this that I haven't received yet...) > > Doug, I'm not familiar with the evil zfs tuning wiki mechanism. I'll > have to see if Google can help me find it. > > As for the ZIL+ L2ARC on the same SSD potentially being the problem, > clearly I can't say with 100% certanty that it is not a problem however I > have a second host (running 151022) with _exactly_ the same configuration > of hard drives + split-SSD and NFS writes to that pool are fine. > > hvfs2 is ~18 months old but the chrup0 pool is a few months old. > > time cp -rp /misc/fs1test/004test /misc/hvfs2chru/omics1 > > real 3m11.431s > user 0m0.177s > sys 0m28.030s > > time cp -rp /misc/fs1test/004test /misc/fs2test/omics1 > > real 21m13.412s > user 0m0.188s > sys 0m28.678s > > nomad > From doug at will.to Fri Aug 24 00:33:43 2018 From: doug at will.to (Doug Hughes) Date: Thu, 23 Aug 2018 20:33:43 -0400 Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: Evil tuning here: https://www.solaris-cookbook.eu/solaris/solaris-10-zfs-evil-tuning-guide/ It's at the bottom. Where it says "Disabling the ZIL (Don't)" I could see a lack of TRIM/erase support in background as a strong possibility caused by continuous use of blocks from the L2ARC over time. Are you getting a high hit rate on your L2arc? http://blog.harschsystems.com/2010/09/08/arcstat-pl-updated-for-l2arc-statistics/ If not, you might think about just dropping it all together and then This, as old as it is, may not be accurate, but it doesn't give me high confidence that Trim support was added to illumos. Maybe it was and somebody else can chime in: http://open-zfs.org/wiki/Features zpool iostat -v may also be interesting for the l2arc/zil devices On 8/23/2018 7:43 PM, Lee Damon wrote: > (I've just changed from digest to regular subscription as I see there > are messages relevant to this that I haven't received yet...) > > Doug, I'm not familiar with the evil zfs tuning wiki mechanism. I'll > have to see if Google can help me find it. > > As for the ZIL+ L2ARC on the same SSD potentially being the problem, > clearly I can't say with 100% certanty that it is not a problem however I > have a second host (running 151022) with _exactly_ the same configuration > of hard drives + split-SSD and NFS writes to that pool are fine. > > hvfs2 is ~18 months old but the chrup0 pool is a few months old. > > time cp -rp /misc/fs1test/004test /misc/hvfs2chru/omics1 > > real 3m11.431s > user 0m0.177s > sys 0m28.030s > > time cp -rp /misc/fs1test/004test /misc/fs2test/omics1 > > real 21m13.412s > user 0m0.188s > sys 0m28.678s > > nomad > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From richard.elling at richardelling.com Fri Aug 24 03:17:13 2018 From: richard.elling at richardelling.com (Richard Elling) Date: Thu, 23 Aug 2018 20:17:13 -0700 Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: <17f1d58d-4a5d-e4d3-979e-3ef7014d9396@ee.washington.edu> References: <17f1d58d-4a5d-e4d3-979e-3ef7014d9396@ee.washington.edu> Message-ID: <8F575E13-0CC8-46BB-8FF7-0E5DEF87210D@richardelling.com> fwiw, nfssvrstat breaks down the NFS writes by sync, async, and commits: explicitly for determining how the workload will impact ZIL. For writing many files, the (compound) operations can also include creates and sync-on-close that also impacts performance. -- richard > On Aug 23, 2018, at 5:23 PM, Lee Damon wrote: > > (This doesn't appear to have gone out so I'm re-sending. Apologies if it's a duplicate.) > >> On 8/23/18 16:43 , Lee Damon wrote: >> (I've just changed from digest to regular subscription as I see there >> are messages relevant to this that I haven't received yet...) >> Doug, I'm not familiar with the evil zfs tuning wiki mechanism. I'll >> have to see if Google can help me find it. >> As for the ZIL+ L2ARC on the same SSD potentially being the problem, >> clearly I can't say with 100% certanty that it is not a problem however I >> have a second host (running 151022) with _exactly_ the same configuration >> of hard drives + split-SSD and NFS writes to that pool are fine. >> hvfs2 is ~18 months old but the chrup0 pool is a few months old. >> time cp -rp /misc/fs1test/004test /misc/hvfs2chru/omics1 >> real 3m11.431s >> user 0m0.177s >> sys 0m28.030s >> time cp -rp /misc/fs1test/004test /misc/fs2test/omics1 >> real 21m13.412s >> user 0m0.188s >> sys 0m28.678s >> nomad > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From feigin at iis.ee.ethz.ch Fri Aug 24 08:07:06 2018 From: feigin at iis.ee.ethz.ch (Adam Feigin) Date: Fri, 24 Aug 2018 10:07:06 +0200 Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: Hi Lee: I've been experiencing something very similar. I recently (several months ago) moved a ~30T pool from a "old" OpenIndiana 151a9 system , where it had been working flawlessly for several years, to a "new" OmniOSce 151022 installation (zpool export on old, zpool import on new). Now, I have extremely poor NFS write speeds on the new system. I've even swapped the cards (LSI SAS, 10G Ethernet) from the OI system to the OmniOS system, to eliminate some hardware discrepancies, but this had no effect whatsoever. Its not a network problem. I can happily get near line-rate on the 10G network between the server and various 10G connected hosts. Its not a ZIL/L2ARC problem either, removing them (they're on SSDs, as yours) had minimal effect. The new hardware is signifcantly more performant, with nearly 10x more memory (240G vs 32G), more cores and faster CPUs; I never expected performance to get worse. I'm not convinced its a "pure" NFS problem either, as I've noticed some other strange performance degradation on the new system. The pool used to take somewhere between 40 - 60 hours to run a scrub on the OI system. Recent scrubs were taking 400+ hours. After a recent pkg update and reboot, the last scrub took ~159 hours. During the scrub, I noticed that the scanning speed, while starting out relatively fast, pretty much monotonically decreased in speed as time when on, going from 50 M/s near the beginning to 17M/s at the end. I have to see what happens at the next monthly scrub of the pool. Have you looked at your scrub performance ? What else is different between the 2 machines ? From tobi at oetiker.ch Fri Aug 24 08:54:29 2018 From: tobi at oetiker.ch (Tobias Oetiker) Date: Fri, 24 Aug 2018 10:54:29 +0200 (CEST) Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: <1305271323.35866.1535100869773.JavaMail.zimbra@oetiker.ch> Hi All, Lee has opened a issue here https://github.com/omniosorg/illumos-omnios/issues/256 it might be a good place to discuss this. I have also posted a very simple tests script (not sure if it is enough to reproduce the problem, but it would at least give a common baseline as to what we are talking about). cheers tobi ----- On Aug 24, 2018, at 10:07 AM, Adam Feigin feigin at iis.ee.ethz.ch wrote: > Hi Lee: > > > I've been experiencing something very similar. I recently (several > months ago) moved a ~30T pool from a "old" OpenIndiana 151a9 system , > where it had been working flawlessly for several years, to a "new" > OmniOSce 151022 installation (zpool export on old, zpool import on new). > > > Now, I have extremely poor NFS write speeds on the new system. I've even > swapped the cards (LSI SAS, 10G Ethernet) from the OI system to the > OmniOS system, to eliminate some hardware discrepancies, but this had no > effect whatsoever. Its not a network problem. I can happily get near > line-rate on the 10G network between the server and various 10G > connected hosts. Its not a ZIL/L2ARC problem either, removing them > (they're on SSDs, as yours) had minimal effect. > > > The new hardware is signifcantly more performant, with nearly 10x more > memory (240G vs 32G), more cores and faster CPUs; I never expected > performance to get worse. > > > I'm not convinced its a "pure" NFS problem either, as I've noticed some > other strange performance degradation on the new system. The pool used > to take somewhere between 40 - 60 hours to run a scrub on the OI system. > Recent scrubs were taking 400+ hours. After a recent pkg update and > reboot, the last scrub took ~159 hours. During the scrub, I noticed that > the scanning speed, while starting out relatively fast, pretty much > monotonically decreased in speed as time when on, going from 50 M/s near > the beginning to 17M/s at the end. I have to see what happens at the > next monthly scrub of the pool. > > > Have you looked at your scrub performance ? > > What else is different between the 2 machines ? > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From nomad at ee.washington.edu Fri Aug 24 15:11:17 2018 From: nomad at ee.washington.edu (Lee Damon) Date: Fri, 24 Aug 2018 08:11:17 -0700 Subject: [OmniOS-discuss] Slow NFS writes in 151026 In-Reply-To: References: Message-ID: Adam, I'm having no problems at all with my 151022 hosts. They're all doing well for NFS reads & writes. I only see the degredation in write speed on the 151026 host I recently installed. > Have you looked at your scrub performance ? I had bad scrub performance on a host that had a bad drive causing bus contention. That host hasn't scrubbed again since then so I can't say if the problem is still there. > What else is different between the 2 machines ? Age of hardware. The 151022 host is ~18 months old while the 151026 host is ~2 months old. The '26 host has never had anything but 151026 installed on it because I couldn't get the 22 installer to boot on it (I don't remember the details now, that was 2 months ago). The '26 host has 98GB RAM while the '22 host has 128GB. Other than that the pools in question are the same in terms of drives, ZIL, and L2ARC type/config. nomad From Ergi.Thanasko at avsquad.com Fri Aug 24 16:55:01 2018 From: Ergi.Thanasko at avsquad.com (Ergi Thanasko) Date: Fri, 24 Aug 2018 16:55:01 +0000 Subject: [OmniOS-discuss] ARC or memory perfomance benchmarks Message-ID: <0557B620-77A2-4373-A1A8-1888D7AC73A3@avsquad.com> We are building a new box with Skylake CPU 3.6Ghz and ddr4 rdim 2666Mhz. Been using iozone for multithreaded random IO zpool testing and getting some awesome speed test. What I really want to test it the ram speed. Supermicro gave me some benchmarks for ram at around 200GB/sec sustained bandwidth for 768G of ram. I want to see how it compares with my other DDR3 boxes. I am having trouble finding utilities that will test the ram speed on Solaris, OmniOS or Opendianna. Any help is appreciated [/Users/ergithanasko/Library/Containers/com.microsoft.Outlook/Data/Library/Caches/Signatures/signature_107557883] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 7911 bytes Desc: image001.png URL: From bfriesen at simple.dallas.tx.us Fri Aug 24 20:40:34 2018 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Fri, 24 Aug 2018 15:40:34 -0500 (CDT) Subject: [OmniOS-discuss] ARC or memory perfomance benchmarks In-Reply-To: <0557B620-77A2-4373-A1A8-1888D7AC73A3@avsquad.com> References: <0557B620-77A2-4373-A1A8-1888D7AC73A3@avsquad.com> Message-ID: On Fri, 24 Aug 2018, Ergi Thanasko wrote: > We are building a new box with Skylake CPU 3.6Ghz and ddr4 rdim > 2666Mhz. Been using iozone for multithreaded random IO zpool testing > and getting some awesome speed test. What I really want to test it > the ram speed. Supermicro gave me some benchmarks for ram at around > 200GB/sec sustained bandwidth for 768G of ram. I want to see how it > compares with my other DDR3 boxes. I am having trouble finding > utilities that will test the ram speed on Solaris, OmniOS or > Opendianna. Any help is appreciated The classic RAM speed benchmark is the 'stream' benchmark, which you can obtain from https://www.cs.virginia.edu/stream/ Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From Ergi.Thanasko at avsquad.com Fri Aug 24 20:52:21 2018 From: Ergi.Thanasko at avsquad.com (Ergi Thanasko) Date: Fri, 24 Aug 2018 20:52:21 +0000 Subject: [OmniOS-discuss] ARC or memory perfomance benchmarks In-Reply-To: References: <0557B620-77A2-4373-A1A8-1888D7AC73A3@avsquad.com> Message-ID: Thnx Bob, That is what SM used on Redhat 7.3, anyone has compiled version around that feels like sharing. ?On 8/24/18, 1:40 PM, "Bob Friesenhahn" wrote: On Fri, 24 Aug 2018, Ergi Thanasko wrote: > We are building a new box with Skylake CPU 3.6Ghz and ddr4 rdim > 2666Mhz. Been using iozone for multithreaded random IO zpool testing > and getting some awesome speed test. What I really want to test it > the ram speed. Supermicro gave me some benchmarks for ram at around > 200GB/sec sustained bandwidth for 768G of ram. I want to see how it > compares with my other DDR3 boxes. I am having trouble finding > utilities that will test the ram speed on Solaris, OmniOS or > Opendianna. Any help is appreciated The classic RAM speed benchmark is the 'stream' benchmark, which you can obtain from https://www.cs.virginia.edu/stream/ Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From pkam at bloom.pl Sat Aug 25 21:37:30 2018 From: pkam at bloom.pl (Piotr Kaminski) Date: Sat, 25 Aug 2018 23:37:30 +0200 Subject: [OmniOS-discuss] CIFS access denied to some users from AD - again Message-ID: <5759dce4-7fec-227a-2fb4-177503d7673a@bloom.pl> Hi Everybody, I would like to refresh my post sent around 3 month ago. The issue still persists... What I've got is * Ubuntu 16.04 with Samba 4 as AD DC * OmniOSce CIFS server is joined to AD domain * Windows 10 Pro joined to AD domain * and some more client computers joined I do AD administration from Win10 with RSAT. I've created a lot of accounts for employees. PROBLEM: Some users are denied access to OmniOSce shares while other users can connect without problems. I would like to stress: the issue is present only with OmniOS shares. Users ARE authorised thru AD DC. * There is ACL rule for a "employees" AD group allowing access for the members, * there are about 20 members and only a few of them have problem, * problematic accounts CAN? connect to another Windows machine via RDP and are authorized by AD DC (I even changed passwords to check and still can connect with the new passwords), * problematic accounts cannot access the CIFS share from OmniIOSce server. When I try to access the server from Ubuntu machine I get the following with "good_user": $ smbclient -U test26 -L //omnios Enter test26's password: Domain=[DOMAIN_NAME] OS=[SunOS 5.11 omnios-r151026-51c7d] Server=[Native SMB service] Sharename Type Comment --------- ---- ------- public Disk c$ Disk Default Share test1 Disk test2 Disk ipc$ IPC Remote IPC test Disk Domain=[DOMAIN_NAME] OS=[SunOS 5.11 omnios-r151026-51c7d] Server=[Native SMB service] Server Comment --------- ------- Workgroup Master --------- ------- and with "bad_user" I get # smbclient -U bad_user -L //omnios Enter bad_user's password: session setup failed: NT_STATUS_ACCESS_DENIED The same results are obtained from Windows machine with? "net view \\omnios" ? command * When I log in to Windows machine with "bad user" I can log in properly but "net view" command produces error 53. * When I log in to the same Windows machine with "good user", I can list shares with "net view" command. I cannot see any difference between the users. They are members of the same AD groups. They were created one by one. As a workaround I can disable problematic accounts, create new accounts and they work as a charm. But that is just a temporary? workaround. Can the issue be related to SID numbers? Maybe OmniOS does not like some of them? I have the following ID mappings on OmniOS: # idmap list add???? winuser:administrator at local.domain_name.net? unixuser:root add???? wingroup:administrators at local.domain_name.net??????? unixgroup:root add -d? winuser:*@local.domain_name.net????? unixuser:domain_name The issue drives me crazy. Any help or thoughts appreciated. Regards, -- Piotr -------------- next part -------------- An HTML attachment was scrubbed... URL: From nomad at ee.washington.edu Wed Aug 29 21:12:50 2018 From: nomad at ee.washington.edu (Lee Damon) Date: Wed, 29 Aug 2018 14:12:50 -0700 Subject: [OmniOS-discuss] Question about ndpd.conf in 151026 Message-ID: <12aed56e-4a66-787d-ccb5-3fed658b7ce1@ee.washington.edu> I have an /etc/inet/ndpd.conf file that has exactly two lines: ifdefault StatelessAddrConf false ifdefault StatefulAddrConf false On my test host running 151022 when I sudo ipadm create-addr -T addrconf aggr0/v6 ipadm show-if shows the interface with an fe80:: address and nothing else. However, when I do it on my 151026 host it gives both the fe80:: address and a fully routeable address based on the host's MAC address. This is not what I expect to see. I've tried with 'if aggr0' instead of 'ifdefault', same result. I've tried with duplicated lines for both ifdefault and if aggr0 and that just breaks things (so I know it's reading the file). I've also tried with just a StatelessAddrConf or StatefullAddrConf line, no change. I don't see any references to ndpd in the release notes for 151024 or 151026 so I'm presuming no changes were made that should have impacted this. Any suggestions of what I'm missing? thanks, nomad From chip at innovates.com Thu Aug 30 14:29:53 2018 From: chip at innovates.com (Schweiss, Chip) Date: Thu, 30 Aug 2018 09:29:53 -0500 Subject: [OmniOS-discuss] Panic on OmniOS CE r151022ay Message-ID: I've seen this panic twice now in the past couple weeks. Does anyone know if there is a patch already that fixes this? Looks like another xattr problem. # fmdump -Vp -u b7c9840b-8bb1-cbbc-e165-a5b6fa34078b TIME UUID SUNW-MSG-ID Aug 30 2018 08:29:32.089419000 b7c9840b-8bb1-cbbc-e165-a5b6fa34078b SUNOS-8000-KL TIME CLASS ENA Aug 30 08:27:50.8299 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000 nvlist version: 0 version = 0x0 class = list.suspect uuid = b7c9840b-8bb1-cbbc-e165-a5b6fa34078b code = SUNOS-8000-KL diag-time = 1535635766 223254 de = fmd:///module/software-diagnosis fault-list-sz = 0x1 fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = defect.sunos.kernel.panic certainty = 0x64 asru = sw:///:path=/var/crash//.b7c9840b-8bb1-cbbc-e165- a5b6fa34078b resource = sw:///:path=/var/crash//.b7c9840b-8bb1-cbbc-e165- a5b6fa34078b savecore-succcess = 0 os-instance-uuid = b7c9840b-8bb1-cbbc-e165-a5b6fa34078b panicstr = BAD TRAP: type=d (#gp General protection) rp=ffffd001e9855360 addr=ffffd063784ee8d0 panicstack = unix:real_mode_stop_cpu_stage2_end+b203 () | unix:trap+a70 () | unix:cmntrap+e6 () | zfs:zfs_getattr+1a0 () | genunix:fop_getattr+a8 () | genunix:xattr_dir_getattr+16c () | genunix:fop_getattr+a8 () | nfssrv:rfs4_delegated_getattr+20 () | nfssrv:acl3_getxattrdir+102 () | nfssrv:common_dispatch+5ab () | nfssrv:acl_dispatch+2d () | rpcmod:svc_getreq+1c1 () | rpcmod:svc_run+e0 () | rpcmod:svc_do_run+8e () | nfs:nfssys+111 () | unix:brand_sys_sysenter+1d3 () | crashtime = 1535633923 panic-time = Thu Aug 30 07:58:43 2018 CDT (end fault-list[0]) fault-status = 0x1 severity = Major __ttl = 0x1 __tod = 0x5b87f13c 0x5546cf8 Let me know what other information I can provide here. -Chip -------------- next part -------------- An HTML attachment was scrubbed... URL: From chip at innovates.com Thu Aug 30 14:42:15 2018 From: chip at innovates.com (Schweiss, Chip) Date: Thu, 30 Aug 2018 09:42:15 -0500 Subject: [OmniOS-discuss] Panic on OmniOS CE r151022ay In-Reply-To: References: Message-ID: Here's the dump from the panic: ftp://ftp.nrg.wustl.edu/pub/zfs/mirpool03-xattr-20180830-vmdump.1 On Thu, Aug 30, 2018 at 9:29 AM, Schweiss, Chip wrote: > I've seen this panic twice now in the past couple weeks. Does anyone > know if there is a patch already that fixes this? Looks like another xattr > problem. > > # fmdump -Vp -u b7c9840b-8bb1-cbbc-e165-a5b6fa34078b > TIME UUID > SUNW-MSG-ID > Aug 30 2018 08:29:32.089419000 b7c9840b-8bb1-cbbc-e165-a5b6fa34078b > SUNOS-8000-KL > > TIME CLASS ENA > Aug 30 08:27:50.8299 ireport.os.sunos.panic.dump_pending_on_device > 0x0000000000000000 > > nvlist version: 0 > version = 0x0 > class = list.suspect > uuid = b7c9840b-8bb1-cbbc-e165-a5b6fa34078b > code = SUNOS-8000-KL > diag-time = 1535635766 223254 > de = fmd:///module/software-diagnosis > fault-list-sz = 0x1 > fault-list = (array of embedded nvlists) > (start fault-list[0]) > nvlist version: 0 > version = 0x0 > class = defect.sunos.kernel.panic > certainty = 0x64 > asru = sw:///:path=/var/crash//.b7c98 > 40b-8bb1-cbbc-e165-a5b6fa34078b > resource = sw:///:path=/var/crash//.b7c98 > 40b-8bb1-cbbc-e165-a5b6fa34078b > savecore-succcess = 0 > os-instance-uuid = b7c9840b-8bb1-cbbc-e165-a5b6fa34078b > panicstr = BAD TRAP: type=d (#gp General protection) > rp=ffffd001e9855360 addr=ffffd063784ee8d0 > panicstack = unix:real_mode_stop_cpu_stage2_end+b203 () | > unix:trap+a70 () | unix:cmntrap+e6 () | zfs:zfs_getattr+1a0 () | > genunix:fop_getattr+a8 () | genunix:xattr_dir_getattr+16c () | > genunix:fop_getattr+a8 () | nfssrv:rfs4_delegated_getattr+20 () | > nfssrv:acl3_getxattrdir+102 () | nfssrv:common_dispatch+5ab () | > nfssrv:acl_dispatch+2d () | rpcmod:svc_getreq+1c1 () | rpcmod:svc_run+e0 () > | rpcmod:svc_do_run+8e () | nfs:nfssys+111 () | unix:brand_sys_sysenter+1d3 > () | > crashtime = 1535633923 > panic-time = Thu Aug 30 07:58:43 2018 CDT > (end fault-list[0]) > > fault-status = 0x1 > severity = Major > __ttl = 0x1 > __tod = 0x5b87f13c 0x5546cf8 > > Let me know what other information I can provide here. > > -Chip > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From omnios at citrus-it.net Thu Aug 30 22:08:56 2018 From: omnios at citrus-it.net (Andy Fiddaman) Date: Thu, 30 Aug 2018 22:08:56 +0000 (UTC) Subject: [OmniOS-discuss] Panic on OmniOS CE r151022ay In-Reply-To: References: Message-ID: On Thu, 30 Aug 2018, Schweiss, Chip wrote: ; > panicstack = unix:real_mode_stop_cpu_stage2_end+b203 () | ; > unix:trap+a70 () | unix:cmntrap+e6 () | zfs:zfs_getattr+1a0 () | ; > genunix:fop_getattr+a8 () | genunix:xattr_dir_getattr+16c () | ; > genunix:fop_getattr+a8 () | nfssrv:rfs4_delegated_getattr+20 () | ; > nfssrv:acl3_getxattrdir+102 () | nfssrv:common_dispatch+5ab () | ; > nfssrv:acl_dispatch+2d () | rpcmod:svc_getreq+1c1 () | rpcmod:svc_run+e0 () ; > | rpcmod:svc_do_run+8e () | nfs:nfssys+111 () | unix:brand_sys_sysenter+1d3 That does look quite similar to issue 8806 that was fixed earlier in the year. Can you check that the fix is in place on your box since you're running a version of OmniOS from May. If this produces any output, then the fix is missing, otherwise it's something else. mdb -ke xattr_dir_inactive::dis | grep mutex Please can you open an issue for this at https://github.com/omniosorg/illumos-omnios/issues/new in the first instance as it may be OmniOS-specific? Andy -- Citrus IT Limited | +44 (0)333 0124 007 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From chip at innovates.com Fri Aug 31 13:09:09 2018 From: chip at innovates.com (Schweiss, Chip) Date: Fri, 31 Aug 2018 08:09:09 -0500 Subject: [OmniOS-discuss] Panic on OmniOS CE r151022ay In-Reply-To: References: Message-ID: Looks like the fix is missing: # mdb -ke xattr_dir_inactive::dis | grep mutex xattr_dir_inactive+0x1f: call -0x304cf4 xattr_dir_inactive+0x3c: call -0x304bf1 xattr_dir_inactive+0x73: call -0x304c28 Looking closer I thought I had updated this system after the first crash but did not. However, I had explicitly put that patch in place back in January, but it may not have made it into later OmniOS CE releases that the system was upgraded to. I just ran the test on an r151022bk system and it passes. I'll get this system updated ASAP. Thanks! -Chip On Thu, Aug 30, 2018 at 5:08 PM, Andy Fiddaman wrote: > > On Thu, 30 Aug 2018, Schweiss, Chip wrote: > > ; > panicstack = unix:real_mode_stop_cpu_stage2_end+b203 > () | > ; > unix:trap+a70 () | unix:cmntrap+e6 () | zfs:zfs_getattr+1a0 () | > ; > genunix:fop_getattr+a8 () | genunix:xattr_dir_getattr+16c () | > ; > genunix:fop_getattr+a8 () | nfssrv:rfs4_delegated_getattr+20 () | > ; > nfssrv:acl3_getxattrdir+102 () | nfssrv:common_dispatch+5ab () | > ; > nfssrv:acl_dispatch+2d () | rpcmod:svc_getreq+1c1 () | > rpcmod:svc_run+e0 () > ; > | rpcmod:svc_do_run+8e () | nfs:nfssys+111 () | > unix:brand_sys_sysenter+1d3 > > That does look quite similar to issue 8806 that was fixed earlier in the > year. Can you check that the fix is in place on your box since you're > running a version of OmniOS from May. > > If this produces any output, then the fix is missing, otherwise it's > something > else. > > mdb -ke xattr_dir_inactive::dis | grep mutex > > Please can you open an issue for this at > https://github.com/omniosorg/illumos-omnios/issues/new > in the first instance as it may be OmniOS-specific? > > Andy > > -- > Citrus IT Limited | +44 (0)333 0124 007 | enquiries at citrus-it.co.uk > Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ > Registered in England and Wales | Company number 4899123 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: