From mir at miras.org Sat Aug 1 19:01:24 2015 From: mir at miras.org (Michael Rasmussen) Date: Sat, 1 Aug 2015 21:01:24 +0200 Subject: [OmniOS-discuss] supermicro A1SRM-2558F Message-ID: <20150801210124.64866f09@sleipner.datanom.net> Hi all, Anyone tried Omnios on this board? There is also an 8-core version A1SRM-2758F would the extra 4 cores be worth the extra money? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Q: How many Martians does it take to screw in a light bulb? A: One and a half. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From bfriesen at simple.dallas.tx.us Sat Aug 1 19:45:50 2015 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Sat, 1 Aug 2015 14:45:50 -0500 (CDT) Subject: [OmniOS-discuss] supermicro A1SRM-2558F In-Reply-To: <20150801210124.64866f09@sleipner.datanom.net> References: <20150801210124.64866f09@sleipner.datanom.net> Message-ID: On Sat, 1 Aug 2015, Michael Rasmussen wrote: > Hi all, > > Anyone tried Omnios on this board? > > There is also an 8-core version A1SRM-2758F would the extra 4 cores be > worth the extra money? While there may be some rough-edges and disabled features for a while, Xeon D seems like a much better investment than an Atom CPU with an extra 4 cores. This is also a low power SOC, but with quite a lot better performance than Atom and more/better integrated interfaces. See: http://www.supermicro.com/products/motherboard/Xeon3000/#1667 I have a system with one of these motherboards on order but have not yet heard about specific issues with OmniOS except that the 10Gb-E interfaces will not be supported right away, and there was mention of "weirdness with asy port" with early hardware on an Illumos list. Hopefully I will have news within the next two weeks. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From mir at miras.org Sat Aug 1 20:13:04 2015 From: mir at miras.org (Michael Rasmussen) Date: Sat, 1 Aug 2015 22:13:04 +0200 Subject: [OmniOS-discuss] supermicro A1SRM-2558F In-Reply-To: References: <20150801210124.64866f09@sleipner.datanom.net> Message-ID: <20150801221304.3abc4e62@sleipner.datanom.net> On Sat, 1 Aug 2015 14:45:50 -0500 (CDT) Bob Friesenhahn wrote: > > While there may be some rough-edges and disabled features for a while, Xeon D seems like a much better investment than an Atom CPU with an extra 4 cores. This is also a low power SOC, but with quite a lot better performance than Atom and more/better integrated interfaces. > Looks nice but only mini-itx. I need at least a PCIe x8 (HBA) and a PCIe x4 (dual Infiniband nic) -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: There are worse things than Perl....ASP comes to mind -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From mir at miras.org Sat Aug 1 20:16:16 2015 From: mir at miras.org (Michael Rasmussen) Date: Sat, 1 Aug 2015 22:16:16 +0200 Subject: [OmniOS-discuss] supermicro A1SRM-2558F In-Reply-To: <974257D3-8EDF-45CD-98C7-7A9394E792E7@countermail.com> References: <20150801210124.64866f09@sleipner.datanom.net> <974257D3-8EDF-45CD-98C7-7A9394E792E7@countermail.com> Message-ID: <20150801221616.0f866b93@sleipner.datanom.net> On Sat, 01 Aug 2015 15:54:59 -0400 "Ottmar Klaas" wrote: > I have been running OmniOS on the A1SRM-2758F-O since February. No complaints, runs nicely. > But is 90? more worth the money for extra 4 cores? I wonder how much more load is acquired to see a difference between 4 and 8 cores? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: There are worse things than Perl....ASP comes to mind -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From jimklimov at cos.ru Sun Aug 2 11:32:39 2015 From: jimklimov at cos.ru (Jim Klimov) Date: Sun, 02 Aug 2015 13:32:39 +0200 Subject: [OmniOS-discuss] supermicro A1SRM-2558F In-Reply-To: <20150801221616.0f866b93@sleipner.datanom.net> References: <20150801210124.64866f09@sleipner.datanom.net> <974257D3-8EDF-45CD-98C7-7A9394E792E7@countermail.com> <20150801221616.0f866b93@sleipner.datanom.net> Message-ID: <5BC40ED3-BADE-4870-9679-BA0088408377@cos.ru> 1 ??????? 2015??. 22:16:16 CEST, Michael Rasmussen ?????: >On Sat, 01 Aug 2015 15:54:59 -0400 >"Ottmar Klaas" wrote: > >> I have been running OmniOS on the A1SRM-2758F-O since February. No >complaints, runs nicely. >> >But is 90? more worth the money for extra 4 cores? > >I wonder how much more load is acquired to see a difference between 4 >and 8 cores? Umm, roughly twice as many independent tasks, assuming you can saturate your 4 cores? Think compile farms, mail relays with antispam, webservers, databases with parallelisable queries, zfs with compression, VMs, etc. Simply many background processes (dormant zones and services) can require a bit of overhead in context switching (and at a few thousand processes per core this can become a fulltime job of its own); the more cores you have - the smaller hit you get per each. Even if you do not have such loads running fullthrottle all the time, it is possible that the moment you do - more cores can help reach your goals faster in wallclock time. -- Typos courtesy of K-9 Mail on my Samsung Android From mir at miras.org Sun Aug 2 13:46:54 2015 From: mir at miras.org (Michael Rasmussen) Date: Sun, 2 Aug 2015 15:46:54 +0200 Subject: [OmniOS-discuss] supermicro A1SRM-2558F In-Reply-To: <5BC40ED3-BADE-4870-9679-BA0088408377@cos.ru> References: <20150801210124.64866f09@sleipner.datanom.net> <974257D3-8EDF-45CD-98C7-7A9394E792E7@countermail.com> <20150801221616.0f866b93@sleipner.datanom.net> <5BC40ED3-BADE-4870-9679-BA0088408377@cos.ru> Message-ID: <20150802154654.0447080f@sleipner.datanom.net> On Sun, 02 Aug 2015 13:32:39 +0200 Jim Klimov wrote: > > Umm, roughly twice as many independent tasks, assuming you can saturate your 4 cores? Think compile farms, mail relays with antispam, webservers, databases with parallelisable queries, zfs with compression, VMs, etc. Simply many background processes (dormant zones and services) can require a bit of overhead in context switching (and at a few thousand processes per core this can become a fulltime job of its own); the more cores you have - the smaller hit you get per each. > > Even if you do not have such loads running fullthrottle all the time, it is possible that the moment you do - more cores can help reach your goals faster in wallclock time. This servers only job will primarily be exposing zvol's as iSCSI LUN's and to a lesser extend shared storage via NFS for virtual servers. I have one now doing the same job running on a 4 core Opteron (3350 HE) and at no time it has been even close to saturate all 4 cores. I think 4 cores should suffice and for only 15W compared to 45w. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Q: How do you save a drowning lawyer? A: Throw him a rock. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From lists at marzocchi.net Sun Aug 2 15:39:27 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Sun, 02 Aug 2015 17:39:27 +0200 Subject: [OmniOS-discuss] User/group with CIFS Message-ID: <55BE39AF.9070305@marzocchi.net> Hello, in my server (latest stable release) I use netatalk to share files with OS X and I use CIFS (kernel) for Windows. I never noticed until now that the files made by the two operating systems have mismatching and incompatible permissions. This prevents me from modifying or deleting under an operating system the files created with the other one. For example: -rw-r--r-- 1 olaf olaf 469 Aug 2 17:12 scaletta.txt owner@:rw-p--aARWcCos:-------:allow group@:r-----a-R-c--s:-------:allow everyone@:r-----a-R-c--s:-------:allow -rwx------+ 1 olaf olaf 469 Aug 2 17:12 scaletta2.txt user:olaf:rwxpdDaARWcCos:-------:allow group:2147483648:rwxpdDaARWcCos:-------:allow The first one was generated by netatalk and correctly shows the permissions according to the user/group I used to login, it also inherited extended ACL according to the parent folder. The second file was created by Windows 8.1, when connected to my server using SERVER at username as login (to be sure I am logging in with a user local to the server), and added strange ACL: user:olaf/group:2147483648 The server is not connected to any AD and I am using a normal workgroup setup. Where can I find some info to understand the issue? is there something obvious I missed? in my configuration? Thanks! Olaf Marzocchi From alka at hfg-gmuend.de Sun Aug 2 17:44:27 2015 From: alka at hfg-gmuend.de (Guenther Alka) Date: Sun, 02 Aug 2015 19:44:27 +0200 Subject: [OmniOS-discuss] User/group with CIFS In-Reply-To: <55BE39AF.9070305@marzocchi.net> References: <55BE39AF.9070305@marzocchi.net> Message-ID: <55BE56FB.1080203@hfg-gmuend.de> Netatalk and Solaris CIFS are incompatible regarding permissions in an AD environment or regarding groups. One problem hides in the + of -rwx------+ 1 olaf olaf 469 Aug 2 17:12 scaletta2.txt This means that there are ACLs defined. While netatalk uses classic Unix permissions (owner/group/everyone), Solaris CIFS works like Windows what means: It uses ACL only, uses Windows SID in AD environments and Windows SMB groups instead of Unix groups. The group:2147483648 is an SMB group that is unknown to netatalk while the group olaf is a Unix group that is unknown to Solaris CIFS. Your options are: 1. Avoid netatalk as it is dead and always a source of problemws. Apple switched to SMB so this is the future. (This is what I did). Currently you have the Problem that Illumos lacks SMB2 and OSX is slow with SMB1. Hope that this will come in the near future to OmniOS as it is in NexentaStor and Solaris 11.3. First tests on Solaris show that smb is there as fast as AFP. 2. Avoid AD, groups and set permissions on Windows only for users or use id mapping This is a workaround 3. Use SAMBA as it rely on Unix permissions as well (For what I use SMB, Solaris CIFS is superieur, so an option that I would avoid) my tip is 1. Gea Am 02.08.2015 17:39, schrieb Olaf Marzocchi: > Hello, > in my server (latest stable release) I use netatalk to share files > with OS X and I use CIFS (kernel) for Windows. > I never noticed until now that the files made by the two operating > systems have mismatching and incompatible permissions. This prevents > me from modifying or deleting under an operating system the files > created with the other one. > > For example: > > -rw-r--r-- 1 olaf olaf 469 Aug 2 17:12 scaletta.txt > owner@:rw-p--aARWcCos:-------:allow > group@:r-----a-R-c--s:-------:allow > everyone@:r-----a-R-c--s:-------:allow > -rwx------+ 1 olaf olaf 469 Aug 2 17:12 scaletta2.txt > user:olaf:rwxpdDaARWcCos:-------:allow > group:2147483648:rwxpdDaARWcCos:-------:allow > > The first one was generated by netatalk and correctly shows the > permissions according to the user/group I used to login, it also > inherited extended ACL according to the parent folder. > The second file was created by Windows 8.1, when connected to my > server using SERVER at username as login (to be sure I am logging in with > a user local to the server), and added strange ACL: > user:olaf/group:2147483648 > > The server is not connected to any AD and I am using a normal > workgroup setup. > > Where can I find some info to understand the issue? is there something > obvious I missed? in my configuration? > > Thanks! > Olaf Marzocchi > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimklimov at cos.ru Sun Aug 2 18:23:46 2015 From: jimklimov at cos.ru (Jim Klimov) Date: Sun, 02 Aug 2015 20:23:46 +0200 Subject: [OmniOS-discuss] supermicro A1SRM-2558F In-Reply-To: <20150802154654.0447080f@sleipner.datanom.net> References: <20150801210124.64866f09@sleipner.datanom.net> <974257D3-8EDF-45CD-98C7-7A9394E792E7@countermail.com> <20150801221616.0f866b93@sleipner.datanom.net> <5BC40ED3-BADE-4870-9679-BA0088408377@cos.ru> <20150802154654.0447080f@sleipner.datanom.net> Message-ID: <60866E25-ABEE-478C-8292-F8E51ECBFD1A@cos.ru> 2 ??????? 2015??. 15:46:54 CEST, Michael Rasmussen ?????: >On Sun, 02 Aug 2015 13:32:39 +0200 >Jim Klimov wrote: > >> >> Umm, roughly twice as many independent tasks, assuming you can >saturate your 4 cores? Think compile farms, mail relays with antispam, >webservers, databases with parallelisable queries, zfs with >compression, VMs, etc. Simply many background processes (dormant zones >and services) can require a bit of overhead in context switching (and >at a few thousand processes per core this can become a fulltime job of >its own); the more cores you have - the smaller hit you get per each. >> >> Even if you do not have such loads running fullthrottle all the time, >it is possible that the moment you do - more cores can help reach your >goals faster in wallclock time. > >This servers only job will primarily be exposing zvol's as iSCSI LUN's >and to a lesser extend shared storage via NFS for virtual servers. I >have one now doing the same job running on a 4 core Opteron (3350 HE) >and at no time it has been even close to saturate all 4 cores. I think >4 cores should suffice and for only 15W compared to 45w. FWIW, Oracle boasted that Fishworks storage had a lot of cores that were quite used in compression, perhaps dedup and encryption (not illumos case), btw you can plug VFS filters like antivirus into zfs since almost forever, etc. As well as processing work of all the server-services. At high loads even interrupt processing has a noticeable cost. So as usual ymmv - for your loads 4 cores may suffice ;) HTH, Jim -- Typos courtesy of K-9 Mail on my Samsung Android From mir at miras.org Sun Aug 2 18:46:41 2015 From: mir at miras.org (Michael Rasmussen) Date: Sun, 2 Aug 2015 20:46:41 +0200 Subject: [OmniOS-discuss] supermicro A1SRM-2558F In-Reply-To: <60866E25-ABEE-478C-8292-F8E51ECBFD1A@cos.ru> References: <20150801210124.64866f09@sleipner.datanom.net> <974257D3-8EDF-45CD-98C7-7A9394E792E7@countermail.com> <20150801221616.0f866b93@sleipner.datanom.net> <5BC40ED3-BADE-4870-9679-BA0088408377@cos.ru> <20150802154654.0447080f@sleipner.datanom.net> <60866E25-ABEE-478C-8292-F8E51ECBFD1A@cos.ru> Message-ID: <20150802204641.3e621569@sleipner.datanom.net> On Sun, 02 Aug 2015 20:23:46 +0200 Jim Klimov wrote: > > FWIW, Oracle boasted that Fishworks storage had a lot of cores that were quite used in compression, perhaps dedup and encryption (not illumos case), btw you can plug VFS filters like antivirus into zfs since almost forever, etc. As well as processing work of all the server-services. At high loads even interrupt processing has a noticeable cost. > No Windows? here so antivirus is not an issue;-) No dedub either. For encryption the CPU comes with Intel? QuickAssist Technology which by the time encryption is in illumos will be supported to (my best guess) -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: I'd give my right arm to be ambidextrous. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From stephan.budach at JVM.DE Mon Aug 3 06:06:10 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Mon, 3 Aug 2015 08:06:10 +0200 Subject: [OmniOS-discuss] User/group with CIFS In-Reply-To: <55BE56FB.1080203@hfg-gmuend.de> References: <55BE39AF.9070305@marzocchi.net> <55BE56FB.1080203@hfg-gmuend.de> Message-ID: <55BF04D2.8010701@jvm.de> If you want to, you can go with Samba and vfs:fruit. vfs:fruit has been developed by one of the main Netatalk developers and successfully overcomes a) the permissions issue b) the locking issue between Samba and Netatalk I actually haven't had the time to test that out, but I will do that in the near future. Cheers, budy From cj.keist at colostate.edu Tue Aug 4 18:38:05 2015 From: cj.keist at colostate.edu (CJ Keist) Date: Tue, 4 Aug 2015 12:38:05 -0600 Subject: [OmniOS-discuss] auto_master /net not allowing root permission Message-ID: <55C1068D.1050608@colostate.edu> All, Running OmniOS SunOS projects1 5.11 omnios-170cea2 and running into issue where mounting a NFS share on another server is not letting root to create/modify anything. On the OmniOS server auto_master: /net -hosts -nosuid,nobrowse /home auto_home -nobrowse On the NFS server I have the OmniOS server name set with root access as well as another server (CentOS) The CentOS server has no problems going to /net/nfsserver/mount and creating/deleting folders But the OmniOS system going through /net/nfsserver/mount gets permission denied when trying to create or delete anything. But if I manually mount the nfsserver share: mount nfsserver:/mount /mnt And then go into /mnt the OmniOS server can create/delete folders no issues. It's only through autofs that I get permission denied. Any Ideas??? root at projects1:/etc# sharectl get nfs servers=16 lockd_listen_backlog=32 lockd_servers=20 lockd_retransmit_timeout=5 grace_period=90 server_versmin=2 server_versmax=3 client_versmin=2 client_versmax=3 server_delegation=on nfsmapid_domain= max_connections=-1 protocol=ALL listen_backlog=32 device= mountd_listen_backlog=64 mountd_max_threads=16 -- C. J. Keist Email: cj.keist at colostate.edu Systems Group Manager Solaris 10 OS (SAI) Engineering Network Services Phone: 970-491-0630 College of Engineering, CSU Fax: 970-491-5569 Ft. Collins, CO 80523-1301 All I want is a chance to prove 'Money can't buy happiness' From vab at bb-c.de Tue Aug 4 20:57:31 2015 From: vab at bb-c.de (Volker A. Brandt) Date: Tue, 4 Aug 2015 22:57:31 +0200 Subject: [OmniOS-discuss] auto_master /net not allowing root permission In-Reply-To: <55C1068D.1050608@colostate.edu> References: <55C1068D.1050608@colostate.edu> Message-ID: <21953.10043.818307.256439@glaurung.bb-c.de> [...] > But the OmniOS system going through /net/nfsserver/mount gets > permission denied when trying to create or delete anything. But if > I manually mount the nfsserver share: > > mount nfsserver:/mount /mnt > > And then go into /mnt the OmniOS server can create/delete folders no > issues. It's only through autofs that I get permission denied. > > Any Ideas??? > > root at projects1:/etc# sharectl get nfs > servers=16 > lockd_listen_backlog=32 > lockd_servers=20 > lockd_retransmit_timeout=5 > grace_period=90 > server_versmin=2 > server_versmax=3 > client_versmin=2 > client_versmax=3 > server_delegation=on > nfsmapid_domain= > max_connections=-1 > protocol=ALL > listen_backlog=32 > device= > mountd_listen_backlog=64 > mountd_max_threads=16 You have explicitly disabled NFSv4, so you cannot take advantage of the username mapping features. Hence your userids must match. Which userid are you using to create/delete folders? What are the mount options in your automount configuration? How is the share exported on the CentOS side? Regards -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Oracle Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANY Email: vab at bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgr??e: 46 Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt "When logic and proportion have fallen sloppy dead" From lists at mcintyreweb.com Thu Aug 6 08:03:37 2015 From: lists at mcintyreweb.com (Hugh McIntyre) Date: Thu, 6 Aug 2015 01:03:37 -0700 Subject: [OmniOS-discuss] auto_master /net not allowing root permission In-Reply-To: <21953.10043.818307.256439@glaurung.bb-c.de> References: <55C1068D.1050608@colostate.edu> <21953.10043.818307.256439@glaurung.bb-c.de> Message-ID: <55C314D9.4040907@mcintyreweb.com> As well as Volker's suggestion, maybe type "mount" when the filesystem is mounted under /net or /mnt and compare the reported mount options? One other debug trick if the filesystem is mounted but gives you permission errors creating files is to go to a world writeable directory and type "touch abc". Then check the user/group matches what you expect, mainly on the server but also on the client. Is the NFS server "nfsserver" on OmniOS or Centos or ...? Hugh. On 8/4/15 1:57 PM, Volker A. Brandt wrote: > [...] >> But the OmniOS system going through /net/nfsserver/mount gets >> permission denied when trying to create or delete anything. But if >> I manually mount the nfsserver share: >> >> mount nfsserver:/mount /mnt >> >> And then go into /mnt the OmniOS server can create/delete folders no >> issues. It's only through autofs that I get permission denied. >> >> Any Ideas??? >> >> root at projects1:/etc# sharectl get nfs >> servers=16 >> lockd_listen_backlog=32 >> lockd_servers=20 >> lockd_retransmit_timeout=5 >> grace_period=90 >> server_versmin=2 >> server_versmax=3 >> client_versmin=2 >> client_versmax=3 >> server_delegation=on >> nfsmapid_domain= >> max_connections=-1 >> protocol=ALL >> listen_backlog=32 >> device= >> mountd_listen_backlog=64 >> mountd_max_threads=16 > > You have explicitly disabled NFSv4, so you cannot take advantage of > the username mapping features. Hence your userids must match. Which > userid are you using to create/delete folders? What are the mount > options in your automount configuration? How is the share exported > on the CentOS side? > > > Regards -- Volker > From cj.keist at colostate.edu Thu Aug 6 15:28:55 2015 From: cj.keist at colostate.edu (CJ Keist) Date: Thu, 6 Aug 2015 09:28:55 -0600 Subject: [OmniOS-discuss] auto_master /net not allowing root permission In-Reply-To: <21953.10043.818307.256439@glaurung.bb-c.de> References: <55C1068D.1050608@colostate.edu> <21953.10043.818307.256439@glaurung.bb-c.de> Message-ID: <55C37D37.2090703@colostate.edu> Thank you. Yes I do specifically disable V4 for NFS, not something we need in our environment and we have unified user ids. I am wanting root user on the OmniOS system to be able to create/delete/modify files on the NFS share. In this case the NFS server is a Dell NAS storage device running FluidFS (version 4). On the Dell NAS I have setup the NFS share to allow the OmniOS server root level access to the NFS mount point. As well as the CentOS server. The OmniOS server as root gets the permission denied error going through the autofs mount /net/nfsserver/mount but doesn't have any issues if the nfsserver is manually mounted. This doesn't make any since to me. On 8/4/15 2:57 PM, Volker A. Brandt wrote: > [...] >> But the OmniOS system going through /net/nfsserver/mount gets >> permission denied when trying to create or delete anything. But if >> I manually mount the nfsserver share: >> >> mount nfsserver:/mount /mnt >> >> And then go into /mnt the OmniOS server can create/delete folders no >> issues. It's only through autofs that I get permission denied. >> >> Any Ideas??? >> >> root at projects1:/etc# sharectl get nfs >> servers=16 >> lockd_listen_backlog=32 >> lockd_servers=20 >> lockd_retransmit_timeout=5 >> grace_period=90 >> server_versmin=2 >> server_versmax=3 >> client_versmin=2 >> client_versmax=3 >> server_delegation=on >> nfsmapid_domain= >> max_connections=-1 >> protocol=ALL >> listen_backlog=32 >> device= >> mountd_listen_backlog=64 >> mountd_max_threads=16 > You have explicitly disabled NFSv4, so you cannot take advantage of > the username mapping features. Hence your userids must match. Which > userid are you using to create/delete folders? What are the mount > options in your automount configuration? How is the share exported > on the CentOS side? > > > Regards -- Volker -- C. J. Keist Email: cj.keist at colostate.edu Systems Group Manager Solaris 10 OS (SAI) Engineering Network Services Phone: 970-491-0630 College of Engineering, CSU Fax: 970-491-5569 Ft. Collins, CO 80523-1301 All I want is a chance to prove 'Money can't buy happiness' From cj.keist at colostate.edu Thu Aug 6 15:35:35 2015 From: cj.keist at colostate.edu (CJ Keist) Date: Thu, 6 Aug 2015 09:35:35 -0600 Subject: [OmniOS-discuss] auto_master /net not allowing root permission In-Reply-To: <55C314D9.4040907@mcintyreweb.com> References: <55C1068D.1050608@colostate.edu> <21953.10043.818307.256439@glaurung.bb-c.de> <55C314D9.4040907@mcintyreweb.com> Message-ID: <55C37EC7.90302@colostate.edu> Here is the mount command for both the autofs and the manully mounted one. They both look the same: root at projects2:/net/nasstore2/projects# mount | egrep nasstore2 /mnt on nasstore2:/projects remote/read/write/setuid/devices/xattr/dev=8780014 on Wed Aug 5 15:10:13 2015 /net/nasstore2/projects on nasstore2:/projects remote/read/write/setuid/devices/xattr/dev=8780015 on Thu Aug 6 09:24:16 2015 Thank you for the suggestion of creating a world writable folder. I did this and is what I found. Going through the manual mount point /mnt, and creating a file it does get the root:root ownership. But going through the autofs /net/nasstore2/projects and creating a file it is getting 99:99 ownership on the file??? Is autofs running as different user than root on OmniOS? On 8/6/15 2:03 AM, Hugh McIntyre wrote: > As well as Volker's suggestion, maybe type "mount" when the filesystem > is mounted under /net or /mnt and compare the reported mount options? > > One other debug trick if the filesystem is mounted but gives you > permission errors creating files is to go to a world writeable > directory and type "touch abc". Then check the user/group matches > what you expect, mainly on the server but also on the client. > > Is the NFS server "nfsserver" on OmniOS or Centos or ...? > > Hugh. > > > On 8/4/15 1:57 PM, Volker A. Brandt wrote: >> [...] >>> But the OmniOS system going through /net/nfsserver/mount gets >>> permission denied when trying to create or delete anything. But if >>> I manually mount the nfsserver share: >>> >>> mount nfsserver:/mount /mnt >>> >>> And then go into /mnt the OmniOS server can create/delete folders no >>> issues. It's only through autofs that I get permission denied. >>> >>> Any Ideas??? >>> >>> root at projects1:/etc# sharectl get nfs >>> servers=16 >>> lockd_listen_backlog=32 >>> lockd_servers=20 >>> lockd_retransmit_timeout=5 >>> grace_period=90 >>> server_versmin=2 >>> server_versmax=3 >>> client_versmin=2 >>> client_versmax=3 >>> server_delegation=on >>> nfsmapid_domain= >>> max_connections=-1 >>> protocol=ALL >>> listen_backlog=32 >>> device= >>> mountd_listen_backlog=64 >>> mountd_max_threads=16 >> >> You have explicitly disabled NFSv4, so you cannot take advantage of >> the username mapping features. Hence your userids must match. Which >> userid are you using to create/delete folders? What are the mount >> options in your automount configuration? How is the share exported >> on the CentOS side? >> >> >> Regards -- Volker >> > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -- C. J. Keist Email: cj.keist at colostate.edu Systems Group Manager Solaris 10 OS (SAI) Engineering Network Services Phone: 970-491-0630 College of Engineering, CSU Fax: 970-491-5569 Ft. Collins, CO 80523-1301 All I want is a chance to prove 'Money can't buy happiness' From vab at bb-c.de Thu Aug 6 15:53:26 2015 From: vab at bb-c.de (Volker A. Brandt) Date: Thu, 6 Aug 2015 17:53:26 +0200 Subject: [OmniOS-discuss] auto_master /net not allowing root permission In-Reply-To: <55C37EC7.90302@colostate.edu> References: <55C1068D.1050608@colostate.edu> <21953.10043.818307.256439@glaurung.bb-c.de> <55C314D9.4040907@mcintyreweb.com> <55C37EC7.90302@colostate.edu> Message-ID: <21955.33526.152018.938763@glaurung.bb-c.de> CJ Keist writes: > Going through the manual mount point /mnt, and creating a file it > does get the root:root ownership. But going through the autofs > /net/nasstore2/projects and creating a file it is getting 99:99 > ownership on the file??? > > Is autofs running as different user than root on OmniOS? No, it is not, at least not on my OmniOS systems. :-) My guess is that this is something on the NFS server side, but I don't really know. What are your export options for "projects" on "nasstore2"? Regards -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Oracle Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANY Email: vab at bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgr??e: 46 Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt "When logic and proportion have fallen sloppy dead" From cj.keist at colostate.edu Thu Aug 6 16:16:56 2015 From: cj.keist at colostate.edu (CJ Keist) Date: Thu, 6 Aug 2015 10:16:56 -0600 Subject: [OmniOS-discuss] auto_master /net not allowing root permission In-Reply-To: <21955.33526.152018.938763@glaurung.bb-c.de> References: <55C1068D.1050608@colostate.edu> <21953.10043.818307.256439@glaurung.bb-c.de> <55C314D9.4040907@mcintyreweb.com> <55C37EC7.90302@colostate.edu> <21955.33526.152018.938763@glaurung.bb-c.de> Message-ID: <55C38878.10905@colostate.edu> Attached is picture of the NFS export options on the Dell NAS device. The server goku, is the CentOS system which doesn't have any issues create/deleting/modify files/folders as root both through autofs and manually mounted. The OmniOS system is projects2 and the IP address is the same OmniOS system. Just testing things out there. The FluidOS system does show that both the CentOS and the OmniOS system are connecting through using NFS v3. And automountd is running as root: root at projects2:/mnt/test# ps -ef | egrep autofs root 19684 19682 0 Aug 04 ? 0:00 /usr/lib/autofs/automountd root 25158 25039 0 10:08:49 pts/1 0:00 egrep autofs root 19682 1 0 Aug 04 ? 0:00 /usr/lib/autofs/automountd Not sure where userid of 99 and group id of 99 is coming from? They are not defined in /etc/passwd or /etc/group. On 8/6/15 9:53 AM, Volker A. Brandt wrote: > CJ Keist writes: >> Going through the manual mount point /mnt, and creating a file it >> does get the root:root ownership. But going through the autofs >> /net/nasstore2/projects and creating a file it is getting 99:99 >> ownership on the file??? >> >> Is autofs running as different user than root on OmniOS? > No, it is not, at least not on my OmniOS systems. :-) > > My guess is that this is something on the NFS server side, but I > don't really know. What are your export options for "projects" on > "nasstore2"? > > > Regards -- Volker -- C. J. Keist Email: cj.keist at colostate.edu Systems Group Manager Solaris 10 OS (SAI) Engineering Network Services Phone: 970-491-0630 College of Engineering, CSU Fax: 970-491-5569 Ft. Collins, CO 80523-1301 All I want is a chance to prove 'Money can't buy happiness' -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2015-08-06 at 10.06.08 AM.png Type: image/png Size: 43823 bytes Desc: not available URL: From vab at bb-c.de Thu Aug 6 19:27:07 2015 From: vab at bb-c.de (Volker A. Brandt) Date: Thu, 6 Aug 2015 21:27:07 +0200 Subject: [OmniOS-discuss] auto_master /net not allowing root permission In-Reply-To: <55C38878.10905@colostate.edu> References: <55C1068D.1050608@colostate.edu> <21953.10043.818307.256439@glaurung.bb-c.de> <55C314D9.4040907@mcintyreweb.com> <55C37EC7.90302@colostate.edu> <21955.33526.152018.938763@glaurung.bb-c.de> <55C38878.10905@colostate.edu> Message-ID: <21955.46347.277713.655434@glaurung.bb-c.de> > Attached is picture of the NFS export options on the Dell NAS > device. [...] > Not sure where userid of 99 and group id of 99 is coming from? They > are not defined in /etc/passwd or /etc/group. That is really strange. The next thing I would try is to do network traces of the traffic between client and server while the mount is done, and again when you create the folder. Then look at the captured NFS packets and compare the calls. Regards -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Oracle Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANY Email: vab at bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgr??e: 46 Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt "When logic and proportion have fallen sloppy dead" From lists at mcintyreweb.com Fri Aug 7 06:30:48 2015 From: lists at mcintyreweb.com (Hugh McIntyre) Date: Thu, 6 Aug 2015 23:30:48 -0700 Subject: [OmniOS-discuss] auto_master /net not allowing root permission In-Reply-To: <21955.46347.277713.655434@glaurung.bb-c.de> References: <55C1068D.1050608@colostate.edu> <21953.10043.818307.256439@glaurung.bb-c.de> <55C314D9.4040907@mcintyreweb.com> <55C37EC7.90302@colostate.edu> <21955.33526.152018.938763@glaurung.bb-c.de> <55C38878.10905@colostate.edu> <21955.46347.277713.655434@glaurung.bb-c.de> Message-ID: <55C45098.9060800@mcintyreweb.com> I would assume 99/99 are the "nobody"-style user/group that root accesses get mapped to when root is not trusted. This would be defined on the Dell server's passwd file, not on the client. I would agree with Volker that it's strange and the next step would be network tracing (wireshark or snoop, etc). Or any logs on the Dell server. The one other strangeness is your "mount" results: /mnt on nasstore2:/projects remote/read/write/setuid/devices/xattr/dev=8780014 on Wed Aug 5 15:10:13 2015 /net/nasstore2/projects on nasstore2:/projects remote/read/write/setuid/devices/xattr/dev=8780015 on Thu Aug 6 09:24:16 2015 Given that your auto_master says the following, I would have expected "nosetuid" from the /net case (and possibly nodevices too): /net -hosts -nosuid,nobrowse "nosuid" should be a client-side option though. Finally, the one other thing you could check is that in the past people have had strange permission issues because of the folder you mount on previously existing or having permissions other than 777. You might want to check /net with the automounter stopped? Or, try the /mnt case with "mount -o nosuid nasstore2:/projects /mnt" since this is what the automounter should be doing. This, also, is a long shot though since it should not trigger username mapping. Hugh. On 8/6/15 12:27 PM, Volker A. Brandt wrote: >> Attached is picture of the NFS export options on the Dell NAS >> device. > [...] >> Not sure where userid of 99 and group id of 99 is coming from? They >> are not defined in /etc/passwd or /etc/group. > > That is really strange. > > The next thing I would try is to do network traces of the traffic > between client and server while the mount is done, and again when > you create the folder. Then look at the captured NFS packets and > compare the calls. > > > Regards -- Volker > From henson at acm.org Fri Aug 7 20:10:06 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 07 Aug 2015 13:10:06 -0700 Subject: [OmniOS-discuss] upgrade to 151014 Message-ID: <20150807201005.GC3405@bender.unx.cpp.edu> So I finally got around to upgrading my home storage server to 014, went super smoothly (thanks Dan!). I was looking at the new mailwrapper stuff and noticed a typo in the mailer.conf man page, under the FILES section it says /etc/mail/mailer.conf instead of /etc/mailer.conf. Looks like that comes from upstream, I opened an issue for it. Not really worth fixing by itself, maybe I'll grab a few more man page bugs and submit a bundle. So, has anybody enabled the large_blocks zpool feature? My storage server is the backend for my mythtv dvr, and has a filesystem full of large files: 6.3G 1111_20150505040000.mpg 13G 1111_20150508030000.mpg 3.0G 1111_20150511030000.mpg 13G 1111_20150512030000.mpg 6.3G 1111_20150515030000.mpg 6.3G 1111_20150515040000.mpg 3.1G 1111_20150518030000.mpg 13G 1111_20150519030000.mpg along with small files (screenshots): 113K 1111_20150605040000.mpg.png 97K 1111_20150609030000.mpg.png 81K 1111_20150612030000.mpg.png 81K 1111_20150612040000.mpg.png 97K 1111_20150616030000.mpg.png 97K 1111_20150623030000.mpg.png 97K 1111_20150626040000.mpg.png 97K 1111_20150630030000.mpg.png Would enabling large blocks and bumping the record size to 1M improve the efficiency or performance of a filesystem with large files like this? Would there be any negative consequences for the small files it also contains? Based on the originating bug 5027 it would only really be an improvement when multiple videos are simultaneously streamed. I've only got two TV's, so don't really stream more than two videos, although I've got four tuners, so there might be up to four streams being written, two being read, plus real time commercial flagging of shows currently being recorded, so overall a max of 4 write streams and 6 read streams... Just wondering if anybody has played with it in production and might have some feedback. Thanks... From bfriesen at simple.dallas.tx.us Fri Aug 7 21:54:53 2015 From: bfriesen at simple.dallas.tx.us (Bob Friesenhahn) Date: Fri, 7 Aug 2015 16:54:53 -0500 (CDT) Subject: [OmniOS-discuss] upgrade to 151014 In-Reply-To: <20150807201005.GC3405@bender.unx.cpp.edu> References: <20150807201005.GC3405@bender.unx.cpp.edu> Message-ID: On Fri, 7 Aug 2015, Paul B. Henson wrote: > > Would enabling large blocks and bumping the record size to 1M improve > the efficiency or performance of a filesystem with large files like > this? Would there be any negative consequences for the small files it > also contains? Based on the originating bug 5027 it would only really be With the large blocks, it is necessary to wait for the whole large block to be read each time a fresh block is read. This adds latency. The ARC would do caching in units of the large blocks. The zfs read-ahead algorithm is based on reading zfs blocks and the large blocks will slow the acceleration rate (and tuning/resolution) of the read-ahead algorithm (but would have an intial head-start). If the small files are smaller than 1MB or compression is enabled, then it seems like there should not be much impact for small files. Be aware that copy-on-write is in units of zfs blocks and so 1MB leads to very large copy-on-write operations. This could harm performance if files are updated in place. The 1MB blocks are interesting to experiment with but I would not use them without observing positive impact in real usage. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From henson at acm.org Fri Aug 7 22:02:34 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 07 Aug 2015 15:02:34 -0700 Subject: [OmniOS-discuss] upgrade to 151014 In-Reply-To: References: <20150807201005.GC3405@bender.unx.cpp.edu> Message-ID: <20150807220234.GD3405@bender.unx.cpp.edu> On Fri, Aug 07, 2015 at 04:54:53PM -0500, Bob Friesenhahn wrote: > The 1MB blocks are interesting to experiment with but I would not use > them without observing positive impact in real usage. Thanks for the info; everything is working fine right now so I guess I won't experiment, don't want the family to show up with pitchforks when they can't watch their shows :). From gmason at msu.edu Fri Aug 7 22:10:16 2015 From: gmason at msu.edu (Greg Mason) Date: Fri, 7 Aug 2015 18:10:16 -0400 Subject: [OmniOS-discuss] upgrade to 151014 In-Reply-To: References: <20150807201005.GC3405@bender.unx.cpp.edu> Message-ID: <94EEB5BD-F811-4223-A7D8-8FC15FE65E98@msu.edu> > > The 1MB blocks are interesting to experiment with but I would not use them without observing positive impact in real usage. > If you have an application that does IO in large blocks, the large block support in ZFS should make this faster. If your application only does small IOs (regardless of file size), then my recommendation would be to stick with the 128k record size. This was really added for things like Lustre on ZFS, which *really* likes 1MB IOs. Speaking as such a user, it is *awesome* and allows us to get much closer to the performance that the hardware is capable of. Large block sequential writes are massively improved when compared to ZFS with 128k record size. -Greg From henson at acm.org Fri Aug 7 22:59:32 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 07 Aug 2015 15:59:32 -0700 Subject: [OmniOS-discuss] upgrade to 151014 In-Reply-To: <94EEB5BD-F811-4223-A7D8-8FC15FE65E98@msu.edu> References: <20150807201005.GC3405@bender.unx.cpp.edu> <94EEB5BD-F811-4223-A7D8-8FC15FE65E98@msu.edu> Message-ID: <20150807225931.GE3405@bender.unx.cpp.edu> On Fri, Aug 07, 2015 at 06:10:16PM -0400, Greg Mason wrote: > If you have an application that does IO in large blocks, the large > block support in ZFS should make this faster. If your application only > does small IOs (regardless of file size), then my recommendation would > be to stick with the 128k record size. Hmm, the filesystem is shared over nfs to a linux box with the nfs read/write block size set at 1M. I'm not really sure what block size mythtv uses for I/O, I'll have to check. From cj.keist at colostate.edu Mon Aug 10 13:49:31 2015 From: cj.keist at colostate.edu (CJ Keist) Date: Mon, 10 Aug 2015 07:49:31 -0600 Subject: [OmniOS-discuss] auto_master /net not allowing root permission In-Reply-To: <21955.46347.277713.655434@glaurung.bb-c.de> References: <55C1068D.1050608@colostate.edu> <21953.10043.818307.256439@glaurung.bb-c.de> <55C314D9.4040907@mcintyreweb.com> <55C37EC7.90302@colostate.edu> <21955.33526.152018.938763@glaurung.bb-c.de> <55C38878.10905@colostate.edu> <21955.46347.277713.655434@glaurung.bb-c.de> Message-ID: <55C8ABEB.9070000@colostate.edu> Thanks for the tips. Right now don't have time to further trouble shoot this. Since the manual mounting works, it is all I need to start migrating data off of this server to the Dell NAS unit. Retiring this old OmniOS file server. On 8/6/15 1:27 PM, Volker A. Brandt wrote: >> Attached is picture of the NFS export options on the Dell NAS >> device. > [...] >> Not sure where userid of 99 and group id of 99 is coming from? They >> are not defined in /etc/passwd or /etc/group. > That is really strange. > > The next thing I would try is to do network traces of the traffic > between client and server while the mount is done, and again when > you create the folder. Then look at the captured NFS packets and > compare the calls. > > > Regards -- Volker -- C. J. Keist Email: cj.keist at colostate.edu Systems Group Manager Solaris 10 OS (SAI) Engineering Network Services Phone: 970-491-0630 College of Engineering, CSU Fax: 970-491-5569 Ft. Collins, CO 80523-1301 All I want is a chance to prove 'Money can't buy happiness' From henson at acm.org Tue Aug 11 02:49:23 2015 From: henson at acm.org (Paul B. Henson) Date: Mon, 10 Aug 2015 19:49:23 -0700 Subject: [OmniOS-discuss] openssh on omnios Message-ID: <20150811024922.GM3405@bender.unx.cpp.edu> So I ran into the sunssh compatibility issue again on my recently updated to 014 omnios box: no common kex alg: client 'diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1', server 'curve25519-sha256 at libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1' I thought I'd try switching to openssh again, but unfortunately the packaging system still says that's a no-go: # pkg install network/openssh Creating Plan (Checking for conflicting actions): / pkg install: The following packages all deliver file actions to usr/bin/ssh-add: pkg://omnios/network/openssh at 6.7.1,5.11-0.151014:20150402T175112Z pkg://omnios/network/ssh at 0.5.11,5.11-0.151014:20150727T054652Z These packages may not be installed together. Any non-conflicting set may be, or the packages must be corrected before they can be installed. # pkg uninstall pkg://omnios/network/ssh Creating Plan (Solver setup): /pkg uninstall: Cannot remove 'pkg://omnios/network/ssh at 0.5.11,5.11-0.151014:20150727T054652Z' due to the following packages that depend on it: pkg://omnios/entire at 11,5.11-0.151014:20150727T183612Z So you can't install openssh without uninstalling ssh, and you can't uninstall ssh (at least without breaking the 'entire' incorporation). Same for openssh-server. Dan, what's the point of having openssh available as an OS package if we can't use it :)? Any suggestions? Thanks... From danmcd at omniti.com Tue Aug 11 03:11:24 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 10 Aug 2015 23:11:24 -0400 Subject: [OmniOS-discuss] openssh on omnios In-Reply-To: <20150811024922.GM3405@bender.unx.cpp.edu> References: <20150811024922.GM3405@bender.unx.cpp.edu> Message-ID: There may be a fix in bloody that needs to get backported. Lauri -- you around? Dan Sent from my iPhone (typos, autocorrect, and all) > On Aug 10, 2015, at 10:49 PM, Paul B. Henson wrote: > > So I ran into the sunssh compatibility issue again on my recently > updated to 014 omnios box: > > no common kex alg: client > 'diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1', server > 'curve25519-sha256 at libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1' > > I thought I'd try switching to openssh again, but unfortunately the > packaging system still says that's a no-go: > > # pkg install network/openssh > Creating Plan (Checking for conflicting actions): / > pkg install: The following packages all deliver file actions to > usr/bin/ssh-add: > > pkg://omnios/network/openssh at 6.7.1,5.11-0.151014:20150402T175112Z > pkg://omnios/network/ssh at 0.5.11,5.11-0.151014:20150727T054652Z > > These packages may not be installed together. Any non-conflicting > set may be, or the packages must be corrected before they can be installed. > > > # pkg uninstall pkg://omnios/network/ssh > Creating Plan (Solver setup): /pkg uninstall: Cannot remove > 'pkg://omnios/network/ssh at 0.5.11,5.11-0.151014:20150727T054652Z' due to > the following packages that depend on it: > pkg://omnios/entire at 11,5.11-0.151014:20150727T183612Z > > So you can't install openssh without uninstalling ssh, and you can't > uninstall ssh (at least without breaking the 'entire' incorporation). > Same for openssh-server. > > Dan, what's the point of having openssh available as an OS package if we > can't use it :)? Any suggestions? > > Thanks... > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From lotheac at iki.fi Tue Aug 11 06:33:35 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Tue, 11 Aug 2015 09:33:35 +0300 Subject: [OmniOS-discuss] openssh on omnios In-Reply-To: References: <20150811024922.GM3405@bender.unx.cpp.edu> Message-ID: <20150811063335.GD7722@gutsman.lotheac.fi> On Mon, Aug 10 2015 23:11:24 -0400, Dan McDonald wrote: > There may be a fix in bloody that needs to get backported. Lauri -- > you around? Not very much this week, but what do you need? I'd also like to see this backported :) -- Lauri Tirkkonen | lotheac @ IRCnet From vab at bb-c.de Tue Aug 11 08:06:24 2015 From: vab at bb-c.de (Volker A. Brandt) Date: Tue, 11 Aug 2015 10:06:24 +0200 Subject: [OmniOS-discuss] openssh on omnios In-Reply-To: <20150811024922.GM3405@bender.unx.cpp.edu> References: <20150811024922.GM3405@bender.unx.cpp.edu> Message-ID: <21961.44288.284142.611149@glaurung.bb-c.de> Paul B. Henson writes: > Dan, what's the point of having openssh available as an OS package > if we can't use it :)? Any suggestions? ISP mediator? Regards -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Oracle Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANY Email: vab at bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgr??e: 46 Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt "When logic and proportion have fallen sloppy dead" From vab at bb-c.de Tue Aug 11 08:20:29 2015 From: vab at bb-c.de (Volker A. Brandt) Date: Tue, 11 Aug 2015 10:20:29 +0200 Subject: [OmniOS-discuss] openssh on omnios In-Reply-To: <21961.44288.284142.611149@glaurung.bb-c.de> References: <20150811024922.GM3405@bender.unx.cpp.edu> <21961.44288.284142.611149@glaurung.bb-c.de> Message-ID: <21961.45133.867624.467448@glaurung.bb-c.de> Volker A. Brandt writes: > Paul B. Henson writes: > > Dan, what's the point of having openssh available as an OS package > > if we can't use it :)? Any suggestions? > > ISP mediator? Ack! The thing is still called IPS. :-) Regards -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Oracle Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANY Email: vab at bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgr??e: 46 Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt "When logic and proportion have fallen sloppy dead" From danmcd at omniti.com Tue Aug 11 13:57:55 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 11 Aug 2015 09:57:55 -0400 Subject: [OmniOS-discuss] openssh on omnios In-Reply-To: <20150811063335.GD7722@gutsman.lotheac.fi> References: <20150811024922.GM3405@bender.unx.cpp.edu> <20150811063335.GD7722@gutsman.lotheac.fi> Message-ID: > On Aug 11, 2015, at 2:33 AM, Lauri Tirkkonen wrote: > > > Not very much this week, but what do you need? I'd also like to see this > backported :) I was curious if you knew of any barriers to simply cherry-picking these: commit a683605fd87f9ceed41cf5b872d3ec3ceaf7d5c0 Author: Lauri Tirkkonen Date: Wed Jul 15 00:54:44 2015 +0300 use ssh-keygen -A for openssh host key generation commit a50eefa08235676001af9a019a178b7f21f22198 Author: Lauri Tirkkonen Date: Wed Apr 8 23:15:21 2015 +0300 openssh smf method script should use /bin/sh It's probably not a good idea to source smf_include from a different shell than what it was written for. commit 9426a4089028d6f84954d7ca9fe2c14d4e58fb4c Author: Lauri Tirkkonen Date: Wed Apr 8 23:00:12 2015 +0300 don't depend on pidfiles in openssh refresh commit 861832d4b8007048dbdfe5b8c6381b70b5b52fbb Author: Lauri Tirkkonen Date: Wed Apr 8 22:58:20 2015 +0300 generate ed25519 host keys for openssh commit 715f828ffc979f214bb1e0c49364836e35873f6c Author: Lauri Tirkkonen Date: Wed Apr 8 22:16:35 2015 +0300 fix openssh/sunssh exclude, allow openssh in entire commit c5c1222d2a2d8ea4cd70bd61534161b0f0cd9cc9 Author: Lauri Tirkkonen Date: Wed Apr 8 20:23:28 2015 +0300 fix openssh ssh.xml reference to sshd manual section ... for r151014? Dan From lotheac at iki.fi Tue Aug 11 14:10:51 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Tue, 11 Aug 2015 17:10:51 +0300 Subject: [OmniOS-discuss] openssh on omnios In-Reply-To: References: <20150811024922.GM3405@bender.unx.cpp.edu> <20150811063335.GD7722@gutsman.lotheac.fi> Message-ID: <20150811141051.GA9505@gutsman.lotheac.fi> On Tue, Aug 11 2015 09:57:55 -0400, Dan McDonald wrote: > > On Aug 11, 2015, at 2:33 AM, Lauri Tirkkonen wrote: > > > > > > Not very much this week, but what do you need? I'd also like to see this > > backported :) > > I was curious if you knew of any barriers to simply cherry-picking these: [snip] > > ... for r151014? Not off the top of my head, no. Releasing does require publishing the new entire package as well as openssh, but no modifications to sunssh packages required, if memory serves (can't check until next week, phone-only for the time being). -- Lauri Tirkkonen | lotheac @ IRCnet From danmcd at omniti.com Tue Aug 11 14:16:28 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 11 Aug 2015 10:16:28 -0400 Subject: [OmniOS-discuss] openssh on omnios In-Reply-To: <20150811141051.GA9505@gutsman.lotheac.fi> References: <20150811024922.GM3405@bender.unx.cpp.edu> <20150811063335.GD7722@gutsman.lotheac.fi> <20150811141051.GA9505@gutsman.lotheac.fi> Message-ID: <54AAAD81-AEC9-4620-B837-5E0D1382ED18@omniti.com> > On Aug 11, 2015, at 10:10 AM, Lauri Tirkkonen wrote: > >> >> I was curious if you knew of any barriers to simply cherry-picking these: > [snip] >> >> ... for r151014? > > Not off the top of my head, no. I was hoping you'd say that. > Releasing does require publishing the > new entire package as well as openssh, but no modifications to sunssh > packages required, if memory serves (can't check until next week, > phone-only for the time being). I think the packaging update may be a bit more complicated than just pushing out openssh, but I don't think it's untenable. Thanks, Dan From alka at hfg-gmuend.de Tue Aug 11 15:28:03 2015 From: alka at hfg-gmuend.de (=?utf-8?Q?G=C3=BCnther_Alka?=) Date: Tue, 11 Aug 2015 17:28:03 +0200 Subject: [OmniOS-discuss] stable combinations of OmniOS 151014 andNFS to ESXi and state of open-vm tools (vmxnet3) In-Reply-To: <1A3D1B20-9C33-4514-9956-5303F6BD320A@omniti.com> References: <1A3D1B20-9C33-4514-9956-5303F6BD320A@omniti.com> Message-ID: In my own setups, I use OmniOS 151014 (initial release) with NFS to ESXi 5.5U2 which is very stable with vmxnet3 and e1000 (original VMware tools) In the last time I got several mails about problems on some configurations Stable configurations 5.5U2 + OmniOS 151014 (April) 6.00b + OmniOS 151014 (April) problems with 5.5U2 + OmniOS 151014 (July) 6.0 initial release + NFS 6.00b + OmniOS 151014 (July) example http://hardforum.com/showpost.php?p=1041787564&postcount=6873 Are there others experiencing problems or stabiliy on these combinations? beside that Are there news about open vmware tools in OmniOS especially regarding the missing vmxnet3 driver -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at will.to Tue Aug 11 17:31:20 2015 From: doug at will.to (Doug Hughes) Date: Tue, 11 Aug 2015 13:31:20 -0400 Subject: [OmniOS-discuss] stable combinations of OmniOS 151014 andNFS to ESXi and state of open-vm tools (vmxnet3) In-Reply-To: References: <1A3D1B20-9C33-4514-9956-5303F6BD320A@omniti.com> Message-ID: Gunther, are you using vmware esx as an NFS client to OmniOS or are using OmniOS as an NFS client to ESX? On Tue, Aug 11, 2015 at 11:28 AM, G?nther Alka wrote: > In my own setups, I use OmniOS 151014 (initial release) with NFS to ESXi > 5.5U2 which is very stable with vmxnet3 and e1000 (original VMware tools) > In the last time I got several mails about problems on some configurations > > Stable configurations > 5.5U2 + OmniOS 151014 (April) > 6.00b + OmniOS 151014 (April) > > problems with > 5.5U2 + OmniOS 151014 (July) > 6.0 initial release + NFS > 6.00b + OmniOS 151014 (July) > > example > http://hardforum.com/showpost.php?p=1041787564&postcount=6873 > > > Are there others experiencing problems or stabiliy on these combinations? > > > > beside that > Are there news about open vmware tools in OmniOS especially regarding the > missing vmxnet3 driver > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephan.budach at JVM.DE Wed Aug 12 14:04:25 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 12 Aug 2015 16:04:25 +0200 Subject: [OmniOS-discuss] ZFS/COMSTAR - zpool reports errors Message-ID: <55CB5269.5070302@jvm.de> Hi everyone, yesterday I was alerted about one of my zpools reporting an uncorrectable error. When I checked that, I was presented with some sort of generic error at one of my iSCSI zvols: root at nfsvmpool08:/root# zpool status -v sataTank pool: sataTank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub repaired 0 in 6h2m with 0 errors on Wed Aug 12 05:08:51 2015 config: NAME STATE READ WRITE CKSUM sataTank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t5000CCA22BC4ACEDd0 ONLINE 0 0 0 c1t5000CCA22BC51C04d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c1t5000CCA22BC4896Dd0 ONLINE 0 0 0 c1t5000CCA22BC4B18Ed0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c1t5000CCA22BC4AFFBd0 ONLINE 0 0 0 c1t5000CCA22BC5135Ed0 ONLINE 0 0 0 logs mirror-3 ONLINE 0 0 0 c1t50015179596C598Ed0p2 ONLINE 0 0 0 c1t50015179596B0A1Fd0p2 ONLINE 0 0 0 cache c1t5001517959680E33d0 ONLINE 0 0 0 c1t50015179596B0A1Fd0p3 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: sataTank/nfsvmpool08-sata-03:<0x1> root at nfsvmpool08:/root# As you can see, I ran a scrub, but that one didn't find any issue with any of the data in the pool. Checking fmdump also revealed nothing, so I wonder what I am to do about that? I recall from somewhere in my head, that I had seen a topic like this had been discussed before, but I seem to cannot find it anywhere. This must have been either on this list or the OpenIndiana list. This zvol is actually part of a mirror RAC volume group, so I went to the RAC nodes, but neither of them notices anythinf strange as well? So, my main question is: how can I diagnose this further, if possible= Thanks, Stephan From stephan.budach at JVM.DE Wed Aug 12 14:50:49 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 12 Aug 2015 16:50:49 +0200 Subject: [OmniOS-discuss] ZFS/COMSTAR - zpool reports errors In-Reply-To: <55CB5269.5070302@jvm.de> References: <55CB5269.5070302@jvm.de> Message-ID: <55CB5D49.7050705@jvm.de> Am 12.08.15 um 16:04 schrieb Stephan Budach: > Hi everyone, > > yesterday I was alerted about one of my zpools reporting an > uncorrectable error. When I checked that, I was presented with some > sort of generic error at one of my iSCSI zvols: > > root at nfsvmpool08:/root# zpool status -v sataTank > pool: sataTank > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A > scan: scrub repaired 0 in 6h2m with 0 errors on Wed Aug 12 05:08:51 > 2015 > config: > > NAME STATE READ WRITE CKSUM > sataTank ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c1t5000CCA22BC4ACEDd0 ONLINE 0 0 0 > c1t5000CCA22BC51C04d0 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > c1t5000CCA22BC4896Dd0 ONLINE 0 0 0 > c1t5000CCA22BC4B18Ed0 ONLINE 0 0 0 > mirror-2 ONLINE 0 0 0 > c1t5000CCA22BC4AFFBd0 ONLINE 0 0 0 > c1t5000CCA22BC5135Ed0 ONLINE 0 0 0 > logs > mirror-3 ONLINE 0 0 0 > c1t50015179596C598Ed0p2 ONLINE 0 0 0 > c1t50015179596B0A1Fd0p2 ONLINE 0 0 0 > cache > c1t5001517959680E33d0 ONLINE 0 0 0 > c1t50015179596B0A1Fd0p3 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > sataTank/nfsvmpool08-sata-03:<0x1> > root at nfsvmpool08:/root# > > As you can see, I ran a scrub, but that one didn't find any issue with > any of the data in the pool. Checking fmdump also revealed nothing, so > I wonder what I am to do about that? I recall from somewhere in my > head, that I had seen a topic like this had been discussed before, but > I seem to cannot find it anywhere. This must have been either on this > list or the OpenIndiana list. > > This zvol is actually part of a mirror RAC volume group, so I went to > the RAC nodes, but neither of them notices anythinf strange as well? > > So, my main question is: how can I diagnose this further, if possible= > > Thanks, > Stephan > > _______________________________________________ Ahh? that was too soon? ;) Actually one of the RAC nodes noticed an error at or rather a couple of mintues before this issue, when it reported this: Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Unhandled sense code Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Result: hostbyte=invalid driverbyte=DRIVER_SENSE Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Sense Key : Medium Error [current] Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Add. Sense: Unrecovered read error Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] CDB: Read(10): 28 00 60 b2 54 28 00 00 80 00 Aug 11 20:25:04 btierasm01 kernel: end_request: critical target error, dev sdx, sector 1622299688 Aug 11 20:25:04 btierasm01 kernel: end_request: critical target error, dev dm-25, sector 1622299688 Aug 11 20:25:04 btierasm01 kernel: ADVMK-0020: A read error was reported to the ASM instance for volume vg_nfs07fwd-16 in diskgroup DG_NFS07SA This vg_nfs07fwd-16 is a RAC volume, which is presented via NFS from the RAC cluster nodes to some Oracle VM hosts, but neither of those hosts had any issues with that volume at any time, so I assmume the request came from the RAC node itself and I will dig into the logs to see, what it actually treid to do with the volume. I am still wondering if this issue is somewhat related to COMSTAR or the zpool itself. Thanks, Stephan From mir at miras.org Wed Aug 12 15:19:38 2015 From: mir at miras.org (Michael Rasmussen) Date: Wed, 12 Aug 2015 17:19:38 +0200 Subject: [OmniOS-discuss] ZFS/COMSTAR - zpool reports errors In-Reply-To: <55CB5D49.7050705@jvm.de> References: <55CB5269.5070302@jvm.de> <55CB5D49.7050705@jvm.de> Message-ID: <20150812171938.1f414444@sleipner.datanom.net> On Wed, 12 Aug 2015 16:50:49 +0200 Stephan Budach wrote: > Ahh? that was too soon? ;) Actually one of the RAC nodes noticed an > error at or rather a couple of mintues before this issue, when it > reported this: > > Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Unhandled sense code > Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Result: > hostbyte=invalid driverbyte=DRIVER_SENSE > Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Sense Key : Medium > Error [current] > Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Add. Sense: > Unrecovered read error > Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] CDB: Read(10): 28 > 00 60 b2 54 28 00 00 80 00 > Aug 11 20:25:04 btierasm01 kernel: end_request: critical target error, > dev sdx, sector 1622299688 > Aug 11 20:25:04 btierasm01 kernel: end_request: critical target error, > dev dm-25, sector 1622299688 > Aug 11 20:25:04 btierasm01 kernel: ADVMK-0020: A read error was reported > to the ASM instance for volume vg_nfs07fwd-16 in diskgroup DG_NFS07SA > > This vg_nfs07fwd-16 is a RAC volume, which is presented via NFS from the > RAC cluster nodes to some Oracle VM hosts, but neither of those hosts > had any issues with that volume at any time, so I assmume the request > came from the RAC node itself and I will dig into the logs to see, what > it actually treid to do with the volume. > > I am still wondering if this issue is somewhat related to COMSTAR or the > zpool itself. > I wonder whether this is a hardware issue (eg driver firmware). What if the firmware has marked a sector bad and have moved it elsewhere. Could one imagine that this move have taking place unnoticed to ZFS? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Terminate input by end-of-file or marker, not by count. - The Elements of Programming Style (Kernighan & Plaugher) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From stephan.budach at JVM.DE Wed Aug 12 17:13:12 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 12 Aug 2015 19:13:12 +0200 Subject: [OmniOS-discuss] ZFS/COMSTAR - zpool reports errors In-Reply-To: <20150812171938.1f414444@sleipner.datanom.net> References: <55CB5269.5070302@jvm.de> <55CB5D49.7050705@jvm.de> <20150812171938.1f414444@sleipner.datanom.net> Message-ID: <55CB7EA8.50806@jvm.de> Am 12.08.15 um 17:19 schrieb Michael Rasmussen: > On Wed, 12 Aug 2015 16:50:49 +0200 > Stephan Budach wrote: > >> Ahh? that was too soon? ;) Actually one of the RAC nodes noticed an >> error at or rather a couple of mintues before this issue, when it >> reported this: >> >> Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Unhandled sense code >> Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Result: >> hostbyte=invalid driverbyte=DRIVER_SENSE >> Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Sense Key : Medium >> Error [current] >> Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] Add. Sense: >> Unrecovered read error >> Aug 11 20:25:04 btierasm01 kernel: sd 75:0:0:1: [sdx] CDB: Read(10): 28 >> 00 60 b2 54 28 00 00 80 00 >> Aug 11 20:25:04 btierasm01 kernel: end_request: critical target error, >> dev sdx, sector 1622299688 >> Aug 11 20:25:04 btierasm01 kernel: end_request: critical target error, >> dev dm-25, sector 1622299688 >> Aug 11 20:25:04 btierasm01 kernel: ADVMK-0020: A read error was reported >> to the ASM instance for volume vg_nfs07fwd-16 in diskgroup DG_NFS07SA >> >> This vg_nfs07fwd-16 is a RAC volume, which is presented via NFS from the >> RAC cluster nodes to some Oracle VM hosts, but neither of those hosts >> had any issues with that volume at any time, so I assmume the request >> came from the RAC node itself and I will dig into the logs to see, what >> it actually treid to do with the volume. >> >> I am still wondering if this issue is somewhat related to COMSTAR or the >> zpool itself. >> > I wonder whether this is a hardware issue (eg driver firmware). What if > the firmware has marked a sector bad and have moved it elsewhere. Could > one imagine that this move have taking place unnoticed to ZFS? > I think, if the drive's firmware moves a sector, then it would go unnoticed to ZFS, but then it shouldn't have any effect on the zpool itself, since in order to remap a sector, the drive had to be able to finally read it succesfully, no? Otherwise the zpool would have to encounter some read errors which should be in fmdump? but then, maybe not. The drives are all HGST HUS72404-A3B0, connected to a LSI 9207-8i with FW 19. The other questions than is, how can this error be cleared, as a scrub doesn't turn up any data to be faulty. From sim.ple at live.nl Thu Aug 13 15:47:50 2015 From: sim.ple at live.nl (Randy S) Date: Thu, 13 Aug 2015 17:47:50 +0200 Subject: [OmniOS-discuss] dell 730xd with md3060e jbod Message-ID: Hi, A while ago I had a moment to test a 730XD with a md3060e jbod. I have read the other threads regarding the 730 usability. I had no problems with it using omnios R12. However the use of the JBOD did raise an issue regarding blinking disk leds. I noticed that the signals send with sas2ircu were not doing their job (nothing blinks). After some calls to dell technicians, I heard that dell has disabled these signals in their firmware and only allows signalling through their own tool, which only works with windows and some linux flavours. At that time I hear about santools which "might propably" be used for this blinking functionality (and more), but you have to buy it to test it. Bit expensive for a test. After this long intro, my question is: Does anybody know of a another way (script, tools etc) to get this blinking functionality going in this hardware combination (ofcourse NOT using dd) ? As I understood, this same combination is used, as a standard, by a another (big) illumos kernel user and their systems also do not blink disks. I however, would like to be able to find a disk easilly to e.g. replace and not only depend on the internal disk tests which the JBOD seems to do by itself regardless of the OS used. (Dell told me theJBOD detects defect disks by itself by performing some periodic tests. How it does this I do not know). Regards, R -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.elling at richardelling.com Thu Aug 13 19:11:29 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Thu, 13 Aug 2015 12:11:29 -0700 Subject: [OmniOS-discuss] dell 730xd with md3060e jbod In-Reply-To: References: Message-ID: <30D0EAF8-E73C-4EF0-B145-F029ED060002@richardelling.com> > On Aug 13, 2015, at 8:47 AM, Randy S wrote: > > Hi, > > A while ago I had a moment to test a 730XD with a md3060e jbod. > I have read the other threads regarding the 730 usability. > I had no problems with it using omnios R12. However the use of the JBOD did raise an issue regarding blinking disk leds. > > I noticed that the signals send with sas2ircu were not doing their job (nothing blinks). After some calls to dell technicians, I heard > that dell has disabled these signals in their firmware and only allows signalling through their own tool, which only works with windows > and some linux flavours. > > At that time I hear about santools which "might propably" be used for this blinking functionality (and more), but you have to buy it to test it. > Bit expensive for a test. > > After this long intro, my question is: > Does anybody know of a another way (script, tools etc) to get this blinking functionality going in this hardware combination (ofcourse NOT using dd) ? Try fmtopo first. There are 3 indicators defined: fail, ident, ok2rm. Many SES vendors only implement fail and ident. Here an example: /usr/lib/fm/fmd/fmtopo -P facility.mode=uint32:1 hc://:chassis-mfg=XYZ:chassis-name=XYZ-ABC:chassis-part=unknown:chassis-serial=500093d0016cc76/ses-enclosure=0/bay=1?indicator=fail to de-assert the indicator, facility.mode=uint32:0 to find the FMRI string, use fmtopo to observe what your hardware reports. -- richard > > As I understood, this same combination is used, as a standard, by a another (big) illumos kernel user and their systems also do not blink disks. I however, > would like to be able to find a disk easilly to e.g. replace and not only depend on the internal disk tests which the JBOD seems to do by itself regardless of the OS used. > > (Dell told me theJBOD detects defect disks by itself by performing some periodic tests. How it does this I do not know). > > Regards, > > R > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From sim.ple at live.nl Fri Aug 14 07:38:56 2015 From: sim.ple at live.nl (Randy S) Date: Fri, 14 Aug 2015 09:38:56 +0200 Subject: [OmniOS-discuss] dell 730xd with md3060e jbod In-Reply-To: <30D0EAF8-E73C-4EF0-B145-F029ED060002@richardelling.com> References: , <30D0EAF8-E73C-4EF0-B145-F029ED060002@richardelling.com> Message-ID: Thanks Richard! Will try this Subject: Re: [OmniOS-discuss] dell 730xd with md3060e jbod From: richard.elling at richardelling.com Date: Thu, 13 Aug 2015 12:11:29 -0700 CC: omnios-discuss at lists.omniti.com To: sim.ple at live.nl On Aug 13, 2015, at 8:47 AM, Randy S wrote: Hi, A while ago I had a moment to test a 730XD with a md3060e jbod. I have read the other threads regarding the 730 usability. I had no problems with it using omnios R12. However the use of the JBOD did raise an issue regarding blinking disk leds. I noticed that the signals send with sas2ircu were not doing their job (nothing blinks). After some calls to dell technicians, I heard that dell has disabled these signals in their firmware and only allows signalling through their own tool, which only works with windows and some linux flavours. At that time I hear about santools which "might propably" be used for this blinking functionality (and more), but you have to buy it to test it. Bit expensive for a test. After this long intro, my question is: Does anybody know of a another way (script, tools etc) to get this blinking functionality going in this hardware combination (ofcourse NOT using dd) ? Try fmtopo first. There are 3 indicators defined: fail, ident, ok2rm. Many SES vendorsonly implement fail and ident. Here an example: /usr/lib/fm/fmd/fmtopo -P facility.mode=uint32:1 hc://:chassis-mfg=XYZ:chassis-name=XYZ-ABC:chassis-part=unknown:chassis-serial=500093d0016cc76/ses-enclosure=0/bay=1?indicator=fail to de-assert the indicator, facility.mode=uint32:0 to find the FMRI string, use fmtopo to observe what your hardware reports. -- richard As I understood, this same combination is used, as a standard, by a another (big) illumos kernel user and their systems also do not blink disks. I however, would like to be able to find a disk easilly to e.g. replace and not only depend on the internal disk tests which the JBOD seems to do by itself regardless of the OS used. (Dell told me theJBOD detects defect disks by itself by performing some periodic tests. How it does this I do not know). Regards, R _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.lecuyer at gmail.com Fri Aug 14 10:08:02 2015 From: alexandre.lecuyer at gmail.com (Alexandre Lecuyer) Date: Fri, 14 Aug 2015 12:08:02 +0200 Subject: [OmniOS-discuss] Anyone try the new bloody ISO on OpenStack/KVM or Linux/KVM In-Reply-To: <6FA64693-C9BD-4680-9922-48823EA34DA1@omniti.com> References: <6FA64693-C9BD-4680-9922-48823EA34DA1@omniti.com> Message-ID: Hello Dan, I tried to install bloody on openstack. The installer boots correctly from the ISO image If I start a shell, I can see my target install disk, and create a zpool. The network works fine, although I had to configure it manually However, the installer crashes with a python exception right after the "Welcome to OmniOS" screen. /usr/lib/python2.6/vendor-packages/terminalui/inner_window.py, line 318, in activate_object raise IndexError(err_msg) I can provide more details if that's interesting Regards, Alex 2015-07-27 21:16 GMT+02:00 Dan McDonald : > Subject says it all. I'm looking for additional testing datapoints if any > are available above/beyond our own. ESPECIALLY because of the vioif > support now in bloody. > > Thanks, > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Fri Aug 14 14:26:14 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 14 Aug 2015 10:26:14 -0400 Subject: [OmniOS-discuss] Anyone try the new bloody ISO on OpenStack/KVM or Linux/KVM In-Reply-To: References: <6FA64693-C9BD-4680-9922-48823EA34DA1@omniti.com> Message-ID: <185E9FEE-CA81-4765-89D0-0309C72CE046@omniti.com> > On Aug 14, 2015, at 6:08 AM, Alexandre Lecuyer wrote: > > Hello Dan, > > I tried to install bloody on openstack. > The installer boots correctly from the ISO image > If I start a shell, I can see my target install disk, and create a zpool. > The network works fine, although I had to configure it manually > > However, the installer crashes with a python exception right after the "Welcome to OmniOS" screen. > > /usr/lib/python2.6/vendor-packages/terminalui/inner_window.py, line 318, in activate_object > raise IndexError(err_msg) > > I can provide more details if that's interesting That would be interesting. I'm particularly interested in whether or not this python error is related to readline or not. Until very recently, readline was accidentally not being linked with the python packages. That has been very recently fixed, after the most recent bloody install media shipped, even. So knowing that it's not the recently-re-added readline, I'm curious for more. Thanks, Dan From alexandre.lecuyer at gmail.com Fri Aug 14 15:13:10 2015 From: alexandre.lecuyer at gmail.com (Alexandre Lecuyer) Date: Fri, 14 Aug 2015 17:13:10 +0200 Subject: [OmniOS-discuss] Anyone try the new bloody ISO on OpenStack/KVM or Linux/KVM In-Reply-To: <185E9FEE-CA81-4765-89D0-0309C72CE046@omniti.com> References: <6FA64693-C9BD-4680-9922-48823EA34DA1@omniti.com> <185E9FEE-CA81-4765-89D0-0309C72CE046@omniti.com> Message-ID: 2015-08-14 16:26 GMT+02:00 Dan McDonald : > > > On Aug 14, 2015, at 6:08 AM, Alexandre Lecuyer < > alexandre.lecuyer at gmail.com> wrote: > > > > Hello Dan, > > > > I tried to install bloody on openstack. > > The installer boots correctly from the ISO image > > If I start a shell, I can see my target install disk, and create a zpool. > > The network works fine, although I had to configure it manually > > > > However, the installer crashes with a python exception right after the > "Welcome to OmniOS" screen. > > > > /usr/lib/python2.6/vendor-packages/terminalui/inner_window.py, line 318, > in activate_object > > raise IndexError(err_msg) > > > > I can provide more details if that's interesting > > That would be interesting. I'm particularly interested in whether or not > this python error is related to readline or not. Until very recently, > readline was accidentally not being linked with the python packages. That > has been very recently fixed, after the most recent bloody install media > shipped, even. > > So knowing that it's not the recently-re-added readline, I'm curious for > more. > > Thanks, > Dan > > I managed to get it installed, the problem occured in the installer, in disk_selection.py, function _show() in the for loop that starts line 345, we hit this : if disk.disk_prop is None or disk.disk_prop.dev_type is None: continue So we never increment disk_index, which causes the exception I mentionned in my previous email, when we call self.disk_win.activate_object(self.selected_disk_index) Looking at disk.disk_prop (DiskProp object), dev_size is properly set, but dev_vendor and dev_type are set to None. As far as I can tell, these two fields are displayed by the installer but not used beyond that. Setting them to any random field prevents the crash. I was then able to install OmniOS. I will have to run another install to try to figure out why they're not set in the first place. Here's the result after rebooting in the new install : root at unknown:/root# uname -a SunOS unknown 5.11 omnios-3d8d739 i86pc i386 i86pc root at unknown:/root# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c2t0d0 /pci at 0,0/pci1af4,2 at 4/blkdev at 0,0 Specify disk (enter its number): Happy to run further tests if needed, I'll report other issues I can find with the installed system Cheers, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Fri Aug 14 15:16:29 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 14 Aug 2015 11:16:29 -0400 Subject: [OmniOS-discuss] Anyone try the new bloody ISO on OpenStack/KVM or Linux/KVM In-Reply-To: References: <6FA64693-C9BD-4680-9922-48823EA34DA1@omniti.com> <185E9FEE-CA81-4765-89D0-0309C72CE046@omniti.com> Message-ID: <2F3F3E2E-6279-41D6-BED6-961F47B3633D@omniti.com> Are you using vioblk devices? Yes.... you are: > /pci at 0,0/pci1af4,2 at 4/blkdev at 0,0 The installer, alas, doesn't like or recognize block devices at all. You have to do the horrible hack of installing on a virtual scsi disk, then mirroring the blkdev after booting for real, then removing the virtual scsi disk. Others on this list can go over the procedure better than I. It's a bug in the installer, which is unfortunately in serious need of rework or replacement (some have suggested a kayak-based ISO/USB installer...). Sorry I can't be of more useful assistance, Dan From moo at wuffers.net Fri Aug 14 16:08:05 2015 From: moo at wuffers.net (wuffers) Date: Fri, 14 Aug 2015 12:08:05 -0400 Subject: [OmniOS-discuss] ZFS data corruption Message-ID: A few weeks ago (while I was away on vacation), both of my VMware hosts PSOD within a day of each other. The first time DR kicked in and VMs restarted smoothly, but my backup didn't notice that the rebooted host didn't reconnect to the SAN, so when the second host PSODed everything went down. He rebooted the SAN and the hosts and everything seemed okay. I came back and saw this on my pool: pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub in progress since Tue Jul 28 14:10:23 2015 19.6T scanned out of 46.3T at 80.0M/s, 97h9m to go 0 repaired, 42.30% done [snip config] errors: Permanent errors have been detected in the following files: tank/vmware-64k-5tb-7:<0x1> I moved all the VMs off that datastore, and had to repair an Exchange database that was reporting some issues. I then started a scrub (as seen above). My plan was to delete this block device, and recreate a new datastore but the scrub completed and now it shows: errors: No known data errors Should I trust this? I suppose that now that I've moved all the data on it there can be no corruption at ZFS level (since I didn't find any hardware issues in iostat or fmdump logs). Or would the consensus be to delete this, recreate it and present it to VMware again? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mir at miras.org Fri Aug 14 16:21:27 2015 From: mir at miras.org (Michael Rasmussen) Date: Fri, 14 Aug 2015 18:21:27 +0200 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: References: Message-ID: <20150814182127.13a8a2a3@sleipner.datanom.net> On Fri, 14 Aug 2015 12:08:05 -0400 wuffers wrote: > > Should I trust this? I suppose that now that I've moved all the data on it > there can be no corruption at ZFS level (since I didn't find any hardware > issues in iostat or fmdump logs). Or would the consensus be to delete this, > recreate it and present it to VMware again? Are there any resemblance to this thread with subject: [OmniOS-discuss] ZFS/COMSTAR - zpool reports errors -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Worlds are conquered, galaxies destroyed -- but a woman is always a woman. -- Kirk, "The Conscience of the King", stardate 2818.9 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From cal-s at blue-bolt.com Fri Aug 14 16:29:22 2015 From: cal-s at blue-bolt.com (Cal Sawyer) Date: Fri, 14 Aug 2015 17:29:22 +0100 Subject: [OmniOS-discuss] dell 730xd with md3060e jbod In-Reply-To: References: Message-ID: <55CE1762.7040400@blue-bolt.com> If the (big) illumos kernel user begins in "N", the do have disk blink via the UI: Settings -> disks According to a posting on a (big) illumos kernel user's Community forum (thread 1424): sesctl list - to get sesctl list - to get and sesctl blink : - to blink slot sesctl blink -I : - to stop the blink The fun lies in cross-referring enclosure locations to devices Cal Sawyer | Systems Engineer | BlueBolt Ltd 15-16 Margaret Street | London W1W 8RW +44 (0)20 7637 5575 | www.blue-bolt.com Subject: Re: [OmniOS-discuss] dell 730xd with md3060e jbod From: richard.elling at richardelling.com Date: Thu, 13 Aug 2015 12:11:29 -0700 CC: omnios-discuss at lists.omniti.com To: sim.ple at live.nl On Aug 13, 2015, at 8:47 AM, Randy S wrote: Hi, A while ago I had a moment to test a 730XD with a md3060e jbod. I have read the other threads regarding the 730 usability. I had no problems with it using omnios R12. However the use of the JBOD did raise an issue regarding blinking disk leds. I noticed that the signals send with sas2ircu were not doing their job (nothing blinks). After some calls to dell technicians, I heard that dell has disabled these signals in their firmware and only allows signalling through their own tool, which only works with windows and some linux flavours. At that time I hear about santools which "might propably" be used for this blinking functionality (and more), but you have to buy it to test it. Bit expensive for a test. After this long intro, my question is: Does anybody know of a another way (script, tools etc) to get this blinking functionality going in this hardware combination (ofcourse NOT using dd) ? Try fmtopo first. There are 3 indicators defined: fail, ident, ok2rm. Many SES vendorsonly implement fail and ident. Here an example: /usr/lib/fm/fmd/fmtopo -P facility.mode=uint32:1 hc://:chassis-mfg=XYZ:chassis-name=XYZ-ABC:chassis-part=unknown:chassis-serial=500093d0016cc76/ses-enclosure=0/bay=1?indicator=fail to de-assert the indicator, facility.mode=uint32:0 to find the FMRI string, use fmtopo to observe what your hardware reports. -- richard As I understood, this same combination is used, as a standard, by a another (big) illumos kernel user and their systems also do not blink disks. I however, would like to be able to find a disk easilly to e.g. replace and not only depend on the internal disk tests which the JBOD seems to do by itself regardless of the OS used. (Dell told me theJBOD detects defect disks by itself by performing some periodic tests. How it does this I do not know). Regards, R -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.truhlar at archcon.cz Fri Aug 14 23:38:00 2015 From: martin.truhlar at archcon.cz (=?iso-8859-2?Q?Martin_Truhl=E1=F8?=) Date: Sat, 15 Aug 2015 01:38:00 +0200 Subject: [OmniOS-discuss] data gone ...? Message-ID: Hallo everyone, I have a little problem here. I'm using OmniOS v11 r151014 with nappit 0.9f5 and 3 pools (2 data pool and a system) There is a problem with epool that I'm sharing by iSCSI to Windows 2008 SBS server. This pool is few days old, but used disks are about 5 years old. Obviously something happen with one 500GB disk (S:0 H:106 T:12), but data on epool seems to be in a good condition. But. I had a problem with accessing some data on that pool and today most of them (roughly 2/3) have disappeared. But ZFS seems to be ok and available space epool indicates is the same as day before. I welcome any advice. Martin Truhlar pool: dpool state: ONLINE scan: scrub repaired 0 in 14h11m with 0 errors on Thu Aug 13 14:34:21 2015 config: NAME STATE READ WRITE CKSUM CAP Product /napp-it IOstat mess dpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t50014EE00400FA16d0 ONLINE 0 0 0 1 TB WDC WD1002F9YZ-0 S:0 H:0 T:0 c1t50014EE2B40F14DBd0 ONLINE 0 0 0 1 TB WDC WD1003FBYX-0 S:0 H:0 T:0 mirror-1 ONLINE 0 0 0 c1t50014EE05950B131d0 ONLINE 0 0 0 1 TB WDC WD1002F9YZ-0 S:0 H:0 T:0 c1t50014EE2B5E5A6B8d0 ONLINE 0 0 0 1 TB WDC WD1003FBYZ-0 S:0 H:0 T:0 mirror-2 ONLINE 0 0 0 c1t50014EE05958C51Bd0 ONLINE 0 0 0 1 TB WDC WD1002F9YZ-0 S:0 H:0 T:0 c1t50014EE0595617ACd0 ONLINE 0 0 0 1 TB WDC WD1002F9YZ-0 S:0 H:0 T:0 mirror-3 ONLINE 0 0 0 c1t50014EE0AEAE7540d0 ONLINE 0 0 0 1 TB WDC WD1002F9YZ-0 S:0 H:0 T:0 c1t50014EE0AEAE9B65d0 ONLINE 0 0 0 1 TB WDC WD1002F9YZ-0 S:0 H:0 T:0 logs mirror-4 ONLINE 0 0 0 c1t55CD2E404B88ABE1d0 ONLINE 0 0 0 120 GB INTEL SSDSC2BW12 S:0 H:0 T:0 c1t55CD2E404B88E4CFd0 ONLINE 0 0 0 120 GB INTEL SSDSC2BW12 S:0 H:0 T:0 cache c1t55CD2E4000339A59d0 ONLINE 0 0 0 180 GB INTEL SSDSC2BW18 S:0 H:0 T:0 errors: No known data errors pool: epool state: ONLINE scan: scrub repaired 0 in 6h26m with 0 errors on Fri Aug 14 07:17:03 2015 config: NAME STATE READ WRITE CKSUM CAP Product /napp-it IOstat mess epool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c1t50014EE1578AC0B5d0 ONLINE 0 0 0 500.1 GB WDC WD5002ABYS-0 S:0 H:0 T:0 c1t50014EE1578B1091d0 ONLINE 0 0 0 500.1 GB WDC WD5002ABYS-0 S:0 H:106 T:12 c1t50014EE1ACD9A82Bd0 ONLINE 0 0 0 500.1 GB WDC WD5002ABYS-0 S:0 H:1 T:0 c1t50014EE1ACD9AC4Ed0 ONLINE 0 0 0 500.1 GB WDC WD5002ABYS-0 S:0 H:1 T:0 errors: No known data errors From moo at wuffers.net Sat Aug 15 02:23:16 2015 From: moo at wuffers.net (wuffers) Date: Fri, 14 Aug 2015 22:23:16 -0400 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: <20150814182127.13a8a2a3@sleipner.datanom.net> References: <20150814182127.13a8a2a3@sleipner.datanom.net> Message-ID: My scrub actually cleared the error, so I don't think it's similar. So my question remains.. is this block storage compromised or now marked safe to use? On Fri, Aug 14, 2015 at 12:21 PM, Michael Rasmussen wrote: > On Fri, 14 Aug 2015 12:08:05 -0400 > wuffers wrote: > > > > > Should I trust this? I suppose that now that I've moved all the data on > it > > there can be no corruption at ZFS level (since I didn't find any hardware > > issues in iostat or fmdump logs). Or would the consensus be to delete > this, > > recreate it and present it to VMware again? > Are there any resemblance to this thread with subject: [OmniOS-discuss] > ZFS/COMSTAR - zpool reports errors > > -- > Hilsen/Regards > Michael Rasmussen > > Get my public GnuPG keys: > michael rasmussen cc > http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E > mir datanom net > http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C > mir miras org > http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 > -------------------------------------------------------------- > /usr/games/fortune -es says: > Worlds are conquered, galaxies destroyed -- but a woman is always a > woman. -- Kirk, "The Conscience of the King", stardate 2818.9 > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Sat Aug 15 07:04:27 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sat, 15 Aug 2015 03:04:27 -0400 Subject: [OmniOS-discuss] OmniOS bloody update Message-ID: A little late this week, but I've pushed out updated packages for OmniOS bloody. Install media is not being updated this time, due to me being neck-deep in i40e bringup. omnios-bloody is at master revision edda13f. New with this is: - net-snmp to 5.7.3 - Python update to correct the lack of readline support. illumos-omnios is at master revision fe21bd5, so uname -v will say omnios-fe21bd5. New here is: - Official interface stability for libavl. AVL trees are useful structures, now use 'em with highly stable interfaces. - cpqary3 supports newer HP Gen9 HBAs. Sorry this isn't available immediately for install media. This may be backported to '014. - sunmdi fix that improves stability when devices (esp. disks) are retired. This is already backported to '014. Happy updating! Dan From danmcd at omniti.com Sat Aug 15 15:19:34 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sat, 15 Aug 2015 11:19:34 -0400 Subject: [OmniOS-discuss] Fwd: data gone ...? References: Message-ID: Pardon the headers, folks. While I sort out the list, this message can be read. Dan Sent from my iPhone (typos, autocorrect, and all) Begin forwarded message: > From: Ben Kitching > Date: August 15, 2015 at 7:38:27 AM EDT > To: omnios-discuss-owner at lists.omniti.com > Subject: Fwd: [OmniOS-discuss] data gone ...? > > HI There, > > I?m a bit of a lurker and it?s been a while since I posted. > > I do have something useful to say on this thread though and it seems the list is rejecting my replies. > > I?ve checked that I?m sending from my registered address (narratorben at icloud.com) and I?m pretty sure it?s the right address. > > Could you look into it please. > > Thanks > > Ben > > Begin forwarded message: > > From: omnios-discuss-owner at lists.omniti.com > Date: 15 August 2015 at 11:36:54 BST > To: narratorben at icloud.com > Subject: Re: [OmniOS-discuss] data gone ...? > > This list only allows members to post, and your message has been > automatically rejected. If you think that your messages are being > rejected in error, contact the mailing list owner at > omnios-discuss-owner at lists.omniti.com > > > > You say that you are exporting a volume over iSCSI to your windows = > server. I assume that means you have an NTFS (or other windows = > filesystem) sitting on top of the iSCSI volume? It might be worth using = > windows tools to check the integrity of that filesystem as it may be = > that rather than ZFS that is causing problems. > > Are you using the built in Windows iSCSI initiator? I=E2=80=99ve had = > problems with this in past on versions of windows older than windows 8 / = > server 2012 due to it not supporting iSCSI unmap commands and therefore = > being unable to tell ZFS to free blocks when files are deleted. You can = > see if you are having this problem by comparing the free space reported = > by both windows and ZFS. If there is a disparity then you are likely = > experiencing this problem and could ultimately end up in a situation = > where ZFS will stop allowing writes because it thinks the volume is full = > no matter how many files you delete from the windows end. I saw this = > manifest as errors with the NTFS filesystem on the windows end as from = > Windows point of view it has free space and can=E2=80=99t understand why = > it isn=E2=80=99t allowed to write, it sees it as an error. > > On 15 Aug 2015, at 00:38, Martin Truhl=C3=A1=C5=99 = > wrote: > > Hallo everyone, > > I have a little problem here. I'm using OmniOS v11 r151014 with nappit = > 0.9f5 and 3 pools (2 data pool and a system) > There is a problem with epool that I'm sharing by iSCSI to Windows 2008 = > SBS server. This pool is few days old, but used disks are about 5 years = > old. Obviously something happen with one 500GB disk (S:0 H:106 T:12), = > but data on epool seems to be in a good condition. But. I had a problem = > with accessing some data on that pool and today most of them (roughly = > 2/3) have disappeared. But ZFS seems to be ok and available space epool = > indicates is the same as day before. > > I welcome any advice. > Martin Truhlar > > > > > > > pool: dpool > state: ONLINE > scan: scrub repaired 0 in 14h11m with 0 errors on Thu Aug 13 14:34:21 = > 2015 > config: > > NAME STATE READ WRITE CKSUM CAP = > Product /napp-it IOstat mess > dpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c1t50014EE00400FA16d0 ONLINE 0 0 0 1 TB = > WDC WD1002F9YZ-0 S:0 H:0 T:0 > c1t50014EE2B40F14DBd0 ONLINE 0 0 0 1 TB = > WDC WD1003FBYX-0 S:0 H:0 T:0 > mirror-1 ONLINE 0 0 0 > c1t50014EE05950B131d0 ONLINE 0 0 0 1 TB = > WDC WD1002F9YZ-0 S:0 H:0 T:0 > c1t50014EE2B5E5A6B8d0 ONLINE 0 0 0 1 TB = > WDC WD1003FBYZ-0 S:0 H:0 T:0 > mirror-2 ONLINE 0 0 0 > c1t50014EE05958C51Bd0 ONLINE 0 0 0 1 TB = > WDC WD1002F9YZ-0 S:0 H:0 T:0 > c1t50014EE0595617ACd0 ONLINE 0 0 0 1 TB = > WDC WD1002F9YZ-0 S:0 H:0 T:0 > mirror-3 ONLINE 0 0 0 > c1t50014EE0AEAE7540d0 ONLINE 0 0 0 1 TB = > WDC WD1002F9YZ-0 S:0 H:0 T:0 > c1t50014EE0AEAE9B65d0 ONLINE 0 0 0 1 TB = > WDC WD1002F9YZ-0 S:0 H:0 T:0 > logs > mirror-4 ONLINE 0 0 0 > c1t55CD2E404B88ABE1d0 ONLINE 0 0 0 120 = > GB INTEL SSDSC2BW12 S:0 H:0 T:0 > c1t55CD2E404B88E4CFd0 ONLINE 0 0 0 120 = > GB INTEL SSDSC2BW12 S:0 H:0 T:0 > cache > c1t55CD2E4000339A59d0 ONLINE 0 0 0 180 = > GB INTEL SSDSC2BW18 S:0 H:0 T:0 > > errors: No known data errors > > pool: epool > state: ONLINE > scan: scrub repaired 0 in 6h26m with 0 errors on Fri Aug 14 07:17:03 = > 2015 > config: > > NAME STATE READ WRITE CKSUM CAP = > Product /napp-it IOstat mess > epool ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > c1t50014EE1578AC0B5d0 ONLINE 0 0 0 500.1 = > GB WDC WD5002ABYS-0 S:0 H:0 T:0 > c1t50014EE1578B1091d0 ONLINE 0 0 0 500.1 = > GB WDC WD5002ABYS-0 S:0 H:106 T:12 > c1t50014EE1ACD9A82Bd0 ONLINE 0 0 0 500.1 = > GB WDC WD5002ABYS-0 S:0 H:1 T:0 > c1t50014EE1ACD9AC4Ed0 ONLINE 0 0 0 500.1 = > GB WDC WD5002ABYS-0 S:0 H:1 T:0 > > errors: No known data errors > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at marzocchi.net Sat Aug 15 18:57:18 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Sat, 15 Aug 2015 20:57:18 +0200 Subject: [OmniOS-discuss] Swap and dump Message-ID: <55CF8B8E.9010503@marzocchi.net> Hello, over two years ago I installed my OmniOS server and after the install I found a 16 GB dump dataset and a 4 GB swap dataset. I have 32 GB RAM. My root disk is 30 GB, you can feel my pain :) I don't remember ever changing them, but now I was investigating the matter after for the second time in two years the automatic updates (that I set) filled the root disk, and I found: Dump content: kernel pages Dump device: none (dumps disabled) Savecore directory: /var/crash/OmniOS-Xeon Savecore enabled: yes Save compressed: on Dumps are disabled. Do I need that dataset anymore? I don't understand (from the man page) how savecores can be working, when they are taken at the reboot after a crash and when they rely on a dump, that is not being taken. I reduced the dump dataset to 8 GB that should be enough for the kernel pages, but if I really don't need one to get the savecores (not that I ever had crashes in the last 18 months anyway) I will remove it altogether. Could you clarify me how this works and at what time the dumps got disabled automatically? Thanks Olaf From danmcd at omniti.com Sat Aug 15 19:05:57 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sat, 15 Aug 2015 15:05:57 -0400 Subject: [OmniOS-discuss] Swap and dump In-Reply-To: <55CF8B8E.9010503@marzocchi.net> References: <55CF8B8E.9010503@marzocchi.net> Message-ID: <0D2C991A-7A9D-4F8D-A5AA-E5BEC8E4A986@omniti.com> > On Aug 15, 2015, at 2:57 PM, Olaf Marzocchi wrote: > > Hello, > over two years ago I installed my OmniOS server and after the install I found a 16 GB dump dataset and a 4 GB swap dataset. I have 32 GB RAM. My root disk is 30 GB, you can feel my pain :) > > I don't remember ever changing them, but now I was investigating the matter after for the second time in two years the automatic updates (that I set) filled the root disk, and I found: > > Dump content: kernel pages > Dump device: none (dumps disabled) > Savecore directory: /var/crash/OmniOS-Xeon > Savecore enabled: yes > Save compressed: on > > Dumps are disabled. Do I need that dataset anymore? If you don't want your kernel panics to dump core, you may certainly destroy your 16GB dump dataset. If it disappeared, perhaps you renamed your rpool, or for some obscure reason the system could find the dump device originally configured. If the system cannot find your dump device, even once, your dumps are disabled. > I don't understand (from the man page) how savecores can be working, when they are taken at the reboot after a crash and when they rely on a dump, that is not being taken. Some people use swap for their dump device, but I don't see that enabled in your output either. > I reduced the dump dataset to 8 GB that should be enough for the kernel pages, but if I really don't need one to get the savecores (not that I ever had crashes in the last 18 months anyway) I will remove it altogether. > > Could you clarify me how this works and at what time the dumps got disabled automatically? I'm guessing you modified something, even temporarily, such that the system couldn't find the configured dump device and it therefore disappeared. You may remove the dump dataset if you wish. You understand the risk (kernel panic means you won't get a kernel dump). Dan From lists at marzocchi.net Sat Aug 15 19:23:34 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Sat, 15 Aug 2015 21:23:34 +0200 Subject: [OmniOS-discuss] Swap and dump In-Reply-To: <0D2C991A-7A9D-4F8D-A5AA-E5BEC8E4A986@omniti.com> References: <55CF8B8E.9010503@marzocchi.net> <0D2C991A-7A9D-4F8D-A5AA-E5BEC8E4A986@omniti.com> Message-ID: <55CF91B6.5040201@marzocchi.net> Thanks, clear enough. Savecores work as long as there is a place where they can find the dump. Understandable. I will set the dump device to the original dataset. Olaf On 15/08/2015 21:05, Dan McDonald wrote: > >> On Aug 15, 2015, at 2:57 PM, Olaf Marzocchi wrote: >> >> Hello, >> over two years ago I installed my OmniOS server and after the install I found a 16 GB dump dataset and a 4 GB swap dataset. I have 32 GB RAM. My root disk is 30 GB, you can feel my pain :) >> >> I don't remember ever changing them, but now I was investigating the matter after for the second time in two years the automatic updates (that I set) filled the root disk, and I found: >> >> Dump content: kernel pages >> Dump device: none (dumps disabled) >> Savecore directory: /var/crash/OmniOS-Xeon >> Savecore enabled: yes >> Save compressed: on >> >> Dumps are disabled. Do I need that dataset anymore? > > If you don't want your kernel panics to dump core, you may certainly destroy your 16GB dump dataset. > > If it disappeared, perhaps you renamed your rpool, or for some obscure reason the system could find the dump device originally configured. If the system cannot find your dump device, even once, your dumps are disabled. > >> I don't understand (from the man page) how savecores can be working, when they are taken at the reboot after a crash and when they rely on a dump, that is not being taken. > > Some people use swap for their dump device, but I don't see that enabled in your output either. > >> I reduced the dump dataset to 8 GB that should be enough for the kernel pages, but if I really don't need one to get the savecores (not that I ever had crashes in the last 18 months anyway) I will remove it altogether. >> >> Could you clarify me how this works and at what time the dumps got disabled automatically? > > I'm guessing you modified something, even temporarily, such that the system couldn't find the configured dump device and it therefore disappeared. > > You may remove the dump dataset if you wish. You understand the risk (kernel panic means you won't get a kernel dump). > > Dan > From stephan.budach at JVM.DE Sat Aug 15 19:58:17 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Sat, 15 Aug 2015 21:58:17 +0200 Subject: [OmniOS-discuss] ZFS/COMSTAR - zpool reports errors In-Reply-To: <55CB7EA8.50806@jvm.de> References: <55CB5269.5070302@jvm.de> <55CB5D49.7050705@jvm.de> <20150812171938.1f414444@sleipner.datanom.net> <55CB7EA8.50806@jvm.de> Message-ID: <55CF99D9.6050905@jvm.de> Today I have experienced the same issue on another OmniOS box, which is also part of that RAC storage. I had a similar setup running, where I also had these two RAC nodes connected to one OmniOS R006 box, which didn't exhibit this error. The only differences being these: a) OmniOS R006 insread of R014 b) the RAC volume was one of external redundancy, where as the current RAC volume is a mirrored one, hence both OmniOS R014 boxes are involved In both cases had OmniOS been accessed by two RAC nodes simultaneously, that is, both RAC nodes where connecting to the same targets, but I never experienced such errors. From stephan.budach at JVM.DE Sun Aug 16 17:09:42 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Sun, 16 Aug 2015 19:09:42 +0200 Subject: [OmniOS-discuss] ZFS/COMSTAR - zpool reports errors In-Reply-To: <55CF99D9.6050905@jvm.de> References: <55CB5269.5070302@jvm.de> <55CB5D49.7050705@jvm.de> <20150812171938.1f414444@sleipner.datanom.net> <55CB7EA8.50806@jvm.de> <55CF99D9.6050905@jvm.de> Message-ID: <55D0C3D6.5010604@jvm.de> So, to remedy these errors, I had to do the following: zpool clear zpool scrub Afterwards the errors on both of my volumes were gone. Performing just a zpool scrub didn't help, but it didn't find any error either. Only after a zpool clear/zpool scrub did the error finally vanish. Seems at least strange to me. From stephan.budach at JVM.DE Sun Aug 16 17:11:47 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Sun, 16 Aug 2015 19:11:47 +0200 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: References: <20150814182127.13a8a2a3@sleipner.datanom.net> Message-ID: <55D0C453.60703@jvm.de> Am 15.08.15 um 04:23 schrieb wuffers: > My scrub actually cleared the error, so I don't think it's similar. > > So my question remains.. is this block storage compromised or now > marked safe to use? > > On Fri, Aug 14, 2015 at 12:21 PM, Michael Rasmussen > wrote: > > On Fri, 14 Aug 2015 12:08:05 -0400 > wuffers > wrote: > > > > > Should I trust this? I suppose that now that I've moved all the > data on it > > there can be no corruption at ZFS level (since I didn't find any > hardware > > issues in iostat or fmdump logs). Or would the consensus be to > delete this, > > recreate it and present it to VMware again? > Are there any resemblance to this thread with subject: > [OmniOS-discuss] > ZFS/COMSTAR - zpool reports errors > > -- > Hilsen/Regards > Michael Rasmussen > So, did your first scrub reveal any error at all? Mine didn't and I suspect, that you issued a zpool clear prior to scrubbing, which made the errors go away on both of my two zpools? I'd say, that you had excatly the same error as me. Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jg at osn.de Mon Aug 17 12:04:05 2015 From: jg at osn.de (Joerg Goltermann) Date: Mon, 17 Aug 2015 14:04:05 +0200 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: <55D0C453.60703@jvm.de> References: <20150814182127.13a8a2a3@sleipner.datanom.net> <55D0C453.60703@jvm.de> Message-ID: <55D1CDB5.1040309@osn.de> Hi, we have the same problems. First time it occurs about 6 month ago, I wrote several mails on the zfs list but I was not able to solve the problem. The last mail was http://permalink.gmane.org/gmane.os.illumos.zfs/4883 I tried to debug the issue, but my zfs knowledge is not deep enough. Hopefully we can solve this nasty thing now .... In my case I am quite sure this is not a real corruption, it's a retry with "strange" flags which caused my "errors". Maybe this IO is very slow, which can cause problems on the hosts, but i have never seen any real problems.... Kind regards, Joerg Goltermann On 16.08.2015 19:11, Stephan Budach wrote: > Am 15.08.15 um 04:23 schrieb wuffers: >> My scrub actually cleared the error, so I don't think it's similar. >> >> So my question remains.. is this block storage compromised or now >> marked safe to use? >> >> On Fri, Aug 14, 2015 at 12:21 PM, Michael Rasmussen > > wrote: >> >> On Fri, 14 Aug 2015 12:08:05 -0400 >> wuffers > wrote: >> >> > >> > Should I trust this? I suppose that now that I've moved all the >> data on it >> > there can be no corruption at ZFS level (since I didn't find any >> hardware >> > issues in iostat or fmdump logs). Or would the consensus be to >> delete this, >> > recreate it and present it to VMware again? >> Are there any resemblance to this thread with subject: >> [OmniOS-discuss] >> ZFS/COMSTAR - zpool reports errors >> >> -- >> Hilsen/Regards >> Michael Rasmussen >> > So, did your first scrub reveal any error at all? Mine didn't and I > suspect, that you issued a zpool clear prior to scrubbing, which made > the errors go away on both of my two zpools? > > I'd say, that you had excatly the same error as me. > > Cheers, > Stephan > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -- OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann From moo at wuffers.net Mon Aug 17 13:48:18 2015 From: moo at wuffers.net (wuffers) Date: Mon, 17 Aug 2015 09:48:18 -0400 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: <55D1CDB5.1040309@osn.de> References: <20150814182127.13a8a2a3@sleipner.datanom.net> <55D0C453.60703@jvm.de> <55D1CDB5.1040309@osn.de> Message-ID: On Mon, Aug 17, 2015 at 8:04 AM, Joerg Goltermann wrote: > Hi, > > we have the same problems. First time it occurs about 6 month > ago, I wrote several mails on the zfs list but I was not able > to solve the problem. > > The last mail was http://permalink.gmane.org/gmane.os.illumos.zfs/4883 > I tried to debug the issue, but my zfs knowledge is not deep enough. > > Hopefully we can solve this nasty thing now .... > > > In my case I am quite sure this is not a real corruption, it's a retry > with "strange" flags which caused my "errors". Maybe this IO is very > slow, which can cause problems on the hosts, but i have never seen > any real problems.... > > One of the VMs on that datastore was Exchange, and it definitely had issues. I had to evacuate and move several mailboxes to another database, and repair some of them (users were reporting strange issues like not being able to move emails to existing folders). I don't think it's a coincidence that a VM that was on that block device suddenly had weird issues (and the Exchange VM was consuming the largest amount of space in that datastore). > > On 16.08.2015 19:11, Stephan Budach wrote: > So, did your first scrub reveal any error at all? Mine didn't and I >> suspect, that you issued a zpool clear prior to scrubbing, which made >> the errors go away on both of my two zpools? >> >> I'd say, that you had excatly the same error as me. >> > I am 100% certain I did not issue a zpool clear. I ran the scrub only once (as it takes ~8 days for it to go through in my case). pool: tank state: ONLINE scan: scrub repaired 0 in 184h28m with 0 errors on Wed Aug 5 06:38:32 2015 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmurphy at omniti.com Mon Aug 17 21:39:29 2015 From: mmurphy at omniti.com (Marissa Murphy) Date: Mon, 17 Aug 2015 17:39:29 -0400 Subject: [OmniOS-discuss] omnios-build omniti-ms branch to be replaced Message-ID: Hi everyone, While we don?t officially support the omniti-ms repo, we are aware that a number of people use it for reference purposes, so we wanted to give people a heads up that we will be getting rid of the omniti-ms branch of omniti-labs/omnios-build ( https://github.com/omniti-labs/omnios-build/tree/omniti-ms) at 6pm tomorrow, Tuesday 8/18. The branch is moving to its own repo located at https://github.com/omniti-labs/omniti-ms. Please fix any repositories that you have checked out to point to the new repository. Thanks! Marissa Murphy OmniTI SRE -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimklimov at cos.ru Tue Aug 18 07:01:44 2015 From: jimklimov at cos.ru (Jim Klimov) Date: Tue, 18 Aug 2015 09:01:44 +0200 Subject: [OmniOS-discuss] omnios-build omniti-ms branch to be replaced In-Reply-To: References: Message-ID: 17 ??????? 2015??. 23:39:29 CEST, Marissa Murphy ?????: >Hi everyone, > >While we don?t officially support the omniti-ms repo, we are aware that >a >number of people use it for reference purposes, so we wanted to give >people >a heads up that we will be getting rid of the omniti-ms branch of >omniti-labs/omnios-build ( >https://github.com/omniti-labs/omnios-build/tree/omniti-ms) at 6pm >tomorrow, Tuesday 8/18. The branch is moving to its own repo located at >https://github.com/omniti-labs/omniti-ms. Please fix any repositories >that >you have checked out to point to the new repository. > >Thanks! >Marissa Murphy >OmniTI SRE > > >------------------------------------------------------------------------ > >_______________________________________________ >OmniOS-discuss mailing list >OmniOS-discuss at lists.omniti.com >http://lists.omniti.com/mailman/listinfo/omnios-discuss Note that git supports nested repos - perhaps you can leave a 'symlink' in the nameplace? We did it in a project at work, worked well to migrate a split-off subproject (requires recursive pulls though). HTH, Jim -- Typos courtesy of K-9 Mail on my Samsung Android From danmcd at omniti.com Tue Aug 18 19:48:20 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 18 Aug 2015 15:48:20 -0400 Subject: [OmniOS-discuss] r151014 update - needs machine boot Message-ID: <0B8CC7C0-1F8F-4B5E-A277-B864B287EE4A@omniti.com> Release media is not yet updated, but will be, for reasons shown below. I will announce here when release media is updated. The new uname -v for r151014 will be omnios-d08e0e5. Changes for this update are: - Support for more HP Gen9 HBAs with cpqary3 (release media will be updated). - Illumos bugs 6093 & 6096 (for SMB servers, fixes a free-NULL problem). - Illumos bug 4051 - helps disk-pull reliability, upstreamed to illumos-gate from Nexenta. - Python 2.6 now has readline back in its libraries (broke in '012, now fixed). Because of the first three, this requires a reboot after upgrade. Also, make sure your number of BEs isn't over 40 (grub limitation). Most of you won't have this problem, but with all the updates, BEs do accumulate, so make sure you "beadm destroy" the older ones you know you won't need. Thanks, Dan From mmurphy at omniti.com Tue Aug 18 20:45:01 2015 From: mmurphy at omniti.com (Marissa Murphy) Date: Tue, 18 Aug 2015 16:45:01 -0400 Subject: [OmniOS-discuss] OmniOS r151014 EC2 AMI Message-ID: Hi everyone, I would like to announce that an OmniOS r151014 AMI has now been made public in AWS: AMI ID: ami-df2293b4 AMI Name: OmniOS r151014 stable/LTS Thanks! Marissa Murphy -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Tue Aug 18 21:32:44 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 18 Aug 2015 17:32:44 -0400 Subject: [OmniOS-discuss] r151014 update - needs machine boot In-Reply-To: References: <0B8CC7C0-1F8F-4B5E-A277-B864B287EE4A@omniti.com> Message-ID: <76236D38-D2E2-4FF3-B5F9-5CDE021421BA@omniti.com> > On Aug 18, 2015, at 5:20 PM, Andy Fiddaman wrote: > > Is this r151014t then, or is that numbering/naming scheme no longer in > use? I've been using date and/or kernel bits. Sometimes an update is so small (like an openssl patch) it doesn't make sense to call it a whole release. Dan From jg at osn.de Wed Aug 19 12:59:21 2015 From: jg at osn.de (Joerg Goltermann) Date: Wed, 19 Aug 2015 14:59:21 +0200 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: References: <20150814182127.13a8a2a3@sleipner.datanom.net> <55D0C453.60703@jvm.de> <55D1CDB5.1040309@osn.de> Message-ID: <55D47DA9.5030907@osn.de> Hi, the PSOD you got can cause the problems on your exchange database. Can you check the ESXi logs for the root cause of the PSOD? I never got a PSOD on such a "corruption". I still think this is a "cosmetic" bug, but this should be verified by one of the ZFS developers ... - Joerg On 17.08.2015 15:48, wuffers wrote: > > On Mon, Aug 17, 2015 at 8:04 AM, Joerg Goltermann > wrote: > > Hi, > > we have the same problems. First time it occurs about 6 month > ago, I wrote several mails on the zfs list but I was not able > to solve the problem. > > The last mail was http://permalink.gmane.org/gmane.os.illumos.zfs/4883 > I tried to debug the issue, but my zfs knowledge is not deep enough. > > Hopefully we can solve this nasty thing now .... > > > In my case I am quite sure this is not a real corruption, it's a retry > with "strange" flags which caused my "errors". Maybe this IO is very > slow, which can cause problems on the hosts, but i have never seen > any real problems.... > > > One of the VMs on that datastore was Exchange, and it definitely had > issues. I had to evacuate and move several mailboxes to another > database, and repair some of them (users were reporting strange issues > like not being able to move emails to existing folders). > > I don't think it's a coincidence that a VM that was on that block device > suddenly had weird issues (and the Exchange VM was consuming the largest > amount of space in that datastore). > > > On 16.08.2015 19:11, Stephan Budach wrote: > > So, did your first scrub reveal any error at all? Mine didn't and I > suspect, that you issued a zpool clear prior to scrubbing, which > made > the errors go away on both of my two zpools? > > I'd say, that you had excatly the same error as me. > > > I am 100% certain I did not issue a zpool clear. I ran the scrub only > once (as it takes ~8 days for it to go through in my case). > pool: tank > state: ONLINE > scan: scrub repaired 0 in 184h28m with 0 errors on Wed Aug 5 > 06:38:32 2015 > -- OSN Online Service Nuernberg GmbH, Bucher Str. 78, 90408 Nuernberg Tel: +49 911 39905-0 - Fax: +49 911 39905-55 - http://www.osn.de HRB 15022 Nuernberg, USt-Id: DE189301263, GF: Joerg Goltermann From stephan.budach at JVM.DE Wed Aug 19 16:49:05 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 19 Aug 2015 18:49:05 +0200 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: <55D47DA9.5030907@osn.de> References: <20150814182127.13a8a2a3@sleipner.datanom.net> <55D0C453.60703@jvm.de> <55D1CDB5.1040309@osn.de> <55D47DA9.5030907@osn.de> Message-ID: <55D4B381.30504@jvm.de> Hi Joerg, Am 19.08.15 um 14:59 schrieb Joerg Goltermann: > Hi, > > the PSOD you got can cause the problems on your exchange database. > > Can you check the ESXi logs for the root cause of the PSOD? > > I never got a PSOD on such a "corruption". I still think this is > a "cosmetic" bug, but this should be verified by one of the ZFS > developers ... > > - Joerg > > On 17.08.2015 15:48, wuffers wrote: >> >> On Mon, Aug 17, 2015 at 8:04 AM, Joerg Goltermann > > wrote: >> >> Hi, >> >> we have the same problems. First time it occurs about 6 month >> ago, I wrote several mails on the zfs list but I was not able >> to solve the problem. >> >> The last mail was >> http://permalink.gmane.org/gmane.os.illumos.zfs/4883 >> I tried to debug the issue, but my zfs knowledge is not deep enough. >> >> Hopefully we can solve this nasty thing now .... >> >> >> In my case I am quite sure this is not a real corruption, it's a >> retry >> with "strange" flags which caused my "errors". Maybe this IO is very >> slow, which can cause problems on the hosts, but i have never seen >> any real problems.... >> >> >> One of the VMs on that datastore was Exchange, and it definitely had >> issues. I had to evacuate and move several mailboxes to another >> database, and repair some of them (users were reporting strange issues >> like not being able to move emails to existing folders). >> >> I don't think it's a coincidence that a VM that was on that block device >> suddenly had weird issues (and the Exchange VM was consuming the largest >> amount of space in that datastore). >> >> >> On 16.08.2015 19:11, Stephan Budach wrote: >> >> So, did your first scrub reveal any error at all? Mine didn't >> and I >> suspect, that you issued a zpool clear prior to scrubbing, which >> made >> the errors go away on both of my two zpools? >> >> I'd say, that you had excatly the same error as me. >> >> >> I am 100% certain I did not issue a zpool clear. I ran the scrub only >> once (as it takes ~8 days for it to go through in my case). >> pool: tank >> state: ONLINE >> scan: scrub repaired 0 in 184h28m with 0 errors on Wed Aug 5 >> 06:38:32 2015 >> > I don't think, that this is entirely true, though. Just today, I got another if these "bogus" ZFS errors. Just as the other two, this one got removed by a zpool clean/zpool scrub. However, this error occurred on the zvol which hosts one of the RAC CSS votings, so nothing was upstreamed to my consumer DGs/VGs. The first two occurences were noticed by ASM, but as I am running mirrored disk groups, this error actually didn't punch through to my consumers. I guess, if I hadn't those mirrored DGs, the read error (since this had been reported on the RAC nodes) might actually very well had affected my VMs running of that NFS store. Looking at what we've got here, I don't think, that we're actually dealing with real disk errors, as those should have been reported as read errors, or mayby as checksum errors otherwise. This must be something else, as it only seems to affect zvols and iSCSI targets. Maybe I willl create a LUN using a file and hook that up to the same RAC cluster, if I don't get any of these errors with that, it has to be something in accessing the zvol. Maybe it's all COMSTAR's fault entirely? Cheers, Stephan From alka at hfg-gmuend.de Wed Aug 19 16:49:29 2015 From: alka at hfg-gmuend.de (Guenther Alka) Date: Wed, 19 Aug 2015 18:49:29 +0200 Subject: [OmniOS-discuss] Problem with Midnight Commander on OmniOS 151014 In-Reply-To: References: Message-ID: <55D4B399.3020503@hfg-gmuend.de> I install Midnight Commender on OmniOS via pkg set-publisher -g http://pkg.cs.umd.edu cs.umd.edu pkg install file/mc On OmniOS 151012 this worked fine, on OmniOS 151014: - I must call it with full path /opt/csd/bin/i386/mc (earlier a mc was enough) - when starting via console on ESXi and entering a folder, it freezes, a clear command cancels this I get a console message: select (FD_SETSIZE, &read_set...): Bad file number (9) When I start mc via Putty, all is ok Any hints? From mir at miras.org Wed Aug 19 17:37:23 2015 From: mir at miras.org (Michael Rasmussen) Date: Wed, 19 Aug 2015 19:37:23 +0200 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: <55D4B381.30504@jvm.de> References: <20150814182127.13a8a2a3@sleipner.datanom.net> <55D0C453.60703@jvm.de> <55D1CDB5.1040309@osn.de> <55D47DA9.5030907@osn.de> <55D4B381.30504@jvm.de> Message-ID: <20150819193723.64a93e7d@sleipner.datanom.net> On Wed, 19 Aug 2015 18:49:05 +0200 Stephan Budach wrote: > > Looking at what we've got here, I don't think, that we're actually dealing with real disk errors, as those should have been reported as read errors, or mayby as checksum errors otherwise. This must be something else, as it only seems to affect zvols and iSCSI targets. Maybe I willl create a LUN using a file and hook that up to the same RAC cluster, if I don't get any of these errors with that, it has to be something in accessing the zvol. Maybe it's all COMSTAR's fault entirely? > I have a storage box entirely distributing zvols as LUN's through Comstar for VM's. I have never experienced such problems so maybe it has to do with the usage and some of the layers in between. Are there any similarities between the used hypervisor and/or OS in VM's? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: He hated being thought of as one of those people that wore stupid ornamental armour. It was gilt by association. -- Terry Pratchett, "Night Watch" -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From mir at miras.org Wed Aug 19 17:44:24 2015 From: mir at miras.org (Michael Rasmussen) Date: Wed, 19 Aug 2015 19:44:24 +0200 Subject: [OmniOS-discuss] Problem with Midnight Commander on OmniOS 151014 In-Reply-To: <55D4B399.3020503@hfg-gmuend.de> References: <55D4B399.3020503@hfg-gmuend.de> Message-ID: <20150819194424.578200db@sleipner.datanom.net> On Wed, 19 Aug 2015 18:49:29 +0200 Guenther Alka wrote: > I install Midnight Commender on OmniOS via > > pkg set-publisher -g http://pkg.cs.umd.edu cs.umd.edu > pkg install file/mc > > On OmniOS 151012 this worked fine, on OmniOS 151014: > > - I must call it with full path /opt/csd/bin/i386/mc (earlier a mc was enough) > - when starting via console on ESXi and entering a folder, it freezes, a clear command cancels this > I get a console message: select (FD_SETSIZE, &read_set...): Bad file number (9) > This one is working here: mc --version GNU Midnight Commander 4.8.13 Built with GLib 2.34.1 Using the S-Lang library with terminfo database With builtin Editor With subshell support as default With support for background operations With mouse support on xterm With internationalization support With multiple codepages support Virtual File Systems: cpiofs, tarfs, sfs, extfs, ftpfs, fish Data types: char: 8; int: 32; long: 64; void *: 64; size_t: 64; off_t: 64; $ which mc /usr/bin/mc If I remember correctly it was installed from pkg.cs.umd.edu under 151012 but I disabled the repo as part of upgrading to 151014 and have forgotten to activate it again;-) Empty response from: $ sudo pkgchk -l -p /usr/bin/mc -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: HOW YOU CAN TELL THAT IT'S GOING TO BE A ROTTEN DAY: #15 Your pet rock snaps at you. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From stephan.budach at JVM.DE Wed Aug 19 19:21:41 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 19 Aug 2015 21:21:41 +0200 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: <20150819193723.64a93e7d@sleipner.datanom.net> References: <20150814182127.13a8a2a3@sleipner.datanom.net> <55D0C453.60703@jvm.de> <55D1CDB5.1040309@osn.de> <55D47DA9.5030907@osn.de> <55D4B381.30504@jvm.de> <20150819193723.64a93e7d@sleipner.datanom.net> Message-ID: <55D4D745.2080403@jvm.de> Am 19.08.15 um 19:37 schrieb Michael Rasmussen: > On Wed, 19 Aug 2015 18:49:05 +0200 > Stephan Budach wrote: > >> Looking at what we've got here, I don't think, that we're actually dealing with real disk errors, as those should have been reported as read errors, or mayby as checksum errors otherwise. This must be something else, as it only seems to affect zvols and iSCSI targets. Maybe I willl create a LUN using a file and hook that up to the same RAC cluster, if I don't get any of these errors with that, it has to be something in accessing the zvol. Maybe it's all COMSTAR's fault entirely? >> > I have a storage box entirely distributing zvols as LUN's through > Comstar for VM's. I have never experienced such problems so maybe it > has to do with the usage and some of the layers in between. Are there > any similarities between the used hypervisor and/or OS in VM's? > I do have some other OmniOS boxes, that also provide LUNs via zvols through COMSTAR as iSCSI targets and those did not show these errors. Both of them are 012s? and both of them provide iSCSI targets for the very same RAC cluster. The only difference being that the LUNs from those 012s are not mirrored on the RAC. Cheers, Stephan From mtalbott at lji.org Thu Aug 20 22:10:51 2015 From: mtalbott at lji.org (Michael Talbott) Date: Thu, 20 Aug 2015 15:10:51 -0700 Subject: [OmniOS-discuss] Multiple root mappings of a NFS share Message-ID: <725FBCB4-7F46-4992-B7E7-1A9DDC05805E@lji.org> Is it possible to share one directory with different root mappings to different clients? I know on my previous linux box it was easy to do but haven't found the equivalent on OmniOS. If there were a way to have the system append to a share via the share command, I could get all the functionality I want by using separate commands like so: #read/write. prevent root access from clients share -F nfs -p -o root_mapping=nobody,rw=@10.0.100.0/24,root=@10.0.100.0/24 /path2share #read-only and map all to root share -F nfs -p -o root_mapping=root,ro=@10.0.3.180:@10.0.3.45,root=@10.0.3.180:@10.0.3.45 /path2share But.. The second share command wipes out the first one :( So I try to combine them like so: #combined that gives me proper rw/ro and root mapped to nobody.. but, I want the ro client root users to map to root instead of nobody share -F nfs -p -o \ root_mapping=nobody\ ,root=@10.0.100.0/24:@10.0.3.180:@10.0.3.45\ ,rw=@10.0.100.0/24\ ,ro=@10.0.3.180:@10.0.3.45\ /path2share It seems I can only have one root_mapping option and get an error if I stick another one in there hoping it'll parse in order of appearance. Is there somehow a way to have multiple root_mapping options based on the host or network for any single nfs export? Or does anyone have an idea of a workaround for this sort of situation? ________________________ Michael Talbott Systems Administrator La Jolla Institute -------------- next part -------------- An HTML attachment was scrubbed... URL: From mtalbott at lji.org Fri Aug 21 00:25:40 2015 From: mtalbott at lji.org (Michael Talbott) Date: Thu, 20 Aug 2015 17:25:40 -0700 Subject: [OmniOS-discuss] Multiple root mappings of a NFS share In-Reply-To: <725FBCB4-7F46-4992-B7E7-1A9DDC05805E@lji.org> References: <725FBCB4-7F46-4992-B7E7-1A9DDC05805E@lji.org> Message-ID: <906C1627-DB3B-4331-B3AD-89CC25A065E9@lji.org> Found the answer to my own question. I had to avoid the root_mapping option all together. Instead, I used the uidmap parameter in combination of ro/rw lists to make everything work the way I need. It's ugly, but it seems to fit my needs. share -F nfs -p -o \ uidmap=\ 0:0:@10.0.3.17\ ~0:0:@10.0.3.45\ ~0:0:@10.0.3.180\ ~0:nobody:@10.0.100.0/24\ ,rw=@10.0.100.0/24:@10.0.3.45:@10.0.3.17\ ,ro=@10.0.3.180:@10.0.3.45\ /path2share ________________________ Michael Talbott Systems Administrator La Jolla Institute > On Aug 20, 2015, at 3:10 PM, Michael Talbott wrote: > > Is it possible to share one directory with different root mappings to different clients? I know on my previous linux box it was easy to do but haven't found the equivalent on OmniOS. > > If there were a way to have the system append to a share via the share command, I could get all the functionality I want by using separate commands like so: > > #read/write. prevent root access from clients > > share -F nfs -p -o root_mapping=nobody,rw=@10.0.100.0 /24,root=@10.0.100.0 /24 /path2share > > #read-only and map all to root > > share -F nfs -p -o root_mapping=root,ro=@10.0.3.180 :@10.0.3.45,root=@10.0.3.180 :@10.0.3.45 /path2share > > > But.. The second share command wipes out the first one :( So I try to combine them like so: > > > #combined that gives me proper rw/ro and root mapped to nobody.. but, I want the ro client root users to map to root instead of nobody > > share -F nfs -p -o \ > root_mapping=nobody\ > ,root=@10.0.100.0 /24:@10.0.3.180:@10.0.3.45\ > ,rw=@10.0.100.0 /24\ > ,ro=@10.0.3.180 :@10.0.3.45\ > /path2share > > It seems I can only have one root_mapping option and get an error if I stick another one in there hoping it'll parse in order of appearance. Is there somehow a way to have multiple root_mapping options based on the host or network for any single nfs export? Or does anyone have an idea of a workaround for this sort of situation? > > > > ________________________ > Michael Talbott > Systems Administrator > La Jolla Institute > -------------- next part -------------- An HTML attachment was scrubbed... URL: From moo at wuffers.net Fri Aug 21 06:06:43 2015 From: moo at wuffers.net (wuffers) Date: Fri, 21 Aug 2015 02:06:43 -0400 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: <55D4B381.30504@jvm.de> References: <20150814182127.13a8a2a3@sleipner.datanom.net> <55D0C453.60703@jvm.de> <55D1CDB5.1040309@osn.de> <55D47DA9.5030907@osn.de> <55D4B381.30504@jvm.de> Message-ID: Oh, the PSOD is not caused by the corruption in ZFS - I suspect it was the other way around (VMware host PSOD -> ZFS corruption). I've experienced the PSOD before, it may be related to IO issues which I outlined in another post here: http://lists.omniti.com/pipermail/omnios-discuss/2015-June/005222.html Nobody chimed in, but it's an ongoing issue. I need to dedicate more time to troubleshoot but other projects are taking my attention right now (coupled with a personal house move time is at a premium!). Also, I've had many improper shutdowns of the hosts and VMs, and this was the first time I've seen a ZFS corruption. I know I'm repeating myself, but my question is still: - Can I safely use this block device again now that it reports no errors? Again, I've moved all data off of it.. and there are no other signs of hardware issues. Recreate it? On Wed, Aug 19, 2015 at 12:49 PM, Stephan Budach wrote: > Hi Joerg, > > Am 19.08.15 um 14:59 schrieb Joerg Goltermann: > > Hi, >> >> the PSOD you got can cause the problems on your exchange database. >> >> Can you check the ESXi logs for the root cause of the PSOD? >> >> I never got a PSOD on such a "corruption". I still think this is >> a "cosmetic" bug, but this should be verified by one of the ZFS >> developers ... >> >> - Joerg > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joeveliscos at gmail.com Fri Aug 21 18:38:28 2015 From: joeveliscos at gmail.com (Joe Veliscos) Date: Fri, 21 Aug 2015 20:38:28 +0200 Subject: [OmniOS-discuss] zfs / disk problem Message-ID: I have somewhat of a puzzle . I have a rather large zpool which has been running for quite a while now. This pool contains 8 vdevs in Raidz1. Each vdev contains 5 disks. There were also 5 spares in the pool. Some time ago we had electricity problems so we had to take the whole box was down as a precaution. It was on a ups so we took it down orderly. When I started it up again later on I saw that during the startup 5 disks were replaced by the spares and thatthe pool was resilvering. The thing is that these disks all belonged to the same vdev. Pool resilvered but still degraded. I have the strong impression that nothing is really wrong with these disks but that at boot time these disks spun up too late or something like that (might be of electricity problems) and so were replaced as zfs thought they were not there. Now I cannot access the data in the pool. Is there a way to fool zfs into accepting this vdev with the original disks back into the pool again in its original state? Situation now is that: 2 of the old disks in the vdev are registered as removed 3 of the old disks in the vdev are registered as degraded and 5 spares are registered as online As far as I know zfs registers some info on each disk itself which tells what pool/vdev it belongs to. If during resilvering, this info is not removed by zfs, is there a chance to : situation 1: 1 shutdown the system 2 remove the spares 3 leave the originals 4 startup the system and hope for the best situation 2: do something low level on these disks (e.g. in case they do contain info which keeps them out of the pool/vdev.) I know this all is far fetched but I want to try anything to save the files on this pool if possible. Maybe someone has a good idea. Will maybe also post at developers list. Thank you, Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephan.budach at JVM.DE Fri Aug 21 20:26:59 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Fri, 21 Aug 2015 22:26:59 +0200 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: References: <20150814182127.13a8a2a3@sleipner.datanom.net> <55D0C453.60703@jvm.de> <55D1CDB5.1040309@osn.de> <55D47DA9.5030907@osn.de> <55D4B381.30504@jvm.de> Message-ID: <55D78993.3090305@jvm.de> Hi, Am 21.08.15 um 08:06 schrieb wuffers: > Oh, the PSOD is not caused by the corruption in ZFS - I suspect it was > the other way around (VMware host PSOD -> ZFS corruption). I've > experienced the PSOD before, it may be related to IO issues which I > outlined in another post here: > http://lists.omniti.com/pipermail/omnios-discuss/2015-June/005222.html > > Nobody chimed in, but it's an ongoing issue. I need to dedicate more > time to troubleshoot but other projects are taking my attention right > now (coupled with a personal house move time is at a premium!). > > Also, I've had many improper shutdowns of the hosts and VMs, and this > was the first time I've seen a ZFS corruption. > > I know I'm repeating myself, but my question is still: > - Can I safely use this block device again now that it reports no > errors? Again, I've moved all data off of it.. and there are no other > signs of hardware issues. Recreate it? I rember that post, but as I do not use Veem, I didn't had anything to contribute. From my understanding though, your issue could only be caused by a write-error, since why should you get a PSOD otherwise? The issues I had were caused by "read-erorrs", at least that was reported by the Linux kernel. In both cases the LUN-depth could be an issue and I will check with my initiator, if I should lower the queue depth. As far as your question goes, I'd say: as long as a scrub doesn't reveal any issue and the zpool's state is clean, I'd use it. > > On Wed, Aug 19, 2015 at 12:49 PM, Stephan Budach > > wrote: > > Hi Joerg, > > Am 19.08.15 um 14:59 schrieb Joerg Goltermann: > > Hi, > > the PSOD you got can cause the problems on your exchange database. > > Can you check the ESXi logs for the root cause of the PSOD? > > I never got a PSOD on such a "corruption". I still think this is > a "cosmetic" bug, but this should be verified by one of the ZFS > developers ... > > - Joerg > Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard at netbsd.org Fri Aug 21 20:37:56 2015 From: richard at netbsd.org (Richard PALO) Date: Fri, 21 Aug 2015 22:37:56 +0200 Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9 Message-ID: There seems to be a recent regression somewhere as when I ssh in from an OI machine to my bloody dev machine running recent vanilla bits, my session hangs relatively soon. I'm using bash as my login shell > richard at omnis:/home/richard$ bash --version > bash --version > GNU bash, version 4.3.33(1)-release (i386-pc-solaris2.11) > Copyright (C) 2013 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > > This is free software; you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > /usr/bin/hostname > > [[ "${LOGNAME}" == "root" ]] && printf "%s" "${PWD/${HOME}/~}# " || > printf "%s" "${PWD/${HOME}/~}\$ " always seems to hang in the same place: > richard at omnis:/home/richard$ mdb /usr/bin/bash core.100940 > Loading modules: [ libc.so.1 ld.so.1 ] >> $C > 08047338 libc.so.1`__read+7(0, 804735b, 1, 80473b8) > 08047368 rl_getc+0x3f(813bec0, 80473b8, 8047398, 810a2de) > 08047398 rl_read_key+0xe4(0, 0, 0, 0, 8047df0, 8047e70) > 080473b8 readline_internal_char+0x98(f00bf, 8a31, 157f1c03, 1, 1a1311, 170f12) > 080473d8 0x80f32df(1, 1, 8047418, 80b1d51, 1, 80b1f43) > 080473f8 0x80f3300(0, 0, 5c5, 0, 811c845, 8143c08) > 08047418 readline+0x56(8149188, 80b1f43, 5c5, 808a19f) > 08047448 0x8070751(10, 81185a0, 1e, 0, 8047df0, 0) > 08047468 0x8070854(8047df0, 8047e70, 80474a8, 80713bc, 8149008, 8117da4) > 08047478 0x80706af(8149008, 8117da4, 143b, 1, 813e808, 80474a4) > 080474a8 0x80713bc(1, 14, 80474e8, 8077bcb) > 080474d8 0x80724ab(0, 14, 80474f8, 8071c2f) > 080474f8 0x8071c57(8146c08, df, 123, 8115492, 8146c08, 123) > 08047d58 yyparse+0x1ef(0, 80b1f43, 8047d98, 0, 81166d1, 0) > 08047d78 parse_command+0x72(8142bc0, 0, 0, 0, 2, 80b1f43) > 08047d98 read_command+0xd4(811b29e, 0, 0, 1, 8047e34, 1) > 08047db8 reader_loop+0x15e(0, 1, 8056230, 8047e10, 1, 1) > 08047df8 main+0x897(fee60a07, feed86e8, 8047e28, 806af93, 1, 8047e34) > 08047e28 _start+0x83(1, 8047ee0, 0, 8047ee6, 8047ef3, 8047f03) Tried running sshd debug, but that doesn't really turn up anything... funny thing is, even from omnios to OI it hangs after a bit: >> $C > 08046068 libc.so.1`__pollsys+7(8046080, 2, 0, 0, 4, 40040) > 08046138 libc.so.1`pselect+0x1bf(9, 810d008, 810cfe8, fea77360, 0, 0) > 08046178 libc.so.1`select+0x8e(9, 810d008, 810cfe8, 0, 0, 0) > 08046268 client_loop+0x480(1, 7e, 0, 8) > 08047b28 main+0x19b7(fea00a07, fea786e8, 8047b54, 8063703, 2, 8047b60) > 08047b54 _start+0x83(2, 8047c58, 8047c5c, 0, 8047c69, 8047c90) I tried changing my login shell to ksh93, but I still get the hangs. Is it possible that something in the gate is causing this? My current HEAD is pointing to > richard at omnis:/home/richard/src/illumos-gate$ git log > commit 359db861fd14071f8a25831efe3bf3790980d071 > Author: Richard Lowe > Date: Wed Aug 5 11:01:58 2015 -0400 > > 6098 ld(1) should not require symbols which identify group sections be global > Reviewed by: Igor Kozhukhov > Reviewed by: Dan McDonald > Reviewed by: Gordon Ross > Approved by: Robert Mustacchi I tried to see if I could get a hang during a program so tried 'git diff' and got: > 101310: less -ins > fee8eec7 read (3, 8047c1b, 1) > 08067d79 iread (3, 8047c1b, 1, 19950) + 71 > 0806c23c getchr (3a, 8047cf0, 8047c48, 805d776, d, 8047cf0) + 1a > 0805d80d getcc (fef47442, 808de40, 0, 17611, fef47442, 8047cf0) + 6b > 0805dbab commands (1, feffb0a8, 8047ded, 0, 8047ce0, 80821b0) + af > 08057533 main (fee90a07, fef086e8, 8047d00, 8056ec3, ffffffff, 8047d14) + 48f > 08056ec3 _start (2, 8047de8, 8047ded, 0, 8047df2, 8047e02) + 83 Perhaps something is up with sockets??? Anybody else seen something like this recently? Any ideas? -- Richard PALO From danmcd at omniti.com Fri Aug 21 21:12:15 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 21 Aug 2015 17:12:15 -0400 Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9 In-Reply-To: References: Message-ID: > On Aug 21, 2015, at 4:37 PM, Richard PALO wrote: > > There seems to be a recent regression somewhere as when I ssh in from > an OI machine to my bloody dev machine running recent vanilla bits, my session hangs relatively soon. > I'm using bash as my login shell I used tcsh as mine, and my bloody box hasn't been updated to the very latest OmniOS bits (which include that ld fix). Did you have this problem with the current bloody repo? Dan From richard at netbsd.org Sat Aug 22 06:27:40 2015 From: richard at netbsd.org (Richard PALO) Date: Sat, 22 Aug 2015 08:27:40 +0200 Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9 In-Reply-To: References: Message-ID: <55D8165C.9070803@netbsd.org> Le 21/08/15 23:12, Dan McDonald a ?crit : > >> On Aug 21, 2015, at 4:37 PM, Richard PALO wrote: >> >> There seems to be a recent regression somewhere as when I ssh in from >> an OI machine to my bloody dev machine running recent vanilla bits, my session hangs relatively soon. >> I'm using bash as my login shell > > I used tcsh as mine, and my bloody box hasn't been updated to the very latest OmniOS bits (which include that ld fix). Did you have this problem with the current bloody repo? > > Dan > > Well, unfortunately I don't keep around all my boot environments, but I was able to determine that a build from 20150725 works fine but the builds I have from 20150818 and later don't. Given I didn't necessarily update from upstream for each build, I'd say something between 15/07 and 18/08 busted something. What made it easier for me to test was relatively simple. 1. boot test be 2. in a virtual terminal (keeping the console to check things out) ssh into OI and then back to omnios 3. cd src/illumos-gate and do a git diff with something needing a few pages of output 4. switch back to console for a few moments and for example with ptree, find the pid of git invoked less and pstack it 5. switch back to vt which should now be hung (if on a broken be). Perhaps someone has a means to reproduce this and identify with more precision when/why things went awry. -- Richard PALO From richard at NetBSD.org Sat Aug 22 06:35:37 2015 From: richard at NetBSD.org (Richard PALO) Date: Sat, 22 Aug 2015 08:35:37 +0200 Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9 In-Reply-To: References: Message-ID: <55D81839.50301@NetBSD.org> Le 21/08/15 23:12, Dan McDonald a ?crit : > >> On Aug 21, 2015, at 4:37 PM, Richard PALO wrote: >> >> There seems to be a recent regression somewhere as when I ssh in from >> an OI machine to my bloody dev machine running recent vanilla bits, my session hangs relatively soon. >> I'm using bash as my login shell > > I used tcsh as mine, and my bloody box hasn't been updated to the very latest OmniOS bits (which include that ld fix). Did you have this problem with the current bloody repo? > > Dan > > Well, unfortunately I don't keep around all my boot environments, but I was able to determine that a build from 20150725 works fine but the builds I have from 20150818 and later don't. Given I didn't necessarily update from upstream for each build, I'd say something between 15/07 and 18/08 busted something. What made it easier for me to test was relatively simple. 1. boot test be 2. in a virtual terminal (keeping the console to check things out) ssh into OI and then back to omnios 3. cd src/illumos-gate and do a git diff with something needing a few pages of output 4. switch back to console for a few moments and for example with ptree, find the pid of git invoked less and pstack it 5. switch back to vt which should now be hung (if on a broken be). Perhaps someone has a means to reproduce this and identify with more precision when/why things went awry. -- Richard PALO From doug at will.to Sat Aug 22 17:02:12 2015 From: doug at will.to (Doug Hughes) Date: Sat, 22 Aug 2015 13:02:12 -0400 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: References: <20150814182127.13a8a2a3@sleipner.datanom.net> <55D0C453.60703@jvm.de> <55D1CDB5.1040309@osn.de> <55D47DA9.5030907@osn.de> <55D4B381.30504@jvm.de> Message-ID: <55D8AB14.3010705@will.to> I've been experiencing spontaneous checksum failure/corruption on read at the zvol level recently on a box running r12 as well. None of the disks show any errors. All of the errors show up at the zvol level until all the disks in the vol get marked as degraded and then a reboot clears it up. repeated scrubs find files to delete, but then after additional heavy read I/O activity, more checksum on read errors occur, and more files need to be removed. So far on r14 I haven't seen this, but I'm keeping an eye on it. The write activity on this server is very low. I'm currently trying to evacuate it with zfs send | mbuffer to another host over 10g, so the read activity is very high and consistent over a long period of time since I have to move about 10TB. On 8/21/2015 2:06 AM, wuffers wrote: > Oh, the PSOD is not caused by the corruption in ZFS - I suspect it was > the other way around (VMware host PSOD -> ZFS corruption). I've > experienced the PSOD before, it may be related to IO issues which I > outlined in another post here: > http://lists.omniti.com/pipermail/omnios-discuss/2015-June/005222.html > > Nobody chimed in, but it's an ongoing issue. I need to dedicate more > time to troubleshoot but other projects are taking my attention right > now (coupled with a personal house move time is at a premium!). > > Also, I've had many improper shutdowns of the hosts and VMs, and this > was the first time I've seen a ZFS corruption. > > I know I'm repeating myself, but my question is still: > - Can I safely use this block device again now that it reports no > errors? Again, I've moved all data off of it.. and there are no other > signs of hardware issues. Recreate it? > > On Wed, Aug 19, 2015 at 12:49 PM, Stephan Budach > > wrote: > > Hi Joerg, > > Am 19.08.15 um 14:59 schrieb Joerg Goltermann: > > Hi, > > the PSOD you got can cause the problems on your exchange database. > > Can you check the ESXi logs for the root cause of the PSOD? > > I never got a PSOD on such a "corruption". I still think this is > a "cosmetic" bug, but this should be verified by one of the ZFS > developers ... > > - Joerg > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at will.to Sat Aug 22 21:49:09 2015 From: doug at will.to (Doug Hughes) Date: Sat, 22 Aug 2015 17:49:09 -0400 Subject: [OmniOS-discuss] trouble with ashift and 4k blocks Message-ID: <55D8EE55.7090708@will.to> I'm following this page: http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks but I just can't get my HGST 4K disks to get the 4TB capacity in the zpool. I've cross referenced and verified multiple times and this should work for sd_config_list= "ATA HGST HDS724040AL", "physical-block-size:4096", Here's some sample output to confirm: # iostat -Er | grep -i vendor | sort | uniq | grep HGST | head -3 Vendor: ATA ,Product: HGST HDS724040AL ,Revision: A580 ,Serial No: PK1331P1GMPSSY Vendor: ATA ,Product: HGST HDS724040AL ,Revision: A580 ,Serial No: PK1331PAGKW5RV Vendor: ATA ,Product: HGST HDS724040AL ,Revision: A580 ,Serial No: PK1331PAGP029V The number of spaces is definitely correct. # echo ::sd_state | mdb -k | egrep '(^un|_blocksize)' un 0: ffffff134e4a7300 un_sys_blocksize = 0x200 un_tgt_blocksize = 0x200 un_phy_blocksize = 0x200 un_f_tgt_blocksize_is_valid = 0x1 un 1: ffffff137abaf340 un_sys_blocksize = 0x200 un_tgt_blocksize = 0x200 un_phy_blocksize = 0x200 un_f_tgt_blocksize_is_valid = 0x1 un 2: ffffff137b2990c0 un_sys_blocksize = 0x200 un_tgt_blocksize = 0x200 un_phy_blocksize = 0x200 un_f_tgt_blocksize_is_valid = 0x1 un 3: ffffff138d713900 un_sys_blocksize = 0x200 un_tgt_blocksize = 0x200 un_phy_blocksize = 0x200 un_f_tgt_blocksize_is_valid = 0x1 should be 0x1000, right? sample format output: 3. c0t3d0 /pci at 0,0/pci8086,340c at 5/pci1000,3150 at 0/sd at 3,0 4. c0t4d0 /pci at 0,0/pci8086,340c at 5/pci1000,3150 at 0/sd at 4,0 5. c0t5d0 format> inq Vendor: ATA Product: HGST HDS724040AL Revision: A580 update_drv -vf sd doesn't seem to help, neither does reboot. zpool create doesn't have the argument to specify ashift, either. Any advice? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mtalbott at lji.org Sun Aug 23 00:38:55 2015 From: mtalbott at lji.org (Michael Talbott) Date: Sat, 22 Aug 2015 17:38:55 -0700 Subject: [OmniOS-discuss] trouble with ashift and 4k blocks In-Reply-To: <55D8EE55.7090708@will.to> References: <55D8EE55.7090708@will.to> Message-ID: <0FF1777F-E057-4F67-B46D-9F48E07A70E7@lji.org> The output from format you provided shows that the kernel is only seeing 2TB, so I would think it's not an issue of the zfs at all, rather, it's the hardware not communicating the full capacity to the server. You're likely running into a limitation of the SAS/SATA controller its connected to or it might be the drive has a jumper set to limit the capacity to 2TB. There might be a firmware update for your SATA/SAS controller to overcome that limit. But definitely check on the drive to see if there's a jumper set on it first. I would test if it's a hardware or OS issue by temporarily booting into a linux distro and checking what it has to say about the drive's capacity. If linux says the same thing, it's almost certainly a hardware/firmware limitation. > On Aug 22, 2015, at 2:49 PM, Doug Hughes wrote: > > 3. c0t3d0 > /pci at 0,0/pci8086,340c at 5/pci1000,3150 at 0/sd at 3,0 > 4. c0t4d0 > /pci at 0,0/pci8086,340c at 5/pci1000,3150 at 0/sd at 4,0 > 5. c0t5d0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From groups at tierarzt-mueller.de Sun Aug 23 10:20:37 2015 From: groups at tierarzt-mueller.de (Alexander Lesle) Date: Sun, 23 Aug 2015 12:20:37 +0200 Subject: [OmniOS-discuss] trouble with ashift and 4k blocks In-Reply-To: <55D8EE55.7090708@will.to> References: <55D8EE55.7090708@will.to> Message-ID: <1353232233.20150823122037@tierarzt-mueller.de> Hello Doug Hughes and List, I use the same HGST drive and they have all 4 TB capacity in my pools. ,-----[ ]----- | | AVAILABLE DISK SELECTIONS: | 0. c2t0d0 | /pci at 0,0/pci15ad,1976 at 10/sd at 0,0 | 1. c2t1d0 | /pci at 0,0/pci15ad,1976 at 10/sd at 1,0 | 2. c4t5000CCA23DCCC6BCd0 | /scsi_vhci/disk at g5000cca23dccc6bc | 3. c4t5000CCA23DCD21A4d0 | /scsi_vhci/disk at g5000cca23dcd21a4 | 4. c4t5000CCA23DCD25A1d0 | /scsi_vhci/disk at g5000cca23dcd25a1 | Specify disk (enter its number): | `------------------- When you wrote only one item in sd.conf you must write a ; and not , at last letter. But this HDD you must not list in sd.conf it will present ashift=12. ,-----[ ]----- | | root at aio:/root# zdb -C | pool_aio: | version: 5000 | name: 'pool_aio' | state: 0 | txg: 5222458 | pool_guid: 11088269185580178933 | hostid: 720590413 | hostname: 'aio' | vdev_children: 1 | vdev_tree: | type: 'root' | id: 0 | guid: 11088269185580178933 | children[0]: | type: 'mirror' | id: 0 | guid: 1388041250297859353 | metaslab_array: 33 | metaslab_shift: 35 | ashift: 12 | asize: 4000773570560 | is_log: 0 | create_txg: 4 | children[0]: | type: 'disk' | id: 0 | guid: 18178429901005250887 | path: '/dev/dsk/c4t5000CCA23DCCC6BCd0s0' | devid: 'id1,sd at n5000cca23dccc6bc/a' | phys_path: '/scsi_vhci/disk at g5000cca23dccc6bc:a' | whole_disk: 1 | DTL: 45 | create_txg: 4 | children[1]: | type: 'disk' | id: 1 | guid: 12635800974590752762 | path: '/dev/dsk/c4t5000CCA23DCD21A4d0s0' | devid: 'id1,sd at n5000cca23dcd21a4/a' | phys_path: '/scsi_vhci/disk at g5000cca23dcd21a4:a' | whole_disk: 1 | DTL: 43 | create_txg: 4 | children[2]: | type: 'disk' | id: 2 | guid: 15588560262687738746 | path: '/dev/dsk/c4t5000CCA23DCD25A1d0s0' | devid: 'id1,sd at n5000cca23dcd25a1/a' | phys_path: '/scsi_vhci/disk at g5000cca23dcd25a1:a' | whole_disk: 1 | DTL: 41 | create_txg: 4 | features_for_read: | com.delphix:hole_birth | com.delphix:embedded_data | `------------------- ,-----[ ]----- | | root at aio:/root# zdb | egrep 'ashift| name' | name: 'pool_aio' | ashift: 12 | name: 'rpool' | ashift: 9 | `------------------- I think its a hardware issue. Test this what Michael Talbott wrote. On August, 22 2015, 23:49 wrote in [1]: > I'm following this page: > http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks > but I just can't get my HGST 4K disks to get the 4TB capacity in the zpool. > I've cross referenced and verified multiple times and this should work > for sd_config_list= > "ATA HGST HDS724040AL", "physical-block-size:4096", -- Best Regards Alexander August, 23 2015 ........ [1] mid:55D8EE55.7090708 at will.to ........ From doug at will.to Sun Aug 23 21:36:59 2015 From: doug at will.to (Doug Hughes) Date: Sun, 23 Aug 2015 17:36:59 -0400 Subject: [OmniOS-discuss] trouble with ashift and 4k blocks In-Reply-To: <1353232233.20150823122037@tierarzt-mueller.de> References: <55D8EE55.7090708@will.to> <1353232233.20150823122037@tierarzt-mueller.de> Message-ID: <55DA3CFB.1020201@will.to> I do suspect that there is a problem with the card and the 2TB/4TB. Still, shouldn't my sd.conf entries result in zdb showing ashift=12 and mdb showing block size 0x1000? (the comma issue, in answer to other person, is ok. This is not the last entry in sd.conf) On 8/23/2015 6:20 AM, Alexander Lesle wrote: > Hello Doug Hughes and List, > > I use the same HGST drive and they have all 4 TB capacity in my pools. > > ,-----[ ]----- > | > | AVAILABLE DISK SELECTIONS: > | 0. c2t0d0 > | /pci at 0,0/pci15ad,1976 at 10/sd at 0,0 > | 1. c2t1d0 > | /pci at 0,0/pci15ad,1976 at 10/sd at 1,0 > | 2. c4t5000CCA23DCCC6BCd0 > | /scsi_vhci/disk at g5000cca23dccc6bc > | 3. c4t5000CCA23DCD21A4d0 > | /scsi_vhci/disk at g5000cca23dcd21a4 > | 4. c4t5000CCA23DCD25A1d0 > | /scsi_vhci/disk at g5000cca23dcd25a1 > | Specify disk (enter its number): > | > `------------------- > > When you wrote only one item in sd.conf you must write a ; and not , > at last letter. > But this HDD you must not list in sd.conf it will present ashift=12. > > ,-----[ ]----- > | > | root at aio:/root# zdb -C > | pool_aio: > | version: 5000 > | name: 'pool_aio' > | state: 0 > | txg: 5222458 > | pool_guid: 11088269185580178933 > | hostid: 720590413 > | hostname: 'aio' > | vdev_children: 1 > | vdev_tree: > | type: 'root' > | id: 0 > | guid: 11088269185580178933 > | children[0]: > | type: 'mirror' > | id: 0 > | guid: 1388041250297859353 > | metaslab_array: 33 > | metaslab_shift: 35 > | ashift: 12 > | asize: 4000773570560 > | is_log: 0 > | create_txg: 4 > | children[0]: > | type: 'disk' > | id: 0 > | guid: 18178429901005250887 > | path: '/dev/dsk/c4t5000CCA23DCCC6BCd0s0' > | devid: 'id1,sd at n5000cca23dccc6bc/a' > | phys_path: '/scsi_vhci/disk at g5000cca23dccc6bc:a' > | whole_disk: 1 > | DTL: 45 > | create_txg: 4 > | children[1]: > | type: 'disk' > | id: 1 > | guid: 12635800974590752762 > | path: '/dev/dsk/c4t5000CCA23DCD21A4d0s0' > | devid: 'id1,sd at n5000cca23dcd21a4/a' > | phys_path: '/scsi_vhci/disk at g5000cca23dcd21a4:a' > | whole_disk: 1 > | DTL: 43 > | create_txg: 4 > | children[2]: > | type: 'disk' > | id: 2 > | guid: 15588560262687738746 > | path: '/dev/dsk/c4t5000CCA23DCD25A1d0s0' > | devid: 'id1,sd at n5000cca23dcd25a1/a' > | phys_path: '/scsi_vhci/disk at g5000cca23dcd25a1:a' > | whole_disk: 1 > | DTL: 41 > | create_txg: 4 > | features_for_read: > | com.delphix:hole_birth > | com.delphix:embedded_data > | > `------------------- > ,-----[ ]----- > | > | root at aio:/root# zdb | egrep 'ashift| name' > | name: 'pool_aio' > | ashift: 12 > | name: 'rpool' > | ashift: 9 > | > `------------------- > > I think its a hardware issue. > Test this what Michael Talbott wrote. > > On August, 22 2015, 23:49 wrote in [1]: > >> I'm following this page: >> http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks >> but I just can't get my HGST 4K disks to get the 4TB capacity in the zpool. >> I've cross referenced and verified multiple times and this should work >> for sd_config_list= >> "ATA HGST HDS724040AL", "physical-block-size:4096", From danmcd at omniti.com Mon Aug 24 01:19:50 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sun, 23 Aug 2015 21:19:50 -0400 Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9 In-Reply-To: <55D81839.50301@NetBSD.org> References: <55D81839.50301@NetBSD.org> Message-ID: <62284A5B-83D7-4A0C-9F3E-CF7BBDA16BD5@omniti.com> I just tried this reproduction on my OmniOS box: 1.) ssh to OI 151a9. 2.) ssh from 151a9 to bloody 3.) cat $illumos-gate/usr/src/uts/common/inet/ip/ip.c (a large file) No breakage. 4.) "git log -p ip.c" -- it uses less -M by default, so that stopped waiting for input. 5.) "git log -p ip.c | cat" -- it spewed output. 6.) exec bash (I use tcsh) 7.) repeat #5 8.) Login to bloody with a user account with SHELL=bash. 9.) repeat #5 I'm not seeing it. Do you maybe have PathMTU issues, or are these same-subnet machines? Dan From stephan.budach at JVM.DE Mon Aug 24 09:54:29 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Mon, 24 Aug 2015 11:54:29 +0200 Subject: [OmniOS-discuss] ZFS data corruption In-Reply-To: <55D8AB14.3010705@will.to> References: <20150814182127.13a8a2a3@sleipner.datanom.net> <55D0C453.60703@jvm.de> <55D1CDB5.1040309@osn.de> <55D47DA9.5030907@osn.de> <55D4B381.30504@jvm.de> <55D8AB14.3010705@will.to> Message-ID: <55DAE9D5.2020908@jvm.de> Am 22.08.15 um 19:02 schrieb Doug Hughes: > I've been experiencing spontaneous checksum failure/corruption on read > at the zvol level recently on a box running r12 as well. None of the > disks show any errors. All of the errors show up at the zvol level > until all the disks in the vol get marked as degraded and then a > reboot clears it up. repeated scrubs find files to delete, but then > after additional heavy read I/O activity, more checksum on read errors > occur, and more files need to be removed. So far on r14 I haven't seen > this, but I'm keeping an eye on it. > > The write activity on this server is very low. I'm currently trying to > evacuate it with zfs send | mbuffer to another host over 10g, so the > read activity is very high and consistent over a long period of time > since I have to move about 10TB. > This morning, I received another of these zvol errors, which was also reported up to my RAC cluster. I haven't fully checked that yet, but I think the ASM/ADVM simply issued a re-read and was happy with the result. Otherwise ASM would have issued a read against the mirror side and probably have taken the "faulty" failure group offline, which it didn't. However, I was wondering how to get some more information from the STMF framework and found a post, how to read from the STMF trace buffer? root at nfsvmpool07:/root# echo '*stmf_trace_buf/s' | mdb -k | more 0xffffff090f828000: :0002579: Imported the LU 600144f090860e6b000055 0c3a290001 :0002580: Imported the LU 600144f090860e6b0000550c3e240002 :0002581: Imported the LU 600144f090860e6b0000550c3e270003 :0002603: Imported the LU 600144f090860e6b000055925a120001 :0002604: Imported the LU 600144f090860e6b000055a50ebf0002 :0002604: Imported the LU 600144f090860e6b000055a8f7d70003 :0002605: Imported the LU 600144f090860e6b000055a8f7e30004 :150815416: UIO_READ failed, ret = 5, resid = 131072 :224314824: UIO_READ failed, ret = 5, resid = 131072 So, this basically shows two read errors, which is consistent with the incidents I had on this system. Unfortuanetly, this doesn't buy me much more, since I don't know how to track that further down, but it seems that COMSTAR had issues reading from the zvol. Is it possible to debug this further? > > On 8/21/2015 2:06 AM, wuffers wrote: >> Oh, the PSOD is not caused by the corruption in ZFS - I suspect it >> was the other way around (VMware host PSOD -> ZFS corruption). I've >> experienced the PSOD before, it may be related to IO issues which I >> outlined in another post here: >> http://lists.omniti.com/pipermail/omnios-discuss/2015-June/005222.html >> >> Nobody chimed in, but it's an ongoing issue. I need to dedicate more >> time to troubleshoot but other projects are taking my attention right >> now (coupled with a personal house move time is at a premium!). >> >> Also, I've had many improper shutdowns of the hosts and VMs, and this >> was the first time I've seen a ZFS corruption. >> >> I know I'm repeating myself, but my question is still: >> - Can I safely use this block device again now that it reports no >> errors? Again, I've moved all data off of it.. and there are no other >> signs of hardware issues. Recreate it? >> >> On Wed, Aug 19, 2015 at 12:49 PM, Stephan Budach >> wrote: >> >> Hi Joerg, >> >> Am 19.08.15 um 14:59 schrieb Joerg Goltermann: >> >> Hi, >> >> the PSOD you got can cause the problems on your exchange >> database. >> >> Can you check the ESXi logs for the root cause of the PSOD? >> >> I never got a PSOD on such a "corruption". I still think this is >> a "cosmetic" bug, but this should be verified by one of the ZFS >> developers ... >> >> - Joerg >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.sproul at circonus.com Mon Aug 24 14:26:40 2015 From: eric.sproul at circonus.com (Eric Sproul) Date: Mon, 24 Aug 2015 10:26:40 -0400 Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9 In-Reply-To: <62284A5B-83D7-4A0C-9F3E-CF7BBDA16BD5@omniti.com> References: <55D81839.50301@NetBSD.org> <62284A5B-83D7-4A0C-9F3E-CF7BBDA16BD5@omniti.com> Message-ID: On Sun, Aug 23, 2015 at 9:19 PM, Dan McDonald wrote: > I'm not seeing it. Do you maybe have PathMTU issues, or are these same-subnet machines? This sounds a lot like a split-path routing issue, so knowing whether these machines are on the same subnet would be key. I've had this happen in the past with multi-homed machines, where the initiating TCP client's packets traverse a router/firewall with stateful filtering and "hair-pin" back onto the local subnet (perhaps due to DNAT or other fancy tricks). Since the destination knows it is directly connected to the network of the source IP, it responds directly back to the client, bypassing the router. Thus the stateful firewall sees only one half of the connection, and misses any negotiated changes, e.g. TCP window-scaling that might occur. As soon as the client sends a scaled-window packet, the firewall drops it as invalid, and the client experiences a hang in connectivity. The solution of course is, Don't Do That(tm). Multi-homing should be employed along with split-horizon DNS or firewall routing rules that NAT the source to ensure responses come back through the router. But maybe all this is moot if you're not multi-homing or doing anything similarly "fancy" between your OI and OmniOS hosts. Eric From richard at netbsd.org Mon Aug 24 14:35:24 2015 From: richard at netbsd.org (Richard PALO) Date: Mon, 24 Aug 2015 16:35:24 +0200 Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9 In-Reply-To: References: <55D81839.50301@NetBSD.org> <62284A5B-83D7-4A0C-9F3E-CF7BBDA16BD5@omniti.com> Message-ID: <55DB2BAC.20603@netbsd.org> Le 24/08/15 16:26, Eric Sproul a ?crit : > On Sun, Aug 23, 2015 at 9:19 PM, Dan McDonald wrote: >> I'm not seeing it. Do you maybe have PathMTU issues, or are these same-subnet machines? > > This sounds a lot like a split-path routing issue, so knowing whether > these machines are on the same subnet would be key. > > I've had this happen in the past with multi-homed machines, where the > initiating TCP client's packets traverse a router/firewall with > stateful filtering and "hair-pin" back onto the local subnet (perhaps > due to DNAT or other fancy tricks). Since the destination knows it is > directly connected to the network of the source IP, it responds > directly back to the client, bypassing the router. Thus the stateful > firewall sees only one half of the connection, and misses any > negotiated changes, e.g. TCP window-scaling that might occur. As soon > as the client sends a scaled-window packet, the firewall drops it as > invalid, and the client experiences a hang in connectivity. The > solution of course is, Don't Do That(tm). Multi-homing should be > employed along with split-horizon DNS or firewall routing rules that > NAT the source to ensure responses come back through the router. > > But maybe all this is moot if you're not multi-homing or doing > anything similarly "fancy" between your OI and OmniOS hosts. > > Eric > > Hi Eric, The machines do not belong to the same subnet. They are physically remote and the omnios machine is behind a router with port forwarding. The OI machine *is* multihomed, though. What strikes me most is that previous versions of omnios (and the gate) worked fine, it is only now I just happened to come across this PITA-ful issue) What could cause this difference in treatment, and is OI at fault or recent gate? -- Richard PALO From eric.sproul at circonus.com Mon Aug 24 16:05:20 2015 From: eric.sproul at circonus.com (Eric Sproul) Date: Mon, 24 Aug 2015 12:05:20 -0400 Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9 In-Reply-To: <55DB2BAC.20603@netbsd.org> References: <55D81839.50301@NetBSD.org> <62284A5B-83D7-4A0C-9F3E-CF7BBDA16BD5@omniti.com> <55DB2BAC.20603@netbsd.org> Message-ID: On Mon, Aug 24, 2015 at 10:35 AM, Richard PALO wrote: > The machines do not belong to the same subnet. They are physically remote > and the omnios machine is behind a router with port forwarding. > The OI machine *is* multihomed, though. > What strikes me most is that previous versions of omnios (and the gate) > worked fine, it is only now I just happened to come across this PITA-ful issue) > > What could cause this difference in treatment, and is OI at fault or recent gate? What you describe sounds network-related, perhaps just a coincidence that it happened "recently". However, it also sounds like the behavior changes depending on whether you use an older BE or a newer one, so that makes it seem *less* likely that it is an issue with the network. I might still try to packet capture both working and non-working ssh sessions and compare them. I would also double-check that your omnios BEs don't have something like ipfilter enabled or perhaps some kernel tunable that you changed but might have forgotten. Eric From richard at netbsd.org Mon Aug 24 16:04:20 2015 From: richard at netbsd.org (Richard PALO) Date: Mon, 24 Aug 2015 18:04:20 +0200 Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9 In-Reply-To: References: <55D81839.50301@NetBSD.org> <62284A5B-83D7-4A0C-9F3E-CF7BBDA16BD5@omniti.com> <55DB2BAC.20603@netbsd.org> Message-ID: <55DB4084.6090005@netbsd.org> Le 24/08/15 18:05, Eric Sproul a ?crit : > What you describe sounds network-related, perhaps just a coincidence > that it happened "recently". However, it also sounds like the > behavior changes depending on whether you use an older BE or a newer > one, so that makes it seem *less* likely that it is an issue with the > network. I might still try to packet capture both working and > non-working ssh sessions and compare them. I would also double-check > that your omnios BEs don't have something like ipfilter enabled or > perhaps some kernel tunable that you changed but might have forgotten. > > Eric > > I do find the following from the OI machine interesting: > richard at smicro:~$ pfexec kstat -m ipf > module: ipf instance: 0 > name: inbound class: net > acct 0 > bad frag state alloc 0 > bad ip pkt 0 > bad pkt state alloc 0 > block 0 > block, logged 0 > cachehit 57425203 > crtime 154,516657078 > dropped:pps ceiling 0 > ip upd. fail 0 > ipv6 pkt 0 > logged 0 > new frag state compl. pkt 0 > new frag state kept 0 > new pkt kept state 0 > nomatch 92080544 > nomatch, logged 0 > pass 95757622 > pass, logged 3676918 > pullup nok 0 > pullup ok 254596 > return sent 0 > short 0 > skip 57 > snaptime 154,516657078 > src != route 0 > tcp cksum bad 0 > ttl invalid 1099124 > > module: ipf instance: 0 > name: outbound class: net > acct 0 > bad frag state alloc 0 > bad ip pkt 0 > bad pkt state alloc 0 > block 14 > block, logged 0 > cachehit 0 > crtime 154,516663632 > dropped:pps ceiling 0 > ip upd. fail 0 > ipv6 pkt 0 > logged 0 > new frag state compl. pkt 0 > new frag state kept 0 > new pkt kept state 0 > nomatch 123524975 > nomatch, logged 0 > pass 123524967 > pass, logged 0 > pullup nok 0 > pullup ok 252835 > return sent 0 > short 0 > skip 0 > snaptime 154,516663632 > src != route 0 > tcp cksum bad 0 > ttl invalid 0 notice inbound invalids and nomatches both ways... are they a concern? -- Richard PALO From eric.sproul at circonus.com Mon Aug 24 17:14:31 2015 From: eric.sproul at circonus.com (Eric Sproul) Date: Mon, 24 Aug 2015 13:14:31 -0400 Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9 In-Reply-To: <55DB4084.6090005@netbsd.org> References: <55D81839.50301@NetBSD.org> <62284A5B-83D7-4A0C-9F3E-CF7BBDA16BD5@omniti.com> <55DB2BAC.20603@netbsd.org> <55DB4084.6090005@netbsd.org> Message-ID: On Mon, Aug 24, 2015 at 12:04 PM, Richard PALO wrote: > notice inbound invalids and nomatches both ways... are they a concern? I have no idea. I might try adding an unconditional pass rule for the OmniOS system to ensure it's not matching any other ipfilter rules, or if possible, disable ipfilter during the testing. From mtalbott at lji.org Thu Aug 27 17:56:40 2015 From: mtalbott at lji.org (Michael Talbott) Date: Thu, 27 Aug 2015 10:56:40 -0700 Subject: [OmniOS-discuss] 8TB Seagates under load = PANIC? Message-ID: <73AE450C-EAE1-4ED5-A48C-D7F988BEC154@lji.org> Anyone out there using 8TB Seagate drives in their storage pools? Reason I ask is that I recently created a new server with a brand new Seagate One-Store shelf full of 8TB seagate drives and everything was working pretty well as long as it wasn't under load. But, once the i/o throughput started getting heavy (700MB/s+) and random read/writes starting happening, I started to get all kinds of random transport errors (from at least 50% of the drives) reported by iostat -en. After a few days of transferring data at that rate (while other random i/o was happening as well), it panic'd with: WARNING: /scsi_vhci/disk at g5000c500794b82b5 (sd87): SYNCHRONIZE CACHE command failed (5) panic[cpu0]/thread=ffffff00f5217c40: assertion failed: ldi_strategy(dvd->vd_lh,bp) == 0, file: ../../common/fs/zfs/vdev_disk.c, line: 819 I'm thinking that maybe because these drives are known to have low random IOP performance that maybe something is timing out causing an error in transport that leads to some bad juju? On a side note, there we no transport errors when I sent data to the shelf initially (300 TB+). Now that data is there and is being accessed with all kinds of different requests, this happened. Go figure. Ideas? ________________________ Michael Talbott Systems Administrator La Jolla Institute -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Thu Aug 27 18:03:32 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 27 Aug 2015 14:03:32 -0400 Subject: [OmniOS-discuss] 8TB Seagates under load = PANIC? In-Reply-To: <73AE450C-EAE1-4ED5-A48C-D7F988BEC154@lji.org> References: <73AE450C-EAE1-4ED5-A48C-D7F988BEC154@lji.org> Message-ID: 8TB drives are SMR. That's a whole new world of emulation and possible failures. I would not recommend anyone use SMR drives for immediate use, ESPECIALLY for a pool under load apart from single-write-stream archiving. Dan Sent from my iPhone (typos, autocorrect, and all) > On Aug 27, 2015, at 1:56 PM, Michael Talbott wrote: > > Anyone out there using 8TB Seagate drives in their storage pools? Reason I ask is that I recently created a new server with a brand new Seagate One-Store shelf full of 8TB seagate drives and everything was working pretty well as long as it wasn't under load. But, once the i/o throughput started getting heavy (700MB/s+) and random read/writes starting happening, I started to get all kinds of random transport errors (from at least 50% of the drives) reported by iostat -en. After a few days of transferring data at that rate (while other random i/o was happening as well), it panic'd with: > > WARNING: /scsi_vhci/disk at g5000c500794b82b5 (sd87): > SYNCHRONIZE CACHE command failed (5) > > panic[cpu0]/thread=ffffff00f5217c40: assertion failed: ldi_strategy(dvd->vd_lh,bp) == 0, file: ../../common/fs/zfs/vdev_disk.c, line: 819 > > I'm thinking that maybe because these drives are known to have low random IOP performance that maybe something is timing out causing an error in transport that leads to some bad juju? > > On a side note, there we no transport errors when I sent data to the shelf initially (300 TB+). Now that data is there and is being accessed with all kinds of different requests, this happened. Go figure. > > Ideas? > > > ________________________ > Michael Talbott > Systems Administrator > La Jolla Institute > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at will.to Thu Aug 27 18:31:25 2015 From: doug at will.to (Doug Hughes) Date: Thu, 27 Aug 2015 14:31:25 -0400 Subject: [OmniOS-discuss] trouble with ashift and 4k blocks In-Reply-To: <55DA3CFB.1020201@will.to> References: <55D8EE55.7090708@will.to> <1353232233.20150823122037@tierarzt-mueller.de> <55DA3CFB.1020201@will.to> Message-ID: replaced ancient 1068E with SAS 2308 chipset and all is well. Thanks! (I rather dislike the ultra long disk names that result from the good old c0t0d0 , but that's life) happily, I didn't even have to rebuild the zpool and the mdb output looks as expected. On Sun, Aug 23, 2015 at 5:36 PM, Doug Hughes wrote: > I do suspect that there is a problem with the card and the 2TB/4TB. Still, > shouldn't my sd.conf entries result in zdb showing ashift=12 and mdb > showing block size 0x1000? > > (the comma issue, in answer to other person, is ok. This is not the last > entry in sd.conf) > > > > On 8/23/2015 6:20 AM, Alexander Lesle wrote: > >> Hello Doug Hughes and List, >> >> I use the same HGST drive and they have all 4 TB capacity in my pools. >> >> ,-----[ ]----- >> | >> | AVAILABLE DISK SELECTIONS: >> | 0. c2t0d0 >> | /pci at 0,0/pci15ad,1976 at 10/sd at 0,0 >> | 1. c2t1d0 >> | /pci at 0,0/pci15ad,1976 at 10/sd at 1,0 >> | 2. c4t5000CCA23DCCC6BCd0 >> | /scsi_vhci/disk at g5000cca23dccc6bc >> | 3. c4t5000CCA23DCD21A4d0 >> | /scsi_vhci/disk at g5000cca23dcd21a4 >> | 4. c4t5000CCA23DCD25A1d0 >> | /scsi_vhci/disk at g5000cca23dcd25a1 >> | Specify disk (enter its number): >> | >> `------------------- >> >> When you wrote only one item in sd.conf you must write a ; and not , >> at last letter. >> But this HDD you must not list in sd.conf it will present ashift=12. >> >> ,-----[ ]----- >> | >> | root at aio:/root# zdb -C >> | pool_aio: >> | version: 5000 >> | name: 'pool_aio' >> | state: 0 >> | txg: 5222458 >> | pool_guid: 11088269185580178933 >> | hostid: 720590413 >> | hostname: 'aio' >> | vdev_children: 1 >> | vdev_tree: >> | type: 'root' >> | id: 0 >> | guid: 11088269185580178933 >> | children[0]: >> | type: 'mirror' >> | id: 0 >> | guid: 1388041250297859353 >> | metaslab_array: 33 >> | metaslab_shift: 35 >> | ashift: 12 >> | asize: 4000773570560 >> | is_log: 0 >> | create_txg: 4 >> | children[0]: >> | type: 'disk' >> | id: 0 >> | guid: 18178429901005250887 >> | path: '/dev/dsk/c4t5000CCA23DCCC6BCd0s0' >> | devid: 'id1,sd at n5000cca23dccc6bc/a' >> | phys_path: '/scsi_vhci/disk at g5000cca23dccc6bc:a' >> | whole_disk: 1 >> | DTL: 45 >> | create_txg: 4 >> | children[1]: >> | type: 'disk' >> | id: 1 >> | guid: 12635800974590752762 >> | path: '/dev/dsk/c4t5000CCA23DCD21A4d0s0' >> | devid: 'id1,sd at n5000cca23dcd21a4/a' >> | phys_path: '/scsi_vhci/disk at g5000cca23dcd21a4:a' >> | whole_disk: 1 >> | DTL: 43 >> | create_txg: 4 >> | children[2]: >> | type: 'disk' >> | id: 2 >> | guid: 15588560262687738746 >> | path: '/dev/dsk/c4t5000CCA23DCD25A1d0s0' >> | devid: 'id1,sd at n5000cca23dcd25a1/a' >> | phys_path: '/scsi_vhci/disk at g5000cca23dcd25a1:a' >> | whole_disk: 1 >> | DTL: 41 >> | create_txg: 4 >> | features_for_read: >> | com.delphix:hole_birth >> | com.delphix:embedded_data >> | >> `------------------- >> ,-----[ ]----- >> | >> | root at aio:/root# zdb | egrep 'ashift| name' >> | name: 'pool_aio' >> | ashift: 12 >> | name: 'rpool' >> | ashift: 9 >> | >> `------------------- >> >> I think its a hardware issue. >> Test this what Michael Talbott wrote. >> >> On August, 22 2015, 23:49 wrote in [1]: >> >> I'm following this page: >>> http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks >>> but I just can't get my HGST 4K disks to get the 4TB capacity in the >>> zpool. >>> I've cross referenced and verified multiple times and this should work >>> for sd_config_list= >>> "ATA HGST HDS724040AL", "physical-block-size:4096", >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard at netbsd.org Sun Aug 30 08:50:49 2015 From: richard at netbsd.org (Richard PALO) Date: Sun, 30 Aug 2015 10:50:49 +0200 Subject: [OmniOS-discuss] strangeness ssh into omnios from oi_151a9 In-Reply-To: References: <55D81839.50301@NetBSD.org> <62284A5B-83D7-4A0C-9F3E-CF7BBDA16BD5@omniti.com> <55DB2BAC.20603@netbsd.org> <55DB4084.6090005@netbsd.org> Message-ID: <55E2C3E9.9000702@netbsd.org> Le 24/08/15 19:14, Eric Sproul a ?crit : > On Mon, Aug 24, 2015 at 12:04 PM, Richard PALO wrote: >> notice inbound invalids and nomatches both ways... are they a concern? > > I have no idea. I might try adding an unconditional pass rule for the > OmniOS system to ensure it's not matching any other ipfilter rules, or > if possible, disable ipfilter during the testing. > Been noticing some talk of issues with ipv6/ipv4 lately... My freebox (on the omnios side) has ipv6 enabled but not on the OI side with an OBS router. I'll try turning that off to see if things settle down.. with some luck some fixes are on the way. (seems I remember Dan already fixed an issue last year in this area that I had with the hottail fox). -- Richard PALO From mail at steffenwagner.com Sun Aug 30 14:17:19 2015 From: mail at steffenwagner.com (Steffen Wagner) Date: Sun, 30 Aug 2015 16:17:19 +0200 Subject: [OmniOS-discuss] OmniOS / Nappit slow iscsi / ZFS performance with Proxmox Message-ID: <001a01d0e32e$951a1660$bf4e4320$@steffenwagner.com> Hi everyone! I just setup a small network with 2 nodes: * 1 proxmox host on Debian Wheezy hosting KVM VMs * 1 napp-it host on OmniOS stable The systems are currently connected through a 1 GBit link for general WAN and LAN communitcation and a 20 GBit link (two 10 GBit links aggregated) for the iSCSI communication. Both connection's bandwidth was confirmed using iperf. The napp-it system currently has one pool (tank) consisting of 2 mirror vdevs. The 4 disks are SAS3 disks connected to a SAS2 backplane and directly attached (no expander) to the LSI SAS3008 (9300-8i) HBA. Comstar is running on that Machine with 1 target (vm-storage) in 1 target group (vm-storage-group). Proxmox has this iSCSI target configured as a "ZFS over iSCSI" storage using a block size of 8k and the "Write cache" option enabled. This is where the problem starts: dd if=/dev/zero of=/tank/test bs=1G count=20 conv=fdatasync This dd test yields around 300 MB/s directly on the napp-it system. dd if=/dev/zero of=/home/test bs=1G count=20 conv=fdatasync This dd test yields around 100 MB/s on a VM with it's disk on the napp-it system connected via iSCSI. The problem here is not the absolute numbers as these tests do not provide accurate numbers, the problem is the difference between the two values. I expected at least something around 80% of the local bandwidth, but this is usually around 30% or less. What I noticed during the tests: When running the test locally on the napp-it system, all disks will be fully utilized (read using iostat -x 1). When running the test inside a VM, the disk utilization barely reaches 30% (which seems to reflect the results of the bandwidth displayed by dd). These 30% are only reached, if the locical unit of the VM disk has the writeback cache enabled. Disabling it results in 20-30 MB/s with the dd test mentioned above. Enabling it also increases the disk utilization. These values are also seen during the disk migration. Migrating one disk results in slow speed and low disk utilization. Migrating several disks in parallel will evetually cause 100% disk utilization. I also tested a NFS share as VM storage in proxmox. Running the same test inside a VM on the NFS share yields results around 200-220 MB/s. This is better (and shows that the traffic is going over the fast link between the servers), but not really yet as I still lose a third. I am fairly new to the Solaris and ZFS world, so any help is greatly appreciated. Thanks in advance! Steffen -------------- next part -------------- An HTML attachment was scrubbed... URL: From vab at bb-c.de Sun Aug 30 15:22:34 2015 From: vab at bb-c.de (Volker A. Brandt) Date: Sun, 30 Aug 2015 17:22:34 +0200 Subject: [OmniOS-discuss] OmniOS / Nappit slow iscsi / ZFS performance with Proxmox In-Reply-To: <001a01d0e32e$951a1660$bf4e4320$@steffenwagner.com> References: <001a01d0e32e$951a1660$bf4e4320$@steffenwagner.com> Message-ID: <21987.8122.276207.700528@glaurung.bb-c.de> > The systems are currently connected through a 1 GBit link for > general WAN and LAN communitcation and a 20 GBit link (two 10 GBit > links aggregated) for the iSCSI communication. This may or may not make a difference but if you do link aggregation and then use the link only from one client IP then you will only get one connection on one of the two aggregated links. In case of iSCSI I think it is better to configure the two links separately and then use multipathing. Regards -- Volker -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Oracle Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANY Email: vab at bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgr??e: 46 Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt "When logic and proportion have fallen sloppy dead" From mtalbott at lji.org Sun Aug 30 22:45:49 2015 From: mtalbott at lji.org (Michael Talbott) Date: Sun, 30 Aug 2015 15:45:49 -0700 Subject: [OmniOS-discuss] OmniOS / Nappit slow iscsi / ZFS performance with Proxmox In-Reply-To: <001a01d0e32e$951a1660$bf4e4320$@steffenwagner.com> References: <001a01d0e32e$951a1660$bf4e4320$@steffenwagner.com> Message-ID: <58169405-2CCA-4C66-92DE-52B1192FFADA@lji.org> This may be a given, but, since you didn't mention this in your network topology.. Make sure the 1g LAN link is on a different subnet than the 20g iscsi link. Otherwise iscsi traffic might be flowing through the 1g link. Also jumbo frames can help with iscsi. Additionally, dd speed tests from /dev/zero to a zfs disk are highly misleading if you have any compression enabled on the zfs disk (since only 512 bytes of disk is actually written for nearly any amount of consecutive zeros) Michael Sent from my iPhone > On Aug 30, 2015, at 7:17 AM, Steffen Wagner wrote: > > Hi everyone! > > I just setup a small network with 2 nodes: > * 1 proxmox host on Debian Wheezy hosting KVM VMs > * 1 napp-it host on OmniOS stable > > The systems are currently connected through a 1 GBit link for general WAN and LAN communitcation and a 20 GBit link (two 10 GBit links aggregated) for the iSCSI communication. > Both connection's bandwidth was confirmed using iperf. > > The napp-it system currently has one pool (tank) consisting of 2 mirror vdevs. The 4 disks are SAS3 disks connected to a SAS2 backplane and directly attached (no expander) to the LSI SAS3008 (9300-8i) HBA. > Comstar is running on that Machine with 1 target (vm-storage) in 1 target group (vm-storage-group). > > Proxmox has this iSCSI target configured as a "ZFS over iSCSI" storage using a block size of 8k and the "Write cache" option enabled. > This is where the problem starts: > > dd if=/dev/zero of=/tank/test bs=1G count=20 conv=fdatasync > > This dd test yields around 300 MB/s directly on the napp-it system. > > dd if=/dev/zero of=/home/test bs=1G count=20 conv=fdatasync > > This dd test yields around 100 MB/s on a VM with it's disk on the napp-it system connected via iSCSI. > > The problem here is not the absolute numbers as these tests do not provide accurate numbers, the problem is the difference between the two values. I expected at least something around 80% of the local bandwidth, but this is usually around 30% or less. > > What I noticed during the tests: When running the test locally on the napp-it system, all disks will be fully utilized (read using iostat -x 1). When running the test inside a VM, the disk utilization barely reaches 30% (which seems to reflect the results of the bandwidth displayed by dd). > > These 30% are only reached, if the locical unit of the VM disk has the writeback cache enabled. Disabling it results in 20-30 MB/s with the dd test mentioned above. Enabling it also increases the disk utilization. > > These values are also seen during the disk migration. Migrating one disk results in slow speed and low disk utilization. Migrating several disks in parallel will evetually cause 100% disk utilization. > > I also tested a NFS share as VM storage in proxmox. Running the same test inside a VM on the NFS share yields results around 200-220 MB/s. This is better (and shows that the traffic is going over the fast link between the servers), but not really yet as I still lose a third. > > I am fairly new to the Solaris and ZFS world, so any help is greatly appreciated. > > Thanks in advance! > > Steffen > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: