From Kevin.Swab at ColoState.EDU Thu Jan 1 00:30:15 2015 From: Kevin.Swab at ColoState.EDU (Kevin Swab) Date: Wed, 31 Dec 2014 17:30:15 -0700 Subject: [OmniOS-discuss] slow drive response times In-Reply-To: References: <54A44D8C.5090302@ColoState.EDU> Message-ID: <54A49517.6070205@ColoState.EDU> Hello Richard and group, thanks for your reply! I'll look into sg_logs for one of these devices once I have a chance to track that progam down... Thanks for the tip on the 500 ms latency, I wasn't aware that could happen in normal cases. However, I don't believe what I'm seeing constitutes normal behavior. First, some anecdotal evidence: If I pull and replace the suspect drive, my downstream systems stop complaining, and the high service time numbers go away. I threw out 500 ms as a guess to the point at which I start seeing problems. However, I see service times far in excess of that, sometimes over 30,000 ms! Below is 20 minutes of sar output from a drive I pulled a few days ago, during a time when downstream VMWare servers were complaining. (since the sar output is so verbose, I grepped out the info just for the suspect drive): # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd91,a)' 14:50:00 device %busy avque r+w/s blks/s avwait avserv sd91,a 99 5.3 1 42 0.0 7811.7 sd91,a 100 11.3 1 53 0.0 11016.0 sd91,a 100 3.8 1 75 0.0 3615.8 sd91,a 100 4.9 1 25 0.0 8633.5 sd91,a 93 3.9 1 55 0.0 4385.3 sd91,a 86 3.5 2 75 0.0 2060.5 sd91,a 91 3.1 4 80 0.0 823.8 sd91,a 97 3.5 1 50 0.0 3984.5 sd91,a 100 4.4 1 56 0.0 6068.6 sd91,a 100 5.0 1 55 0.0 8836.0 sd91,a 100 5.7 1 51 0.0 7939.6 sd91,a 98 9.9 1 42 0.0 12526.8 sd91,a 100 7.4 0 10 0.0 36813.6 sd91,a 51 3.8 8 90 0.0 500.2 sd91,a 88 3.4 1 60 0.0 2338.8 sd91,a 100 4.5 1 28 0.0 6969.2 sd91,a 93 3.8 1 59 0.0 5138.9 sd91,a 79 3.1 1 59 0.0 3143.9 sd91,a 99 4.7 1 52 0.0 5598.4 sd91,a 100 4.8 1 62 0.0 6638.4 sd91,a 94 5.0 1 54 0.0 3752.7 For comparison, here's the sar output from another drive in the same pool for the same period of time: # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd82,a)' 14:50:00 device %busy avque r+w/s blks/s avwait avserv sd82,a 0 0.0 2 28 0.0 5.6 sd82,a 1 0.0 3 51 0.0 5.4 sd82,a 1 0.0 4 66 0.0 6.3 sd82,a 1 0.0 3 48 0.0 4.3 sd82,a 1 0.0 3 45 0.0 6.1 sd82,a 1 0.0 6 82 0.0 2.7 sd82,a 1 0.0 8 112 0.0 2.8 sd82,a 0 0.0 3 27 0.0 1.8 sd82,a 1 0.0 5 80 0.0 3.1 sd82,a 0 0.0 3 35 0.0 3.1 sd82,a 1 0.0 3 35 0.0 3.8 sd82,a 1 0.0 4 49 0.0 3.2 sd82,a 0 0.0 0 0 0.0 4.1 sd82,a 3 0.0 9 84 0.0 4.1 sd82,a 1 0.0 6 55 0.0 3.7 sd82,a 0 0.0 1 23 0.0 7.0 sd82,a 0 0.0 6 57 0.0 1.8 sd82,a 1 0.0 5 70 0.0 2.3 sd82,a 1 0.0 4 55 0.0 3.7 sd82,a 1 0.0 5 72 0.0 4.1 sd82,a 1 0.0 4 54 0.0 3.6 The other drives in this pool all show data similar to that of sd82. Your point about tuning blindly is well taken, and I'm certainly no expert on the IO stack. What's a humble sysadmin to do? For further reference, this system is running r151010. The drive in question is a Toshiba MG03SCA300 (7200rpm SAS), and the pool the drive was in is using lz4 compression and looks like this: # zpool status data1 pool: data1 state: ONLINE scan: resilvered 1.67T in 70h56m with 0 errors on Wed Dec 31 14:40:20 2014 config: NAME STATE READ WRITE CKSUM data1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 c6t5000039468CB54F0d0 ONLINE 0 0 0 c6t5000039478CB5138d0 ONLINE 0 0 0 c6t5000039468D000DCd0 ONLINE 0 0 0 c6t5000039468D000E8d0 ONLINE 0 0 0 c6t5000039468D00F5Cd0 ONLINE 0 0 0 c6t5000039478C816CCd0 ONLINE 0 0 0 c6t5000039478C8546Cd0 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 c6t5000039478C855F0d0 ONLINE 0 0 0 c6t5000039478C856E8d0 ONLINE 0 0 0 c6t5000039478C856ECd0 ONLINE 0 0 0 c6t5000039478C856F4d0 ONLINE 0 0 0 c6t5000039478C86374d0 ONLINE 0 0 0 c6t5000039478C8C2A8d0 ONLINE 0 0 0 c6t5000039478C8C364d0 ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 c6t5000039478C9958Cd0 ONLINE 0 0 0 c6t5000039478C995C4d0 ONLINE 0 0 0 c6t5000039478C9DACCd0 ONLINE 0 0 0 c6t5000039478C9DB30d0 ONLINE 0 0 0 c6t5000039478C9DB6Cd0 ONLINE 0 0 0 c6t5000039478CA73B4d0 ONLINE 0 0 0 c6t5000039478CB3A20d0 ONLINE 0 0 0 raidz2-3 ONLINE 0 0 0 c6t5000039478CB3A64d0 ONLINE 0 0 0 c6t5000039478CB3A70d0 ONLINE 0 0 0 c6t5000039478CB3E7Cd0 ONLINE 0 0 0 c6t5000039478CB3EB0d0 ONLINE 0 0 0 c6t5000039478CB3FBCd0 ONLINE 0 0 0 c6t5000039478CB4048d0 ONLINE 0 0 0 c6t5000039478CB4054d0 ONLINE 0 0 0 raidz2-4 ONLINE 0 0 0 c6t5000039478CB424Cd0 ONLINE 0 0 0 c6t5000039478CB4250d0 ONLINE 0 0 0 c6t5000039478CB470Cd0 ONLINE 0 0 0 c6t5000039478CB471Cd0 ONLINE 0 0 0 c6t5000039478CB4E50d0 ONLINE 0 0 0 c6t5000039478CB50A8d0 ONLINE 0 0 0 c6t5000039478CB50BCd0 ONLINE 0 0 0 spares c6t50000394A8CBC93Cd0 AVAIL errors: No known data errors Thanks for your help, Kevin On 12/31/2014 3:22 PM, Richard Elling wrote: > >> On Dec 31, 2014, at 11:25 AM, Kevin Swab wrote: >> >> Hello Everyone, >> >> We've been running OmniOS on a number of SuperMicro 36bay chassis, with >> Supermicro motherboards, LSI SAS controllers (9211-8i & 9207-8i) and >> various SAS HDD's. These systems are serving block storage via Comstar >> and Qlogic FC HBA's, and have been running well for several years. >> >> The problem we've got is that as the drives age, some of them start to >> perform slowly (intermittently) without failing - no zpool or iostat >> errors, and nothing logged in /var/adm/messages. The slow performance >> can be seen as high average service times in iostat or sar. > > Look at the drive's error logs using sg_logs (-a for all) > >> >> When these service times get above 500ms, they start to cause IO >> timeouts on the downstream storage consumers, which is bad... > > 500 milliseconds is not unusual for a busy HDD with SCSI TCQ or SATA NCQ > >> >> I'm wondering - is there a way to tune OmniOS' behavior so that it >> doesn't try so hard to complete IOs to these slow disks, and instead >> just gives up and fails them? > > Yes, the tuning in Alasdair's blog should work as he describes. More below... > >> >> I found an old post from 2011 which states that some tunables exist, >> but are ignored by the mpt_sas driver: >> >> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ >> >> Does anyone know the current status of these tunables, or have any other >> suggestions that might help? > > These tunables are on the order of seconds. The default, 60, is obviously too big > unless you have old, slow, SCSI CD-ROMs. But setting it below the manufacturer's > internal limit (default or tuned) can lead to an unstable system. Some vendors are > better than others at documenting these, but in any case you'll need to see their spec. > Expect values on the order of 6 to 15 seconds for modern HDDs and SSDs. > > There are a lot of tunables in this area at all levels of the architecture. OOB, the OmniOS > settings ensure stable behaviour. Tuning any layer without understanding the others can > lead to unstable systems, as demonstrated by your current downstream consumers. > -- richard > > >> >> Thanks, >> Kevin >> >> >> -- >> ------------------------------------------------------------------- >> Kevin Swab UNIX Systems Administrator >> ACNS Colorado State University >> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU >> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss From richard.elling at richardelling.com Thu Jan 1 01:13:04 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Wed, 31 Dec 2014 17:13:04 -0800 Subject: [OmniOS-discuss] slow drive response times In-Reply-To: <54A49517.6070205@ColoState.EDU> References: <54A44D8C.5090302@ColoState.EDU> <54A49517.6070205@ColoState.EDU> Message-ID: <055A9B13-DC08-4DA3-9827-BD417545BC98@richardelling.com> > On Dec 31, 2014, at 4:30 PM, Kevin Swab wrote: > > Hello Richard and group, thanks for your reply! > > I'll look into sg_logs for one of these devices once I have a chance to > track that progam down... > > Thanks for the tip on the 500 ms latency, I wasn't aware that could > happen in normal cases. However, I don't believe what I'm seeing > constitutes normal behavior. > > First, some anecdotal evidence: If I pull and replace the suspect > drive, my downstream systems stop complaining, and the high service time > numbers go away. We call these "wounded soldiers" -- it takes more resources to manage a wounded soldier than a dead soldier, so one strategy of war is to wound your enemy causing them to consume resources tending the wounded. The sg_logs should be enlightening. NB, consider a 4TB disk with 5 platters: if a head or surface starts to go, then you have a 1/10 chance that the data you request is under the damaged head and will need to be recovered by the drive. So it is not uncommon to see 90+% of the I/Os to the drive completing quickly. It is also not unusual to see only a small number of sectors or tracks affected. Detecting these becomes tricky, especially as you reduce the timeout/retry interval, since the problem is rarely seen in the average latency -- that which iostat and sar record. This is an area where we can and are improving. -- richard > > I threw out 500 ms as a guess to the point at which I start seeing > problems. However, I see service times far in excess of that, sometimes > over 30,000 ms! Below is 20 minutes of sar output from a drive I pulled > a few days ago, during a time when downstream VMWare servers were > complaining. (since the sar output is so verbose, I grepped out the > info just for the suspect drive): > > # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd91,a)' > 14:50:00 device %busy avque r+w/s blks/s avwait avserv > sd91,a 99 5.3 1 42 0.0 7811.7 > sd91,a 100 11.3 1 53 0.0 11016.0 > sd91,a 100 3.8 1 75 0.0 3615.8 > sd91,a 100 4.9 1 25 0.0 8633.5 > sd91,a 93 3.9 1 55 0.0 4385.3 > sd91,a 86 3.5 2 75 0.0 2060.5 > sd91,a 91 3.1 4 80 0.0 823.8 > sd91,a 97 3.5 1 50 0.0 3984.5 > sd91,a 100 4.4 1 56 0.0 6068.6 > sd91,a 100 5.0 1 55 0.0 8836.0 > sd91,a 100 5.7 1 51 0.0 7939.6 > sd91,a 98 9.9 1 42 0.0 12526.8 > sd91,a 100 7.4 0 10 0.0 36813.6 > sd91,a 51 3.8 8 90 0.0 500.2 > sd91,a 88 3.4 1 60 0.0 2338.8 > sd91,a 100 4.5 1 28 0.0 6969.2 > sd91,a 93 3.8 1 59 0.0 5138.9 > sd91,a 79 3.1 1 59 0.0 3143.9 > sd91,a 99 4.7 1 52 0.0 5598.4 > sd91,a 100 4.8 1 62 0.0 6638.4 > sd91,a 94 5.0 1 54 0.0 3752.7 > > For comparison, here's the sar output from another drive in the same > pool for the same period of time: > > # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd82,a)' > 14:50:00 device %busy avque r+w/s blks/s avwait avserv > sd82,a 0 0.0 2 28 0.0 5.6 > sd82,a 1 0.0 3 51 0.0 5.4 > sd82,a 1 0.0 4 66 0.0 6.3 > sd82,a 1 0.0 3 48 0.0 4.3 > sd82,a 1 0.0 3 45 0.0 6.1 > sd82,a 1 0.0 6 82 0.0 2.7 > sd82,a 1 0.0 8 112 0.0 2.8 > sd82,a 0 0.0 3 27 0.0 1.8 > sd82,a 1 0.0 5 80 0.0 3.1 > sd82,a 0 0.0 3 35 0.0 3.1 > sd82,a 1 0.0 3 35 0.0 3.8 > sd82,a 1 0.0 4 49 0.0 3.2 > sd82,a 0 0.0 0 0 0.0 4.1 > sd82,a 3 0.0 9 84 0.0 4.1 > sd82,a 1 0.0 6 55 0.0 3.7 > sd82,a 0 0.0 1 23 0.0 7.0 > sd82,a 0 0.0 6 57 0.0 1.8 > sd82,a 1 0.0 5 70 0.0 2.3 > sd82,a 1 0.0 4 55 0.0 3.7 > sd82,a 1 0.0 5 72 0.0 4.1 > sd82,a 1 0.0 4 54 0.0 3.6 > > The other drives in this pool all show data similar to that of sd82. > > Your point about tuning blindly is well taken, and I'm certainly no > expert on the IO stack. What's a humble sysadmin to do? > > For further reference, this system is running r151010. The drive in > question is a Toshiba MG03SCA300 (7200rpm SAS), and the pool the drive > was in is using lz4 compression and looks like this: > > # zpool status data1 > pool: data1 > state: ONLINE > scan: resilvered 1.67T in 70h56m with 0 errors on Wed Dec 31 14:40:20 2014 > config: > > NAME STATE READ WRITE CKSUM > data1 ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > c6t5000039468CB54F0d0 ONLINE 0 0 0 > c6t5000039478CB5138d0 ONLINE 0 0 0 > c6t5000039468D000DCd0 ONLINE 0 0 0 > c6t5000039468D000E8d0 ONLINE 0 0 0 > c6t5000039468D00F5Cd0 ONLINE 0 0 0 > c6t5000039478C816CCd0 ONLINE 0 0 0 > c6t5000039478C8546Cd0 ONLINE 0 0 0 > raidz2-1 ONLINE 0 0 0 > c6t5000039478C855F0d0 ONLINE 0 0 0 > c6t5000039478C856E8d0 ONLINE 0 0 0 > c6t5000039478C856ECd0 ONLINE 0 0 0 > c6t5000039478C856F4d0 ONLINE 0 0 0 > c6t5000039478C86374d0 ONLINE 0 0 0 > c6t5000039478C8C2A8d0 ONLINE 0 0 0 > c6t5000039478C8C364d0 ONLINE 0 0 0 > raidz2-2 ONLINE 0 0 0 > c6t5000039478C9958Cd0 ONLINE 0 0 0 > c6t5000039478C995C4d0 ONLINE 0 0 0 > c6t5000039478C9DACCd0 ONLINE 0 0 0 > c6t5000039478C9DB30d0 ONLINE 0 0 0 > c6t5000039478C9DB6Cd0 ONLINE 0 0 0 > c6t5000039478CA73B4d0 ONLINE 0 0 0 > c6t5000039478CB3A20d0 ONLINE 0 0 0 > raidz2-3 ONLINE 0 0 0 > c6t5000039478CB3A64d0 ONLINE 0 0 0 > c6t5000039478CB3A70d0 ONLINE 0 0 0 > c6t5000039478CB3E7Cd0 ONLINE 0 0 0 > c6t5000039478CB3EB0d0 ONLINE 0 0 0 > c6t5000039478CB3FBCd0 ONLINE 0 0 0 > c6t5000039478CB4048d0 ONLINE 0 0 0 > c6t5000039478CB4054d0 ONLINE 0 0 0 > raidz2-4 ONLINE 0 0 0 > c6t5000039478CB424Cd0 ONLINE 0 0 0 > c6t5000039478CB4250d0 ONLINE 0 0 0 > c6t5000039478CB470Cd0 ONLINE 0 0 0 > c6t5000039478CB471Cd0 ONLINE 0 0 0 > c6t5000039478CB4E50d0 ONLINE 0 0 0 > c6t5000039478CB50A8d0 ONLINE 0 0 0 > c6t5000039478CB50BCd0 ONLINE 0 0 0 > spares > c6t50000394A8CBC93Cd0 AVAIL > > errors: No known data errors > > > Thanks for your help, > Kevin > > On 12/31/2014 3:22 PM, Richard Elling wrote: >> >>> On Dec 31, 2014, at 11:25 AM, Kevin Swab wrote: >>> >>> Hello Everyone, >>> >>> We've been running OmniOS on a number of SuperMicro 36bay chassis, with >>> Supermicro motherboards, LSI SAS controllers (9211-8i & 9207-8i) and >>> various SAS HDD's. These systems are serving block storage via Comstar >>> and Qlogic FC HBA's, and have been running well for several years. >>> >>> The problem we've got is that as the drives age, some of them start to >>> perform slowly (intermittently) without failing - no zpool or iostat >>> errors, and nothing logged in /var/adm/messages. The slow performance >>> can be seen as high average service times in iostat or sar. >> >> Look at the drive's error logs using sg_logs (-a for all) >> >>> >>> When these service times get above 500ms, they start to cause IO >>> timeouts on the downstream storage consumers, which is bad... >> >> 500 milliseconds is not unusual for a busy HDD with SCSI TCQ or SATA NCQ >> >>> >>> I'm wondering - is there a way to tune OmniOS' behavior so that it >>> doesn't try so hard to complete IOs to these slow disks, and instead >>> just gives up and fails them? >> >> Yes, the tuning in Alasdair's blog should work as he describes. More below... >> >>> >>> I found an old post from 2011 which states that some tunables exist, >>> but are ignored by the mpt_sas driver: >>> >>> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ >>> >>> Does anyone know the current status of these tunables, or have any other >>> suggestions that might help? >> >> These tunables are on the order of seconds. The default, 60, is obviously too big >> unless you have old, slow, SCSI CD-ROMs. But setting it below the manufacturer's >> internal limit (default or tuned) can lead to an unstable system. Some vendors are >> better than others at documenting these, but in any case you'll need to see their spec. >> Expect values on the order of 6 to 15 seconds for modern HDDs and SSDs. >> >> There are a lot of tunables in this area at all levels of the architecture. OOB, the OmniOS >> settings ensure stable behaviour. Tuning any layer without understanding the others can >> lead to unstable systems, as demonstrated by your current downstream consumers. >> -- richard >> >> >>> >>> Thanks, >>> Kevin >>> >>> >>> -- >>> ------------------------------------------------------------------- >>> Kevin Swab UNIX Systems Administrator >>> ACNS Colorado State University >>> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU >>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C >>> _______________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti.com >>> http://lists.omniti.com/mailman/listinfo/omnios-discuss > From gate03 at landcroft.co.uk Thu Jan 1 09:09:06 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Thu, 1 Jan 2015 19:09:06 +1000 Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: References: <20141223120402.6714c763@emeritus> Message-ID: <20150101190907.2a17aa48@emeritus> Just for the record, I now believe that the cause of the problem is excessive heat. The server is located in a domestic garage in Brisbane, Queensland, and the addition of a larger UPS to the rig has increased the amount of heat in there. The server began to suffer from loss of disk drives (i.e., any activity involving the disk would hang) but hasn't faulted since I took measures to keep the temperature down. Thanks to Dan for his suggestions. Michael. From richard.elling at richardelling.com Thu Jan 1 19:17:04 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Thu, 1 Jan 2015 11:17:04 -0800 Subject: [OmniOS-discuss] Sneak modified scsi_vhci.conf in under installer ? In-Reply-To: References: <2C40A4F8-BCBB-4B4C-BAC6-390595CA89BD@richardelling.com> Message-ID: > On Dec 27, 2014, at 11:26 AM, sergei wrote: > > Richard > > The thing is - installer does not show those two disks in the list. This is a different problem, not solvable by scsi_vhci.conf. I?ve not seen this. What usually happens if scsi_vhci.conf doesn?t declare the drive as multipathed and the multipath detection doesn?t work is you will see the drive twice in the device tree: once per path. When multipathing is configured, you?ll see the drive once in the device tree. > So I can't install into the disks that are supposed to be boot disks. There only larger Seagate drives offered as install target. My guess was that installed filters disks by their device path, not letting /pci ones through ? That is why I wanted to have proper scsi_vhci before the installer. Was hoping this can be done with mdb. > > Re: scsi_vhci.conf - systems with 3+ years uptime now are very picky about disk replacements (model/vendor). Almost have to feed them Seagate exclusively. If I told anyone that this server won't take TOSHIBA SAS drive but will be more happy with Seagate - I doubt many people would take it seriously. And these days I see lots of TOSHIBA disks that come as replacements. Almost makes one wish for a *some* tool to add entries to vhci and make them active at runtime. This is not my experience. scsi_vhci.conf is a nicety, not a requirement. ? richard > > > On Sat, Dec 27, 2014 at 10:54 AM, Richard Elling > wrote: > > > On Dec 26, 2014, at 2:36 PM, sergei > wrote: > > > > Hi > > > > The disks I want to install OmniOS to are TOSHIBA AL13SEB300 model which scsi_vhci won't take over without proper conf file listing this model under "scsi-vhci-failover-override" line. Right now those disk device path starts with /pci instead of /scsi_vhci. Yet they are showing up in format output ok. What is the trick to fix this without rebuilding install ISO image ? > > easy -- don't rebuild the install ISO image :-) > > > > > I could install into one of (larger) Seagates and then mirror/remove mirror to move boot OS to the proper disks. Is there any easier way ? > > ugh, too much work. > Try this (I'm sure I blogged this a few times, or maybe in the Nexenta knowledge base?) > I'm sure the procedure is in the email archives... > > 1. go ahead and do the installation. > 2. boot into newly installed OS > 3. edit scsi_vhci.conf > 4. shutdown > 5. boot from install media, go to shell > 6. import rpool > 7. export rpool > 8. reboot into newly installed OS > > ZFS is tolerant of path changes, but you have to trick the boot process. > > > > > I think it would benefit Omni if you could keep scsi_vhci with at least some updates. I see Nexenta does include bunch of models into it's default scsi_vhci.conf. > > The root cause is a deficiency in detecting multiple ports. The workaround is to override > in scsi_vhci.conf. The fix is known, just need to find the time... > -- richard > > -- Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gate03 at landcroft.co.uk Thu Jan 1 22:49:37 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Fri, 2 Jan 2015 08:49:37 +1000 Subject: [OmniOS-discuss] adding cua/a as a second login In-Reply-To: <20141231225028.GH29549@bender.unx.csupomona.edu> References: <20141204153051.3e17ac8f@punda-mlia> <20141205073111.0762e2b1@punda-mlia> <4cff01d01010$dc508510$94f18f30$@acm.org> <20141205085057.5e000d9e@punda-mlia> <20141231225028.GH29549@bender.unx.csupomona.edu> Message-ID: <20150102084937.36a704c8@emeritus> On Wed, 31 Dec 2014 14:50:28 -0800 "Paul B. Henson" wrote: > Let me know if you have any questions or problems. Thanks Paul; it works very well. There are two very minor niggles that might impede the acceptance of the change: 1. Generally in *nix, items in a list are separated by a colon or a space; rarely a comma. 2. If login is attempted on a device not in the CONSOLE list, the error message is "not on system console" which is slightly misleading; I think the message should be "login not allowed on this device" or similar. Each of those is very minor and the facility is working well here. Thank you ! Michael. From gate03 at landcroft.co.uk Thu Jan 1 23:19:24 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Fri, 2 Jan 2015 09:19:24 +1000 Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: <20150101190907.2a17aa48@emeritus> References: <20141223120402.6714c763@emeritus> <20150101190907.2a17aa48@emeritus> Message-ID: <20150102091924.6210d489@emeritus> Straight after the thermal diagnosis, a more definite culprit arose. Since April 2013 when the server was commissioned, it's had a three-way rpool mirror: two built-in SATA drives and one external 'cigarette packet' USB-2 drive. I know, I know, not blistering data-centre performance but the idea is that if we go on holiday, or the house is burning down, I can grab the external drive and not lose all our data. This is a Supermicro SYS 5017C-LF so the drives are not removable or hot-swappable. Anyway, I'd been using the USB-2 drive elsewhere but this morning, when it was still quite cool, I plugged it in and re-attached it to the rpool mirror. Within 1/2 hour the machine had locked-up again. It would have performed only a few % of the resilvering. This comes into the category of 'wounded soldier' mentioned elsewhere in this list but I wonder if anyone can recommend what to do next. The USB-2 drive had been used on a 'live' booted from a Bloody installation USB stick for about 36 hours without problems, so I don't understand why it should suddenly be causing the server to lock up, when it seems to behave properly elsewhere. The goal is to have an easily-removed mirror of the server, so maybe someone can suggest a better way. USB-2 is obviously crippling the machines performance even when it's running properly. Thanks in expectation of any assistance and suggestions. Michael. From mir at miras.org Fri Jan 2 00:03:11 2015 From: mir at miras.org (Michael Rasmussen) Date: Fri, 2 Jan 2015 01:03:11 +0100 Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: <20150102091924.6210d489@emeritus> References: <20141223120402.6714c763@emeritus> <20150101190907.2a17aa48@emeritus> <20150102091924.6210d489@emeritus> Message-ID: <20150102010311.6a7b482f@sleipner.datanom.net> On Fri, 2 Jan 2015 09:19:24 +1000 Michael Mounteney wrote: > > The goal is to have an easily-removed mirror of the server, so maybe > someone can suggest a better way. USB-2 is obviously crippling the > machines performance even when it's running properly. > > Thanks in expectation of any assistance and suggestions. > If you have a spare PCIe port in your server you could consider something like this: http://www.newegg.com/Product/Product.aspx?Item=N82E16816132020 The SATA chip is confirmed to work with OpenSolaris so this must mean it will work with Omnios as well. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: A princess should not be afraid -- not with a brave knight to protect her. -- McCoy, "Shore Leave", stardate 3025.3 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From mir at miras.org Fri Jan 2 00:56:20 2015 From: mir at miras.org (Michael Rasmussen) Date: Fri, 2 Jan 2015 01:56:20 +0100 Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: <20150102010311.6a7b482f@sleipner.datanom.net> References: <20141223120402.6714c763@emeritus> <20150101190907.2a17aa48@emeritus> <20150102091924.6210d489@emeritus> <20150102010311.6a7b482f@sleipner.datanom.net> Message-ID: <20150102015620.6b879ce3@sleipner.datanom.net> On Fri, 2 Jan 2015 01:03:11 +0100 Michael Rasmussen wrote: > > The SATA chip is confirmed to work with OpenSolaris so this must mean > it will work with Omnios as well. > Forgot the link: http://osdir.com/ml/os.solaris.opensolaris.storage.general/2007-10/msg00060.html -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: To get back on your feet, miss two car payments. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From gate03 at landcroft.co.uk Fri Jan 2 01:53:57 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Fri, 2 Jan 2015 11:53:57 +1000 Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: <20150102010311.6a7b482f@sleipner.datanom.net> References: <20141223120402.6714c763@emeritus> <20150101190907.2a17aa48@emeritus> <20150102091924.6210d489@emeritus> <20150102010311.6a7b482f@sleipner.datanom.net> Message-ID: <20150102115357.082d59ff@emeritus> On Fri, 2 Jan 2015 01:03:11 +0100 Michael Rasmussen wrote: > If you have a spare PCIe port in your server you could consider > something like this: > http://www.newegg.com/Product/Product.aspx?Item=N82E16816132020 Yes, thanks, nice idea but the machine has only one such slot which I'd planned to fill with extra ethernet sockets at some time. I think I should have bought a slightly less minimal machine, but the idea was to minimise power usage in a domestic installation. The director of finance will not authorise a further hardware acquisition. I'll probably try a new USB-2 cigarette-case drive, hopefully of higher quality. Or if I get more ethernet sockets, maybe an iSCSI device, assuming that's allowed as a mirror component. Or maybe quick-release rackmount hardware, if such exists. http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm That page says 6x SATA 2.0 but the box only accommodates 2 HDDs. Michael. From henson at acm.org Fri Jan 2 02:37:17 2015 From: henson at acm.org (Paul B. Henson) Date: Thu, 1 Jan 2015 18:37:17 -0800 Subject: [OmniOS-discuss] adding cua/a as a second login In-Reply-To: <20150102084937.36a704c8@emeritus> References: <20141204153051.3e17ac8f@punda-mlia> <20141205073111.0762e2b1@punda-mlia> <4cff01d01010$dc508510$94f18f30$@acm.org> <20141205085057.5e000d9e@punda-mlia> <20141231225028.GH29549@bender.unx.csupomona.edu> <20150102084937.36a704c8@emeritus> Message-ID: <20150102023717.GJ29549@bender.unx.csupomona.edu> On Fri, Jan 02, 2015 at 08:49:37AM +1000, Michael Mounteney wrote: > 1. Generally in *nix, items in a list are separated by a colon or a > space; rarely a comma. It's always hard to pick a delimiter for a list containing paths, as virtually every convenient character is also a valid part of a path :). As this is a list of devices, I thought commas would be less prevailent than colons. Could have gone the other way I suppose. I don't really care myself, if the review concensus is to change it before integration I'll change it... > 2. If login is attempted on a device not in the CONSOLE list, the > error message is "not on system console" which is slightly misleading; > I think the message should be "login not allowed on this device" or > similar. That message was already misleading, the "system console" could be the framebuffer, but CONSOLE set to /dev/ttya, so when logging in to the actual system console you'd be told you weren't on the system console ;). I generally try to go with the least invasive changes needed to implement the new functionality, so tweaking that message wasn't really on my radar. From johan.kragsterman at capvert.se Fri Jan 2 08:18:02 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Fri, 2 Jan 2015 09:18:02 +0100 Subject: [OmniOS-discuss] LU read only and r/w for different hosts? Message-ID: Hi! I've been thinking about this for a while, and haven't figured out a solution. I'd like to have a possibility to set LU read only for some hosts, but r/w to others, for the same LU. There are possibilities to set read only, or r/w, on a LU, but that property is valid for all hosts, it is not(afaik) possible to choose which hosts are going to get read only, and which are going to get r/w. This is an access controll operation, and as such, imho, should be controlled by comstar. It is the responsability of the view to handle this, but I haven't seen this anywhere in the comstar/stmf configuration posibilities. Are there someone on this list that can shed some light on this? Best regards from/Med v?nliga h?lsningar fr?n Johan Kragsterman Capvert From johan.kragsterman at capvert.se Fri Jan 2 08:33:13 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Fri, 2 Jan 2015 09:33:13 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Re: Ang: Re: CoreOS In-Reply-To: References: , <509071d00c17bdf9229b0be3eb8bfe8b@blackdot.be>, , Message-ID: Hi Jorge and list! Haven't been active during this time of christmas and new year, but I'm back now... Thanks, Jorge, for digging into this! I will do some more investigations.... About SmartOS and LX branded zones: Well, if I could use them on OmniOS I would be interested, because I'd like a fully working server OS in the bottom, not a crippled OS just developed for running zones on. Do you know if there are possibilities to run these LX zones on OmniOS as well? But generally, I'd prefer to have CoreOS as a KVM guest, since the CoreOS model is very interesting, imo. I guess this discussion will continue in one way or another, now when it turns out that interesting solutions like CoreOS can't be run because of lack of features/old implementation in our KVM... Rgrds Johan -----Jorge Schrauwen skrev: ----- Till: Johan Kragsterman Fr?n: Jorge Schrauwen Datum: 2014-12-20 14:37 Kopia: omnios-discuss at lists.omniti.com ?rende: Re: Ang: Re: Ang: Re: [OmniOS-discuss] CoreOS Hey Johan, I just poked at the qemu image... it seems it wants some stuff not in our old qemu-kvm fork. e.g. fsdev (mouting a filesystem from host to guest). But let's try anyway! # convert qcow2 to raw qemu-img convert coreos_production_qemu_image.img coreos_production_qemu_image.dd # dump this on our zvol dd if=coreos_production_qemu_image.dd of=/dev/zvol/rdsk/core/vms/hosts/coreos/disk0 We now have the correctly formatted data on our zvol... On the plus side it does output nicely to ttya if added to a vm :) So... here is where the kernel dies: (oh it does some kexec bits which are a PITA) --- [ ? ?0.001000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.2 #2 [ ? ?0.001000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ ? ?0.001000] ?0000000000000008 ffff88007a3e7db8 ffffffff814e8915 000000000000e [ ? ?0.001000] ?ffffffff81798190 ffff88007a3e7e38 ffffffff814e4c97 0000000000006 [ ? ?0.001000] ?0000000000000008 ffff88007a3e7e48 ffff88007a3e7de8 00000000fffb0 [ ? ?0.001000] Call Trace: [ ? ?0.001000] ?[] dump_stack+0x46/0x58 [ ? ?0.001000] ?[] panic+0xc1/0x1f5 [ ? ?0.001000] ?[] setup_IO_APIC+0x7d6/0x83d [ ? ?0.001000] ?[] native_smp_prepare_cpus+0x2bc/0x337 [ ? ?0.001000] ?[] kernel_init_freeable+0xcd/0x212 [ ? ?0.001000] ?[] ? rest_init+0x80/0x80 [ ? ?0.001000] ?[] kernel_init+0xe/0xf0 [ ? ?0.001000] ?[] ret_from_fork+0x7c/0xb0 [ ? ?0.001000] ?[] ? rest_init+0x80/0x80 [ ? ?0.001000] Rebooting in 60 seconds.. --- I actually also have this on a ubuntu vm I am using, it needs noapic kernel option... on the grub prompt (really nice is coreos seems to have grub + console on both tty0 (vga) and ttyS0 (serial ttya). Woo hoo we got past that bit where it fails on IO-APIC, now we just hang on smpboot :( --- [ ? ?0.001000] CPU: Physical Processor ID: 0 [ ? ?0.001000] CPU: Processor Core ID: 0 [ ? ?0.001000] mce: CPU supports 10 MCE banks [ ? ?0.001000] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [ ? ?0.001000] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 [ ? ?0.001000] Freeing SMP alternatives memory: 20K (ffffffff82fa1000 - fffffff) [ ? ?0.001000] ftrace: allocating 19518 entries in 77 pages [ ? ?0.001000] smpboot: CPU0: Intel QEMU Virtual CPU version 0.14.1 (fam: 06, m --- Pretty much stuck here... I tried some variations of cpu type (qemu64, Nehalem and host) I also tried using one vcpu but still stuck. Let's just cripple the entire thing and plow are way through: adding 'nosmp noapic noacpi' So yeah at this point coreos is pretty useless... but we fly past smpboot! And... land here: --- [ ? ?0.239823] scsi host0: ata_piix [ ? ?0.239823] scsi host1: ata_piix [ ? ?0.239823] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14 [ ? ?0.239823] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15 --- If it is docker you want, you may as well look at SmartOS's LX Brand stuff, they are racing towards workable docker zones. But since I came this far, lets see if I can make it to the finish... I am using virtio... lets try scsi... nothing... ide... nothing... So this is were it ends. Our qemu-kvm fork is probably just too old. Regards Jorge On 2014-12-20 12:47, Johan Kragsterman wrote: > Hi, Jorge and all! > > > I would be interested in discussing this further, but perhaps > omnios-discuss isn't the right place? Since I don't know if this is > omnios/illumos/coreos specific... > > I did some experimenting: > > I only used CoreOS stable in my tests. > > I tried the iso, but the iso isn't full featured, and doesn't run > docker out of the box. And the docker implementation is of coarse what > everybody is interested in. I got it to boot without problems, but I > had big problems with VNC keymapping due to my Swedish keyboard and > perhaps my Swedish client computer. So I could actually never do > something with it, and since it is not full featured, it is not what I > want to use. > > So instead, I downloaded the img file for qemu, created a volume, and > dd'ed the image to the volume, and then set this volume as boot. That > went fine, to get it to boot. But then, with the default boot option > in grub, it panicked, and restarted every 60 seconds. > > I stopped the grub booting, and chosed the B option. That didn't > panic, but it didn't work either, it was too much that didn't work. > But option A went fine, no panic, and everything seem to work more or > less without problems. The only problem here seem to be that I can't > log in, due to the "first log in"-principles they seem to have: It is > only possible to log in via ssh, which means the network have to be > up, and I couldn't get the network to come up....so there I am right > now... > > Regards Johan > > > -----Jorge Schrauwen skrev: ----- > Till: Johan Kragsterman > Fr?n: Jorge Schrauwen > Datum: 2014-12-18 18:07 > Kopia: omnios-discuss at lists.omniti.com > ?rende: Re: Ang: Re: [OmniOS-discuss] CoreOS > > > On 2014-12-18 17:57, Johan Kragsterman wrote: >> Jorge, I was thinking about you when I posted this! I thought you >> would be a possible contributor to this thread... ?More furhter >> down... >> >> >> -----Jorge Schrauwen skrev: ----- >> Till: Johan Kragsterman >> Fr?n: Jorge Schrauwen >> Datum: 2014-12-18 17:38 >> Kopia: omnios-discuss at lists.omniti.com >> ?rende: Re: [OmniOS-discuss] CoreOS >> >> Something like this will probably work: >> >> >> ??/usr/bin/qemu-system-x86_64 >> ?? -name coreos \ >> ?? -enable-kvm \ >> ?? -no-hpet \ >> ?? -m 4096 >> ?? -cpu Nehalem \ >> ?? -smp sockets=1,cores=4,threads=2 \ >> ?? -rtc base=utc,driftfix=slew \ >> ?? -pidfile /tank/coreo/coreos.pid ?\ >> ?? -monitor unix:/tank/coreo/coreos.monitor,server,nowait,nodelay ?\ >> ?? -vga std ?\ >> ?? -vnc :1 ?\ >> ?? -nographic \ >> ?? -drive >> file=/tank/coreos/coreos.iso,if=ide,media=cdrom,index=0,cache=none \ >> ?? -drive >> file=/dev/zvol/rdsk/tank/coreos/disk0,if=virtio,media=disk,index=0,cache=none,boot=on >> \ >> ?? -boot order=cd,once=d \ >> ?? -device >> virtio-net-pci,mac=02:08:20:0c:04:d2,tx=timer,x-txtimer=200000,x-txburst=128,vlan=0 >> \ >> ?? -net vnic,vlan=0,name=net1,ifname=vcoreos0 \ >> ?? -chardev >> socket,id=serial0,path=/tank/coreos/coreos.console,server,nowait \ >> ?? -serial chardev:serial0 \ >> ?? -usb \ >> ?? -usbdevice tablet \ >> ?? -daemonize >> >> You should get vnc at port 5901, seemed to boot for me but I did not >> complete the install. >> >> >> >> At the CoreOS site they say: Start like this: >> >> ./coreos_production_qemu.sh -nographic >> >> and they pass on that string -nographic ?...? >> >> It makes me wonder, because they tell you to connect with the instans >> only over ssh with: ssh -l core -p 2222 localhost ? ... >> >> So I'm not sure if it is possible to connect via VNC...did you >> actually check VNC, to confirm you had a VNC connection? >> >> It should boot and run from the image r/o, so perhaps you just need >> one "disk"? I can see you got two configured, or at least the iso >> file, and then a disk. Don't you think it would be enough with just >> the image file? >> >> Perhaps I just try... > I used the install iso to see if it booted. > -nographic just mean don't spawn a graphical console AKA SDL or simular > window. It does not prevent '-vnc :1' from working. > >> >> >> >> Regards >> >> Jorge >> >> >> >> >> >> >> >> >> >> >> >> >> On 2014-12-18 16:57, Johan Kragsterman wrote: >>> Hi! >>> >>> >>> ?I've been looking at CoreOS and finds it interesting! Since I'd like >>> to have OmniOS as the platform, I need to run CoreOS as a KVM guest. >>> Haven't tested yet, but I downloaded the startscript for qemu, and it >>> looks a little bit "too much" for Illumos KVM... >>> >>> ?It would be nice to get some views on people that have been >>> considering this as well, perhaps some already tested or already >>> running...? >>> >>> ?I've seen that Frederic Alix on this list been blogging about it, >>> but >>> haven't seen if he managed to run it as a KVM guest on OmniOS. >>> >>> ?For me it seems to be some complications at first startup, mainly. >>> It >>> doesn't seem to be reachable by VNC... >>> >>> ?Hope to get some input from you guys... >>> >>> >>> Best regards from/Med v?nliga h?lsningar fr?n >>> >>> Johan Kragsterman >>> >>> Capvert >>> >>> _______________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti.com >>> http://lists.omniti.com/mailman/listinfo/omnios-discuss From sjorge+ml at blackdot.be Fri Jan 2 08:59:05 2015 From: sjorge+ml at blackdot.be (Jorge Schrauwen) Date: Fri, 02 Jan 2015 09:59:05 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Re: Ang: Re: CoreOS In-Reply-To: References: , <509071d00c17bdf9229b0be3eb8bfe8b@blackdot.be>, , Message-ID: <54A65DD9.9010401@blackdot.be> For now LX Branded zones are actively being developed on the smartos fork. I do not know of any plan to upstream those changes yet. Regards Jorge On 02/01/2015 09:33, Johan Kragsterman wrote: > Hi Jorge and list! > > Haven't been active during this time of christmas and new year, but I'm back now... > > Thanks, Jorge, for digging into this! > > I will do some more investigations.... > > About SmartOS and LX branded zones: Well, if I could use them on OmniOS I would be interested, because I'd like a fully working server OS in the bottom, not a crippled OS just developed for running zones on. > > Do you know if there are possibilities to run these LX zones on OmniOS as well? > > But generally, I'd prefer to have CoreOS as a KVM guest, since the CoreOS model is very interesting, imo. > > I guess this discussion will continue in one way or another, now when it turns out that interesting solutions like CoreOS can't be run because of lack of features/old implementation in our KVM... > > > Rgrds Johan > > > > > -----Jorge Schrauwen skrev: ----- > Till: Johan Kragsterman > Fr?n: Jorge Schrauwen > Datum: 2014-12-20 14:37 > Kopia: omnios-discuss at lists.omniti.com > ?rende: Re: Ang: Re: Ang: Re: [OmniOS-discuss] CoreOS > > Hey Johan, > > I just poked at the qemu image... it seems it wants some stuff not in > our old qemu-kvm fork. e.g. fsdev (mouting a filesystem from host to > guest). > > But let's try anyway! > > # convert qcow2 to raw > qemu-img convert coreos_production_qemu_image.img > coreos_production_qemu_image.dd > # dump this on our zvol > dd if=coreos_production_qemu_image.dd > of=/dev/zvol/rdsk/core/vms/hosts/coreos/disk0 > > We now have the correctly formatted data on our zvol... > > On the plus side it does output nicely to ttya if added to a vm :) > > So... here is where the kernel dies: (oh it does some kexec bits which > are a PITA) > --- > [ 0.001000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.2 #2 > [ 0.001000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 > [ 0.001000] 0000000000000008 ffff88007a3e7db8 ffffffff814e8915 > 000000000000e > [ 0.001000] ffffffff81798190 ffff88007a3e7e38 ffffffff814e4c97 > 0000000000006 > [ 0.001000] 0000000000000008 ffff88007a3e7e48 ffff88007a3e7de8 > 00000000fffb0 > [ 0.001000] Call Trace: > [ 0.001000] [] dump_stack+0x46/0x58 > [ 0.001000] [] panic+0xc1/0x1f5 > [ 0.001000] [] setup_IO_APIC+0x7d6/0x83d > [ 0.001000] [] native_smp_prepare_cpus+0x2bc/0x337 > [ 0.001000] [] kernel_init_freeable+0xcd/0x212 > [ 0.001000] [] ? rest_init+0x80/0x80 > [ 0.001000] [] kernel_init+0xe/0xf0 > [ 0.001000] [] ret_from_fork+0x7c/0xb0 > [ 0.001000] [] ? rest_init+0x80/0x80 > [ 0.001000] Rebooting in 60 seconds.. > --- > > I actually also have this on a ubuntu vm I am using, it needs noapic > kernel option... on the grub prompt (really nice is coreos seems to have > grub + console on both tty0 (vga) and ttyS0 (serial ttya). > > Woo hoo we got past that bit where it fails on IO-APIC, now we just hang > on smpboot :( > > --- > [ 0.001000] CPU: Physical Processor ID: 0 > [ 0.001000] CPU: Processor Core ID: 0 > [ 0.001000] mce: CPU supports 10 MCE banks > [ 0.001000] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 > [ 0.001000] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 > [ 0.001000] Freeing SMP alternatives memory: 20K (ffffffff82fa1000 - > fffffff) > [ 0.001000] ftrace: allocating 19518 entries in 77 pages > [ 0.001000] smpboot: CPU0: Intel QEMU Virtual CPU version 0.14.1 > (fam: 06, m > --- > > Pretty much stuck here... I tried some variations of cpu type (qemu64, > Nehalem and host) I also tried using one vcpu but still stuck. > > Let's just cripple the entire thing and plow are way through: adding > 'nosmp noapic noacpi' > > So yeah at this point coreos is pretty useless... but we fly past > smpboot! > And... land here: > > --- > [ 0.239823] scsi host0: ata_piix > [ 0.239823] scsi host1: ata_piix > [ 0.239823] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 > irq 14 > [ 0.239823] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 > irq 15 > --- > > If it is docker you want, you may as well look at SmartOS's LX Brand > stuff, they are racing towards workable docker zones. > > But since I came this far, lets see if I can make it to the finish... > I am using virtio... lets try scsi... nothing... ide... nothing... > > So this is were it ends. > > Our qemu-kvm fork is probably just too old. > > Regards > > Jorge > > > > On 2014-12-20 12:47, Johan Kragsterman wrote: >> Hi, Jorge and all! >> >> >> I would be interested in discussing this further, but perhaps >> omnios-discuss isn't the right place? Since I don't know if this is >> omnios/illumos/coreos specific... >> >> I did some experimenting: >> >> I only used CoreOS stable in my tests. >> >> I tried the iso, but the iso isn't full featured, and doesn't run >> docker out of the box. And the docker implementation is of coarse what >> everybody is interested in. I got it to boot without problems, but I >> had big problems with VNC keymapping due to my Swedish keyboard and >> perhaps my Swedish client computer. So I could actually never do >> something with it, and since it is not full featured, it is not what I >> want to use. >> >> So instead, I downloaded the img file for qemu, created a volume, and >> dd'ed the image to the volume, and then set this volume as boot. That >> went fine, to get it to boot. But then, with the default boot option >> in grub, it panicked, and restarted every 60 seconds. >> >> I stopped the grub booting, and chosed the B option. That didn't >> panic, but it didn't work either, it was too much that didn't work. >> But option A went fine, no panic, and everything seem to work more or >> less without problems. The only problem here seem to be that I can't >> log in, due to the "first log in"-principles they seem to have: It is >> only possible to log in via ssh, which means the network have to be >> up, and I couldn't get the network to come up....so there I am right >> now... >> >> Regards Johan >> >> >> -----Jorge Schrauwen skrev: ----- >> Till: Johan Kragsterman >> Fr?n: Jorge Schrauwen >> Datum: 2014-12-18 18:07 >> Kopia: omnios-discuss at lists.omniti.com >> ?rende: Re: Ang: Re: [OmniOS-discuss] CoreOS >> >> >> On 2014-12-18 17:57, Johan Kragsterman wrote: >>> Jorge, I was thinking about you when I posted this! I thought you >>> would be a possible contributor to this thread... More furhter >>> down... >>> >>> >>> -----Jorge Schrauwen skrev: ----- >>> Till: Johan Kragsterman >>> Fr?n: Jorge Schrauwen >>> Datum: 2014-12-18 17:38 >>> Kopia: omnios-discuss at lists.omniti.com >>> ?rende: Re: [OmniOS-discuss] CoreOS >>> >>> Something like this will probably work: >>> >>> >>> /usr/bin/qemu-system-x86_64 >>> -name coreos \ >>> -enable-kvm \ >>> -no-hpet \ >>> -m 4096 >>> -cpu Nehalem \ >>> -smp sockets=1,cores=4,threads=2 \ >>> -rtc base=utc,driftfix=slew \ >>> -pidfile /tank/coreo/coreos.pid \ >>> -monitor unix:/tank/coreo/coreos.monitor,server,nowait,nodelay \ >>> -vga std \ >>> -vnc :1 \ >>> -nographic \ >>> -drive >>> file=/tank/coreos/coreos.iso,if=ide,media=cdrom,index=0,cache=none \ >>> -drive >>> file=/dev/zvol/rdsk/tank/coreos/disk0,if=virtio,media=disk,index=0,cache=none,boot=on >>> \ >>> -boot order=cd,once=d \ >>> -device >>> virtio-net-pci,mac=02:08:20:0c:04:d2,tx=timer,x-txtimer=200000,x-txburst=128,vlan=0 >>> \ >>> -net vnic,vlan=0,name=net1,ifname=vcoreos0 \ >>> -chardev >>> socket,id=serial0,path=/tank/coreos/coreos.console,server,nowait \ >>> -serial chardev:serial0 \ >>> -usb \ >>> -usbdevice tablet \ >>> -daemonize >>> >>> You should get vnc at port 5901, seemed to boot for me but I did not >>> complete the install. >>> >>> >>> >>> At the CoreOS site they say: Start like this: >>> >>> ./coreos_production_qemu.sh -nographic >>> >>> and they pass on that string -nographic ...? >>> >>> It makes me wonder, because they tell you to connect with the instans >>> only over ssh with: ssh -l core -p 2222 localhost ... >>> >>> So I'm not sure if it is possible to connect via VNC...did you >>> actually check VNC, to confirm you had a VNC connection? >>> >>> It should boot and run from the image r/o, so perhaps you just need >>> one "disk"? I can see you got two configured, or at least the iso >>> file, and then a disk. Don't you think it would be enough with just >>> the image file? >>> >>> Perhaps I just try... >> I used the install iso to see if it booted. >> -nographic just mean don't spawn a graphical console AKA SDL or simular >> window. It does not prevent '-vnc :1' from working. >> >>> >>> >>> >>> Regards >>> >>> Jorge >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On 2014-12-18 16:57, Johan Kragsterman wrote: >>>> Hi! >>>> >>>> >>>> I've been looking at CoreOS and finds it interesting! Since I'd like >>>> to have OmniOS as the platform, I need to run CoreOS as a KVM guest. >>>> Haven't tested yet, but I downloaded the startscript for qemu, and it >>>> looks a little bit "too much" for Illumos KVM... >>>> >>>> It would be nice to get some views on people that have been >>>> considering this as well, perhaps some already tested or already >>>> running...? >>>> >>>> I've seen that Frederic Alix on this list been blogging about it, >>>> but >>>> haven't seen if he managed to run it as a KVM guest on OmniOS. >>>> >>>> For me it seems to be some complications at first startup, mainly. >>>> It >>>> doesn't seem to be reachable by VNC... >>>> >>>> Hope to get some input from you guys... >>>> >>>> >>>> Best regards from/Med v?nliga h?lsningar fr?n >>>> >>>> Johan Kragsterman >>>> >>>> Capvert >>>> >>>> _______________________________________________ >>>> OmniOS-discuss mailing list >>>> OmniOS-discuss at lists.omniti.com >>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > -- ~ sjorge From richard.elling at richardelling.com Fri Jan 2 15:02:49 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Fri, 2 Jan 2015 07:02:49 -0800 Subject: [OmniOS-discuss] LU read only and r/w for different hosts? In-Reply-To: References: Message-ID: <8464C917-2640-401F-9B44-0B76DF3ED442@RichardElling.com> It has been a while since using comstar, but the SCSI protocol has WERO group reservations. Would that suffice? -- richard > On Jan 2, 2015, at 12:18 AM, Johan Kragsterman wrote: > > Hi! > > > I've been thinking about this for a while, and haven't figured out a solution. > > I'd like to have a possibility to set LU read only for some hosts, but r/w to others, for the same LU. There are possibilities to set read only, or r/w, on a LU, but that property is valid for all hosts, it is not(afaik) possible to choose which hosts are going to get read only, and which are going to get r/w. > > This is an access controll operation, and as such, imho, should be controlled by comstar. It is the responsability of the view to handle this, but I haven't seen this anywhere in the comstar/stmf configuration posibilities. > > Are there someone on this list that can shed some light on this? > > > Best regards from/Med v?nliga h?lsningar fr?n > > Johan Kragsterman > > Capvert > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From johan.kragsterman at capvert.se Fri Jan 2 16:36:53 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Fri, 2 Jan 2015 17:36:53 +0100 Subject: [OmniOS-discuss] Ang: Re: LU read only and r/w for different hosts? In-Reply-To: <8464C917-2640-401F-9B44-0B76DF3ED442@RichardElling.com> References: <8464C917-2640-401F-9B44-0B76DF3ED442@RichardElling.com>, Message-ID: -----Richard Elling skrev: ----- Till: Johan Kragsterman Fr?n: Richard Elling Datum: 2015-01-02 16:03 Kopia: "omnios-discuss at lists.omniti.com" ?rende: Re: [OmniOS-discuss] LU read only and r/w for different hosts? It has been a while since using comstar, but the SCSI protocol has WERO group reservations. Would that suffice? ?-- richard Yeah, I believe so....question is how I can manage it? It doesn't seem to be a part of stmfadm, so I need to do it elsewhere, if possible...so you, or someone else, got some ideas...? Rgrds Johan > On Jan 2, 2015, at 12:18 AM, Johan Kragsterman wrote: > > Hi! > > > I've been thinking about this for a while, and haven't figured out a solution. > > I'd like to have a possibility to set LU read only for some hosts, but r/w to others, for the same LU. There are possibilities to set read only, or r/w, on a LU, but that property is valid for all hosts, it is not(afaik) possible to choose which hosts are going to get read only, and which are going to get r/w. > > This is an access controll operation, and as such, imho, should be controlled by comstar. It is the responsability of the view to handle this, but I haven't seen this anywhere in the comstar/stmf configuration posibilities. > > Are there someone on this list that can shed some light on this? > > > Best regards from/Med v?nliga h?lsningar fr?n > > Johan Kragsterman > > Capvert > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From johan.kragsterman at capvert.se Fri Jan 2 16:52:30 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Fri, 2 Jan 2015 17:52:30 +0100 Subject: [OmniOS-discuss] Ang: Ang: Re: LU read only and r/w for different hosts? In-Reply-To: References: , <8464C917-2640-401F-9B44-0B76DF3ED442@RichardElling.com>, Message-ID: -----"OmniOS-discuss" skrev: ----- Till: Richard Elling Fr?n: Johan Kragsterman S?nt av: "OmniOS-discuss" Datum: 2015-01-02 17:38 Kopia: "omnios-discuss at lists.omniti.com" ?rende: [OmniOS-discuss] Ang: Re: LU read only and r/w for different hosts? -----Richard Elling skrev: ----- Till: Johan Kragsterman Fr?n: Richard Elling Datum: 2015-01-02 16:03 Kopia: "omnios-discuss at lists.omniti.com" ?rende: Re: [OmniOS-discuss] LU read only and r/w for different hosts? It has been a while since using comstar, but the SCSI protocol has WERO group reservations. Would that suffice? ?-- richard Yeah, I believe so....question is how I can manage it? It doesn't seem to be a part of stmfadm, so I need to do it elsewhere, if possible...so you, or someone else, got some ideas...? Rgrds Johan Hmm, been reading up a little on the PR and WERO....it seems to be registering on DEVICE level, not on LU level, huh? Or do I misinterpret something here...? > On Jan 2, 2015, at 12:18 AM, Johan Kragsterman wrote: > > Hi! > > > I've been thinking about this for a while, and haven't figured out a solution. > > I'd like to have a possibility to set LU read only for some hosts, but r/w to others, for the same LU. There are possibilities to set read only, or r/w, on a LU, but that property is valid for all hosts, it is not(afaik) possible to choose which hosts are going to get read only, and which are going to get r/w. > > This is an access controll operation, and as such, imho, should be controlled by comstar. It is the responsability of the view to handle this, but I haven't seen this anywhere in the comstar/stmf configuration posibilities. > > Are there someone on this list that can shed some light on this? > > > Best regards from/Med v?nliga h?lsningar fr?n > > Johan Kragsterman > > Capvert > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From tim at multitalents.net Fri Jan 2 17:35:24 2015 From: tim at multitalents.net (Tim Rice) Date: Fri, 2 Jan 2015 09:35:24 -0800 (PST) Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: <20150102115357.082d59ff@emeritus> References: <20141223120402.6714c763@emeritus> <20150101190907.2a17aa48@emeritus> <20150102091924.6210d489@emeritus> <20150102010311.6a7b482f@sleipner.datanom.net> <20150102115357.082d59ff@emeritus> Message-ID: On Fri, 2 Jan 2015, Michael Mounteney wrote: > On Fri, 2 Jan 2015 01:03:11 +0100 > http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm > > That page says 6x SATA 2.0 but the box only accommodates 2 HDDs. Then just get a SATA to ESATA cable and find a way to get the ESATA end outside the case. -- Tim Rice Multitalents tim at multitalents.net From johan.kragsterman at capvert.se Fri Jan 2 17:52:46 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Fri, 2 Jan 2015 18:52:46 +0100 Subject: [OmniOS-discuss] Ang: Ang: Ang: Re: LU read only and r/w for different hosts? In-Reply-To: References: , , <8464C917-2640-401F-9B44-0B76DF3ED442@RichardElling.com>, Message-ID: Hmmm again....a lot of hmmm's here today... Been reading some more, and it looks like it is possible to reserve at LU level. I found a guy that uses an sg-persist command on solaris 11.1, but I don't find it in OmniOS. I did a pkg search, but perhaps I don't know what to search for...? -----"OmniOS-discuss" skrev: ----- Till: Richard Elling Fr?n: Johan Kragsterman S?nt av: "OmniOS-discuss" Datum: 2015-01-02 17:53 Kopia: "omnios-discuss at lists.omniti.com" ?rende: [OmniOS-discuss] Ang: Ang: Re: LU read only and r/w for different hosts? -----"OmniOS-discuss" skrev: ----- Till: Richard Elling Fr?n: Johan Kragsterman S?nt av: "OmniOS-discuss" Datum: 2015-01-02 17:38 Kopia: "omnios-discuss at lists.omniti.com" ?rende: [OmniOS-discuss] Ang: Re: LU read only and r/w for different hosts? -----Richard Elling skrev: ----- Till: Johan Kragsterman Fr?n: Richard Elling Datum: 2015-01-02 16:03 Kopia: "omnios-discuss at lists.omniti.com" ?rende: Re: [OmniOS-discuss] LU read only and r/w for different hosts? It has been a while since using comstar, but the SCSI protocol has WERO group reservations. Would that suffice? ?-- richard Yeah, I believe so....question is how I can manage it? It doesn't seem to be a part of stmfadm, so I need to do it elsewhere, if possible...so you, or someone else, got some ideas...? Rgrds Johan Hmm, been reading up a little on the PR and WERO....it seems to be registering on DEVICE level, not on LU level, huh? Or do I misinterpret something here...? > On Jan 2, 2015, at 12:18 AM, Johan Kragsterman wrote: > > Hi! > > > I've been thinking about this for a while, and haven't figured out a solution. > > I'd like to have a possibility to set LU read only for some hosts, but r/w to others, for the same LU. There are possibilities to set read only, or r/w, on a LU, but that property is valid for all hosts, it is not(afaik) possible to choose which hosts are going to get read only, and which are going to get r/w. > > This is an access controll operation, and as such, imho, should be controlled by comstar. It is the responsability of the view to handle this, but I haven't seen this anywhere in the comstar/stmf configuration posibilities. > > Are there someone on this list that can shed some light on this? > > > Best regards from/Med v?nliga h?lsningar fr?n > > Johan Kragsterman > > Capvert > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From johan.kragsterman at capvert.se Fri Jan 2 18:28:00 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Fri, 2 Jan 2015 19:28:00 +0100 Subject: [OmniOS-discuss] Ang: Ang: Ang: Ang: Re: LU read only and r/w for different hosts? In-Reply-To: References: , , , <8464C917-2640-401F-9B44-0B76DF3ED442@RichardElling.com>, Message-ID: -----"OmniOS-discuss" skrev: ----- Till: Richard Elling Fr?n: Johan Kragsterman S?nt av: "OmniOS-discuss" Datum: 2015-01-02 18:53 Kopia: "omnios-discuss at lists.omniti.com" ?rende: [OmniOS-discuss] Ang: Ang: Ang: Re: LU read only and r/w for different hosts? Hmmm again....a lot of hmmm's here today... Been reading some more, and it looks like it is possible to reserve at LU level. I found a guy that uses an sg-persist command on solaris 11.1, but I don't find it in OmniOS. I did a pkg search, but perhaps I don't know what to search for...? Me again... Seems to be a good possibility that this can be configured using the sg3_utils? I've seen the reference to that before here on list, but I don't find any pkg... Hint's??? Rgrds Johan -----"OmniOS-discuss" skrev: ----- Till: Richard Elling Fr?n: Johan Kragsterman S?nt av: "OmniOS-discuss" Datum: 2015-01-02 17:53 Kopia: "omnios-discuss at lists.omniti.com" ?rende: [OmniOS-discuss] Ang: Ang: Re: LU read only and r/w for different hosts? -----"OmniOS-discuss" skrev: ----- Till: Richard Elling Fr?n: Johan Kragsterman S?nt av: "OmniOS-discuss" Datum: 2015-01-02 17:38 Kopia: "omnios-discuss at lists.omniti.com" ?rende: [OmniOS-discuss] Ang: Re: LU read only and r/w for different hosts? -----Richard Elling skrev: ----- Till: Johan Kragsterman Fr?n: Richard Elling Datum: 2015-01-02 16:03 Kopia: "omnios-discuss at lists.omniti.com" ?rende: Re: [OmniOS-discuss] LU read only and r/w for different hosts? It has been a while since using comstar, but the SCSI protocol has WERO group reservations. Would that suffice? ?-- richard Yeah, I believe so....question is how I can manage it? It doesn't seem to be a part of stmfadm, so I need to do it elsewhere, if possible...so you, or someone else, got some ideas...? Rgrds Johan Hmm, been reading up a little on the PR and WERO....it seems to be registering on DEVICE level, not on LU level, huh? Or do I misinterpret something here...? > On Jan 2, 2015, at 12:18 AM, Johan Kragsterman wrote: > > Hi! > > > I've been thinking about this for a while, and haven't figured out a solution. > > I'd like to have a possibility to set LU read only for some hosts, but r/w to others, for the same LU. There are possibilities to set read only, or r/w, on a LU, but that property is valid for all hosts, it is not(afaik) possible to choose which hosts are going to get read only, and which are going to get r/w. > > This is an access controll operation, and as such, imho, should be controlled by comstar. It is the responsability of the view to handle this, but I haven't seen this anywhere in the comstar/stmf configuration posibilities. > > Are there someone on this list that can shed some light on this? > > > Best regards from/Med v?nliga h?lsningar fr?n > > Johan Kragsterman > > Capvert > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From Kevin.Swab at ColoState.EDU Fri Jan 2 18:32:14 2015 From: Kevin.Swab at ColoState.EDU (Kevin Swab) Date: Fri, 02 Jan 2015 11:32:14 -0700 Subject: [OmniOS-discuss] Ang: Ang: Ang: Ang: Re: LU read only and r/w for different hosts? In-Reply-To: References: , , , <8464C917-2640-401F-9B44-0B76DF3ED442@RichardElling.com>, Message-ID: <54A6E42E.6040408@ColoState.EDU> sg_persist is part of the sg3_utils package. Don't know if there's a repo out there with binaries, but it compiles easily on Omni: http://sg.danny.cz/sg/sg3_utils.html Kevin On 01/02/2015 11:28 AM, Johan Kragsterman wrote: > > > -----"OmniOS-discuss" skrev: ----- > Till: Richard Elling > Fr?n: Johan Kragsterman > S?nt av: "OmniOS-discuss" > Datum: 2015-01-02 18:53 > Kopia: "omnios-discuss at lists.omniti.com" > ?rende: [OmniOS-discuss] Ang: Ang: Ang: Re: LU read only and r/w for different hosts? > > Hmmm again....a lot of hmmm's here today... > > > Been reading some more, and it looks like it is possible to reserve at LU level. I found a guy that uses an sg-persist command on solaris 11.1, but I don't find it in OmniOS. I did a pkg search, but perhaps I don't know what to search for...? > > > > > Me again... > > > Seems to be a good possibility that this can be configured using the sg3_utils? > > I've seen the reference to that before here on list, but I don't find any pkg... > > Hint's??? > > > Rgrds Johan > > > > > > -----"OmniOS-discuss" skrev: ----- > Till: Richard Elling > Fr?n: Johan Kragsterman > S?nt av: "OmniOS-discuss" > Datum: 2015-01-02 17:53 > Kopia: "omnios-discuss at lists.omniti.com" > ?rende: [OmniOS-discuss] Ang: Ang: Re: LU read only and r/w for different hosts? > > > -----"OmniOS-discuss" skrev: ----- > Till: Richard Elling > Fr?n: Johan Kragsterman > S?nt av: "OmniOS-discuss" > Datum: 2015-01-02 17:38 > Kopia: "omnios-discuss at lists.omniti.com" > ?rende: [OmniOS-discuss] Ang: Re: LU read only and r/w for different hosts? > > > -----Richard Elling skrev: ----- > Till: Johan Kragsterman > Fr?n: Richard Elling > Datum: 2015-01-02 16:03 > Kopia: "omnios-discuss at lists.omniti.com" > ?rende: Re: [OmniOS-discuss] LU read only and r/w for different hosts? > > It has been a while since using comstar, but the SCSI protocol has > WERO group reservations. Would that suffice? > > -- richard > > > > Yeah, I believe so....question is how I can manage it? It doesn't seem to be a part of stmfadm, so I need to do it elsewhere, if possible...so you, or someone else, got some ideas...? > > > Rgrds Johan > > > > Hmm, been reading up a little on the PR and WERO....it seems to be registering on DEVICE level, not on LU level, huh? Or do I misinterpret something here...? > > > > > > >> On Jan 2, 2015, at 12:18 AM, Johan Kragsterman wrote: >> >> Hi! >> >> >> I've been thinking about this for a while, and haven't figured out a solution. >> >> I'd like to have a possibility to set LU read only for some hosts, but r/w to others, for the same LU. There are possibilities to set read only, or r/w, on a LU, but that property is valid for all hosts, it is not(afaik) possible to choose which hosts are going to get read only, and which are going to get r/w. >> >> This is an access controll operation, and as such, imho, should be controlled by comstar. It is the responsability of the view to handle this, but I haven't seen this anywhere in the comstar/stmf configuration posibilities. >> >> Are there someone on this list that can shed some light on this? >> >> >> Best regards from/Med v?nliga h?lsningar fr?n >> >> Johan Kragsterman >> >> Capvert >> >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -- ------------------------------------------------------------------- Kevin Swab UNIX Systems Administrator ACNS Colorado State University Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C From richard.elling at richardelling.com Fri Jan 2 19:12:18 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Fri, 2 Jan 2015 11:12:18 -0800 Subject: [OmniOS-discuss] Ang: Ang: Ang: Re: LU read only and r/w for different hosts? In-Reply-To: References: <, > <, > <8464C917-2640-401F-9B44-0B76DF3ED442@RichardElling.com> <, > Message-ID: <52459CB3-D8DA-462F-93C8-4A02E0943469@RichardElling.com> > On Jan 2, 2015, at 9:52 AM, Johan Kragsterman wrote: > > Hmmm again....a lot of hmmm's here today... > > > Been reading some more, and it looks like it is possible to reserve at LU level. You are correct. The spec is for targets as managed by the initiator. If you?d like to propose an enhancement to the spec? :-) > I found a guy that uses an sg-persist command on solaris 11.1, but I don't find it in OmniOS. I did a pkg search, but perhaps I don't know what to search for?? I?m not sure if OmniTI packages it, but source is readily available at http://sg.danny.cz/sg/sg3_utils.html ? richard -- Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Swab at ColoState.EDU Fri Jan 2 21:50:29 2015 From: Kevin.Swab at ColoState.EDU (Kevin Swab) Date: Fri, 02 Jan 2015 14:50:29 -0700 Subject: [OmniOS-discuss] slow drive response times In-Reply-To: <055A9B13-DC08-4DA3-9827-BD417545BC98@richardelling.com> References: <54A44D8C.5090302@ColoState.EDU> <54A49517.6070205@ColoState.EDU> <055A9B13-DC08-4DA3-9827-BD417545BC98@richardelling.com> Message-ID: <54A712A5.9080502@ColoState.EDU> I've run 'sg_logs' on the drive I pulled last week. There were alot of errors in the backgroud scan section of the output, which made it very large, so I put it here: http://pastebin.com/jx5BvSep When I pulled this drive, the SMART health status was OK. However, when I put it in a test system to run 'sg_logs', the status changed to "impending failure...". Had the SMART status changed before pulling the drive, I'm sure 'fmd' would have alerted me to the problem... Since that drive had other indications of trouble, I ran 'sg_logs' on another drive I pulled recently that has a SMART health status of OK, but exibits similar slow service time behavior: http://pastebin.com/Q0t8Jnug Thanks for taking the time to look at these, please let me know what you find... Kevin On 12/31/2014 06:13 PM, Richard Elling wrote: > >> On Dec 31, 2014, at 4:30 PM, Kevin Swab wrote: >> >> Hello Richard and group, thanks for your reply! >> >> I'll look into sg_logs for one of these devices once I have a chance to >> track that progam down... >> >> Thanks for the tip on the 500 ms latency, I wasn't aware that could >> happen in normal cases. However, I don't believe what I'm seeing >> constitutes normal behavior. >> >> First, some anecdotal evidence: If I pull and replace the suspect >> drive, my downstream systems stop complaining, and the high service time >> numbers go away. > > We call these "wounded soldiers" -- it takes more resources to manage a > wounded soldier than a dead soldier, so one strategy of war is to wound your > enemy causing them to consume resources tending the wounded. The sg_logs > should be enlightening. > > NB, consider a 4TB disk with 5 platters: if a head or surface starts to go, then > you have a 1/10 chance that the data you request is under the damaged head > and will need to be recovered by the drive. So it is not uncommon to see > 90+% of the I/Os to the drive completing quickly. It is also not unusual to see > only a small number of sectors or tracks affected. > > Detecting these becomes tricky, especially as you reduce the timeout/retry > interval, since the problem is rarely seen in the average latency -- that which > iostat and sar record. This is an area where we can and are improving. > -- richard > >> >> I threw out 500 ms as a guess to the point at which I start seeing >> problems. However, I see service times far in excess of that, sometimes >> over 30,000 ms! Below is 20 minutes of sar output from a drive I pulled >> a few days ago, during a time when downstream VMWare servers were >> complaining. (since the sar output is so verbose, I grepped out the >> info just for the suspect drive): >> >> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd91,a)' >> 14:50:00 device %busy avque r+w/s blks/s avwait avserv >> sd91,a 99 5.3 1 42 0.0 7811.7 >> sd91,a 100 11.3 1 53 0.0 11016.0 >> sd91,a 100 3.8 1 75 0.0 3615.8 >> sd91,a 100 4.9 1 25 0.0 8633.5 >> sd91,a 93 3.9 1 55 0.0 4385.3 >> sd91,a 86 3.5 2 75 0.0 2060.5 >> sd91,a 91 3.1 4 80 0.0 823.8 >> sd91,a 97 3.5 1 50 0.0 3984.5 >> sd91,a 100 4.4 1 56 0.0 6068.6 >> sd91,a 100 5.0 1 55 0.0 8836.0 >> sd91,a 100 5.7 1 51 0.0 7939.6 >> sd91,a 98 9.9 1 42 0.0 12526.8 >> sd91,a 100 7.4 0 10 0.0 36813.6 >> sd91,a 51 3.8 8 90 0.0 500.2 >> sd91,a 88 3.4 1 60 0.0 2338.8 >> sd91,a 100 4.5 1 28 0.0 6969.2 >> sd91,a 93 3.8 1 59 0.0 5138.9 >> sd91,a 79 3.1 1 59 0.0 3143.9 >> sd91,a 99 4.7 1 52 0.0 5598.4 >> sd91,a 100 4.8 1 62 0.0 6638.4 >> sd91,a 94 5.0 1 54 0.0 3752.7 >> >> For comparison, here's the sar output from another drive in the same >> pool for the same period of time: >> >> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd82,a)' >> 14:50:00 device %busy avque r+w/s blks/s avwait avserv >> sd82,a 0 0.0 2 28 0.0 5.6 >> sd82,a 1 0.0 3 51 0.0 5.4 >> sd82,a 1 0.0 4 66 0.0 6.3 >> sd82,a 1 0.0 3 48 0.0 4.3 >> sd82,a 1 0.0 3 45 0.0 6.1 >> sd82,a 1 0.0 6 82 0.0 2.7 >> sd82,a 1 0.0 8 112 0.0 2.8 >> sd82,a 0 0.0 3 27 0.0 1.8 >> sd82,a 1 0.0 5 80 0.0 3.1 >> sd82,a 0 0.0 3 35 0.0 3.1 >> sd82,a 1 0.0 3 35 0.0 3.8 >> sd82,a 1 0.0 4 49 0.0 3.2 >> sd82,a 0 0.0 0 0 0.0 4.1 >> sd82,a 3 0.0 9 84 0.0 4.1 >> sd82,a 1 0.0 6 55 0.0 3.7 >> sd82,a 0 0.0 1 23 0.0 7.0 >> sd82,a 0 0.0 6 57 0.0 1.8 >> sd82,a 1 0.0 5 70 0.0 2.3 >> sd82,a 1 0.0 4 55 0.0 3.7 >> sd82,a 1 0.0 5 72 0.0 4.1 >> sd82,a 1 0.0 4 54 0.0 3.6 >> >> The other drives in this pool all show data similar to that of sd82. >> >> Your point about tuning blindly is well taken, and I'm certainly no >> expert on the IO stack. What's a humble sysadmin to do? >> >> For further reference, this system is running r151010. The drive in >> question is a Toshiba MG03SCA300 (7200rpm SAS), and the pool the drive >> was in is using lz4 compression and looks like this: >> >> # zpool status data1 >> pool: data1 >> state: ONLINE >> scan: resilvered 1.67T in 70h56m with 0 errors on Wed Dec 31 14:40:20 2014 >> config: >> >> NAME STATE READ WRITE CKSUM >> data1 ONLINE 0 0 0 >> raidz2-0 ONLINE 0 0 0 >> c6t5000039468CB54F0d0 ONLINE 0 0 0 >> c6t5000039478CB5138d0 ONLINE 0 0 0 >> c6t5000039468D000DCd0 ONLINE 0 0 0 >> c6t5000039468D000E8d0 ONLINE 0 0 0 >> c6t5000039468D00F5Cd0 ONLINE 0 0 0 >> c6t5000039478C816CCd0 ONLINE 0 0 0 >> c6t5000039478C8546Cd0 ONLINE 0 0 0 >> raidz2-1 ONLINE 0 0 0 >> c6t5000039478C855F0d0 ONLINE 0 0 0 >> c6t5000039478C856E8d0 ONLINE 0 0 0 >> c6t5000039478C856ECd0 ONLINE 0 0 0 >> c6t5000039478C856F4d0 ONLINE 0 0 0 >> c6t5000039478C86374d0 ONLINE 0 0 0 >> c6t5000039478C8C2A8d0 ONLINE 0 0 0 >> c6t5000039478C8C364d0 ONLINE 0 0 0 >> raidz2-2 ONLINE 0 0 0 >> c6t5000039478C9958Cd0 ONLINE 0 0 0 >> c6t5000039478C995C4d0 ONLINE 0 0 0 >> c6t5000039478C9DACCd0 ONLINE 0 0 0 >> c6t5000039478C9DB30d0 ONLINE 0 0 0 >> c6t5000039478C9DB6Cd0 ONLINE 0 0 0 >> c6t5000039478CA73B4d0 ONLINE 0 0 0 >> c6t5000039478CB3A20d0 ONLINE 0 0 0 >> raidz2-3 ONLINE 0 0 0 >> c6t5000039478CB3A64d0 ONLINE 0 0 0 >> c6t5000039478CB3A70d0 ONLINE 0 0 0 >> c6t5000039478CB3E7Cd0 ONLINE 0 0 0 >> c6t5000039478CB3EB0d0 ONLINE 0 0 0 >> c6t5000039478CB3FBCd0 ONLINE 0 0 0 >> c6t5000039478CB4048d0 ONLINE 0 0 0 >> c6t5000039478CB4054d0 ONLINE 0 0 0 >> raidz2-4 ONLINE 0 0 0 >> c6t5000039478CB424Cd0 ONLINE 0 0 0 >> c6t5000039478CB4250d0 ONLINE 0 0 0 >> c6t5000039478CB470Cd0 ONLINE 0 0 0 >> c6t5000039478CB471Cd0 ONLINE 0 0 0 >> c6t5000039478CB4E50d0 ONLINE 0 0 0 >> c6t5000039478CB50A8d0 ONLINE 0 0 0 >> c6t5000039478CB50BCd0 ONLINE 0 0 0 >> spares >> c6t50000394A8CBC93Cd0 AVAIL >> >> errors: No known data errors >> >> >> Thanks for your help, >> Kevin >> >> On 12/31/2014 3:22 PM, Richard Elling wrote: >>> >>>> On Dec 31, 2014, at 11:25 AM, Kevin Swab wrote: >>>> >>>> Hello Everyone, >>>> >>>> We've been running OmniOS on a number of SuperMicro 36bay chassis, with >>>> Supermicro motherboards, LSI SAS controllers (9211-8i & 9207-8i) and >>>> various SAS HDD's. These systems are serving block storage via Comstar >>>> and Qlogic FC HBA's, and have been running well for several years. >>>> >>>> The problem we've got is that as the drives age, some of them start to >>>> perform slowly (intermittently) without failing - no zpool or iostat >>>> errors, and nothing logged in /var/adm/messages. The slow performance >>>> can be seen as high average service times in iostat or sar. >>> >>> Look at the drive's error logs using sg_logs (-a for all) >>> >>>> >>>> When these service times get above 500ms, they start to cause IO >>>> timeouts on the downstream storage consumers, which is bad... >>> >>> 500 milliseconds is not unusual for a busy HDD with SCSI TCQ or SATA NCQ >>> >>>> >>>> I'm wondering - is there a way to tune OmniOS' behavior so that it >>>> doesn't try so hard to complete IOs to these slow disks, and instead >>>> just gives up and fails them? >>> >>> Yes, the tuning in Alasdair's blog should work as he describes. More below... >>> >>>> >>>> I found an old post from 2011 which states that some tunables exist, >>>> but are ignored by the mpt_sas driver: >>>> >>>> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ >>>> >>>> Does anyone know the current status of these tunables, or have any other >>>> suggestions that might help? >>> >>> These tunables are on the order of seconds. The default, 60, is obviously too big >>> unless you have old, slow, SCSI CD-ROMs. But setting it below the manufacturer's >>> internal limit (default or tuned) can lead to an unstable system. Some vendors are >>> better than others at documenting these, but in any case you'll need to see their spec. >>> Expect values on the order of 6 to 15 seconds for modern HDDs and SSDs. >>> >>> There are a lot of tunables in this area at all levels of the architecture. OOB, the OmniOS >>> settings ensure stable behaviour. Tuning any layer without understanding the others can >>> lead to unstable systems, as demonstrated by your current downstream consumers. >>> -- richard >>> >>> >>>> >>>> Thanks, >>>> Kevin >>>> >>>> >>>> -- >>>> ------------------------------------------------------------------- >>>> Kevin Swab UNIX Systems Administrator >>>> ACNS Colorado State University >>>> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU >>>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C >>>> _______________________________________________ >>>> OmniOS-discuss mailing list >>>> OmniOS-discuss at lists.omniti.com >>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >> -- ------------------------------------------------------------------- Kevin Swab UNIX Systems Administrator ACNS Colorado State University Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C From richard.elling at richardelling.com Fri Jan 2 22:45:02 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Fri, 2 Jan 2015 14:45:02 -0800 Subject: [OmniOS-discuss] slow drive response times In-Reply-To: <54A712A5.9080502@ColoState.EDU> References: <54A44D8C.5090302@ColoState.EDU> <54A49517.6070205@ColoState.EDU> <055A9B13-DC08-4DA3-9827-BD417545BC98@richardelling.com> <54A712A5.9080502@ColoState.EDU> Message-ID: > On Jan 2, 2015, at 1:50 PM, Kevin Swab wrote: > > I've run 'sg_logs' on the drive I pulled last week. There were alot of > errors in the backgroud scan section of the output, which made it very > large, so I put it here: > > http://pastebin.com/jx5BvSep > > When I pulled this drive, the SMART health status was OK. SMART isn?t smart :-P > However, when > I put it in a test system to run 'sg_logs', the status changed to > "impending failure...". Had the SMART status changed before pulling the > drive, I'm sure 'fmd' would have alerted me to the problem? By default, fmd looks for the predictive failure (PFA) and self-test every hour using the disk_transport agent. fmstat should show activity there. When a PFA is seen, then there will be an ereport generated and, for most cases, a syslog message. However, this will not cause a zfs-retire event. Vendors have significant leeway in how they implement SMART. In my experience the only thing you can say for sure is if the vendor thinks the drive?s death is imminent, then you should replace it. I suspect these policies are financially motivated rather than scientific? some amount of truthiness is to be expected. In the logs, clearly the one disk has lots of errors that have been corrected and the rate is increasing. The rate of change for "Errors corrected with possible delays? may correlate to your performance issues, but the interpretation is left up to the vendors. In the case of this naughty drive, yep it needs replacing. > > Since that drive had other indications of trouble, I ran 'sg_logs' on > another drive I pulled recently that has a SMART health status of OK, > but exibits similar slow service time behavior: > > http://pastebin.com/Q0t8Jnug This one looks mostly healthy. Another place to look for latency issues is the phy logs. In the sg_logs output, this is the Protocol Specific port log page for SAS SSP. Key values are running disparity error count and loss of dword sync count. The trick here is that you need to look at both ends of the wire for each wire. For a simple case, this means looking at both the HBA?s phys error counts and the driver. If you have expanders in the mix, it is more work. You?ll want to look at all of the HBA, expander, and drive phys health counters for all phys. This can get tricky because wide ports are mostly dumb. For example, if an HBA has a 4-link wide port (common) and one of the links is acting up (all too common) the latency impacts will be random. To see HBA and expander link health, you can use sg3_utils, its companion smp_utils, or sasinfo (installed as a separate package from OmniOS, IIRC). For example, sasinfo hba-port -l HTH ? richard > > Thanks for taking the time to look at these, please let me know what you > find... > > Kevin > > > > > On 12/31/2014 06:13 PM, Richard Elling wrote: >> >>> On Dec 31, 2014, at 4:30 PM, Kevin Swab wrote: >>> >>> Hello Richard and group, thanks for your reply! >>> >>> I'll look into sg_logs for one of these devices once I have a chance to >>> track that progam down... >>> >>> Thanks for the tip on the 500 ms latency, I wasn't aware that could >>> happen in normal cases. However, I don't believe what I'm seeing >>> constitutes normal behavior. >>> >>> First, some anecdotal evidence: If I pull and replace the suspect >>> drive, my downstream systems stop complaining, and the high service time >>> numbers go away. >> >> We call these "wounded soldiers" -- it takes more resources to manage a >> wounded soldier than a dead soldier, so one strategy of war is to wound your >> enemy causing them to consume resources tending the wounded. The sg_logs >> should be enlightening. >> >> NB, consider a 4TB disk with 5 platters: if a head or surface starts to go, then >> you have a 1/10 chance that the data you request is under the damaged head >> and will need to be recovered by the drive. So it is not uncommon to see >> 90+% of the I/Os to the drive completing quickly. It is also not unusual to see >> only a small number of sectors or tracks affected. >> >> Detecting these becomes tricky, especially as you reduce the timeout/retry >> interval, since the problem is rarely seen in the average latency -- that which >> iostat and sar record. This is an area where we can and are improving. >> -- richard >> >>> >>> I threw out 500 ms as a guess to the point at which I start seeing >>> problems. However, I see service times far in excess of that, sometimes >>> over 30,000 ms! Below is 20 minutes of sar output from a drive I pulled >>> a few days ago, during a time when downstream VMWare servers were >>> complaining. (since the sar output is so verbose, I grepped out the >>> info just for the suspect drive): >>> >>> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd91,a)' >>> 14:50:00 device %busy avque r+w/s blks/s avwait avserv >>> sd91,a 99 5.3 1 42 0.0 7811.7 >>> sd91,a 100 11.3 1 53 0.0 11016.0 >>> sd91,a 100 3.8 1 75 0.0 3615.8 >>> sd91,a 100 4.9 1 25 0.0 8633.5 >>> sd91,a 93 3.9 1 55 0.0 4385.3 >>> sd91,a 86 3.5 2 75 0.0 2060.5 >>> sd91,a 91 3.1 4 80 0.0 823.8 >>> sd91,a 97 3.5 1 50 0.0 3984.5 >>> sd91,a 100 4.4 1 56 0.0 6068.6 >>> sd91,a 100 5.0 1 55 0.0 8836.0 >>> sd91,a 100 5.7 1 51 0.0 7939.6 >>> sd91,a 98 9.9 1 42 0.0 12526.8 >>> sd91,a 100 7.4 0 10 0.0 36813.6 >>> sd91,a 51 3.8 8 90 0.0 500.2 >>> sd91,a 88 3.4 1 60 0.0 2338.8 >>> sd91,a 100 4.5 1 28 0.0 6969.2 >>> sd91,a 93 3.8 1 59 0.0 5138.9 >>> sd91,a 79 3.1 1 59 0.0 3143.9 >>> sd91,a 99 4.7 1 52 0.0 5598.4 >>> sd91,a 100 4.8 1 62 0.0 6638.4 >>> sd91,a 94 5.0 1 54 0.0 3752.7 >>> >>> For comparison, here's the sar output from another drive in the same >>> pool for the same period of time: >>> >>> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd82,a)' >>> 14:50:00 device %busy avque r+w/s blks/s avwait avserv >>> sd82,a 0 0.0 2 28 0.0 5.6 >>> sd82,a 1 0.0 3 51 0.0 5.4 >>> sd82,a 1 0.0 4 66 0.0 6.3 >>> sd82,a 1 0.0 3 48 0.0 4.3 >>> sd82,a 1 0.0 3 45 0.0 6.1 >>> sd82,a 1 0.0 6 82 0.0 2.7 >>> sd82,a 1 0.0 8 112 0.0 2.8 >>> sd82,a 0 0.0 3 27 0.0 1.8 >>> sd82,a 1 0.0 5 80 0.0 3.1 >>> sd82,a 0 0.0 3 35 0.0 3.1 >>> sd82,a 1 0.0 3 35 0.0 3.8 >>> sd82,a 1 0.0 4 49 0.0 3.2 >>> sd82,a 0 0.0 0 0 0.0 4.1 >>> sd82,a 3 0.0 9 84 0.0 4.1 >>> sd82,a 1 0.0 6 55 0.0 3.7 >>> sd82,a 0 0.0 1 23 0.0 7.0 >>> sd82,a 0 0.0 6 57 0.0 1.8 >>> sd82,a 1 0.0 5 70 0.0 2.3 >>> sd82,a 1 0.0 4 55 0.0 3.7 >>> sd82,a 1 0.0 5 72 0.0 4.1 >>> sd82,a 1 0.0 4 54 0.0 3.6 >>> >>> The other drives in this pool all show data similar to that of sd82. >>> >>> Your point about tuning blindly is well taken, and I'm certainly no >>> expert on the IO stack. What's a humble sysadmin to do? >>> >>> For further reference, this system is running r151010. The drive in >>> question is a Toshiba MG03SCA300 (7200rpm SAS), and the pool the drive >>> was in is using lz4 compression and looks like this: >>> >>> # zpool status data1 >>> pool: data1 >>> state: ONLINE >>> scan: resilvered 1.67T in 70h56m with 0 errors on Wed Dec 31 14:40:20 2014 >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> data1 ONLINE 0 0 0 >>> raidz2-0 ONLINE 0 0 0 >>> c6t5000039468CB54F0d0 ONLINE 0 0 0 >>> c6t5000039478CB5138d0 ONLINE 0 0 0 >>> c6t5000039468D000DCd0 ONLINE 0 0 0 >>> c6t5000039468D000E8d0 ONLINE 0 0 0 >>> c6t5000039468D00F5Cd0 ONLINE 0 0 0 >>> c6t5000039478C816CCd0 ONLINE 0 0 0 >>> c6t5000039478C8546Cd0 ONLINE 0 0 0 >>> raidz2-1 ONLINE 0 0 0 >>> c6t5000039478C855F0d0 ONLINE 0 0 0 >>> c6t5000039478C856E8d0 ONLINE 0 0 0 >>> c6t5000039478C856ECd0 ONLINE 0 0 0 >>> c6t5000039478C856F4d0 ONLINE 0 0 0 >>> c6t5000039478C86374d0 ONLINE 0 0 0 >>> c6t5000039478C8C2A8d0 ONLINE 0 0 0 >>> c6t5000039478C8C364d0 ONLINE 0 0 0 >>> raidz2-2 ONLINE 0 0 0 >>> c6t5000039478C9958Cd0 ONLINE 0 0 0 >>> c6t5000039478C995C4d0 ONLINE 0 0 0 >>> c6t5000039478C9DACCd0 ONLINE 0 0 0 >>> c6t5000039478C9DB30d0 ONLINE 0 0 0 >>> c6t5000039478C9DB6Cd0 ONLINE 0 0 0 >>> c6t5000039478CA73B4d0 ONLINE 0 0 0 >>> c6t5000039478CB3A20d0 ONLINE 0 0 0 >>> raidz2-3 ONLINE 0 0 0 >>> c6t5000039478CB3A64d0 ONLINE 0 0 0 >>> c6t5000039478CB3A70d0 ONLINE 0 0 0 >>> c6t5000039478CB3E7Cd0 ONLINE 0 0 0 >>> c6t5000039478CB3EB0d0 ONLINE 0 0 0 >>> c6t5000039478CB3FBCd0 ONLINE 0 0 0 >>> c6t5000039478CB4048d0 ONLINE 0 0 0 >>> c6t5000039478CB4054d0 ONLINE 0 0 0 >>> raidz2-4 ONLINE 0 0 0 >>> c6t5000039478CB424Cd0 ONLINE 0 0 0 >>> c6t5000039478CB4250d0 ONLINE 0 0 0 >>> c6t5000039478CB470Cd0 ONLINE 0 0 0 >>> c6t5000039478CB471Cd0 ONLINE 0 0 0 >>> c6t5000039478CB4E50d0 ONLINE 0 0 0 >>> c6t5000039478CB50A8d0 ONLINE 0 0 0 >>> c6t5000039478CB50BCd0 ONLINE 0 0 0 >>> spares >>> c6t50000394A8CBC93Cd0 AVAIL >>> >>> errors: No known data errors >>> >>> >>> Thanks for your help, >>> Kevin >>> >>> On 12/31/2014 3:22 PM, Richard Elling wrote: >>>> >>>>> On Dec 31, 2014, at 11:25 AM, Kevin Swab wrote: >>>>> >>>>> Hello Everyone, >>>>> >>>>> We've been running OmniOS on a number of SuperMicro 36bay chassis, with >>>>> Supermicro motherboards, LSI SAS controllers (9211-8i & 9207-8i) and >>>>> various SAS HDD's. These systems are serving block storage via Comstar >>>>> and Qlogic FC HBA's, and have been running well for several years. >>>>> >>>>> The problem we've got is that as the drives age, some of them start to >>>>> perform slowly (intermittently) without failing - no zpool or iostat >>>>> errors, and nothing logged in /var/adm/messages. The slow performance >>>>> can be seen as high average service times in iostat or sar. >>>> >>>> Look at the drive's error logs using sg_logs (-a for all) >>>> >>>>> >>>>> When these service times get above 500ms, they start to cause IO >>>>> timeouts on the downstream storage consumers, which is bad... >>>> >>>> 500 milliseconds is not unusual for a busy HDD with SCSI TCQ or SATA NCQ >>>> >>>>> >>>>> I'm wondering - is there a way to tune OmniOS' behavior so that it >>>>> doesn't try so hard to complete IOs to these slow disks, and instead >>>>> just gives up and fails them? >>>> >>>> Yes, the tuning in Alasdair's blog should work as he describes. More below... >>>> >>>>> >>>>> I found an old post from 2011 which states that some tunables exist, >>>>> but are ignored by the mpt_sas driver: >>>>> >>>>> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ >>>>> >>>>> Does anyone know the current status of these tunables, or have any other >>>>> suggestions that might help? >>>> >>>> These tunables are on the order of seconds. The default, 60, is obviously too big >>>> unless you have old, slow, SCSI CD-ROMs. But setting it below the manufacturer's >>>> internal limit (default or tuned) can lead to an unstable system. Some vendors are >>>> better than others at documenting these, but in any case you'll need to see their spec. >>>> Expect values on the order of 6 to 15 seconds for modern HDDs and SSDs. >>>> >>>> There are a lot of tunables in this area at all levels of the architecture. OOB, the OmniOS >>>> settings ensure stable behaviour. Tuning any layer without understanding the others can >>>> lead to unstable systems, as demonstrated by your current downstream consumers. >>>> -- richard >>>> >>>> >>>>> >>>>> Thanks, >>>>> Kevin >>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------- >>>>> Kevin Swab UNIX Systems Administrator >>>>> ACNS Colorado State University >>>>> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU >>>>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C >>>>> _______________________________________________ >>>>> OmniOS-discuss mailing list >>>>> OmniOS-discuss at lists.omniti.com >>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>> > > -- > ------------------------------------------------------------------- > Kevin Swab UNIX Systems Administrator > ACNS Colorado State University > Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU > GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C -- Richard.Elling at RichardElling.com +1-760-896-4422 -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Sat Jan 3 03:11:33 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 2 Jan 2015 22:11:33 -0500 Subject: [OmniOS-discuss] Kernel panic - I cant find the problem In-Reply-To: <1531165597.20141231214048@tierarzt-mueller.de> References: <1531165597.20141231214048@tierarzt-mueller.de> Message-ID: > On Dec 31, 2014, at 3:40 PM, Alexander Lesle wrote: > > See if dumpadm(1M) shows you have a working place to store kernel crash dumps. If you do, a "savecore" should get you a vmdump.N file. Having a full vmdump.N is useful, and is something people can inspect. Thanks, Dan From danmcd at omniti.com Sat Jan 3 03:16:03 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 2 Jan 2015 22:16:03 -0500 Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: <20150101190907.2a17aa48@emeritus> References: <20141223120402.6714c763@emeritus> <20150101190907.2a17aa48@emeritus> Message-ID: <732E6EC6-62DE-43F4-8142-9FEB025E61CF@omniti.com> Pardon the latency... > On Jan 1, 2015, at 4:09 AM, Michael Mounteney wrote: > > Just for the record, I now believe that the cause of the problem is > excessive heat. The server is located in a domestic garage in > Brisbane, Queensland, and the addition of a larger UPS to the rig has > increased the amount of heat in there. The server began to suffer from > loss of disk drives (i.e., any activity involving the disk would hang) > but hasn't faulted since I took measures to keep the temperature down. zOMG (as the kids would say) that's terrible! Thanks for the update. Dan From henson at acm.org Sat Jan 3 03:39:41 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 2 Jan 2015 19:39:41 -0800 Subject: [OmniOS-discuss] state of building illumos-gate on omnios-stable In-Reply-To: References: <20141215043037.GA29549@bender.unx.csupomona.edu> <80F9B9EA-5308-4CAA-90B8-1206B03033B9@omniti.com> Message-ID: <20150103033941.GM29549@bender.unx.csupomona.edu> So I've gotten my illumos-omnios build zone going, with the following changes from stock illumos.sh: 47,48c47 < export NIGHTLY_OPTIONS='-FnCDAlmprt' --- > export NIGHTLY_OPTIONS='-nCDlpr' 60c59 < export GATE='testws' --- > export GATE='illumos-omnios-5410.5412' 63c62 < export CODEMGR_WS="$HOME/ws/$GATE" --- > export CODEMGR_WS=/code/work/$GATE 109c108 < export PARENT_WS='' --- > export PARENT_WS=$CODEMGR_WS 112c111 < export CLONE_WS='ssh://anonhg at hg.illumos.org/illumos-gate' --- > export CLONE_WS=$CODEMGR_WS 187c186 < # export PKGPUBLISHER_REDIST='on-redist' --- > #export PKGPUBLISHER_REDIST='omnios' 231a231,240 > export __GNUC='' > export __GNUC4='' > export GCC_ROOT="/opt/gcc-4.4.4" > export CW_GCC_DIR="$GCC_ROOT/bin" > export CW_NO_SHADOW=1 > export ONNV_BUILDNUM=151012 > export MULTI_PROTO=yes > export RELEASE_DATE=2014.12.19 > MAKEFLAGS=k; export MAKEFLAGS > ONLY_LINT_DEFS=-I${SPRO_ROOT}/sunstudio12.1/prod/include/lint; export > ONLY_LINT_DEFS All seems to work ok, except for a failure in the package build section: ==== package build errors (non-DEBUG) ==== dmake: Warning: Command failed for target `packages.i386/system-boot-network.dep' dmake: Warning: Target `install' not remade because of errors I'm not actually using the packages so it doesn't really impact my use case if one fails to build, but any thoughts on why? I'm at: commit 5cc1c75be11cdbfccae217ada6811ad7e60ff1a6 Merge: eb4c8f3 a846f19 Author: Dan McDonald Date: Fri Dec 19 14:09:03 2014 -0500 Merge branch 'upstream' From henson at acm.org Sat Jan 3 03:50:19 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 2 Jan 2015 19:50:19 -0800 Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: References: <20141223120402.6714c763@emeritus> <20150101190907.2a17aa48@emeritus> <20150102091924.6210d489@emeritus> <20150102010311.6a7b482f@sleipner.datanom.net> <20150102115357.082d59ff@emeritus> Message-ID: <20150103035019.GN29549@bender.unx.csupomona.edu> On Fri, Jan 02, 2015 at 09:35:24AM -0800, Tim Rice wrote: > Then just get a SATA to ESATA cable and find a way to get the ESATA > end outside the case. Do you have an unused pci(e) cover plate you can swap out? http://www.amazon.com/KingWin-Esata-Bracket-Cable-ESAC-02/dp/B002TMPWH8 I believe the cables this comes with plug into standard SATA ports on the motherboard and then provide externel esata connectors. From henson at acm.org Sat Jan 3 04:13:20 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 2 Jan 2015 20:13:20 -0800 Subject: [OmniOS-discuss] state of building illumos-gate on omnios-stable In-Reply-To: <20150103033941.GM29549@bender.unx.csupomona.edu> References: <20141215043037.GA29549@bender.unx.csupomona.edu> <80F9B9EA-5308-4CAA-90B8-1206B03033B9@omniti.com> <20150103033941.GM29549@bender.unx.csupomona.edu> Message-ID: <20150103041320.GP29549@bender.unx.csupomona.edu> On Fri, Jan 02, 2015 at 07:39:41PM -0800, Paul B. Henson wrote: > > dmake: Warning: Command failed for target > `packages.i386/system-boot-network.dep' > dmake: Warning: Target `install' not remade because of errors Ah, I see there are more details on this hidden in the midst of the nightly log: Generating dependencies for system-boot-network.mog Unable to generate SMF dependency on svc:/system/system-log declared in /code/wo rk/illumos-omnios-5410.5412/proto/root_i386/lib/svc/manifest/network/rarp.xml by svc:/network/rarp:default: FMRI is delivered by multiple files: set(['/var/svc/manifest/site/syslog-ng.xml', '/code/work/illumos-omnios-5410.5412/proto/root_i386/lib/svc/manifest/system/system-log.xml']) I use syslog-ng on my boxes: disabled Dec_21 svc:/system/system-log:default online Dec_21 svc:/system/system-log:syslog-ng I thought the build systems was isolated from the host and only only used stuff in the build or prototype area? Why is it paying attention to a service I have installed on the host system? From gate03 at landcroft.co.uk Sat Jan 3 04:21:11 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Sat, 3 Jan 2015 14:21:11 +1000 Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: <20150103035019.GN29549@bender.unx.csupomona.edu> References: <20141223120402.6714c763@emeritus> <20150101190907.2a17aa48@emeritus> <20150102091924.6210d489@emeritus> <20150102010311.6a7b482f@sleipner.datanom.net> <20150102115357.082d59ff@emeritus> <20150103035019.GN29549@bender.unx.csupomona.edu> Message-ID: <20150103142111.7570e9be@emeritus> On Fri, 2 Jan 2015 19:50:19 -0800 "Paul B. Henson" wrote: > On Fri, Jan 02, 2015 at 09:35:24AM -0800, Tim Rice wrote: > > Do you have an unused pci(e) cover plate you can swap out? Nope. :-( http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm It has one, which I'm saving for an ethernet card. I suppose I could get mediaeval and do some casemodding, but really, I should have bought something bigger and more flexible but I'm stuck with it now. Michael. From henson at acm.org Sat Jan 3 04:35:34 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 2 Jan 2015 20:35:34 -0800 Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: <20150103142111.7570e9be@emeritus> References: <20141223120402.6714c763@emeritus> <20150101190907.2a17aa48@emeritus> <20150102091924.6210d489@emeritus> <20150102010311.6a7b482f@sleipner.datanom.net> <20150102115357.082d59ff@emeritus> <20150103035019.GN29549@bender.unx.csupomona.edu> <20150103142111.7570e9be@emeritus> Message-ID: <20150103043534.GQ29549@bender.unx.csupomona.edu> On Sat, Jan 03, 2015 at 02:21:11PM +1000, Michael Mounteney wrote: > http://www.supermicro.com/products/system/1U/5017/SYS-5017C-LF.cfm > > It has one, which I'm saving for an ethernet card. I suppose I could > get mediaeval and do some casemodding, but really, I should have bought > something bigger and more flexible but I'm stuck with it now. Ah, I've got a similar unit that's serving as a PBX. There's a pop out above the vga/serial ports that I think is intended for a parallel port. You should be able to break it out and thread some SATA cables through it? Hmm, depending on how handy you are with a dremel you might even be able to buy one of those PCI slot cover cable bundles, unscrew the connector from the cover plate, and attach it through one of the slots in the parallel pop out plate without removing it from the case. Done well enough I bet it would pass for stock ;). Send pictures if you do it :), good luck... From mark0x01 at gmail.com Sat Jan 3 04:40:21 2015 From: mark0x01 at gmail.com (Mark) Date: Sat, 03 Jan 2015 17:40:21 +1300 Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: <20150103035019.GN29549@bender.unx.csupomona.edu> References: <20141223120402.6714c763@emeritus> <20150101190907.2a17aa48@emeritus> <20150102091924.6210d489@emeritus> <20150102010311.6a7b482f@sleipner.datanom.net> <20150102115357.082d59ff@emeritus> <20150103035019.GN29549@bender.unx.csupomona.edu> Message-ID: <54A772B5.8080505@gmail.com> On 3/01/2015 4:50 p.m., Paul B. Henson wrote: > On Fri, Jan 02, 2015 at 09:35:24AM -0800, Tim Rice wrote: > >> Then just get a SATA to ESATA cable and find a way to get the ESATA >> end outside the case. > > Do you have an unused pci(e) cover plate you can swap out? > > http://www.amazon.com/KingWin-Esata-Bracket-Cable-ESAC-02/dp/B002TMPWH8 > > I believe the cables this comes with plug into standard SATA ports on > the motherboard and then provide externel esata connectors. > You could always adapt the esata bracket to fit in the spare parallel port knockout on the back of the chassis, or just hang the cable out the back. > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > From gate03 at landcroft.co.uk Sat Jan 3 05:14:17 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Sat, 3 Jan 2015 15:14:17 +1000 Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: <20150103043534.GQ29549@bender.unx.csupomona.edu> References: <20141223120402.6714c763@emeritus> <20150101190907.2a17aa48@emeritus> <20150102091924.6210d489@emeritus> <20150102010311.6a7b482f@sleipner.datanom.net> <20150102115357.082d59ff@emeritus> <20150103035019.GN29549@bender.unx.csupomona.edu> <20150103142111.7570e9be@emeritus> <20150103043534.GQ29549@bender.unx.csupomona.edu> Message-ID: <20150103151417.22c4f37c@emeritus> On Fri, 2 Jan 2015 20:35:34 -0800 "Paul B. Henson" wrote: > On Sat, Jan 03, 2015 at 02:21:11PM +1000, Michael Mounteney wrote: > Ah, I've got a similar unit that's serving as a PBX. There's a pop out > above the vga/serial ports that I think is intended for a parallel > port. You should be able to break it out and thread some SATA cables > through it? Hmm, depending on how handy you are with a dremel you > might even be able to buy one of those PCI slot cover cable bundles, > unscrew the connector from the cover plate, and attach it through one > of the slots in the parallel pop out plate without removing it from > the case. Done well enough I bet it would pass for stock ;). Send > pictures if you do it :), good luck... Yes, I was thinking of that but the back of the server is somewhat inaccessible and I haven't had the opportunity to investigate yet. The idea of spraying the interior of the server with metal fragments as I mash out the cutout is unattractive as well, I have to say. Thanks also to Mark who just came in with the same idea. I think this is the way to go. I'm investigating the parts now. Hardware isn't my core skill-set so I have to obtain advice first. Michael. From henson at acm.org Sat Jan 3 05:33:49 2015 From: henson at acm.org (Paul B. Henson) Date: Fri, 2 Jan 2015 21:33:49 -0800 Subject: [OmniOS-discuss] sudden loss of networking In-Reply-To: <20150103151417.22c4f37c@emeritus> References: <20150101190907.2a17aa48@emeritus> <20150102091924.6210d489@emeritus> <20150102010311.6a7b482f@sleipner.datanom.net> <20150102115357.082d59ff@emeritus> <20150103035019.GN29549@bender.unx.csupomona.edu> <20150103142111.7570e9be@emeritus> <20150103043534.GQ29549@bender.unx.csupomona.edu> <20150103151417.22c4f37c@emeritus> Message-ID: <20150103053349.GS29549@bender.unx.csupomona.edu> On Sat, Jan 03, 2015 at 03:14:17PM +1000, Michael Mounteney wrote: > Yes, I was thinking of that but the back of the server is somewhat > inaccessible and I haven't had the opportunity to investigate yet. The > idea of spraying the interior of the server with metal fragments as I > mash out the cutout is unattractive as well, I have to say. Granted ;). Perhaps a better plan would be to just pop out the case cover plate (which should break away easily, as that's what it's meant to do), and then modify the PCI cover plate away from the unit to fit the slot. It's hard to eyeball, but I think the width of the cover plate is bigger than the height of the parallel port breakout. You just need the one, right? Cut the plate in half, unscrew the cable from it, hold the plate piece in front of the cutout, and screw in the cable from behind. If you're lucky the profile of the cable connector will be big enough to clamp the case between the plate in front and the connector in rear. If not, you'll need to get a bit creative and use a couple other cut off pieces of the cover plate or the piece you broke off on the inside to hold it together. There are also a couple of breakaway covers above the ethernet ports that might be massaged into ESATA ports too... From gate03 at landcroft.co.uk Sat Jan 3 10:37:27 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Sat, 3 Jan 2015 20:37:27 +1000 Subject: [OmniOS-discuss] succinct expression of NFS share parameters Message-ID: <20150103203727.55e9ae78@emeritus> On some of my ZFS volumes, the sharenfs parameter has the value: sec=sys,rw=@192.168.1.0/24:@192.168.2.0/24:@192.168.3.0/24,root=@192.168.1.0/24:@192.168.2.0/24:@192.168.3.0/24 Can that be expressed more succinctly, but still equivalently? Michael. From omnios at citrus-it.net Sat Jan 3 14:00:13 2015 From: omnios at citrus-it.net (Andy) Date: Sat, 3 Jan 2015 14:00:13 +0000 (GMT) Subject: [OmniOS-discuss] succinct expression of NFS share parameters In-Reply-To: <20150103203727.55e9ae78@emeritus> References: <20150103203727.55e9ae78@emeritus> Message-ID: On Sat, 3 Jan 2015, Michael Mounteney wrote: ; On some of my ZFS volumes, the sharenfs parameter has the value: ; ; sec=sys,rw=@192.168.1.0/24:@192.168.2.0/24:@192.168.3.0/24,root=@192.168.1.0/24:@192.168.2.0/24:@192.168.3.0/24 ; ; Can that be expressed more succinctly, but still equivalently? Not much. @192.168.2.0/23 would cover both 192.168.2.0/24 and 192.168.3.0/24 You could use @192.168.0.0/22 which covers the three ranges you have but also includes 192.168.0.0/24 which you might not want. HTH. Andy -- Citrus IT Limited | +44 (0)870 199 8000 | enquiries at citrus-it.co.uk Rock House Farm | Green Moor | Wortley | Sheffield | S35 7DQ Registered in England and Wales | Company number 4899123 From gate03 at landcroft.co.uk Sat Jan 3 18:47:44 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Sun, 4 Jan 2015 04:47:44 +1000 Subject: [OmniOS-discuss] succinct expression of NFS share parameters In-Reply-To: References: <20150103203727.55e9ae78@emeritus> Message-ID: <20150104044744.01e8fac8@emeritus> On Sat, 3 Jan 2015 14:00:13 +0000 (GMT) Andy wrote: > Not much. @192.168.2.0/23 would cover both 192.168.2.0/24 and > 192.168.3.0/24 You could use @192.168.0.0/22 which covers the three > ranges you have but also includes 192.168.0.0/24 which you might not > want. Hmm, seems a bit 'hacky' but thanks for the reply. Michael. From gate03 at landcroft.co.uk Sun Jan 4 05:24:10 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Sun, 4 Jan 2015 15:24:10 +1000 Subject: [OmniOS-discuss] kvm io 10 times slower after r151010 -> r151012 upgrade In-Reply-To: References: <20141210201540.6fe5cfee@emeritus> Message-ID: <20150104152410.4a07f8e5@pantry> Sorry to take so long to get back to you Tobias and I hope this is still relevant. As described elsewhere in this list, I had temporarily to downgrade ssh to achieve interoperability between the OmniOS (bloody) host and the Gentoo Linux guests. First, ssh imposes some overhead: mounty at pantry ~ $ time ssh people exit real 0m0.724s user 0m0.032s sys 0m0.012s that real figure averages around the 0.750s mark. So I decided to perform much bigger transfers to minimise its effect: mounty at pantry ~ $ dd if=/dev/zero bs=1M count=2000 | ssh people dd of=/dev/null 2000+0 records in 2000+0 records out 2097152000 bytes (2.1 GB) copied, 138.436 s, 15.1 MB/s 4096000+0 records in 4096000+0 records out 2097152000 bytes transferred in 137.657582 secs (15234555 bytes/sec) mounty at pantry ~ $ ssh people dd if=/dev/zero bs=1M count=2000 | dd of=/dev/null 2000+0 records in 2000+0 records out 2097152000 bytes transferred in 51.692313 secs (40569901 bytes/sec) 4096000+0 records in 4096000+0 records out 2097152000 bytes (2.1 GB) copied, 52.4503 s, 40.0 MB/s It is puzzling that the in and out figures are so different but I did perform each test three times and the results were approximately the same each time. On the read-off-disk test, here are all three runs: pantry ~ # dd if=/dev/vda of=/dev/zero bs=1M count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 65.3406 s, 16.0 MB/s pantry ~ # dd if=/dev/vda of=/dev/zero bs=1M count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 1.19789 s, 875 MB/s pantry ~ # dd if=/dev/vda of=/dev/zero bs=1M count=1000 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 1.85877 s, 564 MB/s which I've quoted to show that the disk must be cached. So I tried again with more data to eliminate that effect: pantry ~ # dd if=/dev/vda of=/dev/zero bs=1M count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 710.215 s, 15.1 MB/s I hope that's helpful. Michael. From gate03 at landcroft.co.uk Sun Jan 4 09:42:37 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Sun, 4 Jan 2015 19:42:37 +1000 Subject: [OmniOS-discuss] networking from a zone Message-ID: <20150104194237.67708802@pantry> Hello, my server is running a fairly simple firewall. The machine has two interfaces: e1000g0 192.168.0.n/24 connected to the cable modem and the internet. e1000g1 192.168.1.1/24 connected to a hub and hence various client machines. The firewall is basically as per http://pastebin.com/4aYyZhJ8 and while this works well for the clients, I can't make it work for a zone. I've got one zone which shares the e1000g1 interface, which provides various internal services which I don't want visible to the outside world, but another zone, which shares the e1000g0 interface, I *do* want to be able to see the outside world, but it won't do much. I can ping an external IP address, but can't do ssh (to an IP address) or DNS for example. Any ideas ? Thanks in expectation. Michael. From tobi at oetiker.ch Sun Jan 4 11:37:48 2015 From: tobi at oetiker.ch (Tobias Oetiker) Date: Sun, 4 Jan 2015 12:37:48 +0100 (CET) Subject: [OmniOS-discuss] kvm io 10 times slower after r151010 -> r151012 upgrade In-Reply-To: <20150104152410.4a07f8e5@pantry> References: <20141210201540.6fe5cfee@emeritus> <20150104152410.4a07f8e5@pantry> Message-ID: Hi Michael, so your tests wer now exectued on a bloody host ? indicating that the performance went back up in bloody ? cheers tobi Today Michael Mounteney wrote: > Sorry to take so long to get back to you Tobias and I hope this is > still relevant. As described elsewhere in this list, I had temporarily > to downgrade ssh to achieve interoperability between the OmniOS (bloody) > host and the Gentoo Linux guests. > > First, ssh imposes some overhead: > > mounty at pantry ~ $ time ssh people exit > > real 0m0.724s > user 0m0.032s > sys 0m0.012s > > that real figure averages around the 0.750s mark. So I decided to > perform much bigger transfers to minimise its effect: > > mounty at pantry ~ $ dd if=/dev/zero bs=1M count=2000 | ssh people dd of=/dev/null > 2000+0 records in > 2000+0 records out > 2097152000 bytes (2.1 GB) copied, 138.436 s, 15.1 MB/s > 4096000+0 records in > 4096000+0 records out > 2097152000 bytes transferred in 137.657582 secs (15234555 bytes/sec) > > mounty at pantry ~ $ ssh people dd if=/dev/zero bs=1M count=2000 | dd of=/dev/null > 2000+0 records in > 2000+0 records out > 2097152000 bytes transferred in 51.692313 secs (40569901 bytes/sec) > 4096000+0 records in > 4096000+0 records out > 2097152000 bytes (2.1 GB) copied, 52.4503 s, 40.0 MB/s > > It is puzzling that the in and out figures are so different but I did > perform each test three times and the results were approximately the > same each time. On the read-off-disk test, here are all three runs: > > pantry ~ # dd if=/dev/vda of=/dev/zero bs=1M count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes (1.0 GB) copied, 65.3406 s, 16.0 MB/s > pantry ~ # dd if=/dev/vda of=/dev/zero bs=1M count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes (1.0 GB) copied, 1.19789 s, 875 MB/s > pantry ~ # dd if=/dev/vda of=/dev/zero bs=1M count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes (1.0 GB) copied, 1.85877 s, 564 MB/s > > which I've quoted to show that the disk must be cached. So I tried > again with more data to eliminate that effect: > > pantry ~ # dd if=/dev/vda of=/dev/zero bs=1M count=10240 > 10240+0 records in > 10240+0 records out > 10737418240 bytes (11 GB) copied, 710.215 s, 15.1 MB/s > > I hope that's helpful. > > Michael. > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From jimklimov at cos.ru Sun Jan 4 11:49:09 2015 From: jimklimov at cos.ru (Jim Klimov) Date: Sun, 04 Jan 2015 12:49:09 +0100 Subject: [OmniOS-discuss] networking from a zone In-Reply-To: <20150104194237.67708802@pantry> References: <20150104194237.67708802@pantry> Message-ID: <364196AD-A144-4DAD-870F-475F1C6A0649@cos.ru> On 4 January 2015 10:42:37 CET, Michael Mounteney wrote: >Hello, my server is running a fairly simple firewall. The machine has >two interfaces: > >e1000g0 192.168.0.n/24 connected to the cable modem and the internet. >e1000g1 192.168.1.1/24 connected to a hub and hence various client >machines. > >The firewall is basically as per http://pastebin.com/4aYyZhJ8 and while >this works well for the clients, I can't make it work for a zone. I've >got one zone which shares the e1000g1 interface, which provides various >internal services which I don't want visible to the outside world, but >another zone, which shares the e1000g0 interface, I *do* want to be >able >to see the outside world, but it won't do much. I can ping an external >IP address, but can't do ssh (to an IP address) or DNS for example. > >Any ideas ? Thanks in expectation. > >Michael. >_______________________________________________ >OmniOS-discuss mailing list >OmniOS-discuss at lists.omniti.com >http://lists.omniti.com/mailman/listinfo/omnios-discuss Hello, by "sharing e1000gX" you mean shared IP stacks (special case of aliases) vs. exclusive stacks (over dedicated NICs, or VNICs bound to NICs)? For the exclusive case, you set up complete routing (i.e. default gateway) in the zone. For the shared case, the zone's interfaces are aliases to NICs used in the GZ and use its IP routing and ARP tables. Also, by default at least in older OpenSolaris IP stacks, communications within one stack bypassed L3-L2-L3 conversion and firewalls for speed and were essentially loopback comms. In your case, the zone which 'shares' the internal e1000g1 can't use its 192.168.1.1 as a router, because the GZ does not have itself as a router, but it seems acceptable for you. Possibly comms between two zones in different subnets work already or can be enabled as that loopback bypass (maybe ipfilter ipf.conf loopback keyword had to do with that). The zone sharing the e1000g0 interface should inherit the GZ's default route via 192.168.0.1(?) modem to the internet and so if the zone has an address in that subnet, and the modem does not filter it away (check lan/mac/acl controls of that router) - the internet should work... Just FYI, there were also less apparent 'fake router' tricks to add support for roiting a different subnet in a shared-stack LZ than what is bound to the GZ - essentially you added an address and support for that subnet on the router, and added a static entry into GZ's ARP table and another default route to the same external router with different address - then it could route an LZ too. For debugging, you can 'snoop' in the zone owning the interface (GZ for shared, LZ for dedicated VNICs) to check what requests go out and what does or does not come back in. I did nlt look into pastebin, but maybe your GZ firewall does not allow non-icmp packets to/from the zone's IP address on the external interface, or the modem firewall may be to blame... HTH, //Jim -- Typos courtesy of K-9 Mail on my Samsung Android From jimklimov at cos.ru Sun Jan 4 12:02:42 2015 From: jimklimov at cos.ru (Jim Klimov) Date: Sun, 04 Jan 2015 13:02:42 +0100 Subject: [OmniOS-discuss] networking from a zone In-Reply-To: <20150104194237.67708802@pantry> References: <20150104194237.67708802@pantry> Message-ID: <16378959-E563-49F8-86DC-1B4DBC5EB731@cos.ru> On 4 January 2015 10:42:37 CET, Michael Mounteney wrote: >Hello, my server is running a fairly simple firewall. The machine has >two interfaces: > >e1000g0 192.168.0.n/24 connected to the cable modem and the internet. >e1000g1 192.168.1.1/24 connected to a hub and hence various client >machines. > >The firewall is basically as per http://pastebin.com/4aYyZhJ8 and while >this works well for the clients, I can't make it work for a zone. I've >got one zone which shares the e1000g1 interface, which provides various >internal services which I don't want visible to the outside world, but >another zone, which shares the e1000g0 interface, I *do* want to be >able >to see the outside world, but it won't do much. I can ping an external >IP address, but can't do ssh (to an IP address) or DNS for example. > >Any ideas ? Thanks in expectation. > >Michael. >_______________________________________________ >OmniOS-discuss mailing list >OmniOS-discuss at lists.omniti.com >http://lists.omniti.com/mailman/listinfo/omnios-discuss Now that I looked over your pastebin, a few things pop out: 1) why not use 'head' and 'group' for different directions on different interfaces? This is especially nice for flexibility as you may later add, change or rename interfaces without going all over the ipf.conf file. 2) rules for e1000g0 in/out comms. name the dynamic address for the interface as 'e1000g0/32' which may limit to the GZ address. See if replacing this by the subnet /24 fixes the issue? Does the external LZ have a fixed IP address - you can then pluck in specific rules for its network access then? 3) you start with block in quick on e1000g0 from 192.168.0.0/16 to any which may preclude access to your router and other hosts on the external segment, before consulting further rules below (due to quick) - check if you do want this. Also, before changing anything and after some uptime to gather enough statistics, use 'ipfstat -hion' to see the rule hit counts - especially if any 'allow's do happen after the many 'block quick's. Also instrument all block's with 'log' and check with 'ipmon | grep -w b' what gets thrown away by this firewall. HTH, //Jim Klimov -- Typos courtesy of K-9 Mail on my Samsung Android From rt at steait.net Mon Jan 5 09:08:41 2015 From: rt at steait.net (Rune Tipsmark) Date: Mon, 5 Jan 2015 09:08:41 +0000 Subject: [OmniOS-discuss] offline dedup Message-ID: <1420448919317.14484@steait.net> hi all, does anyone know if offline dedup is something we can expect in the future of ZFS? I have some backup boxes with 50+TB on them and only 32GB Ram and even zdb -S crashes due to lack of memory. Seems complete overkill to put 256+GB ram in a slow backup box... and if I enable dedup as is, it will crash after writing a few TB - reboot required. br, Rune -------------- next part -------------- An HTML attachment was scrubbed... URL: From minikola at gmail.com Mon Jan 5 11:18:25 2015 From: minikola at gmail.com (Nikolam) Date: Mon, 5 Jan 2015 12:18:25 +0100 Subject: [OmniOS-discuss] offline dedup In-Reply-To: <1420448919317.14484@steait.net> References: <1420448919317.14484@steait.net> Message-ID: On 1/5/15, Rune Tipsmark wrote: > hi all, > > does anyone know if offline dedup is something we can expect in the future > of ZFS? > > I have some backup boxes with 50+TB on them and only 32GB Ram and even zdb > -S crashes due to lack of memory. Seems complete overkill to put 256+GB ram > in a slow backup box... and if I enable dedup as is, it will crash after > writing a few TB - reboot required. As I understand, RAM needed for deduplicated data in use relates to how much data is inside dataset that is deduplicated. So problem with large datasets is that they need large amount of RAM reserved during use IF deduplication is turned on. Offline deduplication would be copying one zfs dataset that is not deduplicated to another dataset that has deduplication turned on. Amount of RAM required by deduplicated dataset would be required all the time uring the use of dataset with deduplication turned on. But if deduplication is turned off on new dataset after copying, then that dataset would not use additional RAM during regular use. So, maybe one could make snapshot after creating new deduplicated dataset and then turn deduplication off. Then you do your everyday doing with dedup off. After end of a day, you make a snapshot of a new state of dataset, you make clone of a old snapshot state of dataset , turn ON deduplication on cloned dataset and do zfs send between old and new snapshot. Then you turn off deduplication on new state of dataset, snapshot it, clone it again and then unmount existing working dataset and remount new one. (and possibly destroying old one that is not deduplicated, while retaining previous snapshot before last deduplication) That way you have your offline deduplication, if you write script for it and you make sure that you have enough RAM for deduplication of newly added data during some period, to take small enough time to be called "offline" and finish during off-hours. Initial deduplication of existing data in dataset could need a lot of RAM if there is many data in it. So one might consider moving data to several smaller datasets and deduplicate them separately. Anyway it needs testing and deduplication generally needs RAM relative to data deduplicated. It helps if what I described above is done regularly, while Pool is filled up. Maybe one can do full reliable backup (one do backups, zfs and replication is no substitution for separate physical backups) And the fill pool like described in iterations, if you are sure you will use that pool with small amount of RAM+scripted offline deduplications. Only problem is that even scripted ("offline") deduplications need large amount of RAM the moment deduplication is turned ON on large dataset, so even offline deduplication on large dataset would be a problem on small memory systems. MAYBE if one accept slowdown of using SSD(or SSDs) or even space on spindle disks for deduplication data, that should originally be inside RAM. That would require really investing in development and writing deduplication script and extending ZFS to use on-disk/on-SSD space for Deduplication info. It is only a question how much slower would be that kind of deduplication on large datasets, comparing to in-RAM data and would it finish on time to be called "offline" during low-memory machine "off hours". As rule of the thumb, one needs RAM if ever wants to use deduplication and it works nice if so. If you don't have enough RAM, just forget on deduplication on large datasets or buy more RAM. One can be careful using snapshots to laverage CoW without dedulication. Or invest Development Money and development hours to extend ZFS to be able to use SSDs or disks for deduplication space during offline dedulication. Maybe such feature could be called "Copy on write cp" and used on scripts for offline deduplication, instead of turning deduplication ON and OFF. From fabio at fabiorabelo.wiki.br Mon Jan 5 16:58:57 2015 From: fabio at fabiorabelo.wiki.br (=?UTF-8?Q?F=C3=A1bio_Rabelo?=) Date: Mon, 5 Jan 2015 14:58:57 -0200 Subject: [OmniOS-discuss] Marvel based 10 GB network card Message-ID: Hi to all Someone knows if this new 10 GB cards : http://www.startech.com/Networking-IO/Adapter-Cards/10gb-pcie-nic~ST10000SPEX are, or will be supported in OmniOS ? They are incredibly afordable .... Works with linux, with very good performance ! F?bio Rabelo -------------- next part -------------- An HTML attachment was scrubbed... URL: From groups at tierarzt-mueller.de Mon Jan 5 18:17:30 2015 From: groups at tierarzt-mueller.de (Alexander Lesle) Date: Mon, 5 Jan 2015 19:17:30 +0100 Subject: [OmniOS-discuss] Kernel panic - I cant find the problem In-Reply-To: References: <1531165597.20141231214048@tierarzt-mueller.de> Message-ID: <19133405.20150105191730@tierarzt-mueller.de> Hello Dan McDonald and List, On Januar, 03 2015, 04:11 wrote in [1]: >> On Dec 31, 2014, at 3:40 PM, Alexander Lesle wrote: >> >> > See if dumpadm(1M) shows you have a working place to store kernel > crash dumps. If you do, a "savecore" should get you a vmdump.N > file. Having a full vmdump.N is useful, and is something people can inspect. Sorry for my late answer. My English is not very well. When I understood it right, you want to know if the file "vmdump.2" is on my server available? Yes, I have this file backuped, but it is 682 MB big. Here you can download it: https://www.dropbox.com/s/9jvn0puec42xp8n/vmdump.2?dl=0 I hope you can help me. Thanks. -- Best Regards Alexander Januar, 05 2015 ........ [1] mid:CC78515C-8F79-4A33-ACE2-24AA92B03830 at omniti.com ........ From Josh.Barton at usurf.usu.edu Mon Jan 5 23:29:27 2015 From: Josh.Barton at usurf.usu.edu (Josh Barton) Date: Mon, 5 Jan 2015 23:29:27 +0000 Subject: [OmniOS-discuss] tDom on Solaris Message-ID: <94eff65a47894d1394fc33c07f675787@Perses.usurf.usu.edu> I have had some difficulty getting tDom 0.8.3 or frankly any other version working with OmniOS/Solaris. After a lot of experimentation the only method that worked was to change line 1793 in the makefile (where CC is set to "gcc") to CC="gcc -m64". The configure flag -enable-64bit option made no difference but this one change to the Make file did. Does anyone have any idea why this might be? Josh Barton USU Research Foundation 1695 Research Park Way, North Logan, UT 84321 (435) 713-3089 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gate03 at landcroft.co.uk Mon Jan 5 23:55:23 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Tue, 6 Jan 2015 09:55:23 +1000 Subject: [OmniOS-discuss] tDom on Solaris In-Reply-To: <94eff65a47894d1394fc33c07f675787@Perses.usurf.usu.edu> References: <94eff65a47894d1394fc33c07f675787@Perses.usurf.usu.edu> Message-ID: <20150106095523.1a55f19e@emeritus> On Mon, 5 Jan 2015 23:29:27 +0000 Josh Barton wrote: > I have had some difficulty getting tDom 0.8.3 or frankly any other > version working with OmniOS/Solaris. After a lot of experimentation > the only method that worked was to change line 1793 in the makefile > (where CC is set to "gcc") to CC="gcc -m64". The configure flag > -enable-64bit option made no difference but this one change to the > Make file did. Does anyone have any idea why this might be? It's linking with 64 bit libraries that are already installed. I had to make a similar change when trying to install Horde, or rather, the Imagemagick interface thereof. Michael. From gate03 at landcroft.co.uk Tue Jan 6 00:04:37 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Tue, 6 Jan 2015 10:04:37 +1000 Subject: [OmniOS-discuss] tDom on Solaris In-Reply-To: <94eff65a47894d1394fc33c07f675787@Perses.usurf.usu.edu> References: <94eff65a47894d1394fc33c07f675787@Perses.usurf.usu.edu> Message-ID: <20150106100437.558b6491@emeritus> On Mon, 5 Jan 2015 23:29:27 +0000 Josh Barton wrote: > I have had some difficulty getting tDom 0.8.3 or frankly any other > version working with OmniOS/Solaris. After a lot of experimentation > the only method that worked was to change line 1793 in the makefile > (where CC is set to "gcc") to CC="gcc -m64". The configure flag > -enable-64bit option made no difference but this one change to the > Make file did. Does anyone have any idea why this might be? Sorry, that previous answer was hasty and unhelpful. gcc by default builds 32 bit objects but the linking stage is trying to link to OmniOS's 64 bit libraries. I couldn't find any documentation of the --enable-64bit flag (so I can't tell what it's supposed to do) but -m64 tells gcc to output for 64 bit, whatever other ideas it might have. As I mentioned previously, you need it, to ensure that when ld comes to link all together, it doesn't have objects of unequal word sizes. I hope that's clear. Michael. From filip.marvan at aira.cz Tue Jan 6 11:16:09 2015 From: filip.marvan at aira.cz (Filip Marvan) Date: Tue, 6 Jan 2015 12:16:09 +0100 Subject: [OmniOS-discuss] High Availability storage with ZFS Message-ID: <3BE0DEED8863E5429BAE4CAEDF624565039C1B3313BD@AIRA-SRV.aira.local> Hi as few guys before, I'm thinking again about High Availability storage with ZFS. I know, that there is great commercial RSF-1, but that's quite expensive for my needs. I know, that Sa?o did a great job about that on his blog http://zfs-create.blogspot.cz but I never found the way, how to successfully configure that on current OmniOS versions. So I'm thinking about something more simple. Arrange two LUNs from two OmniOS ZFS storages in one software mirror through fibrechannel. Arrange that mirror in client, for example mdadm in Linux. I know, that it will have performance affect and I will lost some ZFS advantages, but I still can use snapshots, backups with send/receive and some other interesting ZFS things, so it could be usable for some projects. Is there anyone, who tried that before? Any eperience with that? Thank you, Filip Marvan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6220 bytes Desc: not available URL: From piiv at zhaw.ch Tue Jan 6 12:54:29 2015 From: piiv at zhaw.ch (Vincenzo Pii) Date: Tue, 6 Jan 2015 13:54:29 +0100 Subject: [OmniOS-discuss] High Availability storage with ZFS In-Reply-To: <68b326347c7f419d81880f15407c66bd@SRV-MAIL-001.zhaw.ch> References: <68b326347c7f419d81880f15407c66bd@SRV-MAIL-001.zhaw.ch> Message-ID: 2015-01-06 12:16 GMT+01:00 Filip Marvan : > Hi > > > > as few guys before, I'm thinking again about High Availability storage > with ZFS. I know, that there is great commercial RSF-1, but that's quite > expensive for my needs. > > I know, that Sa?o did a great job about that on his blog > http://zfs-create.blogspot.cz but I never found the way, how to > successfully configure that on current OmniOS versions. > > > > So I'm thinking about something more simple. Arrange two LUNs from two > OmniOS ZFS storages in one software mirror through fibrechannel. Arrange > that mirror in client, for example mdadm in Linux. I know, that it will > have performance affect and I will lost some ZFS advantages, but I still > can use snapshots, backups with send/receive and some other interesting ZFS > things, so it could be usable for some projects. > > Is there anyone, who tried that before? Any eperience with that? > > > > Thank you, > > > > Filip Marvan > > > > > Hi Filip, I am not directly answering your question, but I've gone through the configuration of HA (with pacemaker) on OmniOS in the past months and collected all my notes here: http://blog.zhaw.ch/icclab/use-pacemaker-and-corosync-on-illumos-omnios-to-run-a-ha-activepassive-cluster/, maybe it can be useful for you. In my experience, running pacemaker correctly on OmniOS is just the tip of the iceberg, then comes the implementation/configuration of the resource agents (and the cluster itself!). If this way is worth it, rather than a quicker and more custom solution, depends on the long term plans :). Best regards, Vincenzo. -- Vincenzo Pii Researcher, InIT Cloud Computing Lab Zurich University of Applied Sciences (ZHAW) blog.zhaw.ch/icclab -------------- next part -------------- An HTML attachment was scrubbed... URL: From filip.marvan at aira.cz Tue Jan 6 13:08:12 2015 From: filip.marvan at aira.cz (Filip Marvan) Date: Tue, 6 Jan 2015 14:08:12 +0100 Subject: [OmniOS-discuss] High Availability storage with ZFS In-Reply-To: References: <68b326347c7f419d81880f15407c66bd@SRV-MAIL-001.zhaw.ch> Message-ID: <3BE0DEED8863E5429BAE4CAEDF624565039C1B3313F2@AIRA-SRV.aira.local> Hi Vincenzo, your solution is much more better, so thank you very much for your notes. I will try that too! Filip From: Vincenzo Pii [mailto:piiv at zhaw.ch] Sent: Tuesday, January 06, 2015 1:54 PM To: Filip Marvan Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] High Availability storage with ZFS 2015-01-06 12:16 GMT+01:00 Filip Marvan : Hi as few guys before, I'm thinking again about High Availability storage with ZFS. I know, that there is great commercial RSF-1, but that's quite expensive for my needs. I know, that Sa?o did a great job about that on his blog http://zfs-create.blogspot.cz but I never found the way, how to successfully configure that on current OmniOS versions. So I'm thinking about something more simple. Arrange two LUNs from two OmniOS ZFS storages in one software mirror through fibrechannel. Arrange that mirror in client, for example mdadm in Linux. I know, that it will have performance affect and I will lost some ZFS advantages, but I still can use snapshots, backups with send/receive and some other interesting ZFS things, so it could be usable for some projects. Is there anyone, who tried that before? Any eperience with that? Thank you, Filip Marvan Hi Filip, I am not directly answering your question, but I've gone through the configuration of HA (with pacemaker) on OmniOS in the past months and collected all my notes here: http://blog.zhaw.ch/icclab/use-pacemaker-and-corosync-on-illumos-omnios-to-run-a-ha-activepassive-cluster/, maybe it can be useful for you. In my experience, running pacemaker correctly on OmniOS is just the tip of the iceberg, then comes the implementation/configuration of the resource agents (and the cluster itself!). If this way is worth it, rather than a quicker and more custom solution, depends on the long term plans :). Best regards, Vincenzo. -- Vincenzo Pii Researcher, InIT Cloud Computing Lab Zurich University of Applied Sciences (ZHAW) blog.zhaw.ch/icclab -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 6220 bytes Desc: not available URL: From KBruene at simmonsperrine.com Tue Jan 6 15:01:43 2015 From: KBruene at simmonsperrine.com (Kyle Bruene) Date: Tue, 6 Jan 2015 15:01:43 +0000 Subject: [OmniOS-discuss] High Availability storage with ZFS Message-ID: <202C92988C5CF249BD3F9F21B2B199CB33E10091@SPMAIL1.spae.local> Re: [OmniOS-discuss] High Availability storage with ZFS Filip, Currently, I have 2 OmniOS hosts running as storage servers with COMSTAR over Fibre Channel to Windows Server 2012 Hyper-V servers. The Windows Servers use a software mirror of luns from both OmniOS boxes. They have been working great, I believe there is only a slight performance penalty. Not exactly HA and it does have some drawbacks, such as resyncs if I need to take down either OmniOS boxes, but it is nice to know I have two full copies of the data at all times. Kyle From stephan.budach at JVM.DE Tue Jan 6 15:19:51 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Tue, 6 Jan 2015 16:19:51 +0100 Subject: [OmniOS-discuss] High Availability storage with ZFS In-Reply-To: <3BE0DEED8863E5429BAE4CAEDF624565039C1B3313F2@AIRA-SRV.aira.local> References: <68b326347c7f419d81880f15407c66bd@SRV-MAIL-001.zhaw.ch> <3BE0DEED8863E5429BAE4CAEDF624565039C1B3313F2@AIRA-SRV.aira.local> Message-ID: <54ABFD17.9040907@jvm.de> Am 06.01.15 um 14:08 schrieb Filip Marvan: > > Hi Vincenzo, > > your solution is much more better, so thank you very much for your > notes. I will try that too! > > Filip > > *From:*Vincenzo Pii [mailto:piiv at zhaw.ch] > *Sent:* Tuesday, January 06, 2015 1:54 PM > *To:* Filip Marvan > *Cc:* omnios-discuss at lists.omniti.com > *Subject:* Re: [OmniOS-discuss] High Availability storage with ZFS > > 2015-01-06 12:16 GMT+01:00 Filip Marvan >: > > Hi > > as few guys before, I'm thinking again about High Availability storage > with ZFS. I know, that there is great commercial RSF-1, but that's > quite expensive for my needs. > > I know, that Sa?o did a great job about that on his blog > http://zfs-create.blogspot.cz but I never found the way, how to > successfully configure that on current OmniOS versions. > > So I'm thinking about something more simple. Arrange two LUNs from two > OmniOS ZFS storages in one software mirror through fibrechannel. > Arrange that mirror in client, for example mdadm in Linux. I know, > that it will have performance affect and I will lost some ZFS > advantages, but I still can use snapshots, backups with send/receive > and some other interesting ZFS things, so it could be usable for some > projects. > > Is there anyone, who tried that before? Any eperience with that? > > Thank you, > > Filip Marvan > > > Hi Filip, > > I am not directly answering your question, but I've gone through the > configuration of HA (with pacemaker) on OmniOS in the past months and > collected all my notes here: > http://blog.zhaw.ch/icclab/use-pacemaker-and-corosync-on-illumos-omnios-to-run-a-ha-activepassive-cluster/, > maybe it can be useful for you. > > In my experience, running pacemaker correctly on OmniOS is just the > tip of the iceberg, then comes the implementation/configuration of the > resource agents (and the cluster itself!). > > If this way is worth it, rather than a quicker and more custom > solution, depends on the long term plans :). > > Best regards, > > Vincenzo. > Have you looked at the setup that Saso Kiselkov describes in his blog here: http://zfs-create.blogspot.nl/2013/06/building-zfs-storage-appliance-part-1.html It seems to cover most of a pacemaker setup, including the resource agents. Cheers, budy -------------- next part -------------- An HTML attachment was scrubbed... URL: From ikaufman at eng.ucsd.edu Tue Jan 6 17:12:13 2015 From: ikaufman at eng.ucsd.edu (Ian Kaufman) Date: Tue, 6 Jan 2015 09:12:13 -0800 Subject: [OmniOS-discuss] High Availability storage with ZFS In-Reply-To: <54ABFD17.9040907@jvm.de> References: <68b326347c7f419d81880f15407c66bd@SRV-MAIL-001.zhaw.ch> <3BE0DEED8863E5429BAE4CAEDF624565039C1B3313F2@AIRA-SRV.aira.local> <54ABFD17.9040907@jvm.de> Message-ID: Hi all, I have modified and played with some simple scripts to create my own heartbeat, STONITH, failover set up. I currently have a pair of redundant systems, each one having redundant heads connected to shared JBODs, with the data replicated to the backup system via ZFS send/recv. I am going to do further testing, as this is a new set up, but I had it down to a 10 second failover without too much hassle between heads at one point, and the delta for the data on the backup system was down to about 30 minutes at one point. Ian On Tue, Jan 6, 2015 at 7:19 AM, Stephan Budach wrote: > Am 06.01.15 um 14:08 schrieb Filip Marvan: > > Hi Vincenzo, > > > > your solution is much more better, so thank you very much for your notes. I > will try that too! > > > > Filip > > > > > > From: Vincenzo Pii [mailto:piiv at zhaw.ch] > Sent: Tuesday, January 06, 2015 1:54 PM > To: Filip Marvan > Cc: omnios-discuss at lists.omniti.com > Subject: Re: [OmniOS-discuss] High Availability storage with ZFS > > > > 2015-01-06 12:16 GMT+01:00 Filip Marvan : > > Hi > > > > as few guys before, I'm thinking again about High Availability storage with > ZFS. I know, that there is great commercial RSF-1, but that's quite > expensive for my needs. > > I know, that Sa?o did a great job about that on his blog > http://zfs-create.blogspot.cz but I never found the way, how to successfully > configure that on current OmniOS versions. > > > > So I'm thinking about something more simple. Arrange two LUNs from two > OmniOS ZFS storages in one software mirror through fibrechannel. Arrange > that mirror in client, for example mdadm in Linux. I know, that it will have > performance affect and I will lost some ZFS advantages, but I still can use > snapshots, backups with send/receive and some other interesting ZFS things, > so it could be usable for some projects. > > Is there anyone, who tried that before? Any eperience with that? > > > > Thank you, > > > > Filip Marvan > > > > > > > Hi Filip, > > > > I am not directly answering your question, but I've gone through the > configuration of HA (with pacemaker) on OmniOS in the past months and > collected all my notes here: > http://blog.zhaw.ch/icclab/use-pacemaker-and-corosync-on-illumos-omnios-to-run-a-ha-activepassive-cluster/, > maybe it can be useful for you. > > > > In my experience, running pacemaker correctly on OmniOS is just the tip of > the iceberg, then comes the implementation/configuration of the resource > agents (and the cluster itself!). > > If this way is worth it, rather than a quicker and more custom solution, > depends on the long term plans :). > > > > Best regards, > > Vincenzo. > > > > Have you looked at the setup that Saso Kiselkov describes in his blog here: > http://zfs-create.blogspot.nl/2013/06/building-zfs-storage-appliance-part-1.html > It seems to cover most of a pacemaker setup, including the resource agents. > > > Cheers, > budy > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu From chip at innovates.com Tue Jan 6 17:28:35 2015 From: chip at innovates.com (Schweiss, Chip) Date: Tue, 6 Jan 2015 11:28:35 -0600 Subject: [OmniOS-discuss] High Availability storage with ZFS In-Reply-To: <3BE0DEED8863E5429BAE4CAEDF624565039C1B3313BD@AIRA-SRV.aira.local> References: <3BE0DEED8863E5429BAE4CAEDF624565039C1B3313BD@AIRA-SRV.aira.local> Message-ID: On Tue, Jan 6, 2015 at 5:16 AM, Filip Marvan wrote: > Hi > > > > as few guys before, I'm thinking again about High Availability storage > with ZFS. I know, that there is great commercial RSF-1, but that's quite > expensive for my needs. > > I know, that Sa?o did a great job about that on his blog > http://zfs-create.blogspot.cz but I never found the way, how to > successfully configure that on current OmniOS versions. > > > > So I'm thinking about something more simple. Arrange two LUNs from two > OmniOS ZFS storages in one software mirror through fibrechannel. Arrange > that mirror in client, for example mdadm in Linux. I know, that it will > have performance affect and I will lost some ZFS advantages, but I still > can use snapshots, backups with send/receive and some other interesting ZFS > things, so it could be usable for some projects. > > Is there anyone, who tried that before? Any eperience with that? > While this sounds technically possible, it is not HA. Your client is the single point of failure. I would wager that mdadm would create more availability issues than it would be solving. I run RSF-1 and HA is still hard to achieve. I don't think I have gained any additional up-time overcoming failures, but it definitely helps with planned maintenance. Unfortunately, there are still too many ways a zfs pool can fail that having a second server connected does not help. -Chip > > > Thank you, > > > > Filip Marvan > > > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lkateley at kateley.com Tue Jan 6 17:48:22 2015 From: lkateley at kateley.com (Linda Kateley) Date: Tue, 06 Jan 2015 11:48:22 -0600 Subject: [OmniOS-discuss] High Availability storage with ZFS In-Reply-To: References: <3BE0DEED8863E5429BAE4CAEDF624565039C1B3313BD@AIRA-SRV.aira.local> Message-ID: <54AC1FE6.9040109@kateley.com> I thought it was stmsboot and mpathadm on omni? If you are just looking for multipathing to disk? Haven't tried on omni. On 1/6/15 11:28 AM, Schweiss, Chip wrote: > On Tue, Jan 6, 2015 at 5:16 AM, Filip Marvan > wrote: > > Hi > > as few guys before, I'm thinking again about High Availability > storage with ZFS. I know, that there is great commercial RSF-1, > but that's quite expensive for my needs. > > I know, that Sa?o did a great job about that on his blog > http://zfs-create.blogspot.cz but I never found the way, how to > successfully configure that on current OmniOS versions. > > So I'm thinking about something more simple. Arrange two LUNs from > two OmniOS ZFS storages in one software mirror through > fibrechannel. Arrange that mirror in client, for example mdadm in > Linux. I know, that it will have performance affect and I will > lost some ZFS advantages, but I still can use snapshots, backups > with send/receive and some other interesting ZFS things, so it > could be usable for some projects. > > Is there anyone, who tried that before? Any eperience with that? > > > While this sounds technically possible, it is not HA. Your client is > the single point of failure. I would wager that mdadm would create > more availability issues than it would be solving. > > I run RSF-1 and HA is still hard to achieve. I don't think I have > gained any additional up-time overcoming failures, but it definitely > helps with planned maintenance. Unfortunately, there are still too > many ways a zfs pool can fail that having a second server connected > does not help. > > -Chip > > Thank you, > > Filip Marvan > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.elling at richardelling.com Tue Jan 6 18:34:07 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Tue, 6 Jan 2015 10:34:07 -0800 Subject: [OmniOS-discuss] High Availability storage with ZFS In-Reply-To: References: <3BE0DEED8863E5429BAE4CAEDF624565039C1B3313BD@AIRA-SRV.aira.local> Message-ID: <13041534-4904-4D7F-8BF3-B0076266CE2D@richardelling.com> > On Jan 6, 2015, at 9:28 AM, Schweiss, Chip wrote: > > On Tue, Jan 6, 2015 at 5:16 AM, Filip Marvan > wrote: > Hi > > > > as few guys before, I'm thinking again about High Availability storage with ZFS. I know, that there is great commercial RSF-1, but that's quite expensive for my needs. > > I know, that Sa?o did a great job about that on his blog http://zfs-create.blogspot.cz but I never found the way, how to successfully configure that on current OmniOS versions. > > > > So I'm thinking about something more simple. Arrange two LUNs from two OmniOS ZFS storages in one software mirror through fibrechannel. Arrange that mirror in client, for example mdadm in Linux. I know, that it will have performance affect and I will lost some ZFS advantages, but I still can use snapshots, backups with send/receive and some other interesting ZFS things, so it could be usable for some projects. > > Is there anyone, who tried that before? Any eperience with that? > > > While this sounds technically possible, it is not HA. Your client is the single point of failure. I would wager that mdadm would create more availability issues than it would be solving. > > I run RSF-1 and HA is still hard to achieve. HA: 98% perspiration, 2% good fortune :-) But seriously, providing HA services is much, much more than just running software. -- richard > I don't think I have gained any additional up-time overcoming failures, but it definitely helps with planned maintenance. Unfortunately, there are still too many ways a zfs pool can fail that having a second server connected does not help. > > -Chip > > > > > Thank you, > > > > Filip Marvan > > > > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Swab at ColoState.EDU Tue Jan 6 20:18:22 2015 From: Kevin.Swab at ColoState.EDU (Kevin Swab) Date: Tue, 06 Jan 2015 13:18:22 -0700 Subject: [OmniOS-discuss] slow drive response times In-Reply-To: References: <54A44D8C.5090302@ColoState.EDU> <54A49517.6070205@ColoState.EDU> <055A9B13-DC08-4DA3-9827-BD417545BC98@richardelling.com> <54A712A5.9080502@ColoState.EDU> Message-ID: <54AC430E.7090003@ColoState.EDU> SAS expanders are involved in my systems, so I installed 'sasinfo' and 'smp_utils'. After a bit of poking around in the dark, I came up with the following commands which I think get at the error counters you mentioned. Unfortunately, I had to remove the "wounded soldier" from this system since it was causing problems. This output is from the same slot, but with a healthy replacement drive: # sasinfo hba-port -a SUNW-mpt_sas-1 -l HBA Name: SUNW-mpt_sas-1 HBA Port Name: /dev/cfg/c7 Phy Information: Identifier: 0 Link Error Statistics: Invalid Dword: 0 Running Disparity Error: 0 Loss of Dword Sync: 0 Reset Problem: 0 Identifier: 1 Link Error Statistics: Invalid Dword: 0 Running Disparity Error: 0 Loss of Dword Sync: 0 Reset Problem: 0 Identifier: 2 Link Error Statistics: Invalid Dword: 0 Running Disparity Error: 0 Loss of Dword Sync: 0 Reset Problem: 0 Identifier: 3 Link Error Statistics: Invalid Dword: 0 Running Disparity Error: 0 Loss of Dword Sync: 0 Reset Problem: 0 HBA Port Name: /dev/cfg/c8 Phy Information: Identifier: 4 Link Error Statistics: Invalid Dword: 0 Running Disparity Error: 0 Loss of Dword Sync: 0 Reset Problem: 0 Identifier: 5 Link Error Statistics: Invalid Dword: 0 Running Disparity Error: 0 Loss of Dword Sync: 0 Reset Problem: 0 Identifier: 6 Link Error Statistics: Invalid Dword: 0 Running Disparity Error: 0 Loss of Dword Sync: 0 Reset Problem: 0 Identifier: 7 Link Error Statistics: Invalid Dword: 0 Running Disparity Error: 0 Loss of Dword Sync: 0 # ./smp_discover /dev/smp/expd9 | egrep '(c982|c983)' phy 26:U:attached:[50000394a8cbc982:00 t(SSP)] 6 Gbps # ./smp_discover /dev/smp/expd11 | egrep '(c982|c983)' phy 26:U:attached:[50000394a8cbc983:01 t(SSP)] 6 Gbps # ./smp_rep_phy_err_log --phy=26 /dev/smp/expd9 Report phy error log response: Expander change count: 228 phy identifier: 26 invalid dword count: 0 running disparity error count: 0 loss of dword synchronization count: 0 phy reset problem count: 0 # ./smp_rep_phy_err_log --phy=26 /dev/smp/expd11 Report phy error log response: Expander change count: 228 phy identifier: 26 invalid dword count: 0 running disparity error count: 0 loss of dword synchronization count: 0 phy reset problem count: 0 # "disparity error count" and "loss of dword sync count" are 0 in all of this output, in contrast with the non-zero values seen in the sg_logs output for the "wounded soldier". Am I looking at the right output? Does "phy" in the above commands refer to the HDD itself or the port on the expander it's connected to? Had I been able to run the above commands with the "wounded soldier" still installed, what should I have been looking for? Thanks again for your help, Kevin On 01/02/2015 03:45 PM, Richard Elling wrote: > >> On Jan 2, 2015, at 1:50 PM, Kevin Swab > > wrote: >> >> I've run 'sg_logs' on the drive I pulled last week. There were alot of >> errors in the backgroud scan section of the output, which made it very >> large, so I put it here: >> >> http://pastebin.com/jx5BvSep >> >> When I pulled this drive, the SMART health status was OK. > > SMART isn?t smart :-P > >> However, when >> I put it in a test system to run 'sg_logs', the status changed to >> "impending failure...". Had the SMART status changed before pulling the >> drive, I'm sure 'fmd' would have alerted me to the problem? > > By default, fmd looks for the predictive failure (PFA) and self-test > every hour using the disk_transport > agent. fmstat should show activity there. When a PFA is seen, then there > will be an ereport generated > and, for most cases, a syslog message. However, this will not cause a > zfs-retire event. > > Vendors have significant leeway in how they implement SMART. In my > experience the only thing > you can say for sure is if the vendor thinks the drive?s death is > imminent, then you should replace > it. I suspect these policies are financially motivated rather than > scientific? some amount of truthiness > is to be expected. > > In the logs, clearly the one disk has lots of errors that have been > corrected and the rate is increasing. > The rate of change for "Errors corrected with possible delays? may > correlate to your performance issues, > but the interpretation is left up to the vendors. > > In the case of this naughty drive, yep it needs replacing. > >> >> Since that drive had other indications of trouble, I ran 'sg_logs' on >> another drive I pulled recently that has a SMART health status of OK, >> but exibits similar slow service time behavior: >> >> http://pastebin.com/Q0t8Jnug > > This one looks mostly healthy. > > Another place to look for latency issues is the phy logs. In the sg_logs > output, this is the > Protocol Specific port log page for SAS SSP. Key values are running > disparity error > count and loss of dword sync count. The trick here is that you need to > look at both ends > of the wire for each wire. For a simple case, this means looking at both > the HBA?s phys error > counts and the driver. If you have expanders in the mix, it is more > work. You?ll want to look at > all of the HBA, expander, and drive phys health counters for all phys. > > This can get tricky because wide ports are mostly dumb. For example, if > an HBA has a 4-link > wide port (common) and one of the links is acting up (all too common) > the latency impacts > will be random. > > To see HBA and expander link health, you can use sg3_utils, its > companion smp_utils, or > sasinfo (installed as a separate package from OmniOS, IIRC). For example, > sasinfo hba-port -l > > HTH > ? richard > > >> >> Thanks for taking the time to look at these, please let me know what you >> find... >> >> Kevin >> >> >> >> >> On 12/31/2014 06:13 PM, Richard Elling wrote: >>> >>>> On Dec 31, 2014, at 4:30 PM, Kevin Swab >>> > wrote: >>>> >>>> Hello Richard and group, thanks for your reply! >>>> >>>> I'll look into sg_logs for one of these devices once I have a chance to >>>> track that progam down... >>>> >>>> Thanks for the tip on the 500 ms latency, I wasn't aware that could >>>> happen in normal cases. However, I don't believe what I'm seeing >>>> constitutes normal behavior. >>>> >>>> First, some anecdotal evidence: If I pull and replace the suspect >>>> drive, my downstream systems stop complaining, and the high service time >>>> numbers go away. >>> >>> We call these "wounded soldiers" -- it takes more resources to manage a >>> wounded soldier than a dead soldier, so one strategy of war is to >>> wound your >>> enemy causing them to consume resources tending the wounded. The sg_logs >>> should be enlightening. >>> >>> NB, consider a 4TB disk with 5 platters: if a head or surface starts >>> to go, then >>> you have a 1/10 chance that the data you request is under the >>> damaged head >>> and will need to be recovered by the drive. So it is not uncommon to see >>> 90+% of the I/Os to the drive completing quickly. It is also not >>> unusual to see >>> only a small number of sectors or tracks affected. >>> >>> Detecting these becomes tricky, especially as you reduce the >>> timeout/retry >>> interval, since the problem is rarely seen in the average latency -- >>> that which >>> iostat and sar record. This is an area where we can and are improving. >>> -- richard >>> >>>> >>>> I threw out 500 ms as a guess to the point at which I start seeing >>>> problems. However, I see service times far in excess of that, sometimes >>>> over 30,000 ms! Below is 20 minutes of sar output from a drive I pulled >>>> a few days ago, during a time when downstream VMWare servers were >>>> complaining. (since the sar output is so verbose, I grepped out the >>>> info just for the suspect drive): >>>> >>>> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd91,a)' >>>> 14:50:00 device %busy avque r+w/s blks/s avwait avserv >>>> sd91,a 99 5.3 1 42 0.0 7811.7 >>>> sd91,a 100 11.3 1 53 0.0 11016.0 >>>> sd91,a 100 3.8 1 75 0.0 3615.8 >>>> sd91,a 100 4.9 1 25 0.0 8633.5 >>>> sd91,a 93 3.9 1 55 0.0 4385.3 >>>> sd91,a 86 3.5 2 75 0.0 2060.5 >>>> sd91,a 91 3.1 4 80 0.0 823.8 >>>> sd91,a 97 3.5 1 50 0.0 3984.5 >>>> sd91,a 100 4.4 1 56 0.0 6068.6 >>>> sd91,a 100 5.0 1 55 0.0 8836.0 >>>> sd91,a 100 5.7 1 51 0.0 7939.6 >>>> sd91,a 98 9.9 1 42 0.0 12526.8 >>>> sd91,a 100 7.4 0 10 0.0 36813.6 >>>> sd91,a 51 3.8 8 90 0.0 500.2 >>>> sd91,a 88 3.4 1 60 0.0 2338.8 >>>> sd91,a 100 4.5 1 28 0.0 6969.2 >>>> sd91,a 93 3.8 1 59 0.0 5138.9 >>>> sd91,a 79 3.1 1 59 0.0 3143.9 >>>> sd91,a 99 4.7 1 52 0.0 5598.4 >>>> sd91,a 100 4.8 1 62 0.0 6638.4 >>>> sd91,a 94 5.0 1 54 0.0 3752.7 >>>> >>>> For comparison, here's the sar output from another drive in the same >>>> pool for the same period of time: >>>> >>>> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd82,a)' >>>> 14:50:00 device %busy avque r+w/s blks/s avwait avserv >>>> sd82,a 0 0.0 2 28 0.0 5.6 >>>> sd82,a 1 0.0 3 51 0.0 5.4 >>>> sd82,a 1 0.0 4 66 0.0 6.3 >>>> sd82,a 1 0.0 3 48 0.0 4.3 >>>> sd82,a 1 0.0 3 45 0.0 6.1 >>>> sd82,a 1 0.0 6 82 0.0 2.7 >>>> sd82,a 1 0.0 8 112 0.0 2.8 >>>> sd82,a 0 0.0 3 27 0.0 1.8 >>>> sd82,a 1 0.0 5 80 0.0 3.1 >>>> sd82,a 0 0.0 3 35 0.0 3.1 >>>> sd82,a 1 0.0 3 35 0.0 3.8 >>>> sd82,a 1 0.0 4 49 0.0 3.2 >>>> sd82,a 0 0.0 0 0 0.0 4.1 >>>> sd82,a 3 0.0 9 84 0.0 4.1 >>>> sd82,a 1 0.0 6 55 0.0 3.7 >>>> sd82,a 0 0.0 1 23 0.0 7.0 >>>> sd82,a 0 0.0 6 57 0.0 1.8 >>>> sd82,a 1 0.0 5 70 0.0 2.3 >>>> sd82,a 1 0.0 4 55 0.0 3.7 >>>> sd82,a 1 0.0 5 72 0.0 4.1 >>>> sd82,a 1 0.0 4 54 0.0 3.6 >>>> >>>> The other drives in this pool all show data similar to that of sd82. >>>> >>>> Your point about tuning blindly is well taken, and I'm certainly no >>>> expert on the IO stack. What's a humble sysadmin to do? >>>> >>>> For further reference, this system is running r151010. The drive in >>>> question is a Toshiba MG03SCA300 (7200rpm SAS), and the pool the drive >>>> was in is using lz4 compression and looks like this: >>>> >>>> # zpool status data1 >>>> pool: data1 >>>> state: ONLINE >>>> scan: resilvered 1.67T in 70h56m with 0 errors on Wed Dec 31 >>>> 14:40:20 2014 >>>> config: >>>> >>>> NAME STATE READ WRITE CKSUM >>>> data1 ONLINE 0 0 0 >>>> raidz2-0 ONLINE 0 0 0 >>>> c6t5000039468CB54F0d0 ONLINE 0 0 0 >>>> c6t5000039478CB5138d0 ONLINE 0 0 0 >>>> c6t5000039468D000DCd0 ONLINE 0 0 0 >>>> c6t5000039468D000E8d0 ONLINE 0 0 0 >>>> c6t5000039468D00F5Cd0 ONLINE 0 0 0 >>>> c6t5000039478C816CCd0 ONLINE 0 0 0 >>>> c6t5000039478C8546Cd0 ONLINE 0 0 0 >>>> raidz2-1 ONLINE 0 0 0 >>>> c6t5000039478C855F0d0 ONLINE 0 0 0 >>>> c6t5000039478C856E8d0 ONLINE 0 0 0 >>>> c6t5000039478C856ECd0 ONLINE 0 0 0 >>>> c6t5000039478C856F4d0 ONLINE 0 0 0 >>>> c6t5000039478C86374d0 ONLINE 0 0 0 >>>> c6t5000039478C8C2A8d0 ONLINE 0 0 0 >>>> c6t5000039478C8C364d0 ONLINE 0 0 0 >>>> raidz2-2 ONLINE 0 0 0 >>>> c6t5000039478C9958Cd0 ONLINE 0 0 0 >>>> c6t5000039478C995C4d0 ONLINE 0 0 0 >>>> c6t5000039478C9DACCd0 ONLINE 0 0 0 >>>> c6t5000039478C9DB30d0 ONLINE 0 0 0 >>>> c6t5000039478C9DB6Cd0 ONLINE 0 0 0 >>>> c6t5000039478CA73B4d0 ONLINE 0 0 0 >>>> c6t5000039478CB3A20d0 ONLINE 0 0 0 >>>> raidz2-3 ONLINE 0 0 0 >>>> c6t5000039478CB3A64d0 ONLINE 0 0 0 >>>> c6t5000039478CB3A70d0 ONLINE 0 0 0 >>>> c6t5000039478CB3E7Cd0 ONLINE 0 0 0 >>>> c6t5000039478CB3EB0d0 ONLINE 0 0 0 >>>> c6t5000039478CB3FBCd0 ONLINE 0 0 0 >>>> c6t5000039478CB4048d0 ONLINE 0 0 0 >>>> c6t5000039478CB4054d0 ONLINE 0 0 0 >>>> raidz2-4 ONLINE 0 0 0 >>>> c6t5000039478CB424Cd0 ONLINE 0 0 0 >>>> c6t5000039478CB4250d0 ONLINE 0 0 0 >>>> c6t5000039478CB470Cd0 ONLINE 0 0 0 >>>> c6t5000039478CB471Cd0 ONLINE 0 0 0 >>>> c6t5000039478CB4E50d0 ONLINE 0 0 0 >>>> c6t5000039478CB50A8d0 ONLINE 0 0 0 >>>> c6t5000039478CB50BCd0 ONLINE 0 0 0 >>>> spares >>>> c6t50000394A8CBC93Cd0 AVAIL >>>> >>>> errors: No known data errors >>>> >>>> >>>> Thanks for your help, >>>> Kevin >>>> >>>> On 12/31/2014 3:22 PM, Richard Elling wrote: >>>>> >>>>>> On Dec 31, 2014, at 11:25 AM, Kevin Swab >>>>> > wrote: >>>>>> >>>>>> Hello Everyone, >>>>>> >>>>>> We've been running OmniOS on a number of SuperMicro 36bay chassis, >>>>>> with >>>>>> Supermicro motherboards, LSI SAS controllers (9211-8i & 9207-8i) and >>>>>> various SAS HDD's. These systems are serving block storage via >>>>>> Comstar >>>>>> and Qlogic FC HBA's, and have been running well for several years. >>>>>> >>>>>> The problem we've got is that as the drives age, some of them start to >>>>>> perform slowly (intermittently) without failing - no zpool or iostat >>>>>> errors, and nothing logged in /var/adm/messages. The slow performance >>>>>> can be seen as high average service times in iostat or sar. >>>>> >>>>> Look at the drive's error logs using sg_logs (-a for all) >>>>> >>>>>> >>>>>> When these service times get above 500ms, they start to cause IO >>>>>> timeouts on the downstream storage consumers, which is bad... >>>>> >>>>> 500 milliseconds is not unusual for a busy HDD with SCSI TCQ or >>>>> SATA NCQ >>>>> >>>>>> >>>>>> I'm wondering - is there a way to tune OmniOS' behavior so that it >>>>>> doesn't try so hard to complete IOs to these slow disks, and instead >>>>>> just gives up and fails them? >>>>> >>>>> Yes, the tuning in Alasdair's blog should work as he describes. >>>>> More below... >>>>> >>>>>> >>>>>> I found an old post from 2011 which states that some tunables exist, >>>>>> but are ignored by the mpt_sas driver: >>>>>> >>>>>> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ >>>>>> >>>>>> Does anyone know the current status of these tunables, or have any >>>>>> other >>>>>> suggestions that might help? >>>>> >>>>> These tunables are on the order of seconds. The default, 60, is >>>>> obviously too big >>>>> unless you have old, slow, SCSI CD-ROMs. But setting it below the >>>>> manufacturer's >>>>> internal limit (default or tuned) can lead to an unstable system. >>>>> Some vendors are >>>>> better than others at documenting these, but in any case you'll >>>>> need to see their spec. >>>>> Expect values on the order of 6 to 15 seconds for modern HDDs and SSDs. >>>>> >>>>> There are a lot of tunables in this area at all levels of the >>>>> architecture. OOB, the OmniOS >>>>> settings ensure stable behaviour. Tuning any layer without >>>>> understanding the others can >>>>> lead to unstable systems, as demonstrated by your current >>>>> downstream consumers. >>>>> -- richard >>>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> Kevin >>>>>> >>>>>> >>>>>> -- >>>>>> ------------------------------------------------------------------- >>>>>> Kevin Swab UNIX Systems Administrator >>>>>> ACNS Colorado State University >>>>>> Phone: (970)491-6572 Email: >>>>>> Kevin.Swab at ColoState.EDU >>>>>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C >>>>>> _______________________________________________ >>>>>> OmniOS-discuss mailing list >>>>>> OmniOS-discuss at lists.omniti.com >>>>>> >>>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>>> >> >> -- >> ------------------------------------------------------------------- >> Kevin Swab UNIX Systems Administrator >> ACNS Colorado State University >> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU >> >> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C > > -- > > Richard.Elling at RichardElling.com > +1-760-896-4422 > > > -- ------------------------------------------------------------------- Kevin Swab UNIX Systems Administrator ACNS Colorado State University Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C From richard.elling at richardelling.com Tue Jan 6 21:23:14 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Tue, 6 Jan 2015 13:23:14 -0800 Subject: [OmniOS-discuss] slow drive response times In-Reply-To: <54AC430E.7090003@ColoState.EDU> References: <54A44D8C.5090302@ColoState.EDU> <54A49517.6070205@ColoState.EDU> <055A9B13-DC08-4DA3-9827-BD417545BC98@richardelling.com> <54A712A5.9080502@ColoState.EDU> <54AC430E.7090003@ColoState.EDU> Message-ID: <831FFC8C-E381-40FD-A542-F5596904728B@richardelling.com> > On Jan 6, 2015, at 12:18 PM, Kevin Swab wrote: > > SAS expanders are involved in my systems, so I installed 'sasinfo' and > 'smp_utils'. After a bit of poking around in the dark, I came up with > the following commands which I think get at the error counters you > mentioned. Yes, this data looks fine > > Unfortunately, I had to remove the "wounded soldier" from this system > since it was causing problems. This output is from the same slot, but > with a healthy replacement drive: > > # sasinfo hba-port -a SUNW-mpt_sas-1 -l > HBA Name: SUNW-mpt_sas-1 > HBA Port Name: /dev/cfg/c7 > Phy Information: > Identifier: 0 > Link Error Statistics: > Invalid Dword: 0 > Running Disparity Error: 0 > Loss of Dword Sync: 0 > Reset Problem: 0 > Identifier: 1 > Link Error Statistics: > Invalid Dword: 0 > Running Disparity Error: 0 > Loss of Dword Sync: 0 > Reset Problem: 0 > Identifier: 2 > Link Error Statistics: > Invalid Dword: 0 > Running Disparity Error: 0 > Loss of Dword Sync: 0 > Reset Problem: 0 > Identifier: 3 > Link Error Statistics: > Invalid Dword: 0 > Running Disparity Error: 0 > Loss of Dword Sync: 0 > Reset Problem: 0 perfect! > HBA Port Name: /dev/cfg/c8 > Phy Information: > Identifier: 4 > Link Error Statistics: > Invalid Dword: 0 > Running Disparity Error: 0 > Loss of Dword Sync: 0 > Reset Problem: 0 > Identifier: 5 > Link Error Statistics: > Invalid Dword: 0 > Running Disparity Error: 0 > Loss of Dword Sync: 0 > Reset Problem: 0 > Identifier: 6 > Link Error Statistics: > Invalid Dword: 0 > Running Disparity Error: 0 > Loss of Dword Sync: 0 > Reset Problem: 0 > Identifier: 7 > Link Error Statistics: > Invalid Dword: 0 > Running Disparity Error: 0 > Loss of Dword Sync: 0 perfect! > > > > # ./smp_discover /dev/smp/expd9 | egrep '(c982|c983)' > phy 26:U:attached:[50000394a8cbc982:00 t(SSP)] 6 Gbps > # ./smp_discover /dev/smp/expd11 | egrep '(c982|c983)' > phy 26:U:attached:[50000394a8cbc983:01 t(SSP)] 6 Gbps > # ./smp_rep_phy_err_log --phy=26 /dev/smp/expd9 > Report phy error log response: > Expander change count: 228 > phy identifier: 26 > invalid dword count: 0 > running disparity error count: 0 > loss of dword synchronization count: 0 > phy reset problem count: 0 > # ./smp_rep_phy_err_log --phy=26 /dev/smp/expd11 > Report phy error log response: > Expander change count: 228 > phy identifier: 26 > invalid dword count: 0 > running disparity error count: 0 > loss of dword synchronization count: 0 > phy reset problem count: 0 > # > > "disparity error count" and "loss of dword sync count" are 0 in all of > this output, in contrast with the non-zero values seen in the sg_logs > output for the "wounded soldier". perfect! > > Am I looking at the right output? Yes, this is not showing any errors, which is a good thing. > Does "phy" in the above commands > refer to the HDD itself or the port on the expander it's connected to? Expander port. The HDD's view is in the sg_logs --page=0x18 /dev/rdsk/... > Had I been able to run the above commands with the "wounded soldier" > still installed, what should I have been looking for? The process is to rule out errors. You have succeeded. -- richard > > Thanks again for your help, > Kevin > > > On 01/02/2015 03:45 PM, Richard Elling wrote: >> >>> On Jan 2, 2015, at 1:50 PM, Kevin Swab >> > wrote: >>> >>> I've run 'sg_logs' on the drive I pulled last week. There were alot of >>> errors in the backgroud scan section of the output, which made it very >>> large, so I put it here: >>> >>> http://pastebin.com/jx5BvSep >>> >>> When I pulled this drive, the SMART health status was OK. >> >> SMART isn?t smart :-P >> >>> However, when >>> I put it in a test system to run 'sg_logs', the status changed to >>> "impending failure...". Had the SMART status changed before pulling the >>> drive, I'm sure 'fmd' would have alerted me to the problem? >> >> By default, fmd looks for the predictive failure (PFA) and self-test >> every hour using the disk_transport >> agent. fmstat should show activity there. When a PFA is seen, then there >> will be an ereport generated >> and, for most cases, a syslog message. However, this will not cause a >> zfs-retire event. >> >> Vendors have significant leeway in how they implement SMART. In my >> experience the only thing >> you can say for sure is if the vendor thinks the drive?s death is >> imminent, then you should replace >> it. I suspect these policies are financially motivated rather than >> scientific? some amount of truthiness >> is to be expected. >> >> In the logs, clearly the one disk has lots of errors that have been >> corrected and the rate is increasing. >> The rate of change for "Errors corrected with possible delays? may >> correlate to your performance issues, >> but the interpretation is left up to the vendors. >> >> In the case of this naughty drive, yep it needs replacing. >> >>> >>> Since that drive had other indications of trouble, I ran 'sg_logs' on >>> another drive I pulled recently that has a SMART health status of OK, >>> but exibits similar slow service time behavior: >>> >>> http://pastebin.com/Q0t8Jnug >> >> This one looks mostly healthy. >> >> Another place to look for latency issues is the phy logs. In the sg_logs >> output, this is the >> Protocol Specific port log page for SAS SSP. Key values are running >> disparity error >> count and loss of dword sync count. The trick here is that you need to >> look at both ends >> of the wire for each wire. For a simple case, this means looking at both >> the HBA?s phys error >> counts and the driver. If you have expanders in the mix, it is more >> work. You?ll want to look at >> all of the HBA, expander, and drive phys health counters for all phys. >> >> This can get tricky because wide ports are mostly dumb. For example, if >> an HBA has a 4-link >> wide port (common) and one of the links is acting up (all too common) >> the latency impacts >> will be random. >> >> To see HBA and expander link health, you can use sg3_utils, its >> companion smp_utils, or >> sasinfo (installed as a separate package from OmniOS, IIRC). For example, >> sasinfo hba-port -l >> >> HTH >> ? richard >> >> >>> >>> Thanks for taking the time to look at these, please let me know what you >>> find... >>> >>> Kevin >>> >>> >>> >>> >>> On 12/31/2014 06:13 PM, Richard Elling wrote: >>>> >>>>> On Dec 31, 2014, at 4:30 PM, Kevin Swab >>>> > wrote: >>>>> >>>>> Hello Richard and group, thanks for your reply! >>>>> >>>>> I'll look into sg_logs for one of these devices once I have a chance to >>>>> track that progam down... >>>>> >>>>> Thanks for the tip on the 500 ms latency, I wasn't aware that could >>>>> happen in normal cases. However, I don't believe what I'm seeing >>>>> constitutes normal behavior. >>>>> >>>>> First, some anecdotal evidence: If I pull and replace the suspect >>>>> drive, my downstream systems stop complaining, and the high service time >>>>> numbers go away. >>>> >>>> We call these "wounded soldiers" -- it takes more resources to manage a >>>> wounded soldier than a dead soldier, so one strategy of war is to >>>> wound your >>>> enemy causing them to consume resources tending the wounded. The sg_logs >>>> should be enlightening. >>>> >>>> NB, consider a 4TB disk with 5 platters: if a head or surface starts >>>> to go, then >>>> you have a 1/10 chance that the data you request is under the >>>> damaged head >>>> and will need to be recovered by the drive. So it is not uncommon to see >>>> 90+% of the I/Os to the drive completing quickly. It is also not >>>> unusual to see >>>> only a small number of sectors or tracks affected. >>>> >>>> Detecting these becomes tricky, especially as you reduce the >>>> timeout/retry >>>> interval, since the problem is rarely seen in the average latency -- >>>> that which >>>> iostat and sar record. This is an area where we can and are improving. >>>> -- richard >>>> >>>>> >>>>> I threw out 500 ms as a guess to the point at which I start seeing >>>>> problems. However, I see service times far in excess of that, sometimes >>>>> over 30,000 ms! Below is 20 minutes of sar output from a drive I pulled >>>>> a few days ago, during a time when downstream VMWare servers were >>>>> complaining. (since the sar output is so verbose, I grepped out the >>>>> info just for the suspect drive): >>>>> >>>>> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd91,a)' >>>>> 14:50:00 device %busy avque r+w/s blks/s avwait avserv >>>>> sd91,a 99 5.3 1 42 0.0 7811.7 >>>>> sd91,a 100 11.3 1 53 0.0 11016.0 >>>>> sd91,a 100 3.8 1 75 0.0 3615.8 >>>>> sd91,a 100 4.9 1 25 0.0 8633.5 >>>>> sd91,a 93 3.9 1 55 0.0 4385.3 >>>>> sd91,a 86 3.5 2 75 0.0 2060.5 >>>>> sd91,a 91 3.1 4 80 0.0 823.8 >>>>> sd91,a 97 3.5 1 50 0.0 3984.5 >>>>> sd91,a 100 4.4 1 56 0.0 6068.6 >>>>> sd91,a 100 5.0 1 55 0.0 8836.0 >>>>> sd91,a 100 5.7 1 51 0.0 7939.6 >>>>> sd91,a 98 9.9 1 42 0.0 12526.8 >>>>> sd91,a 100 7.4 0 10 0.0 36813.6 >>>>> sd91,a 51 3.8 8 90 0.0 500.2 >>>>> sd91,a 88 3.4 1 60 0.0 2338.8 >>>>> sd91,a 100 4.5 1 28 0.0 6969.2 >>>>> sd91,a 93 3.8 1 59 0.0 5138.9 >>>>> sd91,a 79 3.1 1 59 0.0 3143.9 >>>>> sd91,a 99 4.7 1 52 0.0 5598.4 >>>>> sd91,a 100 4.8 1 62 0.0 6638.4 >>>>> sd91,a 94 5.0 1 54 0.0 3752.7 >>>>> >>>>> For comparison, here's the sar output from another drive in the same >>>>> pool for the same period of time: >>>>> >>>>> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd82,a)' >>>>> 14:50:00 device %busy avque r+w/s blks/s avwait avserv >>>>> sd82,a 0 0.0 2 28 0.0 5.6 >>>>> sd82,a 1 0.0 3 51 0.0 5.4 >>>>> sd82,a 1 0.0 4 66 0.0 6.3 >>>>> sd82,a 1 0.0 3 48 0.0 4.3 >>>>> sd82,a 1 0.0 3 45 0.0 6.1 >>>>> sd82,a 1 0.0 6 82 0.0 2.7 >>>>> sd82,a 1 0.0 8 112 0.0 2.8 >>>>> sd82,a 0 0.0 3 27 0.0 1.8 >>>>> sd82,a 1 0.0 5 80 0.0 3.1 >>>>> sd82,a 0 0.0 3 35 0.0 3.1 >>>>> sd82,a 1 0.0 3 35 0.0 3.8 >>>>> sd82,a 1 0.0 4 49 0.0 3.2 >>>>> sd82,a 0 0.0 0 0 0.0 4.1 >>>>> sd82,a 3 0.0 9 84 0.0 4.1 >>>>> sd82,a 1 0.0 6 55 0.0 3.7 >>>>> sd82,a 0 0.0 1 23 0.0 7.0 >>>>> sd82,a 0 0.0 6 57 0.0 1.8 >>>>> sd82,a 1 0.0 5 70 0.0 2.3 >>>>> sd82,a 1 0.0 4 55 0.0 3.7 >>>>> sd82,a 1 0.0 5 72 0.0 4.1 >>>>> sd82,a 1 0.0 4 54 0.0 3.6 >>>>> >>>>> The other drives in this pool all show data similar to that of sd82. >>>>> >>>>> Your point about tuning blindly is well taken, and I'm certainly no >>>>> expert on the IO stack. What's a humble sysadmin to do? >>>>> >>>>> For further reference, this system is running r151010. The drive in >>>>> question is a Toshiba MG03SCA300 (7200rpm SAS), and the pool the drive >>>>> was in is using lz4 compression and looks like this: >>>>> >>>>> # zpool status data1 >>>>> pool: data1 >>>>> state: ONLINE >>>>> scan: resilvered 1.67T in 70h56m with 0 errors on Wed Dec 31 >>>>> 14:40:20 2014 >>>>> config: >>>>> >>>>> NAME STATE READ WRITE CKSUM >>>>> data1 ONLINE 0 0 0 >>>>> raidz2-0 ONLINE 0 0 0 >>>>> c6t5000039468CB54F0d0 ONLINE 0 0 0 >>>>> c6t5000039478CB5138d0 ONLINE 0 0 0 >>>>> c6t5000039468D000DCd0 ONLINE 0 0 0 >>>>> c6t5000039468D000E8d0 ONLINE 0 0 0 >>>>> c6t5000039468D00F5Cd0 ONLINE 0 0 0 >>>>> c6t5000039478C816CCd0 ONLINE 0 0 0 >>>>> c6t5000039478C8546Cd0 ONLINE 0 0 0 >>>>> raidz2-1 ONLINE 0 0 0 >>>>> c6t5000039478C855F0d0 ONLINE 0 0 0 >>>>> c6t5000039478C856E8d0 ONLINE 0 0 0 >>>>> c6t5000039478C856ECd0 ONLINE 0 0 0 >>>>> c6t5000039478C856F4d0 ONLINE 0 0 0 >>>>> c6t5000039478C86374d0 ONLINE 0 0 0 >>>>> c6t5000039478C8C2A8d0 ONLINE 0 0 0 >>>>> c6t5000039478C8C364d0 ONLINE 0 0 0 >>>>> raidz2-2 ONLINE 0 0 0 >>>>> c6t5000039478C9958Cd0 ONLINE 0 0 0 >>>>> c6t5000039478C995C4d0 ONLINE 0 0 0 >>>>> c6t5000039478C9DACCd0 ONLINE 0 0 0 >>>>> c6t5000039478C9DB30d0 ONLINE 0 0 0 >>>>> c6t5000039478C9DB6Cd0 ONLINE 0 0 0 >>>>> c6t5000039478CA73B4d0 ONLINE 0 0 0 >>>>> c6t5000039478CB3A20d0 ONLINE 0 0 0 >>>>> raidz2-3 ONLINE 0 0 0 >>>>> c6t5000039478CB3A64d0 ONLINE 0 0 0 >>>>> c6t5000039478CB3A70d0 ONLINE 0 0 0 >>>>> c6t5000039478CB3E7Cd0 ONLINE 0 0 0 >>>>> c6t5000039478CB3EB0d0 ONLINE 0 0 0 >>>>> c6t5000039478CB3FBCd0 ONLINE 0 0 0 >>>>> c6t5000039478CB4048d0 ONLINE 0 0 0 >>>>> c6t5000039478CB4054d0 ONLINE 0 0 0 >>>>> raidz2-4 ONLINE 0 0 0 >>>>> c6t5000039478CB424Cd0 ONLINE 0 0 0 >>>>> c6t5000039478CB4250d0 ONLINE 0 0 0 >>>>> c6t5000039478CB470Cd0 ONLINE 0 0 0 >>>>> c6t5000039478CB471Cd0 ONLINE 0 0 0 >>>>> c6t5000039478CB4E50d0 ONLINE 0 0 0 >>>>> c6t5000039478CB50A8d0 ONLINE 0 0 0 >>>>> c6t5000039478CB50BCd0 ONLINE 0 0 0 >>>>> spares >>>>> c6t50000394A8CBC93Cd0 AVAIL >>>>> >>>>> errors: No known data errors >>>>> >>>>> >>>>> Thanks for your help, >>>>> Kevin >>>>> >>>>> On 12/31/2014 3:22 PM, Richard Elling wrote: >>>>>> >>>>>>> On Dec 31, 2014, at 11:25 AM, Kevin Swab >>>>>> > wrote: >>>>>>> >>>>>>> Hello Everyone, >>>>>>> >>>>>>> We've been running OmniOS on a number of SuperMicro 36bay chassis, >>>>>>> with >>>>>>> Supermicro motherboards, LSI SAS controllers (9211-8i & 9207-8i) and >>>>>>> various SAS HDD's. These systems are serving block storage via >>>>>>> Comstar >>>>>>> and Qlogic FC HBA's, and have been running well for several years. >>>>>>> >>>>>>> The problem we've got is that as the drives age, some of them start to >>>>>>> perform slowly (intermittently) without failing - no zpool or iostat >>>>>>> errors, and nothing logged in /var/adm/messages. The slow performance >>>>>>> can be seen as high average service times in iostat or sar. >>>>>> >>>>>> Look at the drive's error logs using sg_logs (-a for all) >>>>>> >>>>>>> >>>>>>> When these service times get above 500ms, they start to cause IO >>>>>>> timeouts on the downstream storage consumers, which is bad... >>>>>> >>>>>> 500 milliseconds is not unusual for a busy HDD with SCSI TCQ or >>>>>> SATA NCQ >>>>>> >>>>>>> >>>>>>> I'm wondering - is there a way to tune OmniOS' behavior so that it >>>>>>> doesn't try so hard to complete IOs to these slow disks, and instead >>>>>>> just gives up and fails them? >>>>>> >>>>>> Yes, the tuning in Alasdair's blog should work as he describes. >>>>>> More below... >>>>>> >>>>>>> >>>>>>> I found an old post from 2011 which states that some tunables exist, >>>>>>> but are ignored by the mpt_sas driver: >>>>>>> >>>>>>> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ >>>>>>> >>>>>>> Does anyone know the current status of these tunables, or have any >>>>>>> other >>>>>>> suggestions that might help? >>>>>> >>>>>> These tunables are on the order of seconds. The default, 60, is >>>>>> obviously too big >>>>>> unless you have old, slow, SCSI CD-ROMs. But setting it below the >>>>>> manufacturer's >>>>>> internal limit (default or tuned) can lead to an unstable system. >>>>>> Some vendors are >>>>>> better than others at documenting these, but in any case you'll >>>>>> need to see their spec. >>>>>> Expect values on the order of 6 to 15 seconds for modern HDDs and SSDs. >>>>>> >>>>>> There are a lot of tunables in this area at all levels of the >>>>>> architecture. OOB, the OmniOS >>>>>> settings ensure stable behaviour. Tuning any layer without >>>>>> understanding the others can >>>>>> lead to unstable systems, as demonstrated by your current >>>>>> downstream consumers. >>>>>> -- richard >>>>>> >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Kevin >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ------------------------------------------------------------------- >>>>>>> Kevin Swab UNIX Systems Administrator >>>>>>> ACNS Colorado State University >>>>>>> Phone: (970)491-6572 Email: >>>>>>> Kevin.Swab at ColoState.EDU >>>>>>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C >>>>>>> _______________________________________________ >>>>>>> OmniOS-discuss mailing list >>>>>>> OmniOS-discuss at lists.omniti.com >>>>>>> >>>>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>>>> >>> >>> -- >>> ------------------------------------------------------------------- >>> Kevin Swab UNIX Systems Administrator >>> ACNS Colorado State University >>> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU >>> >>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C >> >> -- >> >> Richard.Elling at RichardElling.com >> +1-760-896-4422 >> >> >> > > -- > ------------------------------------------------------------------- > Kevin Swab UNIX Systems Administrator > ACNS Colorado State University > Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU > GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C From Kevin.Swab at ColoState.EDU Tue Jan 6 23:25:45 2015 From: Kevin.Swab at ColoState.EDU (Kevin Swab) Date: Tue, 06 Jan 2015 16:25:45 -0700 Subject: [OmniOS-discuss] slow drive response times In-Reply-To: <831FFC8C-E381-40FD-A542-F5596904728B@richardelling.com> References: <54A44D8C.5090302@ColoState.EDU> <54A49517.6070205@ColoState.EDU> <055A9B13-DC08-4DA3-9827-BD417545BC98@richardelling.com> <54A712A5.9080502@ColoState.EDU> <54AC430E.7090003@ColoState.EDU> <831FFC8C-E381-40FD-A542-F5596904728B@richardelling.com> Message-ID: <54AC6EF9.2040605@ColoState.EDU> Thanks! This has been very educational. Let me see if I have this straight: The zero error counts for the HBA and the expander ports eliminate either of those as the source of the errors seen in the sg_logs output - is that right? So back to my original question: If I see long service times on a drive, and it shows errors in the drive counters you mentioned, but not on the expander ports or HBAs, then is it safe to conclude the fault lies with the drive? Kevin On 01/06/2015 02:23 PM, Richard Elling wrote: > >> On Jan 6, 2015, at 12:18 PM, Kevin Swab wrote: >> >> SAS expanders are involved in my systems, so I installed 'sasinfo' and >> 'smp_utils'. After a bit of poking around in the dark, I came up with >> the following commands which I think get at the error counters you >> mentioned. > > Yes, this data looks fine > >> >> Unfortunately, I had to remove the "wounded soldier" from this system >> since it was causing problems. This output is from the same slot, but >> with a healthy replacement drive: >> >> # sasinfo hba-port -a SUNW-mpt_sas-1 -l >> HBA Name: SUNW-mpt_sas-1 >> HBA Port Name: /dev/cfg/c7 >> Phy Information: >> Identifier: 0 >> Link Error Statistics: >> Invalid Dword: 0 >> Running Disparity Error: 0 >> Loss of Dword Sync: 0 >> Reset Problem: 0 >> Identifier: 1 >> Link Error Statistics: >> Invalid Dword: 0 >> Running Disparity Error: 0 >> Loss of Dword Sync: 0 >> Reset Problem: 0 >> Identifier: 2 >> Link Error Statistics: >> Invalid Dword: 0 >> Running Disparity Error: 0 >> Loss of Dword Sync: 0 >> Reset Problem: 0 >> Identifier: 3 >> Link Error Statistics: >> Invalid Dword: 0 >> Running Disparity Error: 0 >> Loss of Dword Sync: 0 >> Reset Problem: 0 > > perfect! > >> HBA Port Name: /dev/cfg/c8 >> Phy Information: >> Identifier: 4 >> Link Error Statistics: >> Invalid Dword: 0 >> Running Disparity Error: 0 >> Loss of Dword Sync: 0 >> Reset Problem: 0 >> Identifier: 5 >> Link Error Statistics: >> Invalid Dword: 0 >> Running Disparity Error: 0 >> Loss of Dword Sync: 0 >> Reset Problem: 0 >> Identifier: 6 >> Link Error Statistics: >> Invalid Dword: 0 >> Running Disparity Error: 0 >> Loss of Dword Sync: 0 >> Reset Problem: 0 >> Identifier: 7 >> Link Error Statistics: >> Invalid Dword: 0 >> Running Disparity Error: 0 >> Loss of Dword Sync: 0 > > perfect! > >> >> >> >> # ./smp_discover /dev/smp/expd9 | egrep '(c982|c983)' >> phy 26:U:attached:[50000394a8cbc982:00 t(SSP)] 6 Gbps >> # ./smp_discover /dev/smp/expd11 | egrep '(c982|c983)' >> phy 26:U:attached:[50000394a8cbc983:01 t(SSP)] 6 Gbps >> # ./smp_rep_phy_err_log --phy=26 /dev/smp/expd9 >> Report phy error log response: >> Expander change count: 228 >> phy identifier: 26 >> invalid dword count: 0 >> running disparity error count: 0 >> loss of dword synchronization count: 0 >> phy reset problem count: 0 >> # ./smp_rep_phy_err_log --phy=26 /dev/smp/expd11 >> Report phy error log response: >> Expander change count: 228 >> phy identifier: 26 >> invalid dword count: 0 >> running disparity error count: 0 >> loss of dword synchronization count: 0 >> phy reset problem count: 0 >> # >> >> "disparity error count" and "loss of dword sync count" are 0 in all of >> this output, in contrast with the non-zero values seen in the sg_logs >> output for the "wounded soldier". > > perfect! > >> >> Am I looking at the right output? > > Yes, this is not showing any errors, which is a good thing. > >> Does "phy" in the above commands >> refer to the HDD itself or the port on the expander it's connected to? > > Expander port. The HDD's view is in the sg_logs --page=0x18 /dev/rdsk/... > >> Had I been able to run the above commands with the "wounded soldier" >> still installed, what should I have been looking for? > > The process is to rule out errors. You have succeeded. > -- richard > >> >> Thanks again for your help, >> Kevin >> >> >> On 01/02/2015 03:45 PM, Richard Elling wrote: >>> >>>> On Jan 2, 2015, at 1:50 PM, Kevin Swab >>> > wrote: >>>> >>>> I've run 'sg_logs' on the drive I pulled last week. There were alot of >>>> errors in the backgroud scan section of the output, which made it very >>>> large, so I put it here: >>>> >>>> http://pastebin.com/jx5BvSep >>>> >>>> When I pulled this drive, the SMART health status was OK. >>> >>> SMART isn?t smart :-P >>> >>>> However, when >>>> I put it in a test system to run 'sg_logs', the status changed to >>>> "impending failure...". Had the SMART status changed before pulling the >>>> drive, I'm sure 'fmd' would have alerted me to the problem? >>> >>> By default, fmd looks for the predictive failure (PFA) and self-test >>> every hour using the disk_transport >>> agent. fmstat should show activity there. When a PFA is seen, then there >>> will be an ereport generated >>> and, for most cases, a syslog message. However, this will not cause a >>> zfs-retire event. >>> >>> Vendors have significant leeway in how they implement SMART. In my >>> experience the only thing >>> you can say for sure is if the vendor thinks the drive?s death is >>> imminent, then you should replace >>> it. I suspect these policies are financially motivated rather than >>> scientific? some amount of truthiness >>> is to be expected. >>> >>> In the logs, clearly the one disk has lots of errors that have been >>> corrected and the rate is increasing. >>> The rate of change for "Errors corrected with possible delays? may >>> correlate to your performance issues, >>> but the interpretation is left up to the vendors. >>> >>> In the case of this naughty drive, yep it needs replacing. >>> >>>> >>>> Since that drive had other indications of trouble, I ran 'sg_logs' on >>>> another drive I pulled recently that has a SMART health status of OK, >>>> but exibits similar slow service time behavior: >>>> >>>> http://pastebin.com/Q0t8Jnug >>> >>> This one looks mostly healthy. >>> >>> Another place to look for latency issues is the phy logs. In the sg_logs >>> output, this is the >>> Protocol Specific port log page for SAS SSP. Key values are running >>> disparity error >>> count and loss of dword sync count. The trick here is that you need to >>> look at both ends >>> of the wire for each wire. For a simple case, this means looking at both >>> the HBA?s phys error >>> counts and the driver. If you have expanders in the mix, it is more >>> work. You?ll want to look at >>> all of the HBA, expander, and drive phys health counters for all phys. >>> >>> This can get tricky because wide ports are mostly dumb. For example, if >>> an HBA has a 4-link >>> wide port (common) and one of the links is acting up (all too common) >>> the latency impacts >>> will be random. >>> >>> To see HBA and expander link health, you can use sg3_utils, its >>> companion smp_utils, or >>> sasinfo (installed as a separate package from OmniOS, IIRC). For example, >>> sasinfo hba-port -l >>> >>> HTH >>> ? richard >>> >>> >>>> >>>> Thanks for taking the time to look at these, please let me know what you >>>> find... >>>> >>>> Kevin >>>> >>>> >>>> >>>> >>>> On 12/31/2014 06:13 PM, Richard Elling wrote: >>>>> >>>>>> On Dec 31, 2014, at 4:30 PM, Kevin Swab >>>>> > wrote: >>>>>> >>>>>> Hello Richard and group, thanks for your reply! >>>>>> >>>>>> I'll look into sg_logs for one of these devices once I have a chance to >>>>>> track that progam down... >>>>>> >>>>>> Thanks for the tip on the 500 ms latency, I wasn't aware that could >>>>>> happen in normal cases. However, I don't believe what I'm seeing >>>>>> constitutes normal behavior. >>>>>> >>>>>> First, some anecdotal evidence: If I pull and replace the suspect >>>>>> drive, my downstream systems stop complaining, and the high service time >>>>>> numbers go away. >>>>> >>>>> We call these "wounded soldiers" -- it takes more resources to manage a >>>>> wounded soldier than a dead soldier, so one strategy of war is to >>>>> wound your >>>>> enemy causing them to consume resources tending the wounded. The sg_logs >>>>> should be enlightening. >>>>> >>>>> NB, consider a 4TB disk with 5 platters: if a head or surface starts >>>>> to go, then >>>>> you have a 1/10 chance that the data you request is under the >>>>> damaged head >>>>> and will need to be recovered by the drive. So it is not uncommon to see >>>>> 90+% of the I/Os to the drive completing quickly. It is also not >>>>> unusual to see >>>>> only a small number of sectors or tracks affected. >>>>> >>>>> Detecting these becomes tricky, especially as you reduce the >>>>> timeout/retry >>>>> interval, since the problem is rarely seen in the average latency -- >>>>> that which >>>>> iostat and sar record. This is an area where we can and are improving. >>>>> -- richard >>>>> >>>>>> >>>>>> I threw out 500 ms as a guess to the point at which I start seeing >>>>>> problems. However, I see service times far in excess of that, sometimes >>>>>> over 30,000 ms! Below is 20 minutes of sar output from a drive I pulled >>>>>> a few days ago, during a time when downstream VMWare servers were >>>>>> complaining. (since the sar output is so verbose, I grepped out the >>>>>> info just for the suspect drive): >>>>>> >>>>>> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd91,a)' >>>>>> 14:50:00 device %busy avque r+w/s blks/s avwait avserv >>>>>> sd91,a 99 5.3 1 42 0.0 7811.7 >>>>>> sd91,a 100 11.3 1 53 0.0 11016.0 >>>>>> sd91,a 100 3.8 1 75 0.0 3615.8 >>>>>> sd91,a 100 4.9 1 25 0.0 8633.5 >>>>>> sd91,a 93 3.9 1 55 0.0 4385.3 >>>>>> sd91,a 86 3.5 2 75 0.0 2060.5 >>>>>> sd91,a 91 3.1 4 80 0.0 823.8 >>>>>> sd91,a 97 3.5 1 50 0.0 3984.5 >>>>>> sd91,a 100 4.4 1 56 0.0 6068.6 >>>>>> sd91,a 100 5.0 1 55 0.0 8836.0 >>>>>> sd91,a 100 5.7 1 51 0.0 7939.6 >>>>>> sd91,a 98 9.9 1 42 0.0 12526.8 >>>>>> sd91,a 100 7.4 0 10 0.0 36813.6 >>>>>> sd91,a 51 3.8 8 90 0.0 500.2 >>>>>> sd91,a 88 3.4 1 60 0.0 2338.8 >>>>>> sd91,a 100 4.5 1 28 0.0 6969.2 >>>>>> sd91,a 93 3.8 1 59 0.0 5138.9 >>>>>> sd91,a 79 3.1 1 59 0.0 3143.9 >>>>>> sd91,a 99 4.7 1 52 0.0 5598.4 >>>>>> sd91,a 100 4.8 1 62 0.0 6638.4 >>>>>> sd91,a 94 5.0 1 54 0.0 3752.7 >>>>>> >>>>>> For comparison, here's the sar output from another drive in the same >>>>>> pool for the same period of time: >>>>>> >>>>>> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd82,a)' >>>>>> 14:50:00 device %busy avque r+w/s blks/s avwait avserv >>>>>> sd82,a 0 0.0 2 28 0.0 5.6 >>>>>> sd82,a 1 0.0 3 51 0.0 5.4 >>>>>> sd82,a 1 0.0 4 66 0.0 6.3 >>>>>> sd82,a 1 0.0 3 48 0.0 4.3 >>>>>> sd82,a 1 0.0 3 45 0.0 6.1 >>>>>> sd82,a 1 0.0 6 82 0.0 2.7 >>>>>> sd82,a 1 0.0 8 112 0.0 2.8 >>>>>> sd82,a 0 0.0 3 27 0.0 1.8 >>>>>> sd82,a 1 0.0 5 80 0.0 3.1 >>>>>> sd82,a 0 0.0 3 35 0.0 3.1 >>>>>> sd82,a 1 0.0 3 35 0.0 3.8 >>>>>> sd82,a 1 0.0 4 49 0.0 3.2 >>>>>> sd82,a 0 0.0 0 0 0.0 4.1 >>>>>> sd82,a 3 0.0 9 84 0.0 4.1 >>>>>> sd82,a 1 0.0 6 55 0.0 3.7 >>>>>> sd82,a 0 0.0 1 23 0.0 7.0 >>>>>> sd82,a 0 0.0 6 57 0.0 1.8 >>>>>> sd82,a 1 0.0 5 70 0.0 2.3 >>>>>> sd82,a 1 0.0 4 55 0.0 3.7 >>>>>> sd82,a 1 0.0 5 72 0.0 4.1 >>>>>> sd82,a 1 0.0 4 54 0.0 3.6 >>>>>> >>>>>> The other drives in this pool all show data similar to that of sd82. >>>>>> >>>>>> Your point about tuning blindly is well taken, and I'm certainly no >>>>>> expert on the IO stack. What's a humble sysadmin to do? >>>>>> >>>>>> For further reference, this system is running r151010. The drive in >>>>>> question is a Toshiba MG03SCA300 (7200rpm SAS), and the pool the drive >>>>>> was in is using lz4 compression and looks like this: >>>>>> >>>>>> # zpool status data1 >>>>>> pool: data1 >>>>>> state: ONLINE >>>>>> scan: resilvered 1.67T in 70h56m with 0 errors on Wed Dec 31 >>>>>> 14:40:20 2014 >>>>>> config: >>>>>> >>>>>> NAME STATE READ WRITE CKSUM >>>>>> data1 ONLINE 0 0 0 >>>>>> raidz2-0 ONLINE 0 0 0 >>>>>> c6t5000039468CB54F0d0 ONLINE 0 0 0 >>>>>> c6t5000039478CB5138d0 ONLINE 0 0 0 >>>>>> c6t5000039468D000DCd0 ONLINE 0 0 0 >>>>>> c6t5000039468D000E8d0 ONLINE 0 0 0 >>>>>> c6t5000039468D00F5Cd0 ONLINE 0 0 0 >>>>>> c6t5000039478C816CCd0 ONLINE 0 0 0 >>>>>> c6t5000039478C8546Cd0 ONLINE 0 0 0 >>>>>> raidz2-1 ONLINE 0 0 0 >>>>>> c6t5000039478C855F0d0 ONLINE 0 0 0 >>>>>> c6t5000039478C856E8d0 ONLINE 0 0 0 >>>>>> c6t5000039478C856ECd0 ONLINE 0 0 0 >>>>>> c6t5000039478C856F4d0 ONLINE 0 0 0 >>>>>> c6t5000039478C86374d0 ONLINE 0 0 0 >>>>>> c6t5000039478C8C2A8d0 ONLINE 0 0 0 >>>>>> c6t5000039478C8C364d0 ONLINE 0 0 0 >>>>>> raidz2-2 ONLINE 0 0 0 >>>>>> c6t5000039478C9958Cd0 ONLINE 0 0 0 >>>>>> c6t5000039478C995C4d0 ONLINE 0 0 0 >>>>>> c6t5000039478C9DACCd0 ONLINE 0 0 0 >>>>>> c6t5000039478C9DB30d0 ONLINE 0 0 0 >>>>>> c6t5000039478C9DB6Cd0 ONLINE 0 0 0 >>>>>> c6t5000039478CA73B4d0 ONLINE 0 0 0 >>>>>> c6t5000039478CB3A20d0 ONLINE 0 0 0 >>>>>> raidz2-3 ONLINE 0 0 0 >>>>>> c6t5000039478CB3A64d0 ONLINE 0 0 0 >>>>>> c6t5000039478CB3A70d0 ONLINE 0 0 0 >>>>>> c6t5000039478CB3E7Cd0 ONLINE 0 0 0 >>>>>> c6t5000039478CB3EB0d0 ONLINE 0 0 0 >>>>>> c6t5000039478CB3FBCd0 ONLINE 0 0 0 >>>>>> c6t5000039478CB4048d0 ONLINE 0 0 0 >>>>>> c6t5000039478CB4054d0 ONLINE 0 0 0 >>>>>> raidz2-4 ONLINE 0 0 0 >>>>>> c6t5000039478CB424Cd0 ONLINE 0 0 0 >>>>>> c6t5000039478CB4250d0 ONLINE 0 0 0 >>>>>> c6t5000039478CB470Cd0 ONLINE 0 0 0 >>>>>> c6t5000039478CB471Cd0 ONLINE 0 0 0 >>>>>> c6t5000039478CB4E50d0 ONLINE 0 0 0 >>>>>> c6t5000039478CB50A8d0 ONLINE 0 0 0 >>>>>> c6t5000039478CB50BCd0 ONLINE 0 0 0 >>>>>> spares >>>>>> c6t50000394A8CBC93Cd0 AVAIL >>>>>> >>>>>> errors: No known data errors >>>>>> >>>>>> >>>>>> Thanks for your help, >>>>>> Kevin >>>>>> >>>>>> On 12/31/2014 3:22 PM, Richard Elling wrote: >>>>>>> >>>>>>>> On Dec 31, 2014, at 11:25 AM, Kevin Swab >>>>>>> > wrote: >>>>>>>> >>>>>>>> Hello Everyone, >>>>>>>> >>>>>>>> We've been running OmniOS on a number of SuperMicro 36bay chassis, >>>>>>>> with >>>>>>>> Supermicro motherboards, LSI SAS controllers (9211-8i & 9207-8i) and >>>>>>>> various SAS HDD's. These systems are serving block storage via >>>>>>>> Comstar >>>>>>>> and Qlogic FC HBA's, and have been running well for several years. >>>>>>>> >>>>>>>> The problem we've got is that as the drives age, some of them start to >>>>>>>> perform slowly (intermittently) without failing - no zpool or iostat >>>>>>>> errors, and nothing logged in /var/adm/messages. The slow performance >>>>>>>> can be seen as high average service times in iostat or sar. >>>>>>> >>>>>>> Look at the drive's error logs using sg_logs (-a for all) >>>>>>> >>>>>>>> >>>>>>>> When these service times get above 500ms, they start to cause IO >>>>>>>> timeouts on the downstream storage consumers, which is bad... >>>>>>> >>>>>>> 500 milliseconds is not unusual for a busy HDD with SCSI TCQ or >>>>>>> SATA NCQ >>>>>>> >>>>>>>> >>>>>>>> I'm wondering - is there a way to tune OmniOS' behavior so that it >>>>>>>> doesn't try so hard to complete IOs to these slow disks, and instead >>>>>>>> just gives up and fails them? >>>>>>> >>>>>>> Yes, the tuning in Alasdair's blog should work as he describes. >>>>>>> More below... >>>>>>> >>>>>>>> >>>>>>>> I found an old post from 2011 which states that some tunables exist, >>>>>>>> but are ignored by the mpt_sas driver: >>>>>>>> >>>>>>>> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ >>>>>>>> >>>>>>>> Does anyone know the current status of these tunables, or have any >>>>>>>> other >>>>>>>> suggestions that might help? >>>>>>> >>>>>>> These tunables are on the order of seconds. The default, 60, is >>>>>>> obviously too big >>>>>>> unless you have old, slow, SCSI CD-ROMs. But setting it below the >>>>>>> manufacturer's >>>>>>> internal limit (default or tuned) can lead to an unstable system. >>>>>>> Some vendors are >>>>>>> better than others at documenting these, but in any case you'll >>>>>>> need to see their spec. >>>>>>> Expect values on the order of 6 to 15 seconds for modern HDDs and SSDs. >>>>>>> >>>>>>> There are a lot of tunables in this area at all levels of the >>>>>>> architecture. OOB, the OmniOS >>>>>>> settings ensure stable behaviour. Tuning any layer without >>>>>>> understanding the others can >>>>>>> lead to unstable systems, as demonstrated by your current >>>>>>> downstream consumers. >>>>>>> -- richard >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Kevin >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ------------------------------------------------------------------- >>>>>>>> Kevin Swab UNIX Systems Administrator >>>>>>>> ACNS Colorado State University >>>>>>>> Phone: (970)491-6572 Email: >>>>>>>> Kevin.Swab at ColoState.EDU >>>>>>>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C >>>>>>>> _______________________________________________ >>>>>>>> OmniOS-discuss mailing list >>>>>>>> OmniOS-discuss at lists.omniti.com >>>>>>>> >>>>>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>>>>> >>>> >>>> -- >>>> ------------------------------------------------------------------- >>>> Kevin Swab UNIX Systems Administrator >>>> ACNS Colorado State University >>>> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU >>>> >>>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C >>> >>> -- >>> >>> Richard.Elling at RichardElling.com >>> +1-760-896-4422 >>> >>> >>> >> >> -- >> ------------------------------------------------------------------- >> Kevin Swab UNIX Systems Administrator >> ACNS Colorado State University >> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU >> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C -- ------------------------------------------------------------------- Kevin Swab UNIX Systems Administrator ACNS Colorado State University Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C From richard.elling at richardelling.com Tue Jan 6 23:43:40 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Tue, 6 Jan 2015 15:43:40 -0800 Subject: [OmniOS-discuss] slow drive response times In-Reply-To: <54AC6EF9.2040605@ColoState.EDU> References: <54A44D8C.5090302@ColoState.EDU> <54A49517.6070205@ColoState.EDU> <055A9B13-DC08-4DA3-9827-BD417545BC98@richardelling.com> <54A712A5.9080502@ColoState.EDU> <54AC430E.7090003@ColoState.EDU> <831FFC8C-E381-40FD-A542-F5596904728B@richardelling.com> <54AC6EF9.2040605@ColoState.EDU> Message-ID: <3332867D-E304-4438-A9E4-F4B989F7D41D@richardelling.com> > On Jan 6, 2015, at 3:25 PM, Kevin Swab wrote: > > Thanks! This has been very educational. Let me see if I have this > straight: The zero error counts for the HBA and the expander ports > eliminate either of those as the source of the errors seen in the > sg_logs output - is that right? Not quite. Zero error counts for HBA, expander, and disk ports eliminates cabling as the source of latency issues. > > So back to my original question: If I see long service times on a > drive, and it shows errors in the drive counters you mentioned, but not > on the expander ports or HBAs, then is it safe to conclude the fault > lies with the drive? With high probability. -- richard > > Kevin > > On 01/06/2015 02:23 PM, Richard Elling wrote: >> >>> On Jan 6, 2015, at 12:18 PM, Kevin Swab wrote: >>> >>> SAS expanders are involved in my systems, so I installed 'sasinfo' and >>> 'smp_utils'. After a bit of poking around in the dark, I came up with >>> the following commands which I think get at the error counters you >>> mentioned. >> >> Yes, this data looks fine >> >>> >>> Unfortunately, I had to remove the "wounded soldier" from this system >>> since it was causing problems. This output is from the same slot, but >>> with a healthy replacement drive: >>> >>> # sasinfo hba-port -a SUNW-mpt_sas-1 -l >>> HBA Name: SUNW-mpt_sas-1 >>> HBA Port Name: /dev/cfg/c7 >>> Phy Information: >>> Identifier: 0 >>> Link Error Statistics: >>> Invalid Dword: 0 >>> Running Disparity Error: 0 >>> Loss of Dword Sync: 0 >>> Reset Problem: 0 >>> Identifier: 1 >>> Link Error Statistics: >>> Invalid Dword: 0 >>> Running Disparity Error: 0 >>> Loss of Dword Sync: 0 >>> Reset Problem: 0 >>> Identifier: 2 >>> Link Error Statistics: >>> Invalid Dword: 0 >>> Running Disparity Error: 0 >>> Loss of Dword Sync: 0 >>> Reset Problem: 0 >>> Identifier: 3 >>> Link Error Statistics: >>> Invalid Dword: 0 >>> Running Disparity Error: 0 >>> Loss of Dword Sync: 0 >>> Reset Problem: 0 >> >> perfect! >> >>> HBA Port Name: /dev/cfg/c8 >>> Phy Information: >>> Identifier: 4 >>> Link Error Statistics: >>> Invalid Dword: 0 >>> Running Disparity Error: 0 >>> Loss of Dword Sync: 0 >>> Reset Problem: 0 >>> Identifier: 5 >>> Link Error Statistics: >>> Invalid Dword: 0 >>> Running Disparity Error: 0 >>> Loss of Dword Sync: 0 >>> Reset Problem: 0 >>> Identifier: 6 >>> Link Error Statistics: >>> Invalid Dword: 0 >>> Running Disparity Error: 0 >>> Loss of Dword Sync: 0 >>> Reset Problem: 0 >>> Identifier: 7 >>> Link Error Statistics: >>> Invalid Dword: 0 >>> Running Disparity Error: 0 >>> Loss of Dword Sync: 0 >> >> perfect! >> >>> >>> >>> >>> # ./smp_discover /dev/smp/expd9 | egrep '(c982|c983)' >>> phy 26:U:attached:[50000394a8cbc982:00 t(SSP)] 6 Gbps >>> # ./smp_discover /dev/smp/expd11 | egrep '(c982|c983)' >>> phy 26:U:attached:[50000394a8cbc983:01 t(SSP)] 6 Gbps >>> # ./smp_rep_phy_err_log --phy=26 /dev/smp/expd9 >>> Report phy error log response: >>> Expander change count: 228 >>> phy identifier: 26 >>> invalid dword count: 0 >>> running disparity error count: 0 >>> loss of dword synchronization count: 0 >>> phy reset problem count: 0 >>> # ./smp_rep_phy_err_log --phy=26 /dev/smp/expd11 >>> Report phy error log response: >>> Expander change count: 228 >>> phy identifier: 26 >>> invalid dword count: 0 >>> running disparity error count: 0 >>> loss of dword synchronization count: 0 >>> phy reset problem count: 0 >>> # >>> >>> "disparity error count" and "loss of dword sync count" are 0 in all of >>> this output, in contrast with the non-zero values seen in the sg_logs >>> output for the "wounded soldier". >> >> perfect! >> >>> >>> Am I looking at the right output? >> >> Yes, this is not showing any errors, which is a good thing. >> >>> Does "phy" in the above commands >>> refer to the HDD itself or the port on the expander it's connected to? >> >> Expander port. The HDD's view is in the sg_logs --page=0x18 /dev/rdsk/... >> >>> Had I been able to run the above commands with the "wounded soldier" >>> still installed, what should I have been looking for? >> >> The process is to rule out errors. You have succeeded. >> -- richard >> >>> >>> Thanks again for your help, >>> Kevin >>> >>> >>> On 01/02/2015 03:45 PM, Richard Elling wrote: >>>> >>>>> On Jan 2, 2015, at 1:50 PM, Kevin Swab >>>> > wrote: >>>>> >>>>> I've run 'sg_logs' on the drive I pulled last week. There were alot of >>>>> errors in the backgroud scan section of the output, which made it very >>>>> large, so I put it here: >>>>> >>>>> http://pastebin.com/jx5BvSep >>>>> >>>>> When I pulled this drive, the SMART health status was OK. >>>> >>>> SMART isn?t smart :-P >>>> >>>>> However, when >>>>> I put it in a test system to run 'sg_logs', the status changed to >>>>> "impending failure...". Had the SMART status changed before pulling the >>>>> drive, I'm sure 'fmd' would have alerted me to the problem? >>>> >>>> By default, fmd looks for the predictive failure (PFA) and self-test >>>> every hour using the disk_transport >>>> agent. fmstat should show activity there. When a PFA is seen, then there >>>> will be an ereport generated >>>> and, for most cases, a syslog message. However, this will not cause a >>>> zfs-retire event. >>>> >>>> Vendors have significant leeway in how they implement SMART. In my >>>> experience the only thing >>>> you can say for sure is if the vendor thinks the drive?s death is >>>> imminent, then you should replace >>>> it. I suspect these policies are financially motivated rather than >>>> scientific? some amount of truthiness >>>> is to be expected. >>>> >>>> In the logs, clearly the one disk has lots of errors that have been >>>> corrected and the rate is increasing. >>>> The rate of change for "Errors corrected with possible delays? may >>>> correlate to your performance issues, >>>> but the interpretation is left up to the vendors. >>>> >>>> In the case of this naughty drive, yep it needs replacing. >>>> >>>>> >>>>> Since that drive had other indications of trouble, I ran 'sg_logs' on >>>>> another drive I pulled recently that has a SMART health status of OK, >>>>> but exibits similar slow service time behavior: >>>>> >>>>> http://pastebin.com/Q0t8Jnug >>>> >>>> This one looks mostly healthy. >>>> >>>> Another place to look for latency issues is the phy logs. In the sg_logs >>>> output, this is the >>>> Protocol Specific port log page for SAS SSP. Key values are running >>>> disparity error >>>> count and loss of dword sync count. The trick here is that you need to >>>> look at both ends >>>> of the wire for each wire. For a simple case, this means looking at both >>>> the HBA?s phys error >>>> counts and the driver. If you have expanders in the mix, it is more >>>> work. You?ll want to look at >>>> all of the HBA, expander, and drive phys health counters for all phys. >>>> >>>> This can get tricky because wide ports are mostly dumb. For example, if >>>> an HBA has a 4-link >>>> wide port (common) and one of the links is acting up (all too common) >>>> the latency impacts >>>> will be random. >>>> >>>> To see HBA and expander link health, you can use sg3_utils, its >>>> companion smp_utils, or >>>> sasinfo (installed as a separate package from OmniOS, IIRC). For example, >>>> sasinfo hba-port -l >>>> >>>> HTH >>>> ? richard >>>> >>>> >>>>> >>>>> Thanks for taking the time to look at these, please let me know what you >>>>> find... >>>>> >>>>> Kevin >>>>> >>>>> >>>>> >>>>> >>>>> On 12/31/2014 06:13 PM, Richard Elling wrote: >>>>>> >>>>>>> On Dec 31, 2014, at 4:30 PM, Kevin Swab >>>>>> > wrote: >>>>>>> >>>>>>> Hello Richard and group, thanks for your reply! >>>>>>> >>>>>>> I'll look into sg_logs for one of these devices once I have a chance to >>>>>>> track that progam down... >>>>>>> >>>>>>> Thanks for the tip on the 500 ms latency, I wasn't aware that could >>>>>>> happen in normal cases. However, I don't believe what I'm seeing >>>>>>> constitutes normal behavior. >>>>>>> >>>>>>> First, some anecdotal evidence: If I pull and replace the suspect >>>>>>> drive, my downstream systems stop complaining, and the high service time >>>>>>> numbers go away. >>>>>> >>>>>> We call these "wounded soldiers" -- it takes more resources to manage a >>>>>> wounded soldier than a dead soldier, so one strategy of war is to >>>>>> wound your >>>>>> enemy causing them to consume resources tending the wounded. The sg_logs >>>>>> should be enlightening. >>>>>> >>>>>> NB, consider a 4TB disk with 5 platters: if a head or surface starts >>>>>> to go, then >>>>>> you have a 1/10 chance that the data you request is under the >>>>>> damaged head >>>>>> and will need to be recovered by the drive. So it is not uncommon to see >>>>>> 90+% of the I/Os to the drive completing quickly. It is also not >>>>>> unusual to see >>>>>> only a small number of sectors or tracks affected. >>>>>> >>>>>> Detecting these becomes tricky, especially as you reduce the >>>>>> timeout/retry >>>>>> interval, since the problem is rarely seen in the average latency -- >>>>>> that which >>>>>> iostat and sar record. This is an area where we can and are improving. >>>>>> -- richard >>>>>> >>>>>>> >>>>>>> I threw out 500 ms as a guess to the point at which I start seeing >>>>>>> problems. However, I see service times far in excess of that, sometimes >>>>>>> over 30,000 ms! Below is 20 minutes of sar output from a drive I pulled >>>>>>> a few days ago, during a time when downstream VMWare servers were >>>>>>> complaining. (since the sar output is so verbose, I grepped out the >>>>>>> info just for the suspect drive): >>>>>>> >>>>>>> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd91,a)' >>>>>>> 14:50:00 device %busy avque r+w/s blks/s avwait avserv >>>>>>> sd91,a 99 5.3 1 42 0.0 7811.7 >>>>>>> sd91,a 100 11.3 1 53 0.0 11016.0 >>>>>>> sd91,a 100 3.8 1 75 0.0 3615.8 >>>>>>> sd91,a 100 4.9 1 25 0.0 8633.5 >>>>>>> sd91,a 93 3.9 1 55 0.0 4385.3 >>>>>>> sd91,a 86 3.5 2 75 0.0 2060.5 >>>>>>> sd91,a 91 3.1 4 80 0.0 823.8 >>>>>>> sd91,a 97 3.5 1 50 0.0 3984.5 >>>>>>> sd91,a 100 4.4 1 56 0.0 6068.6 >>>>>>> sd91,a 100 5.0 1 55 0.0 8836.0 >>>>>>> sd91,a 100 5.7 1 51 0.0 7939.6 >>>>>>> sd91,a 98 9.9 1 42 0.0 12526.8 >>>>>>> sd91,a 100 7.4 0 10 0.0 36813.6 >>>>>>> sd91,a 51 3.8 8 90 0.0 500.2 >>>>>>> sd91,a 88 3.4 1 60 0.0 2338.8 >>>>>>> sd91,a 100 4.5 1 28 0.0 6969.2 >>>>>>> sd91,a 93 3.8 1 59 0.0 5138.9 >>>>>>> sd91,a 79 3.1 1 59 0.0 3143.9 >>>>>>> sd91,a 99 4.7 1 52 0.0 5598.4 >>>>>>> sd91,a 100 4.8 1 62 0.0 6638.4 >>>>>>> sd91,a 94 5.0 1 54 0.0 3752.7 >>>>>>> >>>>>>> For comparison, here's the sar output from another drive in the same >>>>>>> pool for the same period of time: >>>>>>> >>>>>>> # sar -d -f /var/adm/sa/sa28 -s 14:50 -e 15:10 | egrep '(device|sd82,a)' >>>>>>> 14:50:00 device %busy avque r+w/s blks/s avwait avserv >>>>>>> sd82,a 0 0.0 2 28 0.0 5.6 >>>>>>> sd82,a 1 0.0 3 51 0.0 5.4 >>>>>>> sd82,a 1 0.0 4 66 0.0 6.3 >>>>>>> sd82,a 1 0.0 3 48 0.0 4.3 >>>>>>> sd82,a 1 0.0 3 45 0.0 6.1 >>>>>>> sd82,a 1 0.0 6 82 0.0 2.7 >>>>>>> sd82,a 1 0.0 8 112 0.0 2.8 >>>>>>> sd82,a 0 0.0 3 27 0.0 1.8 >>>>>>> sd82,a 1 0.0 5 80 0.0 3.1 >>>>>>> sd82,a 0 0.0 3 35 0.0 3.1 >>>>>>> sd82,a 1 0.0 3 35 0.0 3.8 >>>>>>> sd82,a 1 0.0 4 49 0.0 3.2 >>>>>>> sd82,a 0 0.0 0 0 0.0 4.1 >>>>>>> sd82,a 3 0.0 9 84 0.0 4.1 >>>>>>> sd82,a 1 0.0 6 55 0.0 3.7 >>>>>>> sd82,a 0 0.0 1 23 0.0 7.0 >>>>>>> sd82,a 0 0.0 6 57 0.0 1.8 >>>>>>> sd82,a 1 0.0 5 70 0.0 2.3 >>>>>>> sd82,a 1 0.0 4 55 0.0 3.7 >>>>>>> sd82,a 1 0.0 5 72 0.0 4.1 >>>>>>> sd82,a 1 0.0 4 54 0.0 3.6 >>>>>>> >>>>>>> The other drives in this pool all show data similar to that of sd82. >>>>>>> >>>>>>> Your point about tuning blindly is well taken, and I'm certainly no >>>>>>> expert on the IO stack. What's a humble sysadmin to do? >>>>>>> >>>>>>> For further reference, this system is running r151010. The drive in >>>>>>> question is a Toshiba MG03SCA300 (7200rpm SAS), and the pool the drive >>>>>>> was in is using lz4 compression and looks like this: >>>>>>> >>>>>>> # zpool status data1 >>>>>>> pool: data1 >>>>>>> state: ONLINE >>>>>>> scan: resilvered 1.67T in 70h56m with 0 errors on Wed Dec 31 >>>>>>> 14:40:20 2014 >>>>>>> config: >>>>>>> >>>>>>> NAME STATE READ WRITE CKSUM >>>>>>> data1 ONLINE 0 0 0 >>>>>>> raidz2-0 ONLINE 0 0 0 >>>>>>> c6t5000039468CB54F0d0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB5138d0 ONLINE 0 0 0 >>>>>>> c6t5000039468D000DCd0 ONLINE 0 0 0 >>>>>>> c6t5000039468D000E8d0 ONLINE 0 0 0 >>>>>>> c6t5000039468D00F5Cd0 ONLINE 0 0 0 >>>>>>> c6t5000039478C816CCd0 ONLINE 0 0 0 >>>>>>> c6t5000039478C8546Cd0 ONLINE 0 0 0 >>>>>>> raidz2-1 ONLINE 0 0 0 >>>>>>> c6t5000039478C855F0d0 ONLINE 0 0 0 >>>>>>> c6t5000039478C856E8d0 ONLINE 0 0 0 >>>>>>> c6t5000039478C856ECd0 ONLINE 0 0 0 >>>>>>> c6t5000039478C856F4d0 ONLINE 0 0 0 >>>>>>> c6t5000039478C86374d0 ONLINE 0 0 0 >>>>>>> c6t5000039478C8C2A8d0 ONLINE 0 0 0 >>>>>>> c6t5000039478C8C364d0 ONLINE 0 0 0 >>>>>>> raidz2-2 ONLINE 0 0 0 >>>>>>> c6t5000039478C9958Cd0 ONLINE 0 0 0 >>>>>>> c6t5000039478C995C4d0 ONLINE 0 0 0 >>>>>>> c6t5000039478C9DACCd0 ONLINE 0 0 0 >>>>>>> c6t5000039478C9DB30d0 ONLINE 0 0 0 >>>>>>> c6t5000039478C9DB6Cd0 ONLINE 0 0 0 >>>>>>> c6t5000039478CA73B4d0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB3A20d0 ONLINE 0 0 0 >>>>>>> raidz2-3 ONLINE 0 0 0 >>>>>>> c6t5000039478CB3A64d0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB3A70d0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB3E7Cd0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB3EB0d0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB3FBCd0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB4048d0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB4054d0 ONLINE 0 0 0 >>>>>>> raidz2-4 ONLINE 0 0 0 >>>>>>> c6t5000039478CB424Cd0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB4250d0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB470Cd0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB471Cd0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB4E50d0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB50A8d0 ONLINE 0 0 0 >>>>>>> c6t5000039478CB50BCd0 ONLINE 0 0 0 >>>>>>> spares >>>>>>> c6t50000394A8CBC93Cd0 AVAIL >>>>>>> >>>>>>> errors: No known data errors >>>>>>> >>>>>>> >>>>>>> Thanks for your help, >>>>>>> Kevin >>>>>>> >>>>>>> On 12/31/2014 3:22 PM, Richard Elling wrote: >>>>>>>> >>>>>>>>> On Dec 31, 2014, at 11:25 AM, Kevin Swab >>>>>>>> > wrote: >>>>>>>>> >>>>>>>>> Hello Everyone, >>>>>>>>> >>>>>>>>> We've been running OmniOS on a number of SuperMicro 36bay chassis, >>>>>>>>> with >>>>>>>>> Supermicro motherboards, LSI SAS controllers (9211-8i & 9207-8i) and >>>>>>>>> various SAS HDD's. These systems are serving block storage via >>>>>>>>> Comstar >>>>>>>>> and Qlogic FC HBA's, and have been running well for several years. >>>>>>>>> >>>>>>>>> The problem we've got is that as the drives age, some of them start to >>>>>>>>> perform slowly (intermittently) without failing - no zpool or iostat >>>>>>>>> errors, and nothing logged in /var/adm/messages. The slow performance >>>>>>>>> can be seen as high average service times in iostat or sar. >>>>>>>> >>>>>>>> Look at the drive's error logs using sg_logs (-a for all) >>>>>>>> >>>>>>>>> >>>>>>>>> When these service times get above 500ms, they start to cause IO >>>>>>>>> timeouts on the downstream storage consumers, which is bad... >>>>>>>> >>>>>>>> 500 milliseconds is not unusual for a busy HDD with SCSI TCQ or >>>>>>>> SATA NCQ >>>>>>>> >>>>>>>>> >>>>>>>>> I'm wondering - is there a way to tune OmniOS' behavior so that it >>>>>>>>> doesn't try so hard to complete IOs to these slow disks, and instead >>>>>>>>> just gives up and fails them? >>>>>>>> >>>>>>>> Yes, the tuning in Alasdair's blog should work as he describes. >>>>>>>> More below... >>>>>>>> >>>>>>>>> >>>>>>>>> I found an old post from 2011 which states that some tunables exist, >>>>>>>>> but are ignored by the mpt_sas driver: >>>>>>>>> >>>>>>>>> http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ >>>>>>>>> >>>>>>>>> Does anyone know the current status of these tunables, or have any >>>>>>>>> other >>>>>>>>> suggestions that might help? >>>>>>>> >>>>>>>> These tunables are on the order of seconds. The default, 60, is >>>>>>>> obviously too big >>>>>>>> unless you have old, slow, SCSI CD-ROMs. But setting it below the >>>>>>>> manufacturer's >>>>>>>> internal limit (default or tuned) can lead to an unstable system. >>>>>>>> Some vendors are >>>>>>>> better than others at documenting these, but in any case you'll >>>>>>>> need to see their spec. >>>>>>>> Expect values on the order of 6 to 15 seconds for modern HDDs and SSDs. >>>>>>>> >>>>>>>> There are a lot of tunables in this area at all levels of the >>>>>>>> architecture. OOB, the OmniOS >>>>>>>> settings ensure stable behaviour. Tuning any layer without >>>>>>>> understanding the others can >>>>>>>> lead to unstable systems, as demonstrated by your current >>>>>>>> downstream consumers. >>>>>>>> -- richard >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Kevin >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> ------------------------------------------------------------------- >>>>>>>>> Kevin Swab UNIX Systems Administrator >>>>>>>>> ACNS Colorado State University >>>>>>>>> Phone: (970)491-6572 Email: >>>>>>>>> Kevin.Swab at ColoState.EDU >>>>>>>>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C >>>>>>>>> _______________________________________________ >>>>>>>>> OmniOS-discuss mailing list >>>>>>>>> OmniOS-discuss at lists.omniti.com >>>>>>>>> >>>>>>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>>>>>> >>>>> >>>>> -- >>>>> ------------------------------------------------------------------- >>>>> Kevin Swab UNIX Systems Administrator >>>>> ACNS Colorado State University >>>>> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU >>>>> >>>>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C >>>> >>>> -- >>>> >>>> Richard.Elling at RichardElling.com >>>> +1-760-896-4422 >>>> >>>> >>>> >>> >>> -- >>> ------------------------------------------------------------------- >>> Kevin Swab UNIX Systems Administrator >>> ACNS Colorado State University >>> Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU >>> GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C > > -- > ------------------------------------------------------------------- > Kevin Swab UNIX Systems Administrator > ACNS Colorado State University > Phone: (970)491-6572 Email: Kevin.Swab at ColoState.EDU > GPG Fingerprint: 7026 3F66 A970 67BD 6F17 8EB8 8A7D 142F 2392 791C From stephan.budach at JVM.DE Wed Jan 7 10:28:08 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 7 Jan 2015 11:28:08 +0100 Subject: [OmniOS-discuss] Slow NFS speeds at rsize > 128k Message-ID: <54AD0A38.1060801@jvm.de> Hello everyone, I am sharing my zfs via NFS to a couple of OVM nodes. I noticed really bad NFS read performance, when rsize goes beyond 128k, whereas the performance is just fine at 32k. The issue is, that the ovs-agent, which is performing the actual mount, doesn't accept or pass any NFS mount options to the NFS server. To give some numbers, a rsize of 1mb results in a read throughput of approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting a NFS export from a OEL 6u4 box has no issues with this, as the read speeds from this export are 108+MB/s regardles of the rsize of the NFS mount. The OmniOS box is currently connected to a 10GbE port at our core 6509, but the NFS client is connected through a 1GbE port only. MTU is at 1500 and can currently not be upped. Anyone having a tip, why a rsize of 64k+ will result in such a performance drop? Thanks, budy -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Wed Jan 7 15:12:45 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 7 Jan 2015 10:12:45 -0500 Subject: [OmniOS-discuss] Slow NFS speeds at rsize > 128k In-Reply-To: <54AD0A38.1060801@jvm.de> References: <54AD0A38.1060801@jvm.de> Message-ID: <6E13148E-716D-4A71-A44C-D1FD5C3C43E1@omniti.com> > On Jan 7, 2015, at 5:28 AM, Stephan Budach wrote: > > Hello everyone, > > I am sharing my zfs via NFS to a couple of OVM nodes. I noticed really bad NFS read performance, when rsize goes beyond 128k, whereas the performance is just fine at 32k. The issue is, that the ovs-agent, which is performing the actual mount, doesn't accept or pass any NFS mount options to the NFS server. To give some numbers, a rsize of 1mb results in a read throughput of approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting a NFS export from a OEL 6u4 box has no issues with this, as the read speeds from this export are 108+MB/s regardles of the rsize of the NFS mount. > > The OmniOS box is currently connected to a 10GbE port at our core 6509, but the NFS client is connected through a 1GbE port only. MTU is at 1500 and can currently not be upped. > Anyone having a tip, why a rsize of 64k+ will result in such a performance drop? Assuming you're running over TCP, perhaps you need to increase the receive window? ndd -set /dev/tcp tcp_xmit_hiwat 1048576 Dan From stephan.budach at JVM.DE Wed Jan 7 15:46:12 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 7 Jan 2015 16:46:12 +0100 Subject: [OmniOS-discuss] Slow NFS speeds at rsize > 128k In-Reply-To: <6E13148E-716D-4A71-A44C-D1FD5C3C43E1@omniti.com> References: <54AD0A38.1060801@jvm.de> <6E13148E-716D-4A71-A44C-D1FD5C3C43E1@omniti.com> Message-ID: <54AD54C4.4000205@jvm.de> Hi Dan, Am 07.01.15 um 16:12 schrieb Dan McDonald: >> On Jan 7, 2015, at 5:28 AM, Stephan Budach wrote: >> >> Hello everyone, >> >> I am sharing my zfs via NFS to a couple of OVM nodes. I noticed really bad NFS read performance, when rsize goes beyond 128k, whereas the performance is just fine at 32k. The issue is, that the ovs-agent, which is performing the actual mount, doesn't accept or pass any NFS mount options to the NFS server. To give some numbers, a rsize of 1mb results in a read throughput of approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting a NFS export from a OEL 6u4 box has no issues with this, as the read speeds from this export are 108+MB/s regardles of the rsize of the NFS mount. >> >> The OmniOS box is currently connected to a 10GbE port at our core 6509, but the NFS client is connected through a 1GbE port only. MTU is at 1500 and can currently not be upped. >> Anyone having a tip, why a rsize of 64k+ will result in such a performance drop? > Assuming you're running over TCP, perhaps you need to increase the receive window? > > ndd -set /dev/tcp tcp_xmit_hiwat 1048576 > > Dan > unfortuanetly?no. Strange thing to note is, that on my 2nd OmniBox, which is at r012, the read speads from the received zfs are in the 60's MB/s. The actual box is still at r006. I tried the settings you suggested on both, and both did not change. budy From richard.elling at richardelling.com Wed Jan 7 17:00:32 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Wed, 7 Jan 2015 09:00:32 -0800 Subject: [OmniOS-discuss] Slow NFS speeds at rsize > 128k In-Reply-To: <54AD0A38.1060801@jvm.de> References: <54AD0A38.1060801@jvm.de> Message-ID: > On Jan 7, 2015, at 2:28 AM, Stephan Budach wrote: > > Hello everyone, > > I am sharing my zfs via NFS to a couple of OVM nodes. I noticed really bad NFS read performance, when rsize goes beyond 128k, whereas the performance is just fine at 32k. The issue is, that the ovs-agent, which is performing the actual mount, doesn't accept or pass any NFS mount options to the NFS server. The other issue is that illumos/Solaris on x86 tuning of server-side size settings does not work because the compiler optimizes away the tunables. There is a trivial fix, but it requires a rebuild. > To give some numbers, a rsize of 1mb results in a read throughput of approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting a NFS export from a OEL 6u4 box has no issues with this, as the read speeds from this export are 108+MB/s regardles of the rsize of the NFS mount. Brendan wrote about a similar issue in the Dtrace book as a case study. See chapter 5 case study on ZFS 8KB mirror reads. > > The OmniOS box is currently connected to a 10GbE port at our core 6509, but the NFS client is connected through a 1GbE port only. MTU is at 1500 and can currently not be upped. > Anyone having a tip, why a rsize of 64k+ will result in such a performance drop? It is entirely due to optimizations for small I/O going way back to the 1980s. -- richard > > Thanks, > budy > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexmcwhirter at vantagetitle.com Wed Jan 7 17:52:14 2015 From: alexmcwhirter at vantagetitle.com (Alex McWhirter) Date: Wed, 7 Jan 2015 12:52:14 -0500 Subject: [OmniOS-discuss] pkgdepend reports unresolved dependencies Message-ID: <7F0AA6D1-C3E3-490A-AF60-430148D0A0ED@vantagetitle.com> I?m building pigeonhole for dovecot on r151006 with the template build scripts. Pigeonhole compiles fine, but pkgdepend complains about a lot of unresolved dependencies. These dependencies are actually libraries built by pigeonhole, so i can see why they wouldn?t exist in the installed package directory. Should i tell pkgdepend to look in $DESTDIR$PREFIX for these dependencies? Or maybe i don?t quite understand why pkgdepend thinks they are unresolved. /tmp/build_locadmin/triadic_service_pigeonhole.p5m.int.3 has unresolved dependency ' depend type=require fmri=__TBD pkg.debug.depend.file=libdovecot.so.0 \ pkg.debug.depend.reason=opt/triadic/lib/amd64/dovecot/libdovecot-sieve.so.0.0.0 \ pkg.debug.depend.type=elf \ pkg.debug.depend.path=lib/64 \ pkg.debug.depend.path=opt/triadic/lib/amd64 \ pkg.debug.depend.path=usr/lib/64'. /tmp/build_locadmin/triadic_service_pigeonhole.p5m.int.3 has unresolved dependency ' depend type=require fmri=__TBD pkg.debug.depend.file=libdovecot-lda.so.0 \ pkg.debug.depend.reason=opt/triadic/lib/amd64/dovecot/libdovecot-sieve.so.0.0.0 \ pkg.debug.depend.type=elf \ pkg.debug.depend.path=lib/64 \ pkg.debug.depend.path=opt/triadic/lib/amd64 \ pkg.debug.depend.path=usr/lib/64'. /tmp/build_locadmin/triadic_service_pigeonhole.p5m.int.3 has unresolved dependency ' depend type=require fmri=__TBD \ pkg.debug.depend.file=libdovecot-storage.so.0 \ pkg.debug.depend.reason=opt/triadic/lib/amd64/dovecot/libdovecot-sieve.so.0.0.0 \ pkg.debug.depend.type=elf \ pkg.debug.depend.path=lib/64 \ pkg.debug.depend.path=opt/triadic/lib/amd64 \ pkg.debug.depend.path=usr/lib/64'. /tmp/build_locadmin/triadic_service_pigeonhole.p5m.int.3 has unresolved dependency ' depend type=require fmri=__TBD pkg.debug.depend.file=libdovecot-login.so.0 \ pkg.debug.depend.reason=opt/triadic/usr/libexec/amd64/dovecot/managesieve-login \ pkg.debug.depend.type=elf \ pkg.debug.depend.path=lib/64 \ pkg.debug.depend.path=opt/triadic/lib/amd64 \ pkg.debug.depend.path=usr/lib/64'. /tmp/build_locadmin/triadic_service_pigeonhole.p5m.int.3 has unresolved dependency ' depend type=require fmri=__TBD pkg.debug.depend.file=libdovecot-login.so.0 \ pkg.debug.depend.reason=opt/triadic/usr/libexec/i386/dovecot/managesieve-login \ pkg.debug.depend.type=elf \ pkg.debug.depend.path=lib \ pkg.debug.depend.path=opt/triadic/lib/i386 \ pkg.debug.depend.path=usr/lib'. /tmp/build_locadmin/triadic_service_pigeonhole.p5m.int.3 has unresolved dependency ' depend type=require fmri=__TBD pkg.debug.depend.file=libdovecot.so.0 \ pkg.debug.depend.reason=opt/triadic/usr/libexec/amd64/dovecot/managesieve-login \ pkg.debug.depend.type=elf \ pkg.debug.depend.path=lib/64 \ pkg.debug.depend.path=opt/triadic/lib/amd64 \ pkg.debug.depend.path=usr/lib/64'. /tmp/build_locadmin/triadic_service_pigeonhole.p5m.int.3 has unresolved dependency ' depend type=require fmri=__TBD pkg.debug.depend.file=libdovecot.so.0 \ pkg.debug.depend.reason=opt/triadic/usr/libexec/i386/dovecot/managesieve-login \ pkg.debug.depend.type=elf \ pkg.debug.depend.path=lib \ pkg.debug.depend.path=opt/triadic/lib/i386 \ pkg.debug.depend.path=usr/lib'. /tmp/build_locadmin/triadic_service_pigeonhole.p5m.int.3 has unresolved dependency ' depend type=require fmri=__TBD pkg.debug.depend.file=libdovecot.so.0 \ pkg.debug.depend.reason=opt/triadic/lib/i386/dovecot/libdovecot-sieve.so.0.0.0 \ pkg.debug.depend.type=elf \ pkg.debug.depend.path=lib \ pkg.debug.depend.path=opt/triadic/lib/i386 \ pkg.debug.depend.path=usr/lib'. /tmp/build_locadmin/triadic_service_pigeonhole.p5m.int.3 has unresolved dependency ' depend type=require fmri=__TBD pkg.debug.depend.file=libdovecot-lda.so.0 \ pkg.debug.depend.reason=opt/triadic/lib/i386/dovecot/libdovecot-sieve.so.0.0.0 \ pkg.debug.depend.type=elf \ pkg.debug.depend.path=lib \ pkg.debug.depend.path=opt/triadic/lib/i386 \ pkg.debug.depend.path=usr/lib'. /tmp/build_locadmin/triadic_service_pigeonhole.p5m.int.3 has unresolved dependency ' depend type=require fmri=__TBD \ pkg.debug.depend.file=libdovecot-storage.so.0 \ pkg.debug.depend.reason=opt/triadic/lib/i386/dovecot/libdovecot-sieve.so.0.0.0 \ pkg.debug.depend.type=elf \ pkg.debug.depend.path=lib \ pkg.debug.depend.path=opt/triadic/lib/i386 \ pkg.debug.depend.path=usr/lib'. --- Dependency resolution failed -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephan.budach at JVM.DE Wed Jan 7 20:11:30 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 7 Jan 2015 21:11:30 +0100 Subject: [OmniOS-discuss] Slow NFS speeds at rsize > 128k In-Reply-To: References: <54AD0A38.1060801@jvm.de> Message-ID: <54AD92F2.5080209@jvm.de> Am 07.01.15 um 18:00 schrieb Richard Elling: > >> On Jan 7, 2015, at 2:28 AM, Stephan Budach > > wrote: >> >> Hello everyone, >> >> I am sharing my zfs via NFS to a couple of OVM nodes. I noticed >> really bad NFS read performance, when rsize goes beyond 128k, whereas >> the performance is just fine at 32k. The issue is, that the >> ovs-agent, which is performing the actual mount, doesn't accept or >> pass any NFS mount options to the NFS server. > > The other issue is that illumos/Solaris on x86 tuning of server-side > size settings does > not work because the compiler optimizes away the tunables. There is a > trivial fix, but it > requires a rebuild. > >> To give some numbers, a rsize of 1mb results in a read throughput of >> approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting a >> NFS export from a OEL 6u4 box has no issues with this, as the read >> speeds from this export are 108+MB/s regardles of the rsize of the >> NFS mount. > > Brendan wrote about a similar issue in the Dtrace book as a case > study. See chapter 5 > case study on ZFS 8KB mirror reads. > >> >> The OmniOS box is currently connected to a 10GbE port at our core >> 6509, but the NFS client is connected through a 1GbE port only. MTU >> is at 1500 and can currently not be upped. >> Anyone having a tip, why a rsize of 64k+ will result in such a >> performance drop? > > It is entirely due to optimizations for small I/O going way back to > the 1980s. > -- richard But, doesn't that mean, that Oracle Solaris will have the same issue or has Oracle addressed that in recent Solaris versions? Not, that I am intending to switch over, but that would be something I'd like to give my SR engineer to chew on? In any way, the first bummer is, that Oracle chose to not have it's ovs-agent be capable of accepting and passing the NFS mount options? Cheers, budy -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.elling at richardelling.com Wed Jan 7 20:48:17 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Wed, 7 Jan 2015 12:48:17 -0800 Subject: [OmniOS-discuss] Slow NFS speeds at rsize > 128k In-Reply-To: <54AD92F2.5080209@jvm.de> References: <54AD0A38.1060801@jvm.de> <54AD92F2.5080209@jvm.de> Message-ID: <3010EE58-59DE-408D-8BFA-28571F9B1A2B@richardelling.com> > On Jan 7, 2015, at 12:11 PM, Stephan Budach wrote: > > Am 07.01.15 um 18:00 schrieb Richard Elling: >> >>> On Jan 7, 2015, at 2:28 AM, Stephan Budach > wrote: >>> >>> Hello everyone, >>> >>> I am sharing my zfs via NFS to a couple of OVM nodes. I noticed really bad NFS read performance, when rsize goes beyond 128k, whereas the performance is just fine at 32k. The issue is, that the ovs-agent, which is performing the actual mount, doesn't accept or pass any NFS mount options to the NFS server. >> >> The other issue is that illumos/Solaris on x86 tuning of server-side size settings does >> not work because the compiler optimizes away the tunables. There is a trivial fix, but it >> requires a rebuild. >> >>> To give some numbers, a rsize of 1mb results in a read throughput of approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting a NFS export from a OEL 6u4 box has no issues with this, as the read speeds from this export are 108+MB/s regardles of the rsize of the NFS mount. >> >> Brendan wrote about a similar issue in the Dtrace book as a case study. See chapter 5 >> case study on ZFS 8KB mirror reads. >> >>> >>> The OmniOS box is currently connected to a 10GbE port at our core 6509, but the NFS client is connected through a 1GbE port only. MTU is at 1500 and can currently not be upped. >>> Anyone having a tip, why a rsize of 64k+ will result in such a performance drop? >> >> It is entirely due to optimizations for small I/O going way back to the 1980s. >> -- richard > But, doesn't that mean, that Oracle Solaris will have the same issue or has Oracle addressed that in recent Solaris versions? Not, that I am intending to switch over, but that would be something I'd like to give my SR engineer to chew on? Look for yourself :-) In "broken" systems, such as this Solaris 11.1 system: # echo nfs3_tsize::dis | mdb -k nfs3_tsize: pushq %rbp nfs3_tsize+1: movq %rsp,%rbp nfs3_tsize+4: subq $0x8,%rsp nfs3_tsize+8: movq %rdi,-0x8(%rbp) nfs3_tsize+0xc: movl (%rdi),%eax nfs3_tsize+0xe: leal -0x2(%rax),%ecx nfs3_tsize+0x11: cmpl $0x1,%ecx nfs3_tsize+0x14: jbe +0x12 nfs3_tsize+0x16: cmpl $0x5,%eax nfs3_tsize+0x19: movl $0x100000,%eax nfs3_tsize+0x1e: movl $0x8000,%ecx nfs3_tsize+0x23: cmovl.ne %ecx,%eax nfs3_tsize+0x26: jmp +0x5 nfs3_tsize+0x28: movl $0x100000,%eax nfs3_tsize+0x2d: leave nfs3_tsize+0x2e: ret at +0x19 you'll notice hardwired 1MB by contrast, on a proper system # echo nfs3_tsize::dis | mdb -k nfs3_tsize: pushq %rbp nfs3_tsize+1: movq %rsp,%rbp nfs3_tsize+4: subq $0x10,%rsp nfs3_tsize+8: movq %rdi,-0x8(%rbp) nfs3_tsize+0xc: movl (%rdi),%edx nfs3_tsize+0xe: leal -0x2(%rdx),%eax nfs3_tsize+0x11: cmpl $0x1,%eax nfs3_tsize+0x14: jbe +0x12 nfs3_tsize+0x16: movl -0x37f8ea60(%rip),%eax nfs3_tsize+0x1c: cmpl $0x5,%edx nfs3_tsize+0x1f: cmovl.ne -0x37f8ea72(%rip),%eax nfs3_tsize+0x26: leave nfs3_tsize+0x27: ret nfs3_tsize+0x28: movl -0x37f8ea76(%rip),%eax nfs3_tsize+0x2e: leave nfs3_tsize+0x2f: ret where you can actually tune it according to the Solaris Tunable Parameters guide. NB, we fixed this years ago at Nexenta and I'm certain it has not been upstreamed. There are a number of other related fixes, all of the same nature. If someone is inclined to upstream contact me directly. Once, fixed, you'll be able to change the server's settings for negotiating the rsize/wsize with the clients. Many NAS vendors use smaller limits, and IMHO it is a good idea anyway. For example, see http://blog.richardelling.com/2012/04/latency-and-io-size-cars-vs-trains.html -- richard > > In any way, the first bummer is, that Oracle chose to not have it's ovs-agent be capable of accepting and passing the NFS mount options? > > Cheers, > budy -------------- next part -------------- An HTML attachment was scrubbed... URL: From lotheac at iki.fi Wed Jan 7 21:19:50 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Wed, 7 Jan 2015 23:19:50 +0200 Subject: [OmniOS-discuss] pkgdepend reports unresolved dependencies In-Reply-To: <7F0AA6D1-C3E3-490A-AF60-430148D0A0ED@vantagetitle.com> References: <7F0AA6D1-C3E3-490A-AF60-430148D0A0ED@vantagetitle.com> Message-ID: <20150107211950.GA12369@gutsman.lotheac.fi> On Wed, Jan 07 2015 12:52:14 -0500, Alex McWhirter wrote: > /tmp/build_locadmin/triadic_service_pigeonhole.p5m.int.3 has unresolved dependency ' > depend type=require fmri=__TBD pkg.debug.depend.file=libdovecot.so.0 \ > pkg.debug.depend.reason=opt/triadic/lib/amd64/dovecot/libdovecot-sieve.so.0.0.0 \ > pkg.debug.depend.type=elf \ > pkg.debug.depend.path=lib/64 \ > pkg.debug.depend.path=opt/triadic/lib/amd64 \ > pkg.debug.depend.path=usr/lib/64'. Okay, so the pkg.debug.depend.file (libdovecot.so.0) cannot be found by pkgdepend. pkg.debug.depend.reason points to what needs that file (because the latter links to the former). You were able to build it successfully, so obviously you linked to the correct library, but pkgdepend is telling you that the runtime linker can't find libdovecot.0.0.0 (this would probably bite you at runtime too). I solved exactly this problem by building pigeonhole with rpath appended so that libdovecot.so.0 could be found: https://github.com/niksula/omnios-build-scripts/blob/master/pigeonhole/build.sh#L43 -- Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet From stephan.budach at JVM.DE Wed Jan 7 21:21:03 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 7 Jan 2015 22:21:03 +0100 Subject: [OmniOS-discuss] Slow NFS speeds at rsize > 128k In-Reply-To: <3010EE58-59DE-408D-8BFA-28571F9B1A2B@richardelling.com> References: <54AD0A38.1060801@jvm.de> <54AD92F2.5080209@jvm.de> <3010EE58-59DE-408D-8BFA-28571F9B1A2B@richardelling.com> Message-ID: <54ADA33F.2040008@jvm.de> Am 07.01.15 um 21:48 schrieb Richard Elling: > >> On Jan 7, 2015, at 12:11 PM, Stephan Budach > > wrote: >> >> Am 07.01.15 um 18:00 schrieb Richard Elling: >>> >>>> On Jan 7, 2015, at 2:28 AM, Stephan Budach >>> > wrote: >>>> >>>> Hello everyone, >>>> >>>> I am sharing my zfs via NFS to a couple of OVM nodes. I noticed >>>> really bad NFS read performance, when rsize goes beyond 128k, >>>> whereas the performance is just fine at 32k. The issue is, that the >>>> ovs-agent, which is performing the actual mount, doesn't accept or >>>> pass any NFS mount options to the NFS server. >>> >>> The other issue is that illumos/Solaris on x86 tuning of server-side >>> size settings does >>> not work because the compiler optimizes away the tunables. There is >>> a trivial fix, but it >>> requires a rebuild. >>> >>>> To give some numbers, a rsize of 1mb results in a read throughput >>>> of approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting >>>> a NFS export from a OEL 6u4 box has no issues with this, as the >>>> read speeds from this export are 108+MB/s regardles of the rsize of >>>> the NFS mount. >>> >>> Brendan wrote about a similar issue in the Dtrace book as a case >>> study. See chapter 5 >>> case study on ZFS 8KB mirror reads. >>> >>>> >>>> The OmniOS box is currently connected to a 10GbE port at our core >>>> 6509, but the NFS client is connected through a 1GbE port only. MTU >>>> is at 1500 and can currently not be upped. >>>> Anyone having a tip, why a rsize of 64k+ will result in such a >>>> performance drop? >>> >>> It is entirely due to optimizations for small I/O going way back to >>> the 1980s. >>> -- richard >> But, doesn't that mean, that Oracle Solaris will have the same issue >> or has Oracle addressed that in recent Solaris versions? Not, that I >> am intending to switch over, but that would be something I'd like to >> give my SR engineer to chew on? > > Look for yourself :-) > In "broken" systems, such as this Solaris 11.1 system: > # echo nfs3_tsize::dis | mdb -k > nfs3_tsize: pushq %rbp > nfs3_tsize+1: movq %rsp,%rbp > nfs3_tsize+4: subq $0x8,%rsp > nfs3_tsize+8: movq %rdi,-0x8(%rbp) > nfs3_tsize+0xc: movl (%rdi),%eax > nfs3_tsize+0xe: leal -0x2(%rax),%ecx > nfs3_tsize+0x11: cmpl $0x1,%ecx > nfs3_tsize+0x14: jbe +0x12 > nfs3_tsize+0x16: cmpl $0x5,%eax > nfs3_tsize+0x19: movl $0x100000,%eax > nfs3_tsize+0x1e: movl $0x8000,%ecx > nfs3_tsize+0x23: cmovl.ne %ecx,%eax > nfs3_tsize+0x26: jmp +0x5 > nfs3_tsize+0x28: movl $0x100000,%eax > nfs3_tsize+0x2d: leave > nfs3_tsize+0x2e: ret > > at +0x19 you'll notice hardwired 1MB Ouch! Is that from a NFS client or server? Or rather, I know that the NFS server negotiates the options with the client and if no options are passed from the client to the server, the server sets up the connection with it's defaults. So, this S11.1 output - is that from the NFS server? If yes, it would mean that the NFS server would go with the 1mb rsize/wsize since the OracleVM Server has not provided any options to it. > > by contrast, on a proper system > # echo nfs3_tsize::dis | mdb -k > nfs3_tsize: pushq %rbp > nfs3_tsize+1: movq %rsp,%rbp > nfs3_tsize+4: subq $0x10,%rsp > nfs3_tsize+8: movq %rdi,-0x8(%rbp) > nfs3_tsize+0xc: movl (%rdi),%edx > nfs3_tsize+0xe: leal -0x2(%rdx),%eax > nfs3_tsize+0x11: cmpl $0x1,%eax > nfs3_tsize+0x14: jbe +0x12 > nfs3_tsize+0x16: > movl -0x37f8ea60(%rip),%eax > nfs3_tsize+0x1c: cmpl $0x5,%edx > nfs3_tsize+0x1f: > cmovl.ne -0x37f8ea72(%rip),%eax > nfs3_tsize+0x26: leave > nfs3_tsize+0x27: ret > nfs3_tsize+0x28: > movl -0x37f8ea76(%rip),%eax > nfs3_tsize+0x2e: leave > nfs3_tsize+0x2f: ret > > where you can actually tune it according to the Solaris Tunable > Parameters guide. > > NB, we fixed this years ago at Nexenta and I'm certain it has not been > upstreamed. There are > a number of other related fixes, all of the same nature. If someone is > inclined to upstream > contact me directly. > > Once, fixed, you'll be able to change the server's settings for > negotiating the rsize/wsize with > the clients. Many NAS vendors use smaller limits, and IMHO it is a > good idea anyway. For > example, see > http://blog.richardelling.com/2012/04/latency-and-io-size-cars-vs-trains.html > -- richard > I am mostly satisfied with a transfer size of 32k and as this NFS is used as storage repository for the vdisk images and approx 80 guests are accessing those, so the i/o is random anyway. So smaller I/Os are preferred anyway. However, the NFS export from the OEL box just doesn't have this massive performance hit, even with a rsize/wsize of 1mb. > >> >> In any way, the first bummer is, that Oracle chose to not have it's >> ovs-agent be capable of accepting and passing the NFS mount options? >> >> Cheers, >> budy > Thanks, budy -------------- next part -------------- An HTML attachment was scrubbed... URL: From lotheac at iki.fi Wed Jan 7 21:30:35 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Wed, 7 Jan 2015 23:30:35 +0200 Subject: [OmniOS-discuss] pkgdepend reports unresolved dependencies In-Reply-To: <20150107211950.GA12369@gutsman.lotheac.fi> References: <7F0AA6D1-C3E3-490A-AF60-430148D0A0ED@vantagetitle.com> <20150107211950.GA12369@gutsman.lotheac.fi> Message-ID: <20150107213035.GB12369@gutsman.lotheac.fi> On Wed, Jan 07 2015 23:19:50 +0200, Lauri Tirkkonen wrote: > pkgdepend is telling you that the runtime linker can't find > libdovecot.0.0.0 And that's a typo - I meant libdovecot.so.0. -- Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet From richard.elling at richardelling.com Wed Jan 7 23:01:28 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Wed, 7 Jan 2015 15:01:28 -0800 Subject: [OmniOS-discuss] Slow NFS speeds at rsize > 128k In-Reply-To: <54ADA33F.2040008@jvm.de> References: <54AD0A38.1060801@jvm.de> <54AD92F2.5080209@jvm.de> <3010EE58-59DE-408D-8BFA-28571F9B1A2B@richardelling.com> <54ADA33F.2040008@jvm.de> Message-ID: <04BE2451-2C23-461D-8B02-E67CCF8A6C20@richardelling.com> > On Jan 7, 2015, at 1:21 PM, Stephan Budach wrote: > > Am 07.01.15 um 21:48 schrieb Richard Elling: >> >>> On Jan 7, 2015, at 12:11 PM, Stephan Budach > wrote: >>> >>> Am 07.01.15 um 18:00 schrieb Richard Elling: >>>> >>>>> On Jan 7, 2015, at 2:28 AM, Stephan Budach > wrote: >>>>> >>>>> Hello everyone, >>>>> >>>>> I am sharing my zfs via NFS to a couple of OVM nodes. I noticed really bad NFS read performance, when rsize goes beyond 128k, whereas the performance is just fine at 32k. The issue is, that the ovs-agent, which is performing the actual mount, doesn't accept or pass any NFS mount options to the NFS server. >>>> >>>> The other issue is that illumos/Solaris on x86 tuning of server-side size settings does >>>> not work because the compiler optimizes away the tunables. There is a trivial fix, but it >>>> requires a rebuild. >>>> >>>>> To give some numbers, a rsize of 1mb results in a read throughput of approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. Mounting a NFS export from a OEL 6u4 box has no issues with this, as the read speeds from this export are 108+MB/s regardles of the rsize of the NFS mount. >>>> >>>> Brendan wrote about a similar issue in the Dtrace book as a case study. See chapter 5 >>>> case study on ZFS 8KB mirror reads. >>>> >>>>> >>>>> The OmniOS box is currently connected to a 10GbE port at our core 6509, but the NFS client is connected through a 1GbE port only. MTU is at 1500 and can currently not be upped. >>>>> Anyone having a tip, why a rsize of 64k+ will result in such a performance drop? >>>> >>>> It is entirely due to optimizations for small I/O going way back to the 1980s. >>>> -- richard >>> But, doesn't that mean, that Oracle Solaris will have the same issue or has Oracle addressed that in recent Solaris versions? Not, that I am intending to switch over, but that would be something I'd like to give my SR engineer to chew on? >> >> Look for yourself :-) >> In "broken" systems, such as this Solaris 11.1 system: >> # echo nfs3_tsize::dis | mdb -k >> nfs3_tsize: pushq %rbp >> nfs3_tsize+1: movq %rsp,%rbp >> nfs3_tsize+4: subq $0x8,%rsp >> nfs3_tsize+8: movq %rdi,-0x8(%rbp) >> nfs3_tsize+0xc: movl (%rdi),%eax >> nfs3_tsize+0xe: leal -0x2(%rax),%ecx >> nfs3_tsize+0x11: cmpl $0x1,%ecx >> nfs3_tsize+0x14: jbe +0x12 >> nfs3_tsize+0x16: cmpl $0x5,%eax >> nfs3_tsize+0x19: movl $0x100000,%eax >> nfs3_tsize+0x1e: movl $0x8000,%ecx >> nfs3_tsize+0x23: cmovl.ne %ecx,%eax >> nfs3_tsize+0x26: jmp +0x5 >> nfs3_tsize+0x28: movl $0x100000,%eax >> nfs3_tsize+0x2d: leave >> nfs3_tsize+0x2e: ret >> >> at +0x19 you'll notice hardwired 1MB > Ouch! Is that from a NFS client or server? server > Or rather, I know that the NFS server negotiates the options with the client and if no options are passed from the client to the server, the server sets up the connection with it's defaults. the server and client negotiate, so both can have defaults > So, this S11.1 output - is that from the NFS server? If yes, it would mean that the NFS server would go with the 1mb rsize/wsize since the OracleVM Server has not provided any options to it. You are not mistaken. AFAIK, this has been broken in Solaris x86 for more than 10 years. Fortunately, most people can adjust on the client side, unless you're running ESX or something that is difficult to adjust... like you seem to be. >> >> by contrast, on a proper system >> # echo nfs3_tsize::dis | mdb -k >> nfs3_tsize: pushq %rbp >> nfs3_tsize+1: movq %rsp,%rbp >> nfs3_tsize+4: subq $0x10,%rsp >> nfs3_tsize+8: movq %rdi,-0x8(%rbp) >> nfs3_tsize+0xc: movl (%rdi),%edx >> nfs3_tsize+0xe: leal -0x2(%rdx),%eax >> nfs3_tsize+0x11: cmpl $0x1,%eax >> nfs3_tsize+0x14: jbe +0x12 >> nfs3_tsize+0x16: >> movl -0x37f8ea60(%rip),%eax >> nfs3_tsize+0x1c: cmpl $0x5,%edx >> nfs3_tsize+0x1f: >> cmovl.ne -0x37f8ea72(%rip),%eax >> nfs3_tsize+0x26: leave >> nfs3_tsize+0x27: ret >> nfs3_tsize+0x28: >> movl -0x37f8ea76(%rip),%eax >> nfs3_tsize+0x2e: leave >> nfs3_tsize+0x2f: ret >> >> where you can actually tune it according to the Solaris Tunable Parameters guide. >> >> NB, we fixed this years ago at Nexenta and I'm certain it has not been upstreamed. There are >> a number of other related fixes, all of the same nature. If someone is inclined to upstream >> contact me directly. >> >> Once, fixed, you'll be able to change the server's settings for negotiating the rsize/wsize with >> the clients. Many NAS vendors use smaller limits, and IMHO it is a good idea anyway. For >> example, see http://blog.richardelling.com/2012/04/latency-and-io-size-cars-vs-trains.html >> -- richard >> > I am mostly satisfied with a transfer size of 32k and as this NFS is used as storage repository for the vdisk images and approx 80 guests are accessing those, so the i/o is random anyway. So smaller I/Os are preferred anyway. However, the NFS export from the OEL box just doesn't have this massive performance hit, even with a rsize/wsize of 1mb. Yes, this is not the only issue you're facing. Even with modest hardware and OOB settings, it is easy to soak 1GbE. For ZFS backends, we use 128k as the max rsize/wsize, since that is a practical upper limit (even though you can have larger block sizes in ZFS). Here are the OOB tcp parameters we use tcp max_buf rw 16777216 16777216 1048576 8192-1073741824 tcp recv_buf rw 1250000 1250000 1048576 2048-16777216 tcp sack rw active -- active never,passive, active tcp send_buf rw 1250000 1250000 128000 4096-16777216 no real magic here, but if you measure your network closely and it doesn't change much, then you can pre-set the values from your BDP. And, of course, following the USE methodology, check for errors... I can't count the number of times bad transceivers, cabling, or switch settings tripped people up. -- richard >> >>> >>> In any way, the first bummer is, that Oracle chose to not have it's ovs-agent be capable of accepting and passing the NFS mount options? >>> >>> Cheers, >>> budy >> > Thanks, > budy -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephan.budach at JVM.DE Thu Jan 8 10:25:57 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Thu, 8 Jan 2015 11:25:57 +0100 Subject: [OmniOS-discuss] Slow NFS speeds at rsize > 128k In-Reply-To: <04BE2451-2C23-461D-8B02-E67CCF8A6C20@richardelling.com> References: <54AD0A38.1060801@jvm.de> <54AD92F2.5080209@jvm.de> <3010EE58-59DE-408D-8BFA-28571F9B1A2B@richardelling.com> <54ADA33F.2040008@jvm.de> <04BE2451-2C23-461D-8B02-E67CCF8A6C20@richardelling.com> Message-ID: <54AE5B35.70009@jvm.de> Am 08.01.15 um 00:01 schrieb Richard Elling: > >> On Jan 7, 2015, at 1:21 PM, Stephan Budach > > wrote: >> >> Am 07.01.15 um 21:48 schrieb Richard Elling: >>> >>>> On Jan 7, 2015, at 12:11 PM, Stephan Budach >>> > wrote: >>>> >>>> Am 07.01.15 um 18:00 schrieb Richard Elling: >>>>> >>>>>> On Jan 7, 2015, at 2:28 AM, Stephan Budach >>>>> > wrote: >>>>>> >>>>>> Hello everyone, >>>>>> >>>>>> I am sharing my zfs via NFS to a couple of OVM nodes. I noticed >>>>>> really bad NFS read performance, when rsize goes beyond 128k, >>>>>> whereas the performance is just fine at 32k. The issue is, that >>>>>> the ovs-agent, which is performing the actual mount, doesn't >>>>>> accept or pass any NFS mount options to the NFS server. >>>>> >>>>> The other issue is that illumos/Solaris on x86 tuning of >>>>> server-side size settings does >>>>> not work because the compiler optimizes away the tunables. There >>>>> is a trivial fix, but it >>>>> requires a rebuild. >>>>> >>>>>> To give some numbers, a rsize of 1mb results in a read throughput >>>>>> of approx. 2Mb/s, whereas a rsize of 32k gives me 110Mb/s. >>>>>> Mounting a NFS export from a OEL 6u4 box has no issues with this, >>>>>> as the read speeds from this export are 108+MB/s regardles of the >>>>>> rsize of the NFS mount. >>>>> >>>>> Brendan wrote about a similar issue in the Dtrace book as a case >>>>> study. See chapter 5 >>>>> case study on ZFS 8KB mirror reads. >>>>> >>>>>> >>>>>> The OmniOS box is currently connected to a 10GbE port at our core >>>>>> 6509, but the NFS client is connected through a 1GbE port only. >>>>>> MTU is at 1500 and can currently not be upped. >>>>>> Anyone having a tip, why a rsize of 64k+ will result in such a >>>>>> performance drop? >>>>> >>>>> It is entirely due to optimizations for small I/O going way back >>>>> to the 1980s. >>>>> -- richard >>>> But, doesn't that mean, that Oracle Solaris will have the same >>>> issue or has Oracle addressed that in recent Solaris versions? Not, >>>> that I am intending to switch over, but that would be something I'd >>>> like to give my SR engineer to chew on? >>> >>> Look for yourself :-) >>> In "broken" systems, such as this Solaris 11.1 system: >>> # echo nfs3_tsize::dis | mdb -k >>> nfs3_tsize: pushq %rbp >>> nfs3_tsize+1: movq %rsp,%rbp >>> nfs3_tsize+4: subq $0x8,%rsp >>> nfs3_tsize+8: movq %rdi,-0x8(%rbp) >>> nfs3_tsize+0xc: movl (%rdi),%eax >>> nfs3_tsize+0xe: leal -0x2(%rax),%ecx >>> nfs3_tsize+0x11: cmpl $0x1,%ecx >>> nfs3_tsize+0x14: jbe +0x12 >>> nfs3_tsize+0x16: cmpl $0x5,%eax >>> nfs3_tsize+0x19: movl $0x100000,%eax >>> nfs3_tsize+0x1e: movl $0x8000,%ecx >>> nfs3_tsize+0x23: cmovl.ne %ecx,%eax >>> nfs3_tsize+0x26: jmp +0x5 >>> nfs3_tsize+0x28: movl $0x100000,%eax >>> nfs3_tsize+0x2d: leave >>> nfs3_tsize+0x2e: ret >>> >>> at +0x19 you'll notice hardwired 1MB >> Ouch! Is that from a NFS client or server? > > server > >> Or rather, I know that the NFS server negotiates the options with the >> client and if no options are passed from the client to the server, >> the server sets up the connection with it's defaults. > > the server and client negotiate, so both can have defaults > >> So, this S11.1 output - is that from the NFS server? If yes, it would >> mean that the NFS server would go with the 1mb rsize/wsize since the >> OracleVM Server has not provided any options to it. > > You are not mistaken. AFAIK, this has been broken in Solaris x86 for > more than 10 years. > Fortunately, most people can adjust on the client side, unless you're > running ESX or something > that is difficult to adjust... like you seem to be. Yes, I am - and my current workaround is to remount the NFS shares manually, prior to starting any guests that reside on those shares. This is so dumb from Oracle? I have raised an ER for that, since this is the only way to make sure, this scenario can reliably work in any NFS environment, but that's of course totally off-topic. ;) > >>> >>> by contrast, on a proper system >>> # echo nfs3_tsize::dis | mdb -k >>> nfs3_tsize: pushq %rbp >>> nfs3_tsize+1: movq %rsp,%rbp >>> nfs3_tsize+4: subq $0x10,%rsp >>> nfs3_tsize+8: movq %rdi,-0x8(%rbp) >>> nfs3_tsize+0xc: movl (%rdi),%edx >>> nfs3_tsize+0xe: leal -0x2(%rdx),%eax >>> nfs3_tsize+0x11: cmpl $0x1,%eax >>> nfs3_tsize+0x14: jbe +0x12 >>> nfs3_tsize+0x16: >>> movl -0x37f8ea60(%rip),%eax >>> nfs3_tsize+0x1c: cmpl $0x5,%edx >>> nfs3_tsize+0x1f: >>> cmovl.ne -0x37f8ea72(%rip),%eax >>> nfs3_tsize+0x26: leave >>> nfs3_tsize+0x27: ret >>> nfs3_tsize+0x28: >>> movl -0x37f8ea76(%rip),%eax >>> nfs3_tsize+0x2e: leave >>> nfs3_tsize+0x2f: ret >>> >>> where you can actually tune it according to the Solaris Tunable >>> Parameters guide. >>> >>> NB, we fixed this years ago at Nexenta and I'm certain it has not >>> been upstreamed. There are >>> a number of other related fixes, all of the same nature. If someone >>> is inclined to upstream >>> contact me directly. >>> >>> Once, fixed, you'll be able to change the server's settings for >>> negotiating the rsize/wsize with >>> the clients. Many NAS vendors use smaller limits, and IMHO it is a >>> good idea anyway. For >>> example, see >>> http://blog.richardelling.com/2012/04/latency-and-io-size-cars-vs-trains.html >>> -- richard >>> >> I am mostly satisfied with a transfer size of 32k and as this NFS is >> used as storage repository for the vdisk images and approx 80 guests >> are accessing those, so the i/o is random anyway. So smaller I/Os are >> preferred anyway. However, the NFS export from the OEL box just >> doesn't have this massive performance hit, even with a rsize/wsize of >> 1mb. > > Yes, this is not the only issue you're facing. Even with modest > hardware and OOB settings, it is > easy to soak 1GbE. For ZFS backends, we use 128k as the max > rsize/wsize, since that is a > practical upper limit (even though you can have larger block sizes in > ZFS). > > Here are the OOB tcp parameters we use > tcp max_buf rw 16777216 16777216 1048576 > 8192-1073741824 > tcp recv_buf rw 1250000 1250000 1048576 > 2048-16777216 > tcp sack rw active -- active > never,passive, > active > tcp send_buf rw 1250000 1250000 128000 > 4096-16777216 > > no real magic here, but if you measure your network closely and it > doesn't change much, then > you can pre-set the values from your BDP. > > And, of course, following the USE methodology, check for errors... I > can't count the number of > times bad transceivers, cabling, or switch settings tripped people up. > -- richard > Thanks for sharing your insights. Do you think, that the situation will be improved once we finish our network transistion from our (mostly) 1GbE network infrastructure to Nexus gear running 10 GbE? Thanks, budy -- Stephan Budach Managing Director Jung von Matt/it-services GmbH Glash?ttenstra?e 79 20357 Hamburg Tel: +49 40-4321-1353 Fax: +49 40-4321-1114 E-Mail: stephan.budach at jvm.de Internet: http://www.jvm.com Gesch?ftsf?hrer: Stephan Budach AG HH HRB 98380 -------------- next part -------------- An HTML attachment was scrubbed... URL: From svavar at pipar-tbwa.is Thu Jan 8 15:25:53 2015 From: svavar at pipar-tbwa.is (=?iso-8859-1?Q?Svavar_=D6rn_Eysteinsson?=) Date: Thu, 8 Jan 2015 15:25:53 +0000 Subject: [OmniOS-discuss] Controller and or HD recomendations. ZFS storage server - upgrade. Message-ID: <912C3990-0EAC-47CF-B710-82FE6CB58424@pipar-tbwa.is> Hello list. I'm in the need to upgrade/expand my storage OmniOS ZFS storage server. which has the following hardware today : Supermicro AOC USAS L8i - LSI SAS 1068E PCIe controller. (which has I think a 2TB HD limit) 8x Hitachi Hitachi HDS72302 2TB disks (Deskstar 7K3000) 1x datapool which is configured as 2x vdevs of raidz datapool raidz1-0 c3t4d0 c4t1d0 c3t3d0 c4t4d0 raidz1-1 c4t0d0 c4t2d0 c3t1d0 c3t2d0 Total of 14.5TB raw space, 10.5TB usable space. Now, I need more space and clearly I need another controller and or disks. The controller is stated above, only supports 2TB disks and those 2TB Hitachi babies which are awesome have a 512 bytes per sector. So my vdevs have a ashift=9. I have never ever had any problems what so ever with those Hitachi disks in the last 4 years or soe. It was a mistake I think not creating the pool with ashift=12. As I need to create a new pool which consists of 4K drives in a ashift=12 pool correct ? Do people recommend any specific PCI-Express controllers that are preferred with ZFS and supports large disks. 4TB+ any success/bad stories on 4TB+ disks from the manufactures ? Seagate/WD/Toshiba/HGST ... ? Correct me if I'm wrong, is it correct that the optimal number of disks on 4K(sectors) vdev is 128 / (number of disks - parity disks) = should be a flat number ? So in my case I have a 128 / 4 - 1 = 31 or is it 128/3 = 42,66 (do not include the parity drive) in a vdev ? This storage server is mainly used as a archive server and or file services. Just wandering if the optimal number of disks in a vdev should matter in this case. Any suggestions and success stories much appreciated. Thanks allot. Best regards, Svavar O Reykjavik - Iceland From mir at miras.org Thu Jan 8 16:09:09 2015 From: mir at miras.org (Michael Rasmussen) Date: Thu, 8 Jan 2015 17:09:09 +0100 Subject: [OmniOS-discuss] Controller and or HD recomendations. ZFS storage server - upgrade. In-Reply-To: <912C3990-0EAC-47CF-B710-82FE6CB58424@pipar-tbwa.is> References: <912C3990-0EAC-47CF-B710-82FE6CB58424@pipar-tbwa.is> Message-ID: <20150108170909.338d40bd@sleipner.datanom.net> On Thu, 8 Jan 2015 15:25:53 +0000 Svavar ?rn Eysteinsson wrote: > > Any suggestions and success stories much appreciated. > LSI SAS 2008 and LSI SAS 2308 should be a sure purchase and comes cheap these days. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: A newspaper is a circulating library with high blood pressure. -- Arthure "Bugs" Baer -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From mir at miras.org Thu Jan 8 16:14:41 2015 From: mir at miras.org (Michael Rasmussen) Date: Thu, 8 Jan 2015 17:14:41 +0100 Subject: [OmniOS-discuss] Controller and or HD recomendations. ZFS storage server - upgrade. In-Reply-To: <912C3990-0EAC-47CF-B710-82FE6CB58424@pipar-tbwa.is> References: <912C3990-0EAC-47CF-B710-82FE6CB58424@pipar-tbwa.is> Message-ID: <20150108171441.11d70539@sleipner.datanom.net> Forgot this blog post in my previous mail which has a lot of god stuff but a bid dated though: http://blog.zorinaq.com/?e=10 -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: 10.0 times 0.1 is hardly ever 1.0. - The Elements of Programming Style (Kernighan & Plaugher) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From nagele at wildbit.com Thu Jan 8 16:21:54 2015 From: nagele at wildbit.com (Chris Nagele) Date: Thu, 8 Jan 2015 11:21:54 -0500 Subject: [OmniOS-discuss] Controller and or HD recomendations. ZFS storage server - upgrade. In-Reply-To: <912C3990-0EAC-47CF-B710-82FE6CB58424@pipar-tbwa.is> References: <912C3990-0EAC-47CF-B710-82FE6CB58424@pipar-tbwa.is> Message-ID: > Do people recommend any specific PCI-Express controllers that are preferred > with ZFS and supports large disks. 4TB+ In the past we used the LSI 9211-8i, which seems pretty standard for ZFS / Solaris. More recently we started to use the LSI 9207-8i. Both work well. Chris From chip at innovates.com Thu Jan 8 16:48:57 2015 From: chip at innovates.com (Schweiss, Chip) Date: Thu, 8 Jan 2015 10:48:57 -0600 Subject: [OmniOS-discuss] Controller and or HD recomendations. ZFS storage server - upgrade. In-Reply-To: <912C3990-0EAC-47CF-B710-82FE6CB58424@pipar-tbwa.is> References: <912C3990-0EAC-47CF-B710-82FE6CB58424@pipar-tbwa.is> Message-ID: On Thu, Jan 8, 2015 at 9:25 AM, Svavar ?rn Eysteinsson wrote: > Do people recommend any specific PCI-Express controllers that are preferred > with ZFS and supports large disks. 4TB+ > > any success/bad stories on 4TB+ disks from the manufactures ? > Seagate/WD/Toshiba/HGST ... ? > As far as I know all enterprise grade 4TB disks are still 512b native sectors so you're still okay at 4TB. At 6TB things start changing. If you're upgrading you can always to a zfs send/receive to a new 4K pool, assuming you have the drive slots to make this happen. I have had good luck with Seagate 4TB Constellation SAS. I have a few hundred of these running every since they were first released. LSI 2008 or 2308 based HBAs are a good move to 6Gb SAS. The support for the 3008 is in the latest OmniOS, but it may still be a bit buggy as it hasn't been widely used yet. -Chip -------------- next part -------------- An HTML attachment was scrubbed... URL: From mir at miras.org Thu Jan 8 16:54:49 2015 From: mir at miras.org (Michael Rasmussen) Date: Thu, 8 Jan 2015 17:54:49 +0100 Subject: [OmniOS-discuss] Controller and or HD recomendations. ZFS storage server - upgrade. In-Reply-To: References: <912C3990-0EAC-47CF-B710-82FE6CB58424@pipar-tbwa.is> Message-ID: <20150108175449.4e03be23@sleipner.datanom.net> On Thu, 8 Jan 2015 11:21:54 -0500 Chris Nagele wrote: > > Do people recommend any specific PCI-Express controllers that are preferred > > with ZFS and supports large disks. 4TB+ > > In the past we used the LSI 9211-8i, which seems pretty standard for > ZFS / Solaris. More recently we started to use the LSI 9207-8i. Both > work well. > Yep LSI 9211-8i = SAS 2008 LSI 9207-8i = SAS 2308 I was referring to the controller and not the product number;-) -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Isn't it nice that people who prefer Los Angeles to San Francisco live there? -- Herb Caen -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From danmcd at omniti.com Thu Jan 8 17:31:40 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 8 Jan 2015 12:31:40 -0500 Subject: [OmniOS-discuss] OpenSSL updated to 1.0.1k ---> PLEASE UPDATE Message-ID: Relevant security advisory: https://www.openssl.org/news/secadv_20150108.txt I've updated 006 (LTS), 010 (last-stable), 012 (stable), and 013 (bloody) to OpenSSL version 1.0.1k. The vulnerabilities detailed above could've been worse, but since the update is available, I figured it was better to have 'em sooner rather than later. Make sure you *restart* any services using openssl after updating the libraries. This update won't require a reboot, but any openssl-using services should get restarted after the update. Happy updating! Dan From tobi at oetiker.ch Fri Jan 9 08:29:03 2015 From: tobi at oetiker.ch (Tobias Oetiker) Date: Fri, 9 Jan 2015 09:29:03 +0100 (CET) Subject: [OmniOS-discuss] HGST HUS724030ALS640 vs WD WD4001FYYG Message-ID: We are looking into buying some new 4TB drives for our omnios ZFS boxes. The race is between these two devices ... both sas, both 512n both 4T. HGST HUS724030ALS640 vs WD WD4001FYYG I have heard anectotal evidence that the WD Re drives were much more preformant than the HGST offering. 20 MB/s (HGST) vs 140 MB/s (WD Re) scrub performance in a Mirror setup We are using the HGST devices at the moment and they did not feel especially nimble ... can anyone share their experiance ? cheers tobi -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From sim.ple at live.nl Fri Jan 9 21:33:37 2015 From: sim.ple at live.nl (Randy S) Date: Fri, 9 Jan 2015 22:33:37 +0100 Subject: [OmniOS-discuss] assertion failed for thread / omnios r12 Message-ID: Hi all, Maybe this has been covered already (I saw a bug about this so I thought this occurence should not be present in omnios r12) but when I do a zdb -d rpool after having upgraded the rpool to the latest version, I get a : assertion failed for thread 0xfffffd7fff162a40, thread-id 1: spa_writeable(vd->vdev_spa), file ../../../uts/common/fs/zfs/vdev.c, line 1566 What can have caused this. zpool upgrade rpool This system supports ZFS pool feature flags. Enabled the following features on 'rpool': lz4_compress multi_vdev_crash_dump spacemap_histogram enabled_txg hole_birth extensible_dataset embedded_data bookmarks filesystem_limits Is there a way I can disable this spacemap feature after having done the upgrade? It seems that Bug #5165 (https://www.illumos.org/issues/5165) is still in there. Regards, R -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Fri Jan 9 21:48:40 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 9 Jan 2015 16:48:40 -0500 Subject: [OmniOS-discuss] assertion failed for thread / omnios r12 In-Reply-To: References: Message-ID: > On Jan 9, 2015, at 4:33 PM, Randy S wrote: > > > Is there a way I can disable this spacemap feature after having done the upgrade? Not really. > It seems that Bug #5165 (https://www.illumos.org/issues/5165) is still in there. We should probably backport that to r151012. Are you okay otherwise? If you're not booting because of this? Or just seeing zdb assertion failures? Dan From sim.ple at live.nl Fri Jan 9 21:58:52 2015 From: sim.ple at live.nl (Randy S) Date: Fri, 9 Jan 2015 22:58:52 +0100 Subject: [OmniOS-discuss] assertion failed for thread / omnios r12 In-Reply-To: References: , Message-ID: Hi Dan, I haven't noticed any effects otherwise till now. System can be booted with no problem. It seems only assertion failures causing the zdb command not to function. Randy > Subject: Re: [OmniOS-discuss] assertion failed for thread / omnios r12 > From: danmcd at omniti.com > Date: Fri, 9 Jan 2015 16:48:40 -0500 > CC: omnios-discuss at lists.omniti.com > To: sim.ple at live.nl > > > > On Jan 9, 2015, at 4:33 PM, Randy S wrote: > > > > > > Is there a way I can disable this spacemap feature after having done the upgrade? > > Not really. > > > It seems that Bug #5165 (https://www.illumos.org/issues/5165) is still in there. > > We should probably backport that to r151012. Are you okay otherwise? If you're not booting because of this? Or just seeing zdb assertion failures? > > Dan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.elling at richardelling.com Sat Jan 10 02:02:27 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Fri, 9 Jan 2015 18:02:27 -0800 Subject: [OmniOS-discuss] assertion failed for thread / omnios r12 In-Reply-To: References: Message-ID: > On Jan 9, 2015, at 1:33 PM, Randy S wrote: > > Hi all, > > Maybe this has been covered already (I saw a bug about this so I thought this occurence should not be present in omnios r12) but when I do a zdb -d rpool after having upgraded the rpool to the latest version, I get a : > assertion failed for thread 0xfffffd7fff162a40, thread-id 1: spa_writeable(vd->vdev_spa), file ../../../uts/common/fs/zfs/vdev.c, line 1566 > > What can have caused this. Its a bug, zdb doesn't open the pool for writing, so it can't be writable. > > zpool upgrade rpool > This system supports ZFS pool feature flags. > > Enabled the following features on 'rpool': > lz4_compress > multi_vdev_crash_dump > spacemap_histogram > enabled_txg > hole_birth > extensible_dataset > embedded_data > bookmarks > filesystem_limits > > Is there a way I can disable this spacemap feature after having done the upgrade? > It seems that Bug #5165 (https://www.illumos.org/issues/5165) is still in there. yep zdb is intended for debugging and isn't guaranteed to run successfully on imported pools. There is likely some other way to get the info your looking for... so what are you looking for? -- richard > > Regards, > > R > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexmcwhirter at vantagetitle.com Sat Jan 10 04:34:25 2015 From: alexmcwhirter at vantagetitle.com (Alex McWhirter) Date: Fri, 9 Jan 2015 23:34:25 -0500 Subject: [OmniOS-discuss] Postfix adding con files to sbin Message-ID: <15D58A68-4424-456D-BCD2-D2C0CA621444@vantagetitle.com> I?m compiling postfix 2.11.3 using the template build scripts with a few modifications. Postfix doesn?t use a configure script, but rather uses makefiles to compile. I?ve worked most of the kinks out, but for some reason it is adding configuration files to etc and sbin, and i can?t figure out why. It also seems that the binary files are built 32 bit, even with the -m64 flags. I tested the binary files using the file command. Here is the isaexec log. Making isaexec stub binaries --- bin ------ junk ------ postalias ------ postcat ------ postconf ------ postdrop ------ postfix ------ postkick ------ postlock ------ postlog ------ postmap ------ postmulti ------ postqueue ------ postsuper --- sbin ------ anvil ------ bounce ------ cleanup ------ discard ------ dnsblog ------ error ------ flush ------ lmtp ------ local ------ main.cf ------ master ------ master.cf ------ nqmgr ------ oqmgr ------ pickup ------ pipe ------ post-install ------ postfix-files ------ postfix-script ------ postfix-wrapper ------ postmulti-script ------ postscreen ------ proxymap ------ qmgr ------ qmqpd ------ scache ------ showq ------ smtp ------ smtpd ------ spawn ------ tlsmgr ------ tlsproxy ------ trivial-rewrite ------ verify ------ virtual You can see main.cf and master.cf in the sbin directory. Here is a copy of my build script. PROG=postfix # Load support functions . ../../lib/functions.sh VER=2.11.3 VERHUMAN=$VER PKG=triadic/service/smtp/postfix SUMMARY="TODO" DESC="TODO" ARCH="" configure32() { ARCH=$ISAPART logmsg "--- configure (make makefiles)" logcmd $MAKE makefiles CCARGS='-DNO_NIS \ -DDEF_COMMAND_DIR=\"/opt/triadic/sbin/i386\" \ -DDEF_CONFIG_DIR=\"/opt/triadic/etc/postfix\" \ -DDEF_DAEMON_DIR=\"/opt/triadic/usr/libexec/i386/postfix\" \ -DDEF_DATA_DIR=\"/opt/triadic/var/postfix\" \ -DDEF_MAILQ_PATH=\"/opt/triadic/bin/i386/mailq\" \ -DDEF_MANPAGE_DIR=\"/opt/triadic/usr/share/i386/man\" \ -DDEF_NEWALIAS_PATH=\"/opt/triadic/bin/i386/newaliases\" \ -DDEF_QUEUE_DIR=\"/opt/triadic/var/spool/postfix\" \ -DDEF_SENDMAIL_PATH=\"/opt/triadic/sbin/i386/sendmail\" \ ' || logerr "Failed make makefiles command" } configure64() { ARCH=$ISAPART64 logmsg "--- configure (make makefiles)" logcmd $MAKE makefiles CCARGS='-DNO_NIS \ -DDEF_COMMAND_DIR=\"/opt/triadic/sbin/amd64\" \ -DDEF_CONFIG_DIR=\"/opt/triadic/etc/postfix\" \ -DDEF_DAEMON_DIR=\"/opt/triadic/usr/libexec/amd64/postfix\" \ -DDEF_DATA_DIR=\"/opt/triadic/var/postfix\" \ -DDEF_MAILQ_PATH=\"/opt/triadic/bin/amd64/mailq\" \ -DDEF_MANPAGE_DIR=\"/opt/triadic/usr/share/amd64/man\" \ -DDEF_NEWALIAS_PATH=\"/opt/triadic/bin/amd64/newaliases\" \ -DDEF_QUEUE_DIR=\"/opt/triadic/var/spool/postfix\" \ -DDEF_SENDMAIL_PATH=\"/opt/triadic/sbin/amd64/sendmail\" \ ' || logerr "Failed make makefiles command" } make_clean() { logmsg "--- make clean" logcmd $MAKE distclean || \ logcmd $MAKE clean logmsg "--- *** WARNING *** make (dist)clean Failed" } make_install() { logmsg "--- make install" logcmd /bin/sh postfix-install -non-interactive \ install_root=$DESTDIR \ config_directory=$PREFIX/etc/postfix \ data_directory=$PREFIX/var/postfix \ daemon_directory=$PREFIX/sbin/$ARCH \ command_directory=$PREFIX/bin/$ARCH \ queue_directory=$PREFIX/var/spool/postfix \ sendmail_path=$PREFIX/bin/$ARCH \ newaliases_path=$PREFIX/bin/$ARCH \ mailq_path=$PREFIX/bin/$ARCH \ manpage_directory=$PREFIX/usr/share/$ARCH/man \ readme_directory=\"no\" \ html_directory=\"no\" \ || logerr "make install failed" } init download_source $PROG $PROG $VER patch_source prep_build build make_isa_stub #make_package #clean_up # Vim hints # vim:ts=4:sw=4:et: As far as i can tell i have all configuration options defined, so I?m not really sure why I?m getting these results. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sim.ple at live.nl Sat Jan 10 09:32:48 2015 From: sim.ple at live.nl (Randy S) Date: Sat, 10 Jan 2015 10:32:48 +0100 Subject: [OmniOS-discuss] assertion failed for thread / omnios r12 In-Reply-To: References: , Message-ID: Hi Richard, I was checking for the presence of hidden clones, if any. Randy Subject: Re: [OmniOS-discuss] assertion failed for thread / omnios r12 From: richard.elling at richardelling.com Date: Fri, 9 Jan 2015 18:02:27 -0800 CC: omnios-discuss at lists.omniti.com To: sim.ple at live.nl On Jan 9, 2015, at 1:33 PM, Randy S wrote: Hi all, Maybe this has been covered already (I saw a bug about this so I thought this occurence should not be present in omnios r12) but when I do a zdb -d rpool after having upgraded the rpool to the latest version, I get a : assertion failed for thread 0xfffffd7fff162a40, thread-id 1: spa_writeable(vd->vdev_spa), file ../../../uts/common/fs/zfs/vdev.c, line 1566 What can have caused this. Its a bug, zdb doesn't open the pool for writing, so it can't be writable. zpool upgrade rpool This system supports ZFS pool feature flags. Enabled the following features on 'rpool': lz4_compress multi_vdev_crash_dump spacemap_histogram enabled_txg hole_birth extensible_dataset embedded_data bookmarks filesystem_limits Is there a way I can disable this spacemap feature after having done the upgrade? It seems that Bug #5165 (https://www.illumos.org/issues/5165) is still in there. yep zdb is intended for debugging and isn't guaranteed to run successfully on importedpools. There is likely some other way to get the info your looking for... so what are you looking for? -- richard Regards, R _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.r.eremin at gmail.com Sat Jan 10 18:46:36 2015 From: alexander.r.eremin at gmail.com (Alexander) Date: Sat, 10 Jan 2015 21:46:36 +0300 Subject: [OmniOS-discuss] sometimes hsfs causes panic Message-ID: Hello, while testing iso installation, for some iso?s I have panic from ASSERT in hsfs_vnops.c line 1231 (when cpio works). This happens from time to time, has anyone seen such a thing? Screenshot is attached. --? Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2015-01-08 at 19.23.15.png Type: image/png Size: 54766 bytes Desc: not available URL: From jimklimov at cos.ru Mon Jan 12 13:52:12 2015 From: jimklimov at cos.ru (Jim Klimov) Date: Mon, 12 Jan 2015 14:52:12 +0100 Subject: [OmniOS-discuss] assertion failed for thread / omnios r12 In-Reply-To: References: Message-ID: <23198E21-375D-49D3-BBCD-5AB3BBC60765@cos.ru> On 10 January 2015 03:02:27 CET, Richard Elling wrote: > >> On Jan 9, 2015, at 1:33 PM, Randy S wrote: >> >> Hi all, >> >> Maybe this has been covered already (I saw a bug about this so I >thought this occurence should not be present in omnios r12) but when I >do a zdb -d rpool after having upgraded the rpool to the latest >version, I get a : >> assertion failed for thread 0xfffffd7fff162a40, thread-id 1: >spa_writeable(vd->vdev_spa), file ../../../uts/common/fs/zfs/vdev.c, >line 1566 >> >> What can have caused this. > >Its a bug, zdb doesn't open the pool for writing, so it can't be >writable. > >> >> zpool upgrade rpool >> This system supports ZFS pool feature flags. >> >> Enabled the following features on 'rpool': >> lz4_compress >> multi_vdev_crash_dump >> spacemap_histogram >> enabled_txg >> hole_birth >> extensible_dataset >> embedded_data >> bookmarks >> filesystem_limits >> >> Is there a way I can disable this spacemap feature after having done >the upgrade? >> It seems that Bug #5165 (https://www.illumos.org/issues/5165) is >still in there. > >yep > >zdb is intended for debugging and isn't guaranteed to run successfully >on imported >pools. There is likely some other way to get the info your looking >for... so what are >you looking for? > -- richard > > >> >> Regards, >> >> R >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > >------------------------------------------------------------------------ > >_______________________________________________ >OmniOS-discuss mailing list >OmniOS-discuss at lists.omniti.com >http://lists.omniti.com/mailman/listinfo/omnios-discuss @Randy: See also if the anti-assertion options in zdb would help (i.e. -AAA)? Note that when used like this, unlike other zdb options, (e.g. multiple -d -d -d = -ddd), the different -A* options seemed to set different flags in zdb, at least when I looked at the code a couple of years ago. So you might need to set several of these like 'zdb -A -AAA -d -e rpool' or whatever. Of course, most assertions are tripped for a reason, so the view of a 'live' pool would likely seem inconsistent as zdb traverses the tree, parts of which may be obsolete or overwritten by the time it gets there. Jim -- Typos courtesy of K-9 Mail on my Samsung Android From danmcd at omniti.com Mon Jan 12 19:04:23 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 12 Jan 2015 14:04:23 -0500 Subject: [OmniOS-discuss] Any bloody users out there? Message-ID: <1C663720-54A5-413D-B966-B1C67B624543@omniti.com> I'm close to an update (with some exciting developments), but I'm wondering if anyone is using bloody for actual data. I ask because as of right this second, both illumos-gate and illumos-omnios have a potential problem in ZFS. Consider this a ping to see if bloody users are out there, and how they may or may not be affected by potentially bad bugs. Thanks, Dan From gate03 at landcroft.co.uk Mon Jan 12 21:37:24 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Tue, 13 Jan 2015 07:37:24 +1000 Subject: [OmniOS-discuss] Any bloody users out there? In-Reply-To: <1C663720-54A5-413D-B966-B1C67B624543@omniti.com> References: <1C663720-54A5-413D-B966-B1C67B624543@omniti.com> Message-ID: <20150113073724.0aa7a168@emeritus> On Mon, 12 Jan 2015 14:04:23 -0500 Dan McDonald wrote: > I'm close to an update (with some exciting developments), but I'm > wondering if anyone is using bloody for actual data. I ask because > as of right this second, both illumos-gate and illumos-omnios have a > potential problem in ZFS. Oh dear. I am. Should I revert to stable / LTS ? Michael. From alexmcwhirter at vantagetitle.com Mon Jan 12 21:42:03 2015 From: alexmcwhirter at vantagetitle.com (Alex McWhirter) Date: Mon, 12 Jan 2015 16:42:03 -0500 Subject: [OmniOS-discuss] PERL Modules in PKG Message-ID: <8A815BEF-D1A8-4257-8158-B923212216B6@vantagetitle.com> I?m building a software package that requires some extra perl modules. In the build script it is easy enough to add in the commands to install these modules, but how do i make these commands carry over to the IPS repository? essentially when an end user executes ?pkg install package? it has to also install a set of perl modules via cpan. Or would it be best to compile these perl modules into IP packages and set them as dependencies? From vab at bb-c.de Mon Jan 12 21:55:05 2015 From: vab at bb-c.de (Volker A. Brandt) Date: Mon, 12 Jan 2015 22:55:05 +0100 Subject: [OmniOS-discuss] PERL Modules in PKG In-Reply-To: <8A815BEF-D1A8-4257-8158-B923212216B6@vantagetitle.com> References: <8A815BEF-D1A8-4257-8158-B923212216B6@vantagetitle.com> Message-ID: <21684.17081.907538.853087@glaurung.bb-c.de> Alex McWhirter writes: [...] > Or would it be best > to compile these perl modules into IP packages and set them as > dependencies? Yes. If you do this more often you might want to script this. Best regards -- Volker A. Brandt -- ------------------------------------------------------------------------ Volker A. Brandt Consulting and Support for Oracle Solaris Brandt & Brandt Computer GmbH WWW: http://www.bb-c.de/ Am Wiesenpfad 6, 53340 Meckenheim, GERMANY Email: vab at bb-c.de Handelsregister: Amtsgericht Bonn, HRB 10513 Schuhgr??e: 46 Gesch?ftsf?hrer: Rainer J.H. Brandt und Volker A. Brandt "When logic and proportion have fallen sloppy dead" From danmcd at omniti.com Mon Jan 12 22:09:58 2015 From: danmcd at omniti.com (Dan McDonald) Date: Mon, 12 Jan 2015 17:09:58 -0500 Subject: [OmniOS-discuss] Any bloody users out there? In-Reply-To: <20150113073724.0aa7a168@emeritus> References: <1C663720-54A5-413D-B966-B1C67B624543@omniti.com> <20150113073724.0aa7a168@emeritus> Message-ID: <243EB2A5-1F9E-48D0-B4EF-0856D2DD2AD4@omniti.com> No need to panic. I'm asking because there is an issue in Illumos-gate and it's also in the source, but not yet on the bloody repo. I'm trying to balance risk. Dan Sent from my iPhone (typos, autocorrect, and all) > On Jan 12, 2015, at 4:37 PM, Michael Mounteney wrote: > > On Mon, 12 Jan 2015 14:04:23 -0500 > Dan McDonald wrote: > >> I'm close to an update (with some exciting developments), but I'm >> wondering if anyone is using bloody for actual data. I ask because >> as of right this second, both illumos-gate and illumos-omnios have a >> potential problem in ZFS. > > Oh dear. I am. Should I revert to stable / LTS ? > > Michael. From danmcd at omniti.com Tue Jan 13 14:50:44 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 13 Jan 2015 09:50:44 -0500 Subject: [OmniOS-discuss] Any bloody users out there? In-Reply-To: <1C663720-54A5-413D-B966-B1C67B624543@omniti.com> References: <1C663720-54A5-413D-B966-B1C67B624543@omniti.com> Message-ID: <84AAB9B7-BAF9-449D-871C-02BF2BD51EE6@omniti.com> > On Jan 12, 2015, at 2:04 PM, Dan McDonald wrote: > > I'm close to an update (with some exciting developments), but I'm wondering if anyone is using bloody for actual data. I ask because as of right this second, both illumos-gate and illumos-omnios have a potential problem in ZFS. > Many are sending me unicast mails when we should be discussing this on list. . . . Scroll down here and look for "ZFS" http://echelog.com/logs/browse/illumos/1421017200 Very recently introduced code changes likely caused them. These changes HAVE NOT been compiled and pushed out into the bloody IPS server yet, BUT they are in the illumos-omnios master branch so you can see them (and use them if you compile yourself). Dan From henk at hlangeveld.nl Tue Jan 13 16:45:56 2015 From: henk at hlangeveld.nl (Henk Langeveld) Date: Tue, 13 Jan 2015 17:45:56 +0100 Subject: [OmniOS-discuss] adding cua/a as a second login In-Reply-To: <20150102084937.36a704c8@emeritus> References: <20141204153051.3e17ac8f@punda-mlia> <20141205073111.0762e2b1@punda-mlia> <4cff01d01010$dc508510$94f18f30$@acm.org> <20141205085057.5e000d9e@punda-mlia> <20141231225028.GH29549@bender.unx.csupomona.edu> <20150102084937.36a704c8@emeritus> Message-ID: <54B54BC4.1080603@hlangeveld.nl> On 01/01/15 23:49, Michael Mounteney wrote: > 1. Generally in *nix, items in a list are separated by a colon or a > space; rarely a comma. No need to worry. Commas have been used as separators throughout unix history. Examples: group(4), terminfo(4). I tend to associate colons with different *fields* in a line, each with a different meaning. Look for instance at group(4) where different fields of a record are separated by colons, and the actual list of users held together with commas. Cheers, Henk From fabio at fabiorabelo.wiki.br Thu Jan 15 13:47:24 2015 From: fabio at fabiorabelo.wiki.br (=?UTF-8?Q?F=C3=A1bio_Rabelo?=) Date: Thu, 15 Jan 2015 11:47:24 -0200 Subject: [OmniOS-discuss] 6TB HDs Message-ID: Hi to all In a new system, will be used just 6 TB Hard Discs, with LSI controlers, no expander whasoever . WD RED, TQ chassis from supermicro . I need to set Ashift to 12, or the Omni will do it automatically ? Thanks in advance ... -------------- next part -------------- An HTML attachment was scrubbed... URL: From tobi at oetiker.ch Thu Jan 15 14:39:39 2015 From: tobi at oetiker.ch (Tobias Oetiker) Date: Thu, 15 Jan 2015 15:39:39 +0100 (CET) Subject: [OmniOS-discuss] 6TB HDs In-Reply-To: References: Message-ID: Fabio, Today F?bio Rabelo wrote: > Hi to all > > In a new system, will be used just 6 TB Hard Discs, with LSI controlers, no > expander whasoever . there are 6TB 512n disks now ... just saying ... cheers tobi > WD RED, TQ chassis from supermicro . > > I need to set Ashift to 12, or the Omni will do it automatically ? > > Thanks in advance ... > -- Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland www.oetiker.ch tobi at oetiker.ch +41 62 775 9902 From hasslerd at gmx.li Thu Jan 15 15:09:09 2015 From: hasslerd at gmx.li (Dominik Hassler) Date: Thu, 15 Jan 2015 16:09:09 +0100 Subject: [OmniOS-discuss] 6TB HDs In-Reply-To: References: Message-ID: <54B7D815.5020602@gmx.li> AFAIK it depends on what the disks report. if they report 4k you are fine. if they report 512e, check if they are "on the list", if not, add them: http://wiki.illumos.org/display/illumos/List+of+sd-config-list+entries+for+Advanced-Format+drives On 01/15/2015 03:39 PM, Tobias Oetiker wrote: > Fabio, > > Today F?bio Rabelo wrote: > >> Hi to all >> >> In a new system, will be used just 6 TB Hard Discs, with LSI controlers, no >> expander whasoever . > > there are 6TB 512n disks now ... just saying ... > > cheers > tobi > > >> WD RED, TQ chassis from supermicro . >> >> I need to set Ashift to 12, or the Omni will do it automatically ? >> >> Thanks in advance ... >> > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- A non-text attachment was scrubbed... Name: 0xF9ECC6A5.asc Type: application/pgp-keys Size: 2686 bytes Desc: not available URL: From danmcd at omniti.com Thu Jan 15 20:56:09 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 15 Jan 2015 15:56:09 -0500 Subject: [OmniOS-discuss] Curl now at version 7.40, please update! Message-ID: <78999ADD-BCD7-4F76-8A91-A1685113BB4D@omniti.com> It won't force a reboot, but you *may* have to restart services that depend on libcurl. Thanks, Dan From richard at netbsd.org Fri Jan 16 05:51:06 2015 From: richard at netbsd.org (Richard PALO) Date: Fri, 16 Jan 2015 06:51:06 +0100 Subject: [OmniOS-discuss] Any bloody users out there? In-Reply-To: <243EB2A5-1F9E-48D0-B4EF-0856D2DD2AD4@omniti.com> References: <1C663720-54A5-413D-B966-B1C67B624543@omniti.com> <20150113073724.0aa7a168@emeritus> <243EB2A5-1F9E-48D0-B4EF-0856D2DD2AD4@omniti.com> Message-ID: <54B8A6CA.8010806@netbsd.org> Le 12/01/15 23:09, Dan McDonald a ?crit : > No need to panic. I'm asking because there is an issue in Illumos-gate and it's also in the source, but not yet on the bloody repo. > > I'm trying to balance risk. > > Dan > Has this issue been addressed? Its particular status is obscure at best. Do #5514 and/or #5542 have anything to do with it? From richard at netbsd.org Fri Jan 16 05:51:06 2015 From: richard at netbsd.org (Richard PALO) Date: Fri, 16 Jan 2015 06:51:06 +0100 Subject: [OmniOS-discuss] Any bloody users out there? In-Reply-To: <243EB2A5-1F9E-48D0-B4EF-0856D2DD2AD4@omniti.com> References: <1C663720-54A5-413D-B966-B1C67B624543@omniti.com> <20150113073724.0aa7a168@emeritus> <243EB2A5-1F9E-48D0-B4EF-0856D2DD2AD4@omniti.com> Message-ID: <54B8A6CA.8010806@netbsd.org> Le 12/01/15 23:09, Dan McDonald a ?crit : > No need to panic. I'm asking because there is an issue in Illumos-gate and it's also in the source, but not yet on the bloody repo. > > I'm trying to balance risk. > > Dan > Has this issue been addressed? Its particular status is obscure at best. Do #5514 and/or #5542 have anything to do with it? From danmcd at kebe.com Sun Jan 18 07:02:45 2015 From: danmcd at kebe.com (Dan McDonald) Date: Sun, 18 Jan 2015 02:02:45 -0500 Subject: [OmniOS-discuss] Caldav suggestions? Message-ID: I'm surveying Caldav servers. Most require PHP, which bothers me from a security POV. Am I being overly harsh on PHP? Are there ones that don't require PHP? And most of them seem to require a database, I'd prefer either MariaDB or PostgreSQL. Any clues are welcome. Thanks, Dan From gate03 at landcroft.co.uk Sun Jan 18 07:44:58 2015 From: gate03 at landcroft.co.uk (Michael Mounteney) Date: Sun, 18 Jan 2015 17:44:58 +1000 Subject: [OmniOS-discuss] Caldav suggestions? In-Reply-To: References: Message-ID: <20150118174458.611d9b92@emeritus> On Sun, 18 Jan 2015 02:02:45 -0500 Dan McDonald wrote: > I'm surveying Caldav servers. Most require PHP, which bothers me > from a security POV. Am I being overly harsh on PHP? Are there ones > that don't require PHP? You cannot be too harsh on PHP. It is a dog's breakfast of a language and it would have been better if it had been drowned in a bucket at birth. That being said, you can install a service in a zone that doesn't have a world-facing interface, n'est-ce pas ? Michael. From lotheac at iki.fi Sun Jan 18 10:12:32 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Sun, 18 Jan 2015 12:12:32 +0200 Subject: [OmniOS-discuss] Caldav suggestions? In-Reply-To: References: Message-ID: <20150118101232.GB21898@gutsman.lotheac.fi> On Sun, Jan 18 2015 02:02:45 -0500, Dan McDonald wrote: > I'm surveying Caldav servers. Most require PHP, which bothers me from > a security POV. Am I being overly harsh on PHP? Are there ones that > don't require PHP? I'm running radicale at home, and have packaged it for OmniOS in the niksula.hut.fi repo. http://radicale.org/ -- Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet From mir at miras.org Sun Jan 18 13:31:33 2015 From: mir at miras.org (Michael Rasmussen) Date: Sun, 18 Jan 2015 14:31:33 +0100 Subject: [OmniOS-discuss] Caldav suggestions? In-Reply-To: References: Message-ID: <20150118143133.34d4d37c@sleipner.datanom.net> On Sun, 18 Jan 2015 02:02:45 -0500 Dan McDonald wrote: > I'm surveying Caldav servers. Most require PHP, which bothers me from a security POV. Am I being overly harsh on PHP? Are there ones that don't require PHP? > > And most of them seem to require a database, I'd prefer either MariaDB or PostgreSQL. > If it is going to scale you want to use a database for storage. Caldav calendar servers I no of: Davical (http://www.davical.org/) GPL: pro: - Standards compliance - Storage backend is PostgreSQL ( >= 8.3) - Widely used and large user community - LDAP integration for authentication - Very light on resources con: - Written in PHP but most of the functionality is implemented in PgPLsql Calendar and Contacts Server (http://calendarserver.org/) Apache license 2.0: pro: - Standards compliance - Storage backend can be configured to use LDAP - LDAP integration for authentication - Written in Python con: - Use Python Twister so a great number of Python libraries is required - Resource demanding Despite Davical is based on PHP I would recommend this due to simplicity and resource requirements. IMHO Davical is more fitted to the idea behind Omnios - small foot print and code base. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Polymer physicists are into chains. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From mir at miras.org Sun Jan 18 13:34:58 2015 From: mir at miras.org (Michael Rasmussen) Date: Sun, 18 Jan 2015 14:34:58 +0100 Subject: [OmniOS-discuss] Caldav suggestions? In-Reply-To: <20150118101232.GB21898@gutsman.lotheac.fi> References: <20150118101232.GB21898@gutsman.lotheac.fi> Message-ID: <20150118143458.1297b50b@sleipner.datanom.net> On Sun, 18 Jan 2015 12:12:32 +0200 Lauri Tirkkonen wrote: > > I'm running radicale at home, and have packaged it for OmniOS in the > niksula.hut.fi repo. http://radicale.org/ > I don't like the idea of a file based storage backend - it simply does not scale well. Radicale is also not standards compliant which is a big problem to me since it can be very problematic for client integration. The usage and community support seems also very small. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Jacquin's Postulate on Democratic Government: No man's life, liberty, or property are safe while the legislature is in session. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From mir at miras.org Sun Jan 18 13:40:47 2015 From: mir at miras.org (Michael Rasmussen) Date: Sun, 18 Jan 2015 14:40:47 +0100 Subject: [OmniOS-discuss] Caldav suggestions? In-Reply-To: <20150118143133.34d4d37c@sleipner.datanom.net> References: <20150118143133.34d4d37c@sleipner.datanom.net> Message-ID: <20150118144047.50441c5f@sleipner.datanom.net> On Sun, 18 Jan 2015 14:31:33 +0100 Michael Rasmussen wrote: > > Despite Davical is based on PHP I would recommend this due to > simplicity and resource requirements. IMHO Davical is more fitted to > the idea behind Omnios - small foot print and code base. > PS. I forgot to mention that I am slightly biased since I have been involved in the project from the beginning with Andrew (the former maintainer). Davical is now community maintained. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: FORTUNE PROVIDES QUESTIONS FOR THE GREAT ANSWERS: #31 A: Chicken Teriyaki. Q: What is the name of the world's oldest kamikaze pilot? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From lists at marzocchi.net Sun Jan 18 13:53:19 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Sun, 18 Jan 2015 14:53:19 +0100 Subject: [OmniOS-discuss] Caldav suggestions? In-Reply-To: <20150118143133.34d4d37c@sleipner.datanom.net> References: <20150118143133.34d4d37c@sleipner.datanom.net> Message-ID: >> I'm surveying Caldav servers. Most require PHP, which bothers me from a security POV. Am I being overly harsh on PHP? Are there ones that don't require PHP? >> >> And most of them seem to require a database, I'd prefer either MariaDB or PostgreSQL. >> > If it is going to scale you want to use a database for storage. > > Caldav calendar servers I no of: > > Davical (http://www.davical.org/) GPL: > pro: > - Standards compliance > - Storage backend is PostgreSQL ( >= 8.3) > - Widely used and large user community > - LDAP integration for authentication > - Very light on resources > con: > - Written in PHP but most of the functionality is implemented in PgPLsql A guide is found here too, it should apply to OmniOS relatively well: http://www.jasspa.com/oi/oi_setup.pdf > Calendar and Contacts Server (http://calendarserver.org/) Apache > license 2.0: > pro: > - Standards compliance > - Storage backend can be configured to use LDAP > - LDAP integration for authentication > - Written in Python > con: > - Use Python Twister so a great number of Python libraries is required > - Resource demanding I was trying to compile this one after checking https://en.wikipedia.org/wiki/Comparison_of_CalDAV_and_CardDAV_implementations but I wasn?t able to get it working, then I had to take a break and I haven?t restarted yet. If you try, let me know and I give you the info I got until now. Olaf From danmcd at omniti.com Sun Jan 18 16:04:23 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sun, 18 Jan 2015 11:04:23 -0500 Subject: [OmniOS-discuss] Caldav suggestions? In-Reply-To: References: <20150118143133.34d4d37c@sleipner.datanom.net> Message-ID: Thanks everyone for the collected advice so far. Some things I forgot to mention: - it only needs to scale to my family (<10) - it will be public facing on https, likely in my existing webserver zone. - most or all of my clients will be MacOS or iOS. Dan Sent from my iPhone (typos, autocorrect, and all) From mir at miras.org Sun Jan 18 16:40:50 2015 From: mir at miras.org (Michael Rasmussen) Date: Sun, 18 Jan 2015 17:40:50 +0100 Subject: [OmniOS-discuss] Caldav suggestions? In-Reply-To: References: <20150118143133.34d4d37c@sleipner.datanom.net> Message-ID: <20150118174050.0e087084@sleipner.datanom.net> On Sun, 18 Jan 2015 11:04:23 -0500 Dan McDonald wrote: > Thanks everyone for the collected advice so far. Some things I forgot to mention: > > - it only needs to scale to my family (<10) > > - it will be public facing on https, likely in my existing webserver zone. > > - most or all of my clients will be MacOS or iOS. > I see. I thought it was supposed to be added to Omnios default;-) In that case there is no problem what so ever with file based. I have just tested latest release of radicale and I must admit it has come a long way since my last visit (0.7). For your requirements, with the exception of database, I see no problems with using radicale. Latest Omnios stable: Python 2.6.8 (unknown, Sep 13 2014, 04:19:45) fits the bill: "It is known to work on Python 2.6, ..." BTW their new experimental interface to databases using sqlalchemy does support PostgreSQL and MySQL (therefore also MariaDB). I plan doing experimentation involving Radicale-0.10 (daemon mode), OpenLDAP (authentication), and PostgreSQL for storage backend. If someone is interesting I can provide a howto to the list? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: What's so funny? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From mir at miras.org Sun Jan 18 17:00:25 2015 From: mir at miras.org (Michael Rasmussen) Date: Sun, 18 Jan 2015 18:00:25 +0100 Subject: [OmniOS-discuss] openldap-server install problems Message-ID: <20150118180025.0f2bdf36@sleipner.datanom.net> Hi list, Trying to install openldap-server but seems packaging is out of sync? sudo pkg install openldap-server Creating Plan - pkg install: No solution was found to satisfy constraints maintained incorporations: pkg://omnios/entire at 11,5.11-0.151012:20141027T191658Z pkg://omnios/incorporation/jeos/illumos-gate at 11,5.11-0.151012:20140913T032317Z pkg://omnios/consolidation/osnet/osnet-incorporation at 0.5.11,5.11-0.151012:20140913T033401Z pkg://omnios/incorporation/jeos/omnios-userland at 11,5.11-0.151012:20141219T200422Z Plan Creation: dependency error(s) in proposed packages: No suitable version of required package pkg://ms.omniti.com/omniti/network/openldap-server at 2.4.40,5.11-0.151006:20140930T201405Z found: Reject: pkg://ms.omniti.com/omniti/network/openldap-server at 2.4.40,5.11-0.151006:20140930T201405Z Reason: A version for 'incorporate' dependency on pkg:/entire at 11,5.11-0.151006 cannot be found Plan Creation: Errors in installed packages due to proposed changes: No suitable version of installed package pkg://omnios/entire at 11,5.11-0.151012:20141027T191658Z found Reject: pkg://omnios/entire at 11,5.11-0.151012:20141027T191658Z pkg://omnios/entire at 11,5.11-0.151012:20141219T200421Z Reason: Excluded by proposed incorporation 'omniti/network/openldap-server' This version is excluded by installed incorporation pkg://omnios/entire at 11,5.11-0.151012:20141027T191658Z Any advice? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: "His great aim was to escape from civilization, and, as soon as he had money, he went to Southern California." -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From danmcd at omniti.com Sun Jan 18 18:51:53 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sun, 18 Jan 2015 13:51:53 -0500 Subject: [OmniOS-discuss] openldap-server install problems In-Reply-To: <20150118180025.0f2bdf36@sleipner.datanom.net> References: <20150118180025.0f2bdf36@sleipner.datanom.net> Message-ID: <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> The ms.omniti.com repo isn't officially supported; it's for OmniTI Internal use, but we keep it open as a nicety. I'd recommend checking out the omniti-ms branch of OmniOS-build and see if there's something there that can be fixed. Either that, or if someone else here can build/publish 012 versions of the appropriate packages, that'd fix things as well. Sorry, Dan Ps. When 014 ships, it's the next LTS, so many of the ms.omniti.com packages will be updated for 014. Sent from my iPhone (typos, autocorrect, and all) > On Jan 18, 2015, at 12:00 PM, Michael Rasmussen wrote: > > Hi list, > > Trying to install openldap-server but seems packaging is out of sync? > > sudo pkg install openldap-server > Creating Plan - > pkg install: No solution was found to satisfy constraints > > maintained incorporations: > > pkg://omnios/entire at 11,5.11-0.151012:20141027T191658Z > pkg://omnios/incorporation/jeos/illumos-gate at 11,5.11-0.151012:20140913T032317Z > pkg://omnios/consolidation/osnet/osnet-incorporation at 0.5.11,5.11-0.151012:20140913T033401Z > pkg://omnios/incorporation/jeos/omnios-userland at 11,5.11-0.151012:20141219T200422Z > > Plan Creation: dependency error(s) in proposed packages: > > No suitable version of required package > pkg://ms.omniti.com/omniti/network/openldap-server at 2.4.40,5.11-0.151006:20140930T201405Z > found: Reject: > pkg://ms.omniti.com/omniti/network/openldap-server at 2.4.40,5.11-0.151006:20140930T201405Z > Reason: A version for 'incorporate' dependency on > pkg:/entire at 11,5.11-0.151006 cannot be found > > Plan Creation: Errors in installed packages due to proposed changes: > > No suitable version of installed package > pkg://omnios/entire at 11,5.11-0.151012:20141027T191658Z found Reject: > pkg://omnios/entire at 11,5.11-0.151012:20141027T191658Z > pkg://omnios/entire at 11,5.11-0.151012:20141219T200421Z Reason: > Excluded by proposed incorporation 'omniti/network/openldap-server' > This version is excluded by installed incorporation > pkg://omnios/entire at 11,5.11-0.151012:20141027T191658Z > > Any advice? > > -- > Hilsen/Regards > Michael Rasmussen > > Get my public GnuPG keys: > michael rasmussen cc > http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E > mir datanom net > http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C > mir miras org > http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 > -------------------------------------------------------------- > /usr/games/fortune -es says: > "His great aim was to escape from civilization, and, as soon as he had > money, he went to Southern California." > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From danmcd at omniti.com Sun Jan 18 18:57:22 2015 From: danmcd at omniti.com (Dan McDonald) Date: Sun, 18 Jan 2015 13:57:22 -0500 Subject: [OmniOS-discuss] Caldav suggestions? In-Reply-To: <20150118101232.GB21898@gutsman.lotheac.fi> References: <20150118101232.GB21898@gutsman.lotheac.fi> Message-ID: <448C0805-0079-41EF-B297-A4B86F86440F@omniti.com> > On Jan 18, 2015, at 5:12 AM, Lauri Tirkkonen wrote: > > On Sun, Jan 18 2015 02:02:45 -0500, Dan McDonald wrote: >> I'm surveying Caldav servers. Most require PHP, which bothers me from >> a security POV. Am I being overly harsh on PHP? Are there ones that >> don't require PHP? > > I'm running radicale at home, and have packaged it for OmniOS in the > niksula.hut.fi repo. http://radicale.org/ Wow! This may be a winner. How nicely does it play with iOS and MacOS? Thanks, Dan From lotheac at iki.fi Sun Jan 18 19:02:19 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Sun, 18 Jan 2015 21:02:19 +0200 Subject: [OmniOS-discuss] Caldav suggestions? In-Reply-To: <448C0805-0079-41EF-B297-A4B86F86440F@omniti.com> References: <20150118101232.GB21898@gutsman.lotheac.fi> <448C0805-0079-41EF-B297-A4B86F86440F@omniti.com> Message-ID: <20150118190219.GC21898@gutsman.lotheac.fi> On Sun, Jan 18 2015 13:57:22 -0500, Dan McDonald wrote: > Wow! This may be a winner. How nicely does it play with iOS and > MacOS? I don't use either of those so I can't offer any anecdotes on that, sorry. -- Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet From mir at miras.org Sun Jan 18 19:43:03 2015 From: mir at miras.org (Michael Rasmussen) Date: Sun, 18 Jan 2015 20:43:03 +0100 Subject: [OmniOS-discuss] openldap-server install problems In-Reply-To: <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> References: <20150118180025.0f2bdf36@sleipner.datanom.net> <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> Message-ID: <20150118204303.04fcc71d@sleipner.datanom.net> On Sun, 18 Jan 2015 13:51:53 -0500 Dan McDonald wrote: > I'd recommend checking out the omniti-ms branch of OmniOS-build and see if there's something there that can be fixed. Either that, or if someone else here can build/publish 012 versions of the appropriate packages, that'd fix things as well. > Unfortunately my experience with building for pkg is zero so I will not be able to provide a package so using configure, make, make install will be my solution. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Some people need a good imaginary cure for their painful imaginary ailment. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From mir at miras.org Sun Jan 18 19:49:17 2015 From: mir at miras.org (Michael Rasmussen) Date: Sun, 18 Jan 2015 20:49:17 +0100 Subject: [OmniOS-discuss] openldap-server install problems In-Reply-To: <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> References: <20150118180025.0f2bdf36@sleipner.datanom.net> <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> Message-ID: <20150118204917.48733b07@sleipner.datanom.net> On Sun, 18 Jan 2015 13:51:53 -0500 Dan McDonald wrote: > > I'd recommend checking out the omniti-ms branch of OmniOS-build and see if there's something there that can be fixed. Either that, or if someone else here can build/publish 012 versions of the appropriate packages, that'd fix things as well. > I just fund this on github: https://github.com/niksula/omnios-build-scripts/tree/master/openldap Can this be used? And how to use? @Lauri Is this your build tree? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Use variable names that mean something. - The Elements of Programming Style (Kernighan & Plaugher) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From lotheac at iki.fi Sun Jan 18 20:00:27 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Sun, 18 Jan 2015 22:00:27 +0200 Subject: [OmniOS-discuss] openldap-server install problems In-Reply-To: <20150118204917.48733b07@sleipner.datanom.net> References: <20150118180025.0f2bdf36@sleipner.datanom.net> <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> <20150118204917.48733b07@sleipner.datanom.net> Message-ID: <20150118200027.GD21898@gutsman.lotheac.fi> On Sun, Jan 18 2015 20:49:17 +0100, Michael Rasmussen wrote: > I just fund this on github: > https://github.com/niksula/omnios-build-scripts/tree/master/openldap > Can this be used? > And how to use? > > @Lauri > Is this your build tree? Yes, I maintain those scripts as well as the binary repository at http://pkg.niksula.hut.fi/ - this is also mentioned at http://omnios.omniti.com/wiki.php/Packaging -- Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet From mir at miras.org Sun Jan 18 22:44:52 2015 From: mir at miras.org (Michael Rasmussen) Date: Sun, 18 Jan 2015 23:44:52 +0100 Subject: [OmniOS-discuss] openldap-server install problems In-Reply-To: <20150118200027.GD21898@gutsman.lotheac.fi> References: <20150118180025.0f2bdf36@sleipner.datanom.net> <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> <20150118204917.48733b07@sleipner.datanom.net> <20150118200027.GD21898@gutsman.lotheac.fi> Message-ID: <20150118234452.51cef490@sleipner.datanom.net> On Sun, 18 Jan 2015 22:00:27 +0200 Lauri Tirkkonen wrote: > > Yes, I maintain those scripts as well as the binary repository at > http://pkg.niksula.hut.fi/ - this is also mentioned at > http://omnios.omniti.com/wiki.php/Packaging > But the are for 151006 only? depend fmri=pkg:/system/library at 0.5.11-0.151006 type=require -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: One father is more than a hundred schoolmasters. -- George Herbert -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From lotheac at iki.fi Mon Jan 19 06:04:37 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Mon, 19 Jan 2015 08:04:37 +0200 Subject: [OmniOS-discuss] openldap-server install problems In-Reply-To: <20150118234452.51cef490@sleipner.datanom.net> References: <20150118180025.0f2bdf36@sleipner.datanom.net> <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> <20150118204917.48733b07@sleipner.datanom.net> <20150118200027.GD21898@gutsman.lotheac.fi> <20150118234452.51cef490@sleipner.datanom.net> Message-ID: <20150119060436.GE21898@gutsman.lotheac.fi> On Sun, Jan 18 2015 23:44:52 +0100, Michael Rasmussen wrote: > But the are for 151006 only? > depend fmri=pkg:/system/library at 0.5.11-0.151006 type=require They are built on 151006, but they can be installed on later releases because they don't have incorporate dependencies, and can even work thanks to binary compatibility. At work we still run LTS, but I am running quite a few packages at home on 151012 too. Of course, no guarantees :) -- Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet From rt at steait.net Mon Jan 19 11:55:09 2015 From: rt at steait.net (Rune Tipsmark) Date: Mon, 19 Jan 2015 11:55:09 +0000 Subject: [OmniOS-discuss] ZFS Volumes and vSphere Disks - Storage vMotion Speed Message-ID: <1421668509186.85242@steait.net> hi all, just in case there are other people out there using their ZFS box against vSphere 5.1 or later... I found my storage vmotion were slow... really slow... not much info available and so after a while of trial and error I found a nice combo that works very well in terms of performance, latency as well as throughput and storage vMotion. - Use ZFS volumes instead of thin provisioned LU's - Volumes support two of the VAAI features - Use thick provisioning disks, lazy zeroed disks in my case reduced storage vMotion by 90% or so - machine 1 dropped from 8? minutes to 23 seconds and machine 2 dropped from ~7 minutes to 54 seconds... a rather nice improvement simply by changing from thin to thick provisioning. - I dropped my Qlogic HBA max queue depth from default 64 to 16 on all ESXi hosts and now I see an average latency of less than 1ms per data store (on 8G fibre channel). Of course there are spikes when doing storage vMotion at these speeds but its well worth it. I am getting to the point where I am almost happy with my ZFS backend for vSphere. br, Rune -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.elling at richardelling.com Mon Jan 19 12:57:28 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Mon, 19 Jan 2015 04:57:28 -0800 Subject: [OmniOS-discuss] ZFS Volumes and vSphere Disks - Storage vMotion Speed In-Reply-To: <1421668509186.85242@steait.net> References: <1421668509186.85242@steait.net> Message-ID: <9DEFD87E-4EBC-4B30-826F-BE1C99F4ADA5@richardelling.com> > On Jan 19, 2015, at 3:55 AM, Rune Tipsmark wrote: > > hi all, > > just in case there are other people out there using their ZFS box against vSphere 5.1 or later... I found my storage vmotion were slow... really slow... not much info available and so after a while of trial and error I found a nice combo that works very well in terms of performance, latency as well as throughput and storage vMotion. > > - Use ZFS volumes instead of thin provisioned LU's - Volumes support two of the VAAI features > AFAIK, ZFS is not available in VMware. Do you mean run iSCSI to connect the ESX box to the server running ZFS? If so... > - Use thick provisioning disks, lazy zeroed disks in my case reduced storage vMotion by 90% or so - machine 1 dropped from 8? minutes to 23 seconds and machine 2 dropped from ~7 minutes to 54 seconds... a rather nice improvement simply by changing from thin to thick provisioning. > This makes no difference in ZFS. The "thick provisioned" volume is simply a volume with a reservation. All allocations are copy-on-write. So the only difference between a "thick" and "thin" volume occurs when you run out of space in the pool. > - I dropped my Qlogic HBA max queue depth from default 64 to 16 on all ESXi hosts and now I see an average latency of less than 1ms per data store (on 8G fibre channel). Of course there are spikes when doing storage vMotion at these speeds but its well worth it. > I usually see storage vmotion running at wire speed for well configured systems. When you get into the 2GByte/sec range this can get tricky, because maintaining that flow through the RAM and disks requires nontrivial amounts of hardware. More likely, you're seeing the effects of caching, which is very useful for storage vmotion and allows you to hit line rate. > > I am getting to the point where I am almost happy with my ZFS backend for vSphere. > excellent! -- richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From rt at steait.net Mon Jan 19 13:23:04 2015 From: rt at steait.net (Rune Tipsmark) Date: Mon, 19 Jan 2015 13:23:04 +0000 Subject: [OmniOS-discuss] ZFS Volumes and vSphere Disks - Storage vMotion Speed In-Reply-To: <9DEFD87E-4EBC-4B30-826F-BE1C99F4ADA5@richardelling.com> References: <1421668509186.85242@steait.net>, <9DEFD87E-4EBC-4B30-826F-BE1C99F4ADA5@richardelling.com> Message-ID: <1421673783752.75353@steait.net> ________________________________ From: Richard Elling Sent: Monday, January 19, 2015 1:57 PM To: Rune Tipsmark Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] ZFS Volumes and vSphere Disks - Storage vMotion Speed On Jan 19, 2015, at 3:55 AM, Rune Tipsmark > wrote: hi all, just in case there are other people out there using their ZFS box against vSphere 5.1 or later... I found my storage vmotion were slow... really slow... not much info available and so after a while of trial and error I found a nice combo that works very well in terms of performance, latency as well as throughput and storage vMotion. - Use ZFS volumes instead of thin provisioned LU's - Volumes support two of the VAAI features AFAIK, ZFS is not available in VMware. Do you mean run iSCSI to connect the ESX box to the server running ZFS? If so... >> I run 8G Fibre Channel - Use thick provisioning disks, lazy zeroed disks in my case reduced storage vMotion by 90% or so - machine 1 dropped from 8? minutes to 23 seconds and machine 2 dropped from ~7 minutes to 54 seconds... a rather nice improvement simply by changing from thin to thick provisioning. This makes no difference in ZFS. The "thick provisioned" volume is simply a volume with a reservation. All allocations are copy-on-write. So the only difference between a "thick" and "thin" volume occurs when you run out of space in the pool. >> I am talking thick provisioning in VMware, that's where it makes a huge difference - I dropped my Qlogic HBA max queue depth from default 64 to 16 on all ESXi hosts and now I see an average latency of less than 1ms per data store (on 8G fibre channel). Of course there are spikes when doing storage vMotion at these speeds but its well worth it. I usually see storage vmotion running at wire speed for well configured systems. When you get into the 2GByte/sec range this can get tricky, because maintaining that flow through the RAM and disks requires nontrivial amounts of hardware. >> I don't even get close to wire speed unfortunately my SLOGs can only do around 5-600 MBbyte/sec with sync=always. More likely, you're seeing the effects of caching, which is very useful for storage vmotion and allows you to hit line rate. >> Not sure this is the case with using sync=always? I am getting to the point where I am almost happy with my ZFS backend for vSphere. excellent! -- richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From mir at miras.org Mon Jan 19 16:41:06 2015 From: mir at miras.org (Michael Rasmussen) Date: Mon, 19 Jan 2015 17:41:06 +0100 Subject: [OmniOS-discuss] openldap-server install problems In-Reply-To: <20150119060436.GE21898@gutsman.lotheac.fi> References: <20150118180025.0f2bdf36@sleipner.datanom.net> <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> <20150118204917.48733b07@sleipner.datanom.net> <20150118200027.GD21898@gutsman.lotheac.fi> <20150118234452.51cef490@sleipner.datanom.net> <20150119060436.GE21898@gutsman.lotheac.fi> Message-ID: <20150119174106.79bdedc8@sleipner.datanom.net> On Mon, 19 Jan 2015 08:04:37 +0200 Lauri Tirkkonen wrote: > > They are built on 151006, but they can be installed on later releases > because they don't have incorporate dependencies, and can even work > thanks to binary compatibility. At work we still run LTS, but I am > running quite a few packages at home on 151012 too. Of course, no > guarantees :) > openldap-server installed without a hitch - haven't tested yet;-) A humble request: Could you be persuaded to update the Radicale package to 0.10 and at the same time repackage to use the supplied Python version in Omnios which is 2.6? The reason for Radicale 0.10: 1) "This version should bring some interesting discovery and auto-configuration features, mostly with Apple clients." 2) Database integration via sqlalchemy The reason is that using Python 3.3 means no - IMAP Auth - LDAP Auth - PAM Auth I can live without IMAP and PAM Auth but for enterprise usage no LDAP Auth is a show stopper. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Fay: The British police force used to be run by men of integrity. Truscott: That is a mistake which has been rectified. -- Joe Orton, "Loot" -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From lotheac at iki.fi Mon Jan 19 16:56:19 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Mon, 19 Jan 2015 18:56:19 +0200 Subject: [OmniOS-discuss] openldap-server install problems In-Reply-To: <20150119174106.79bdedc8@sleipner.datanom.net> References: <20150118180025.0f2bdf36@sleipner.datanom.net> <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> <20150118204917.48733b07@sleipner.datanom.net> <20150118200027.GD21898@gutsman.lotheac.fi> <20150118234452.51cef490@sleipner.datanom.net> <20150119060436.GE21898@gutsman.lotheac.fi> <20150119174106.79bdedc8@sleipner.datanom.net> Message-ID: <20150119165619.GC24621@gutsman.lotheac.fi> On Mon, Jan 19 2015 17:41:06 +0100, Michael Rasmussen wrote: > A humble request: > Could you be persuaded to update the Radicale package to 0.10 and at > the same time repackage to use the supplied Python version in Omnios > which is 2.6? I could probably update to 0.10 later this week but I'd rather keep using my own python as per KYSTY. You're of course free to fork ;) > The reason is that using Python 3.3 means no > - IMAP Auth > - LDAP Auth > - PAM Auth I don't understand why python3.3 implies no support for those things (although I admit I'm not using authentication in radicale, I let my web server do that). -- Lauri Tirkkonen | lotheac @ IRCnet From richard.elling at richardelling.com Mon Jan 19 17:25:03 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Mon, 19 Jan 2015 09:25:03 -0800 Subject: [OmniOS-discuss] ZFS Volumes and vSphere Disks - Storage vMotion Speed In-Reply-To: <1421673783752.75353@steait.net> References: <1421668509186.85242@steait.net> <, <9DEFD87E-4EBC-4B30-826F-BE1C99F4ADA5@richardelling.com> <>> <1421673783752.75353@steait.net> Message-ID: Thanks Rune, more below... > On Jan 19, 2015, at 5:23 AM, Rune Tipsmark wrote: > > From: Richard Elling > Sent: Monday, January 19, 2015 1:57 PM > To: Rune Tipsmark > Cc: omnios-discuss at lists.omniti.com > Subject: Re: [OmniOS-discuss] ZFS Volumes and vSphere Disks - Storage vMotion Speed > > >> On Jan 19, 2015, at 3:55 AM, Rune Tipsmark > wrote: >> >> hi all, >> >> just in case there are other people out there using their ZFS box against vSphere 5.1 or later... I found my storage vmotion were slow... really slow... not much info available and so after a while of trial and error I found a nice combo that works very well in terms of performance, latency as well as throughput and storage vMotion. >> >> - Use ZFS volumes instead of thin provisioned LU's - Volumes support two of the VAAI features >> > > AFAIK, ZFS is not available in VMware. Do you mean run iSCSI to connect the ESX box to > the server running ZFS? If so... > >> I run 8G Fibre Channel ok, still it is COMSTAR, so the backend is the same >> - Use thick provisioning disks, lazy zeroed disks in my case reduced storage vMotion by 90% or so - machine 1 dropped from 8? minutes to 23 seconds and machine 2 dropped from ~7 minutes to 54 seconds... a rather nice improvement simply by changing from thin to thick provisioning. >> > > This makes no difference in ZFS. The "thick provisioned" volume is simply a volume with a reservation. > All allocations are copy-on-write. So the only difference between a "thick" and "thin" volume occurs when > you run out of space in the pool. > >> I am talking thick provisioning in VMware, that's where it makes a huge difference yes, you should always let VMware think it is thick provisioned, even if it isn't. VMware is too ignorant of copy-on-write file systems to be able to make good decisions. >> - I dropped my Qlogic HBA max queue depth from default 64 to 16 on all ESXi hosts and now I see an average latency of less than 1ms per data store (on 8G fibre channel). Of course there are spikes when doing storage vMotion at these speeds but its well worth it. >> > > I usually see storage vmotion running at wire speed for well configured systems. When you get > into the 2GByte/sec range this can get tricky, because maintaining that flow through the RAM > and disks requires nontrivial amounts of hardware. > >> I don't even get close to wire speed unfortunately my SLOGs can only do around 5-600 MBbyte/sec with sync=always. Indeed, the systems we make fast have enough hardware to be fast. > More likely, you're seeing the effects of caching, which is very useful for storage vmotion and > allows you to hit line rate. > > >> Not sure this is the case with using sync=always? Caching will make a big difference. You should also see effective use of the ZFS prefetcher. Thanks for sharing your experience. -- richard >> >> I am getting to the point where I am almost happy with my ZFS backend for vSphere. >> > > excellent! > -- richard > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mir at miras.org Mon Jan 19 17:29:28 2015 From: mir at miras.org (Michael Rasmussen) Date: Mon, 19 Jan 2015 18:29:28 +0100 Subject: [OmniOS-discuss] openldap-server install problems In-Reply-To: <20150119165619.GC24621@gutsman.lotheac.fi> References: <20150118180025.0f2bdf36@sleipner.datanom.net> <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> <20150118204917.48733b07@sleipner.datanom.net> <20150118200027.GD21898@gutsman.lotheac.fi> <20150118234452.51cef490@sleipner.datanom.net> <20150119060436.GE21898@gutsman.lotheac.fi> <20150119174106.79bdedc8@sleipner.datanom.net> <20150119165619.GC24621@gutsman.lotheac.fi> Message-ID: <20150119182928.449a9aa7@sleipner.datanom.net> On Mon, 19 Jan 2015 18:56:19 +0200 Lauri Tirkkonen wrote: > > I could probably update to 0.10 later this week but I'd rather keep > using my own python as per KYSTY. You're of course free to fork ;) > Would it be difficult to make a 2.6 version? PS. I have no knowledge of pkg package format. > > I don't understand why python3.3 implies no support for those things > (although I admit I'm not using authentication in radicale, I let my web > server do that). > See: http://radicale.org/user_documentation/#idpython-versions-and-os-support -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Star Trek Lives! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From lotheac at iki.fi Mon Jan 19 17:43:49 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Mon, 19 Jan 2015 19:43:49 +0200 Subject: [OmniOS-discuss] openldap-server install problems In-Reply-To: <20150119182928.449a9aa7@sleipner.datanom.net> References: <20150118180025.0f2bdf36@sleipner.datanom.net> <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> <20150118204917.48733b07@sleipner.datanom.net> <20150118200027.GD21898@gutsman.lotheac.fi> <20150118234452.51cef490@sleipner.datanom.net> <20150119060436.GE21898@gutsman.lotheac.fi> <20150119174106.79bdedc8@sleipner.datanom.net> <20150119165619.GC24621@gutsman.lotheac.fi> <20150119182928.449a9aa7@sleipner.datanom.net> Message-ID: <20150119174349.GD24621@gutsman.lotheac.fi> On Mon, Jan 19 2015 18:29:28 +0100, Michael Rasmussen wrote: > Would it be difficult to make a 2.6 version? > PS. I have no knowledge of pkg package format. Not very - likely it'd be sufficient to modify or remove this line from the build script: https://github.com/niksula/omnios-build-scripts/blob/master/radicale/build.sh#L40 Although I'll leave setting up a build environment as an exercise to the reader. http://omnios.omniti.com/wiki.php/Packaging#How-tos > See: > http://radicale.org/user_documentation/#idpython-versions-and-os-support Ah, ok. You'd also need to install those extra libraries, just building against python2.6 isn't sufficient. -- Lauri Tirkkonen | lotheac @ IRCnet From wverb73 at gmail.com Tue Jan 20 02:59:16 2015 From: wverb73 at gmail.com (W Verb) Date: Mon, 19 Jan 2015 18:59:16 -0800 Subject: [OmniOS-discuss] VAAI Testing Message-ID: Hi All, After seeing the recent message regarding ZFS, iSCSI, zvols and ESXi, I decided to follow up on where full VAAI support is. I found Dan?s message from August: http://lists.omniti.com/pipermail/omnios-discuss/2014-August/002957.html Is anyone working on his points 1 and 2? Is anyone keeping track of the testing offers for #3? I do a fair amount of SQA, and am willing to organize and write tests if needed. I also have a reasonable lab environment with which to test the code. -Warren V -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexmcwhirter at vantagetitle.com Tue Jan 20 05:25:43 2015 From: alexmcwhirter at vantagetitle.com (Alex McWhirter) Date: Tue, 20 Jan 2015 00:25:43 -0500 Subject: [OmniOS-discuss] PERL modules as packages conflict Message-ID: I have perl and some modules all setup as packages, but theres a conflict. Each module wants to make changes to perllocal.pod and pkg detects this as a conflict. The following packages all deliver file actions to opt/triadic/lib/i386/perl5/5.20.1/i386/perllocal.pod: pkg://stable.pkg.triadic.us/triadic/perl-net-dns at 0.81,5.11-0.151006:20150108T224214Z pkg://stable.pkg.triadic.us/triadic/perl-html-parser at 3.71,5.11-0.151006:20150108T221603Z These packages may not be installed together. Any non-conflicting set may be, or the packages must be corrected before they can be installed. i would imagine this problem has been run into before with the OMNIperl package set, i would like to know how it was resolved? I couldn?t find build scripts for OMNIperl anywhere. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lotheac at iki.fi Tue Jan 20 07:16:49 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Tue, 20 Jan 2015 09:16:49 +0200 Subject: [OmniOS-discuss] PERL modules as packages conflict In-Reply-To: References: Message-ID: <20150120071649.GG24621@gutsman.lotheac.fi> On Tue, Jan 20 2015 00:25:43 -0500, Alex McWhirter wrote: > I have perl and some modules all setup as packages, but theres a > conflict. Each module wants to make changes to perllocal.pod and pkg > detects this as a conflict. > > i would imagine this problem has been run into before with the > OMNIperl package set, i would like to know how it was resolved? I > couldn?t find build scripts for OMNIperl anywhere. I can't speak for omniti-perl, but the perl template in omnios-build uses make_pure_install by default: https://github.com/omniti-labs/omnios-build/blob/omniti-ms/lib/functions.sh#L792 Failing that you could add a package transform to not ship perllocal.pod with your packages in either local.mog or lib/global-transforms.mog. Something like: drop> -- Lauri Tirkkonen | lotheac @ IRCnet From paul.jochum at alcatel-lucent.com Tue Jan 20 11:43:44 2015 From: paul.jochum at alcatel-lucent.com (Paul Jochum) Date: Tue, 20 Jan 2015 05:43:44 -0600 Subject: [OmniOS-discuss] Who here had lockd/nlockmgr problems? In-Reply-To: <9A9651B5-B71B-44D3-90C1-BCF96B4ECCE8@omniti.com> References: <968CF721-8839-49E2-8F04-9FD912E78E68@omniti.com> <9A9651B5-B71B-44D3-90C1-BCF96B4ECCE8@omniti.com> Message-ID: <54BE3F70.8080303@alcatel-lucent.com> Hi Dan: Resurrecting an older thread here. Do you know if a fix was submitted for this problem, and if submitted, if/when will it be picked up in OmniOS? We have had this problem when trying to upgrade multiple machines to R151012 in our environment, and decided to stay at r151010 hoping it will get fixed soon. thanks, Paul On 11/17/2014 09:12 AM, Dan McDonald wrote: > ISTR someone here had a problem where his/her nlockmgr SMF service wouldn't start. I've encountered this problem myself recently (on a OI VM, but it's running the same new open-source lockd that's in OmniOS with r151010 and later), and wanted to share a conversation from the illumos developer's list. > > FYI, > Dan > > >> Begin forwarded message: >> >> Subject: Re: [developer] lockd not starting? >> From: Dan McDonald >> Date: November 17, 2014 at 10:10:38 AM EST >> Cc: illumos Developer , Matt Amdur >> To: Sebastien Roy >> >> >>> On Nov 16, 2014, at 7:54 PM, Sebastien Roy wrote: >>> >>> Hey Dan, >>> >>> I think we're looking into a similar problem here at Delphix. We've noticed that the nfs/nlockmgr service goes into maintenance mode after a timeout of the SM_CRASH call to statd. >>> >>> Upon startup, the statd daemon is blocked notifying clients. These clients are cached in /var/statmon/sm.bak/... If any of these clients are unreachable, the SM_CRASH call issued by klm will timeout (this timeout is much shorter than that which statd uses to give up on client notifications), and the nfs/nlockmgr service will then end up in maintenance state. >>> >>> We've been discussing possible fixes to this problem. One easy fix that Matt Amdur (Cc'ed) proposed would be to shorten the timeout that statd uses to notify clients to reduce the chance that lockd's SM_CRASH call would timeout in the event of an unreachable client. We haven't implemented the fix yet. >>> >>> A workaround is to clear statd's cached clients. >> Okay! This makes some modicum of sense. Here's what I have now: >> >> # ls -lt /var/statmon/sm.bak/ >> total 2 >> lrwxrwxrwx 1 daemon daemon 10 Nov 12 19:55 ipv4.10.0.1.68 -> Everywhere >> lrwxrwxrwx 1 daemon daemon 28 Nov 11 12:44 ipv4.10.8.3.241 -> everywhere.office.omniti.com >> # nslookup everywhere >> Server: 10.8.3.1 >> Address: 10.8.3.1#53 >> >> Name: everywhere.office.omniti.com >> Address: 10.8.3.241 >> >> # >> >> While travelling last week, I had to park myself on a 10.0.1.0/24 network. My VMs are configured to be link-sharers (same-link peers), so I was using NFS over the same link to my VM. Once I renumbered back to the intended OmniTI networks, things started going screwy. >> >> I'm going to remove the 10.0.1.68 one and try again... BINGO! >> >> # svcs -xv nlockmgr >> svc:/network/nfs/nlockmgr:default (NFS lock manager) >> State: maintenance since November 16, 2014 04:00:05 PM EST >> Reason: Start method failed repeatedly, last exited with status 1. >> See: http://illumos.org/msg/SMF-8000-KS >> See: man -M /usr/share/man -s 1M lockd >> See: /var/svc/log/network-nfs-nlockmgr:default.log >> Impact: 1 dependent service is not running: >> svc:/network/nfs/server:default >> # rm /var/statmon/sm.bak/ipv4.10.0.1.68 >> # svcadm clear nlockmgr >> # svcs -xv nlockmgr >> svc:/network/nfs/nlockmgr:default (NFS lock manager) >> State: online since November 17, 2014 10:08:53 AM EST >> See: man -M /usr/share/man -s 1M lockd >> See: /var/svc/log/network-nfs-nlockmgr:default.log >> Impact: None. >> # >> >> This is actually good, in that I believe the circumstances for this bug are now reproducible. >> >> Thanks! >> Dan >> >> > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From zmalone at omniti.com Tue Jan 20 12:40:38 2015 From: zmalone at omniti.com (Zach Malone) Date: Tue, 20 Jan 2015 07:40:38 -0500 Subject: [OmniOS-discuss] PERL modules as packages conflict In-Reply-To: <20150120071649.GG24621@gutsman.lotheac.fi> References: <20150120071649.GG24621@gutsman.lotheac.fi> Message-ID: The OmniTI perl was built out a different repo, although the functions and build scripts were similar. I'll follow up here and see if we can publish it publicly, I think it was just overlooked. --Zach On Tue, Jan 20, 2015 at 2:16 AM, Lauri Tirkkonen wrote: > On Tue, Jan 20 2015 00:25:43 -0500, Alex McWhirter wrote: >> I have perl and some modules all setup as packages, but theres a >> conflict. Each module wants to make changes to perllocal.pod and pkg >> detects this as a conflict. >> >> i would imagine this problem has been run into before with the >> OMNIperl package set, i would like to know how it was resolved? I >> couldn?t find build scripts for OMNIperl anywhere. > > I can't speak for omniti-perl, but the perl template in omnios-build > uses make_pure_install by default: > > https://github.com/omniti-labs/omnios-build/blob/omniti-ms/lib/functions.sh#L792 > > Failing that you could add a package transform to not ship perllocal.pod > with your packages in either local.mog or lib/global-transforms.mog. > Something like: > > drop> > > -- > Lauri Tirkkonen | lotheac @ IRCnet > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From stephan.budach at JVM.DE Tue Jan 20 13:15:02 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Tue, 20 Jan 2015 14:15:02 +0100 Subject: [OmniOS-discuss] OmniOS r06 locked up due to smartctl running? Message-ID: <54BE54D6.509@jvm.de> Hi guys, we just experienced a lock-up on one of our OmniOS r006 boxes in a way that we had to reset it to get it working again. This box is running on a SuperMicro storage server and it had been checked using smartctl by our check_mk client each 10 mins. Looking through the logs, I found these messages being repeatedly written to them? Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000cca22bc46337 (sd12): Dec 20 03:18:17 nfsvmpool01 Error for Command: Error Level: Recovered Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: PK1361 Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.notice] Sense Key: Soft_Error Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.notice] ASC: 0x0 (), ASCQ: 0x1d, FRU: 0x0 Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000cca22bc4e51d (sd11): Dec 20 03:18:19 nfsvmpool01 Error for Command: Error Level: Recovered Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.notice] Requested Block: 0 Error Block: 0 Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: PK1361 Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.notice] Sense Key: Soft_Error Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.notice] ASC: 0x0 (), ASCQ: 0x1d, FRU: 0x0 Dec 20 03:18:21 nfsvmpool01 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk at g5000cca22bc512c5 (sd3): Dec 20 03:18:21 nfsvmpool01 Error for Command: Error Level: Recovered Could it be, that the use of smartctl somehow caused that lock-up? Thanks, budy -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephan.budach at JVM.DE Tue Jan 20 13:59:27 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Tue, 20 Jan 2015 14:59:27 +0100 Subject: [OmniOS-discuss] OmniOS r06 locked up due to smartctl running? In-Reply-To: <54BE54D6.509@jvm.de> References: <54BE54D6.509@jvm.de> Message-ID: <54BE5F3F.8010205@jvm.de> Am 20.01.15 um 14:15 schrieb Stephan Budach: > Hi guys, > > we just experienced a lock-up on one of our OmniOS r006 boxes in a way > that we had to reset it to get it working again. This box is running > on a SuperMicro storage server and it had been checked using smartctl > by our check_mk client each 10 mins. > > Looking through the logs, I found these messages being repeatedly > written to them? > > Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000cca22bc46337 (sd12): > Dec 20 03:18:17 nfsvmpool01 Error for Command: 0x85> Error Level: Recovered > Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.notice] Requested > Block: 0 Error Block: 0 > Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.notice] Vendor: > ATA Serial Number: PK1361 > Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.notice] Sense Key: > Soft_Error > Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.notice] ASC: 0x0 > (), ASCQ: 0x1d, FRU: 0x0 > Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000cca22bc4e51d (sd11): > Dec 20 03:18:19 nfsvmpool01 Error for Command: 0x85> Error Level: Recovered > Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.notice] Requested > Block: 0 Error Block: 0 > Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.notice] Vendor: > ATA Serial Number: PK1361 > Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.notice] Sense Key: > Soft_Error > Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.notice] ASC: 0x0 > (), ASCQ: 0x1d, FRU: 0x0 > Dec 20 03:18:21 nfsvmpool01 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000cca22bc512c5 (sd3): > Dec 20 03:18:21 nfsvmpool01 Error for Command: 0x85> Error Level: Recovered > > Could it be, that the use of smartctl somehow caused that lock-up? > > Thanks, > budy Seems that this was the real issue: => this was smartctl: Jan 20 13:14:04 nfsvmpool01 scsi: [ID 107833 kern.notice] ASC: 0x3a (medium not present - tray closed), ASCQ: 0x1, FRU: 0x0 Jan 20 13:18:58 nfsvmpool01 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3c08 at 3/pci1000,3020 at 0 (mpt_sas1): Jan 20 13:18:58 nfsvmpool01 MPT Firmware Fault, code: 2651 Jan 20 13:19:00 nfsvmpool01 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c08 at 3/pci1000,3020 at 0 (mpt_sas1): Jan 20 13:19:00 nfsvmpool01 mpt1 Firmware version v15.0.0.0 (?) Jan 20 13:19:00 nfsvmpool01 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,3c08 at 3/pci1000,3020 at 0 (mpt_sas1): Jan 20 13:19:00 nfsvmpool01 mpt1: IOC Operational. => System reset: Jan 20 13:30:45 nfsvmpool01 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 Version omnios-b281e50 64-bit Jan 20 13:30:45 nfsvmpool01 genunix: [ID 877030 kern.notice] Copyright (c) 1983, 2010, Oracle and/or its affiliates. All rights reserved. Tried a bit on googling about that fault and came up with this one from the LSI SCS Engineering Release Notice: (SCGCQ00257616 - Port of SCGCQ00237417) HEADLINE: Controller may fault on bad response with incomplete write data transfer DESC OF CHANGE: When completing a write IO with incomplete data transfer with bad status, clean the IO from the transmit hardware to prevent it from accessing an invalid memory address while attempting to service the already-completed IO. TO REPRODUCE: Run heavy write IO against a very large topology of SAS drives. Repeatedly cause multiple drives to send response frames containing sense data for outstanding IOs before the initiator has finished transferring the write data for the IOs ISSUE DESC: f a SAS drive sends a response frame with response or sense data for a write command before the transfer length specified in the last XferReady frame is satisfied, an 0xD04 or 0x2651 fault may occur. The question is, why did the box lock up? It seems that only one of the LSI HBAs was affected and my zpools are entirey spread across two HBAs, except the cache logs: root at nfsvmpool01:/var/adm# zpool status sasTank pool: sasTank state: ONLINE scan: scrub repaired 0 in 0h8m with 0 errors on Wed Dec 24 09:21:40 2014 config: NAME STATE READ WRITE CKSUM sasTank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t5000CCA04106EAA5d0 ONLINE 0 0 0 c5t5000CCA04106EE41d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c3t5000CCA02A9BE9E1d0 ONLINE 0 0 0 c6t5000CCA02ADEE805d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c4t5000CCA04106EF21d0 ONLINE 0 0 0 c7t5000CCA04106C1F5d0 ONLINE 0 0 0 logs c1t5001517803D653E2d0p1 ONLINE 0 0 0 c1t5001517803D83760d0p1 ONLINE 0 0 0 cache c1t50015179596C5A85d0 ONLINE 0 0 0 errors: No known data errors root at nfsvmpool01:/var/adm# zpool status sataTank pool: sataTank state: ONLINE scan: scrub repaired 0 in 10h39m with 0 errors on Wed Dec 24 20:22:27 2014 config: NAME STATE READ WRITE CKSUM sataTank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t5000CCA22BC4E51Dd0 ONLINE 0 0 0 c1t5000CCA22BC512C5d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c1t5000CCA22BC51BADd0 ONLINE 0 0 0 c1t5000CCA22BC46337d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c1t5000CCA22BC51BB9d0 ONLINE 0 0 0 c1t5000CCA23DED646Fd0 ONLINE 0 0 0 logs c1t5001517803D653E2d0p2 ONLINE 0 0 0 c1t5001517803D83760d0p2 ONLINE 0 0 0 cache c1t5001517803D00E64d0 ONLINE 0 0 0 errors: No known data errors Cheers, budy -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Tue Jan 20 15:40:24 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 20 Jan 2015 10:40:24 -0500 Subject: [OmniOS-discuss] Who here had lockd/nlockmgr problems? In-Reply-To: <54BE3F70.8080303@alcatel-lucent.com> References: <968CF721-8839-49E2-8F04-9FD912E78E68@omniti.com> <9A9651B5-B71B-44D3-90C1-BCF96B4ECCE8@omniti.com> <54BE3F70.8080303@alcatel-lucent.com> Message-ID: <305CC122-4584-4791-9D63-18C210FA3DC2@omniti.com> > On Jan 20, 2015, at 6:43 AM, Paul Jochum wrote: > > Hi Dan: > > Resurrecting an older thread here. > > Do you know if a fix was submitted for this problem, and if submitted, if/when will it be picked up in OmniOS? > > We have had this problem when trying to upgrade multiple machines to R151012 in our environment, and decided to stay at r151010 hoping it will get fixed soon. There have been some changes in nlockmgr, but not *specifically* for that problem. Given there's now a known workaround (deleting the files in statmon), I think the community hasn't been as gung-ho to fix it. Dan From danmcd at omniti.com Tue Jan 20 15:42:23 2015 From: danmcd at omniti.com (Dan McDonald) Date: Tue, 20 Jan 2015 10:42:23 -0500 Subject: [OmniOS-discuss] OmniOS r06 locked up due to smartctl running? In-Reply-To: <54BE5F3F.8010205@jvm.de> References: <54BE54D6.509@jvm.de> <54BE5F3F.8010205@jvm.de> Message-ID: Check the firmware revisions on both mpt_sas controllers. It's possible one need up-or-down grading. There are known good and known bad revisions of the mpt_sas firmware. Other on this list are more cognizant of what those revisions are. Dan From narayan.desai at gmail.com Tue Jan 20 16:08:29 2015 From: narayan.desai at gmail.com (Narayan Desai) Date: Tue, 20 Jan 2015 10:08:29 -0600 Subject: [OmniOS-discuss] OmniOS r06 locked up due to smartctl running? In-Reply-To: <54BE54D6.509@jvm.de> References: <54BE54D6.509@jvm.de> Message-ID: I'm pretty sure these are just the SAS controller containing about getting messages it couldn't decode. (in this case, encapsulated SMART command responses from the drives). AFAIK there is no way to suppress them. -nld On Tue, Jan 20, 2015 at 7:15 AM, Stephan Budach wrote: > Hi guys, > > we just experienced a lock-up on one of our OmniOS r006 boxes in a way > that we had to reset it to get it working again. This box is running on a > SuperMicro storage server and it had been checked using smartctl by our > check_mk client each 10 mins. > > Looking through the logs, I found these messages being repeatedly written > to them? > > Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000cca22bc46337 (sd12): > Dec 20 03:18:17 nfsvmpool01 Error for Command: > Error Level: Recovered > Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.notice] Requested > Block: 0 Error Block: 0 > Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.notice] Vendor: > ATA Serial Number: PK1361 > Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.notice] Sense Key: > Soft_Error > Dec 20 03:18:17 nfsvmpool01 scsi: [ID 107833 kern.notice] ASC: 0x0 > (), ASCQ: 0x1d, FRU: 0x0 > Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000cca22bc4e51d (sd11): > Dec 20 03:18:19 nfsvmpool01 Error for Command: > Error Level: Recovered > Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.notice] Requested > Block: 0 Error Block: 0 > Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.notice] Vendor: > ATA Serial Number: PK1361 > Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.notice] Sense Key: > Soft_Error > Dec 20 03:18:19 nfsvmpool01 scsi: [ID 107833 kern.notice] ASC: 0x0 > (), ASCQ: 0x1d, FRU: 0x0 > Dec 20 03:18:21 nfsvmpool01 scsi: [ID 107833 kern.warning] WARNING: > /scsi_vhci/disk at g5000cca22bc512c5 (sd3): > Dec 20 03:18:21 nfsvmpool01 Error for Command: > Error Level: Recovered > > Could it be, that the use of smartctl somehow caused that lock-up? > > Thanks, > budy > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zmalone at omniti.com Tue Jan 20 16:14:46 2015 From: zmalone at omniti.com (Zach Malone) Date: Tue, 20 Jan 2015 11:14:46 -0500 Subject: [OmniOS-discuss] PERL modules as packages conflict In-Reply-To: References: Message-ID: On Tue, Jan 20, 2015 at 12:25 AM, Alex McWhirter wrote: > I have perl and some modules all setup as packages, but theres a conflict. > Each module wants to make changes to perllocal.pod and pkg detects this as a > conflict. > > The following packages all deliver file actions to > opt/triadic/lib/i386/perl5/5.20.1/i386/perllocal.pod: > > > pkg://stable.pkg.triadic.us/triadic/perl-net-dns at 0.81,5.11-0.151006:20150108T224214Z > > pkg://stable.pkg.triadic.us/triadic/perl-html-parser at 3.71,5.11-0.151006:20150108T221603Z > > These packages may not be installed together. Any non-conflicting set may > be, or the packages must be corrected before they can be installed. > > i would imagine this problem has been run into before with the OMNIperl > package set, i would like to know how it was resolved? I couldn?t find build > scripts for OMNIperl anywhere. I just published https://github.com/omniti-labs/omnios-build-perl to github. Hopefully that helps you with having a framework for building perl packages. From lotheac at iki.fi Tue Jan 20 16:36:46 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Tue, 20 Jan 2015 18:36:46 +0200 Subject: [OmniOS-discuss] PERL modules as packages conflict In-Reply-To: References: Message-ID: <20150120163646.GH24621@gutsman.lotheac.fi> On Tue, Jan 20 2015 11:14:46 -0500, Zach Malone wrote: > I just published https://github.com/omniti-labs/omnios-build-perl to > github. Hopefully that helps you with having a framework for building > perl packages. Oh, nice! I only wish this had been available sooner -- I've made something similar, if a lot more rudimentary (and probably buggier): https://github.com/niksula/omnios-build-scripts/tree/master/perl-modules -- Lauri Tirkkonen | lotheac @ IRCnet From rt at steait.net Tue Jan 20 17:11:27 2015 From: rt at steait.net (Rune Tipsmark) Date: Tue, 20 Jan 2015 17:11:27 +0000 Subject: [OmniOS-discuss] VAAI Testing In-Reply-To: References: Message-ID: <1421773886357.33232@steait.net> I would be able to help test if its stable in my environment as well. I can't program though. br, Rune ________________________________ From: OmniOS-discuss on behalf of W Verb Sent: Tuesday, January 20, 2015 3:59 AM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] VAAI Testing Hi All, After seeing the recent message regarding ZFS, iSCSI, zvols and ESXi, I decided to follow up on where full VAAI support is. I found Dan's message from August: http://lists.omniti.com/pipermail/omnios-discuss/2014-August/002957.html Is anyone working on his points 1 and 2? Is anyone keeping track of the testing offers for #3? I do a fair amount of SQA, and am willing to organize and write tests if needed. I also have a reasonable lab environment with which to test the code. -Warren V -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephan.budach at JVM.DE Tue Jan 20 19:30:32 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Tue, 20 Jan 2015 20:30:32 +0100 Subject: [OmniOS-discuss] OmniOS r06 locked up due to smartctl running? In-Reply-To: References: <54BE54D6.509@jvm.de> <54BE5F3F.8010205@jvm.de> Message-ID: <54BEACD8.9000807@jvm.de> Am 20.01.15 um 16:42 schrieb Dan McDonald: > Check the firmware revisions on both mpt_sas controllers. It's possible one need up-or-down grading. > > There are known good and known bad revisions of the mpt_sas firmware. Other on this list are more cognizant of what those revisions are. > > Dan > Hi Dan, thanks - I do have all of my boxes equipped with the same hardware - LSI 2907-8i. I am downloading the MRM software from LSI and will install it on my backup host to check the firmware revisions of both HBAs on that box first. Cheers, budy From richard.elling at richardelling.com Tue Jan 20 19:38:18 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Tue, 20 Jan 2015 11:38:18 -0800 Subject: [OmniOS-discuss] OmniOS r06 locked up due to smartctl running? In-Reply-To: <54BEACD8.9000807@jvm.de> References: <54BE54D6.509@jvm.de> <54BE5F3F.8010205@jvm.de> <54BEACD8.9000807@jvm.de> Message-ID: <9561C326-099B-4883-A5C9-5C77EE00EA68@richardelling.com> > On Jan 20, 2015, at 11:30 AM, Stephan Budach wrote: > > Am 20.01.15 um 16:42 schrieb Dan McDonald: >> Check the firmware revisions on both mpt_sas controllers. It's possible one need up-or-down grading. >> >> There are known good and known bad revisions of the mpt_sas firmware. Other on this list are more cognizant of what those revisions are. >> >> Dan >> > Hi Dan, > > thanks - I do have all of my boxes equipped with the same hardware - LSI 2907-8i. I am downloading the MRM software from LSI and will install it on my backup host to check the firmware revisions of both HBAs on that box first. avoid P20 like the plague (google it for the blow-by-blow accounts of pain) -- richard > > Cheers, > budy > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From stephan.budach at JVM.DE Tue Jan 20 19:39:54 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Tue, 20 Jan 2015 20:39:54 +0100 Subject: [OmniOS-discuss] OmniOS r06 locked up due to smartctl running? In-Reply-To: <9561C326-099B-4883-A5C9-5C77EE00EA68@richardelling.com> References: <54BE54D6.509@jvm.de> <54BE5F3F.8010205@jvm.de> <54BEACD8.9000807@jvm.de> <9561C326-099B-4883-A5C9-5C77EE00EA68@richardelling.com> Message-ID: <54BEAF0A.2050601@jvm.de> Am 20.01.15 um 20:38 schrieb Richard Elling: >> On Jan 20, 2015, at 11:30 AM, Stephan Budach wrote: >> >> Am 20.01.15 um 16:42 schrieb Dan McDonald: >>> Check the firmware revisions on both mpt_sas controllers. It's possible one need up-or-down grading. >>> >>> There are known good and known bad revisions of the mpt_sas firmware. Other on this list are more cognizant of what those revisions are. >>> >>> Dan >>> >> Hi Dan, >> >> thanks - I do have all of my boxes equipped with the same hardware - LSI 2907-8i. I am downloading the MRM software from LSI and will install it on my backup host to check the firmware revisions of both HBAs on that box first. > avoid P20 like the plague (google it for the blow-by-blow accounts of pain) > -- richard > I will, thanks Richard! From mir at miras.org Tue Jan 20 20:19:16 2015 From: mir at miras.org (Michael Rasmussen) Date: Tue, 20 Jan 2015 21:19:16 +0100 Subject: [OmniOS-discuss] OmniOS r06 locked up due to smartctl running? In-Reply-To: <9561C326-099B-4883-A5C9-5C77EE00EA68@richardelling.com> References: <54BE54D6.509@jvm.de> <54BE5F3F.8010205@jvm.de> <54BEACD8.9000807@jvm.de> <9561C326-099B-4883-A5C9-5C77EE00EA68@richardelling.com> Message-ID: <20150120211916.45584977@sleipner.datanom.net> On Tue, 20 Jan 2015 11:38:18 -0800 Richard Elling wrote: > > avoid P20 like the plague (google it for the blow-by-blow accounts of pain) Several people on the list have downgraded from P20 to P18. P20 seems to be a pile of junk. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: It's more than magnificent-it's mediocre. -Samuel Goldwyn -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From stephan.budach at JVM.DE Tue Jan 20 20:28:46 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Tue, 20 Jan 2015 21:28:46 +0100 Subject: [OmniOS-discuss] OmniOS r06 locked up due to smartctl running? In-Reply-To: <20150120211916.45584977@sleipner.datanom.net> References: <54BE54D6.509@jvm.de> <54BE5F3F.8010205@jvm.de> <54BEACD8.9000807@jvm.de> <9561C326-099B-4883-A5C9-5C77EE00EA68@richardelling.com> <20150120211916.45584977@sleipner.datanom.net> Message-ID: <54BEBA7E.9040307@jvm.de> Am 20.01.15 um 21:19 schrieb Michael Rasmussen: > On Tue, 20 Jan 2015 11:38:18 -0800 > Richard Elling wrote: > >> avoid P20 like the plague (google it for the blow-by-blow accounts of pain) > Several people on the list have downgraded from P20 to P18. P20 seems > to be a pile of junk. > Afaik, my older ones are on p15 while my newer ones are on p17. It's the p15's that grumbled on my today, so I might get them all to p18, if this is, what most people in here are using with their 2907's. Thanks, budy From info at houseofancients.nl Tue Jan 20 21:21:52 2015 From: info at houseofancients.nl (Floris van Essen ..:: House of Ancients Amstafs ::..) Date: Tue, 20 Jan 2015 21:21:52 +0000 Subject: [OmniOS-discuss] VAAI Testing In-Reply-To: References: Message-ID: <356582D1FC91784992ABB4265A16ED487F498FB8@vEX01.mindstorm-internet.local> I would be more than willing to test too! Met vriendelijke groet / With kind regards, Floris van Essen Van: OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] Namens W Verb Verzonden: dinsdag 20 januari 2015 3:59 Aan: omnios-discuss at lists.omniti.com Onderwerp: [OmniOS-discuss] VAAI Testing Hi All, After seeing the recent message regarding ZFS, iSCSI, zvols and ESXi, I decided to follow up on where full VAAI support is. I found Dan?s message from August: http://lists.omniti.com/pipermail/omnios-discuss/2014-August/002957.html Is anyone working on his points 1 and 2? Is anyone keeping track of the testing offers for #3? I do a fair amount of SQA, and am willing to organize and write tests if needed. I also have a reasonable lab environment with which to test the code. -Warren V ...:: House of Ancients ::... American Staffordshire Terriers +31-628-161-350 +31-614-198-389 Het Perk 48 4903 RB Oosterhout Netherlands www.houseofancients.nl -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephan.budach at JVM.DE Tue Jan 20 22:19:40 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Tue, 20 Jan 2015 23:19:40 +0100 Subject: [OmniOS-discuss] How to use LSI's storcli? Message-ID: <54BED47C.8070800@jvm.de> Hi guys, I am struggling to get storcli to work - well, not actually to work, but to show me any of my 2907 HBAs, as storcli reports a number of 0 installed HBAs: root at nfsvmpool02:/opt/MegaRAID/CLI# ./storcli show Status Code = 0 Status = Success Description = None Number of Controllers = 0 Host Name = nfsvmpool02 Operating System = SunOS5.11 So, how do I make use of it? Thanks, budy From henson at acm.org Wed Jan 21 01:45:51 2015 From: henson at acm.org (Paul B. Henson) Date: Tue, 20 Jan 2015 17:45:51 -0800 Subject: [OmniOS-discuss] bad smf manifest hosed system :( Message-ID: <12da01d0351b$ff074500$fd15cf00$@acm.org> So I was working on updating openntpd in pkgsrc to the new portable release and adding an smf manifest for it. Thanks to Joyent sponsored work, pkgsrc supports smf and can automatically install a service for a package. So I went to install the package, and surprisingly the import of the manifest failed: svccfg_libscf.c:7750: fmri_to_entity() failed with unexpected error 1007. Aborting. /var/opt/pkg/db/pkg/openntpd-5.7p2/+INSTALL: line 1550: 8281 Abort (core dumped) /usr/sbin/svccfg import ${PKG_PREFIX}/lib/svc/manifest/openntpd.xml Much MUCH MUCH more surprisingly, it totally hosed my system 8-/. I was working remotely, and after the above error showed up, I lost connectivity. Fortunately, I have a remote serial console, and was able to get to the box, where I found: SUNW-MSG-ID: SMF-8000-YX, TYPE: defect, VER: 1, SEVERITY: major EVENT-TIME: Fri Jan 16 17:54:09 PST 2015 PLATFORM: X9SRE-X9SRE-3F-X9SRi-X9SRi-3F, CSN: 0123456789, HOSTNAME: storage SOURCE: software-diagnosis, REV: 0.1 EVENT-ID: c4c93f8b-8c63-6195-c1de-8b864f1f6433 DESC: A service failed - a start, stop or refresh method failed. Refer to http://illumos.org/msg/SMF-8000-YX for more information. AUTO-RESPONSE: The service has been placed into the maintenance state. IMPACT: svc:/pkgsrc/openntpd:default is unavailable. REC-ACTION: Run 'svcs -xv svc:/pkgsrc/openntpd:default' to determine the generic reason wh y the service failed, the location of any logfiles, and a list of other services impacted. And after that, the following repeated over and over: Assertion failed: r == 0, file libscf.c, line 58 [ network/physical:default starting (physical network interfaces) ] [ network/ipfilter:simple starting (IP Filter) ] [ milestone/network:default starting (Network milestone) ] Assertion failed: r == 0, file libscf.c, line 58 [ network/physical:default starting (physical network interfaces) ] [ network/ipfilter:simple starting (IP Filter) ] [ milestone/network:default starting (Network milestone) ] Assertion failed: r == 0, file libscf.c, line 58 I was able to login to the maintenance console, and evidently my manifest was poorly formed or something, as the FMRI wasn't quite right: # svcs -x svc:/svc:/network/ntp:pkgsrc-openntpd has no "restarter" property group; ignoring. Unfortunately, it would not let me delete it: # svccfg delete svc:/svc:/network/ntp:pkgsrc-openntpd # svcs -a | grep ntp disabled Dec_20 svc:/network/ntp:default - - svc:/svc:/network/ntp:pkgsrc-openntpd # svccfg delete svc:/svc:/network/ntp:pkgsrc-openntpd # svccfg delete svc:/network/ntp:pkgsrc-openntpd # svcs -a | grep ntp disabled Dec_20 svc:/network/ntp:default - - svc:/svc:/network/ntp:pkgsrc-openntpd Finally, I had to resort to running /lib/svc/bin/restore_repository and restore the backup of the repository from the last boot . Fortunately, after doing that and rebooting the box seems happy again, albeit with an unexpected production outage in the middle of the day 8-/. I suppose maybe I should have tested the package on some other system first, but the last thing I expected was trying to import a manifest to cause a complete network outage :(. I no longer have the manifest that caused the problem, as I was careful to delete all traces of it before rebooting the system to make sure it didn't somehow get introduced again, but I do have a crap load of cores: -rw------- 1 root root 6945929 Jan 20 15:58 core.svccfg.8281 -rw------- 1 root root 8804517 Jan 20 15:58 core.svc.startd.10 -rw------- 1 root root 9134877 Jan 20 15:58 core.svc.startd.8286 -rw------- 1 root root 9126589 Jan 20 15:58 core.svc.startd.8342 -rw------- 1 root root 9126589 Jan 20 16:01 core.svc.startd.8890 -rw------- 1 root root 9138973 Jan 20 16:01 core.svc.startd.8946 -rw------- 1 root root 9122493 Jan 20 16:01 core.svc.startd.9002 -rw------- 1 root root 9126589 Jan 20 16:02 core.svc.startd.9064 -rw------- 1 root root 9126589 Jan 20 16:02 core.svc.startd.9120 -rw------- 1 root root 9130717 Jan 20 16:02 core.svc.startd.9176 -rw------- 1 root root 9122525 Jan 20 16:02 core.svc.startd.9234 -rw------- 1 root root 9306813 Jan 20 16:03 core.svc.startd.9296 -rw------- 1 root root 9018857 Jan 20 16:03 core.svc.startd.9307 -rw------- 1 root root 9018857 Jan 20 16:03 core.svc.startd.9309 I could probably re-create the manifest I had that failed to import. This seems like a pretty nasty bug, if there were any syntax or other issues with the manifest svccfg should have failed cleanly and not corrupted the repository :(. From mir at miras.org Wed Jan 21 02:30:48 2015 From: mir at miras.org (Michael Rasmussen) Date: Wed, 21 Jan 2015 03:30:48 +0100 Subject: [OmniOS-discuss] openldap-server install problems In-Reply-To: <20150119174349.GD24621@gutsman.lotheac.fi> References: <20150118180025.0f2bdf36@sleipner.datanom.net> <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> <20150118204917.48733b07@sleipner.datanom.net> <20150118200027.GD21898@gutsman.lotheac.fi> <20150118234452.51cef490@sleipner.datanom.net> <20150119060436.GE21898@gutsman.lotheac.fi> <20150119174106.79bdedc8@sleipner.datanom.net> <20150119165619.GC24621@gutsman.lotheac.fi> <20150119182928.449a9aa7@sleipner.datanom.net> <20150119174349.GD24621@gutsman.lotheac.fi> Message-ID: <20150121033048.0166ab88@sleipner.datanom.net> Hi Lauri, While building python-ldap I discovered that your openldap package is a 32 bit build and I was wondering if you intend to make a 64 bit package? The reason for your 32 bit build (https://www.illumos.org/issues/4215) is in state resolved and has been since 2014-16-10 so I should mean this has been fixed in 151006. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Health nuts are going to feel stupid someday, lying in hospitals dying of nothing. -- Redd Foxx -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From lotheac at iki.fi Wed Jan 21 07:17:56 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Wed, 21 Jan 2015 09:17:56 +0200 Subject: [OmniOS-discuss] openldap-server install problems In-Reply-To: <20150121033048.0166ab88@sleipner.datanom.net> References: <1E349315-2273-448A-A1EB-4FACFE9464B2@omniti.com> <20150118204917.48733b07@sleipner.datanom.net> <20150118200027.GD21898@gutsman.lotheac.fi> <20150118234452.51cef490@sleipner.datanom.net> <20150119060436.GE21898@gutsman.lotheac.fi> <20150119174106.79bdedc8@sleipner.datanom.net> <20150119165619.GC24621@gutsman.lotheac.fi> <20150119182928.449a9aa7@sleipner.datanom.net> <20150119174349.GD24621@gutsman.lotheac.fi> <20150121033048.0166ab88@sleipner.datanom.net> Message-ID: <20150121071756.GA23628@gutsman.lotheac.fi> On Wed, Jan 21 2015 03:30:48 +0100, Michael Rasmussen wrote: > While building python-ldap I discovered that your openldap package is a > 32 bit build and I was wondering if you intend to make a 64 bit package? > > The reason for your 32 bit build (https://www.illumos.org/issues/4215) > is in state resolved and has been since 2014-16-10 so I should mean > this has been fixed in 151006. But it hasn't. The commit listed in that ticket is 9048537, and: gutsman /gutsman/ws/illumos-omnios % git branch -a --contains 9048537 * master r151012 remotes/origin/HEAD -> origin/master remotes/origin/master remotes/origin/r151008 remotes/origin/r151010 remotes/origin/r151012 remotes/origin/upstream You can see it's not in 151006, so that's why. -- Lauri Tirkkonen | lotheac @ IRCnet From johan.kragsterman at capvert.se Wed Jan 21 07:33:48 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Wed, 21 Jan 2015 08:33:48 +0100 Subject: [OmniOS-discuss] Ang: How to use LSI's storcli? In-Reply-To: <54BED47C.8070800@jvm.de> References: <54BED47C.8070800@jvm.de> Message-ID: Hi! -----"OmniOS-discuss" skrev: ----- Till: Fr?n: Stephan Budach S?nt av: "OmniOS-discuss" Datum: 2015-01-20 23:21 ?rende: [OmniOS-discuss] How to use LSI's storcli? Hi guys, I am struggling to get storcli to work - well, not actually to work, but to show me any of my 2907 HBAs, as storcli reports a number of 0 installed HBAs: root at nfsvmpool02:/opt/MegaRAID/CLI# ./storcli show Status Code = 0 Status = Success Description = None Number of Controllers = 0 Host Name = nfsvmpool02 Operating System ?= SunOS5.11 So, how do I make use of it? Isn't the StorCLI a tool for megaraid controllers? If your controller is an HBA, it is not a megaraid controller. By the way, did you mean 9207-8i, when you wrote 2907-8i? LSISAS9207-8i is an HBA, and it is NOT a MegaRaid controller, so I guess it shouldn't show up in the StorCLI. Why would you need the StorCLI? Rgrds Johan Thanks, budy _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From stephan.budach at JVM.DE Wed Jan 21 08:12:45 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 21 Jan 2015 09:12:45 +0100 Subject: [OmniOS-discuss] Ang: How to use LSI's storcli? In-Reply-To: References: <54BED47C.8070800@jvm.de> Message-ID: <54BF5F7D.3050006@jvm.de> Hi Johan, Am 21.01.15 um 08:33 schrieb Johan Kragsterman: > Hi! > > > -----"OmniOS-discuss" skrev: ----- > Till: > Fr?n: Stephan Budach > S?nt av: "OmniOS-discuss" > Datum: 2015-01-20 23:21 > ?rende: [OmniOS-discuss] How to use LSI's storcli? > > Hi guys, > > I am struggling to get storcli to work - well, not actually to work, but > to show me any of my 2907 HBAs, as storcli reports a number of 0 > installed HBAs: > > root at nfsvmpool02:/opt/MegaRAID/CLI# ./storcli show > Status Code = 0 > Status = Success > Description = None > > Number of Controllers = 0 > Host Name = nfsvmpool02 > Operating System = SunOS5.11 > > So, how do I make use of it? > > > > > > > Isn't the StorCLI a tool for megaraid controllers? Well, I thought that the MRM supported both the plain HBAs and the RAID ones. Interestingly, on my Linux boxe both, the storcli and the MegaCLI seem to be the same binary. Plus, when installing the MRM package with all it's bells and whistles, my *actually* 9207-8i do show up in the UI and can be managed. I just checked the readme from the MRM software package and it specifically notes support for the LSI SAS 9207-8i. > > If your controller is an HBA, it is not a megaraid controller. By the way, did you mean 9207-8i, when you wrote 2907-8i? LSISAS9207-8i is an HBA, and it is NOT a MegaRaid controller, so I guess it shouldn't show up in the StorCLI. > > > Why would you need the StorCLI? I wanted to check and probably update the FW on the 9207 from 15 to 18, as I seem to encounter an issue with a specific bug in the FW of those models. Thanks, budy -------------- next part -------------- An HTML attachment was scrubbed... URL: From johan.kragsterman at capvert.se Wed Jan 21 08:45:59 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Wed, 21 Jan 2015 09:45:59 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: How to use LSI's storcli? In-Reply-To: <54BF5F7D.3050006@jvm.de> References: <54BF5F7D.3050006@jvm.de>, <54BED47C.8070800@jvm.de> Message-ID: Hi again! -----Stephan Budach skrev: ----- Till: Johan Kragsterman Fr?n: Stephan Budach Datum: 2015-01-21 09:13 Kopia: ?rende: Re: Ang: [OmniOS-discuss] How to use LSI's storcli? Hi Johan, Am 21.01.15 um 08:33 schrieb Johan Kragsterman: Hi! -----"OmniOS-discuss" ?skrev: ----- Till: Fr?n: Stephan Budach S?nt av: "OmniOS-discuss" Datum: 2015-01-20 23:21 ?rende: [OmniOS-discuss] How to use LSI's storcli? Hi guys, I am struggling to get storcli to work - well, not actually to work, but to show me any of my 2907 HBAs, as storcli reports a number of 0 installed HBAs: root at nfsvmpool02:/opt/MegaRAID/CLI# ./storcli show Status Code = 0 Status = Success Description = None Number of Controllers = 0 Host Name = nfsvmpool02 Operating System ?= SunOS5.11 So, how do I make use of it? Isn't the StorCLI a tool for megaraid controllers? Well, I thought that the MRM supported both the plain HBAs and the RAID ones. Interestingly, on my Linux boxe both, the storcli and the MegaCLI seem to be the same binary. Plus, when installing the MRM package with all it's bells and whistles, my *actually* 9207-8i do show up in the UI and can be managed. I just checked the readme from the MRM software package and it specifically notes support for the LSI SAS 9207-8i. Well, then I supposer I was wrong! I was reading online that the StorCLI was meant for managing MegaRaid adpters, and for sure LSISAS9207-8i is not a MegaRaid adapter, it has not the MegaRaid bios on it, as well as no MegaRaid firmware. But if the readme says it is supported, then I guess it is! And I was wondering why you would need to "manage" an HBA, since it is not much to manage...a Raid adapter is something to manage, but an HBA...? But of coarse, you can consider that firmware is something to manage, though I NEVER flash firmware online, since too much can happen...so I wouldn't use StorCLI for that purpose, I would do it offline with a dos utility. By the way, I've been using firmware P19 without any trouble so far, though not tested under heavy load. Rgrds Johan If your controller is an HBA, it is not a megaraid controller. By the way, did you mean 9207-8i, when you wrote 2907-8i? LSISAS9207-8i is an HBA, and it is NOT a MegaRaid controller, so I guess it shouldn't show up in the StorCLI. Why would you need the StorCLI? I wanted to check and probably update the FW on the 9207 from 15 to 18, as I seem to encounter an issue with a specific bug in the FW of those models. Thanks, budy From stephan.budach at JVM.DE Wed Jan 21 08:54:34 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 21 Jan 2015 09:54:34 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: How to use LSI's storcli? In-Reply-To: References: <54BF5F7D.3050006@jvm.de>, <54BED47C.8070800@jvm.de> Message-ID: <54BF694A.4090809@jvm.de> Hi, so? I settled for the simpler sas2flash option? this works under OmniOS and basically provides what I need: checking/updating the FW of the HBA(s). Although the 9207-8i is listed in the readme of the MRM software - and it actually works, if you fire up that big clunky Java app, it is obviously unsupported by storcli or MegaCLI. The sas2flash utility perfectly fits my needs, so I will go with that. Now, the only thing to check is, which FW to use. I have already heard that P20 should be avoided and that most poeple are F18, but F18 is not available at LSI's site, just F19 and F20. Thanks, budy From phil.harman at gmail.com Wed Jan 21 08:53:14 2015 From: phil.harman at gmail.com (Phil Harman) Date: Wed, 21 Jan 2015 08:53:14 +0000 Subject: [OmniOS-discuss] Ang: Re: Ang: How to use LSI's storcli? In-Reply-To: References: <54BF5F7D.3050006@jvm.de> <, > <54BED47C.8070800@jvm.de> Message-ID: Stephan, You need to go hunting for the illusive sas2flash utility. While you are searching, track down lsiutil and sas2ircu, which can also be useful with LSI 9207 HBAs. Phil > On 21 Jan 2015, at 08:45, Johan Kragsterman wrote: > > > Hi again! > > > > -----Stephan Budach skrev: ----- > Till: Johan Kragsterman > Fr?n: Stephan Budach > Datum: 2015-01-21 09:13 > Kopia: > ?rende: Re: Ang: [OmniOS-discuss] How to use LSI's storcli? > > Hi Johan, > > Am 21.01.15 um 08:33 schrieb Johan Kragsterman: > Hi! > > > -----"OmniOS-discuss" skrev: ----- > Till: > Fr?n: Stephan Budach > S?nt av: "OmniOS-discuss" > Datum: 2015-01-20 23:21 > ?rende: [OmniOS-discuss] How to use LSI's storcli? > > Hi guys, > > I am struggling to get storcli to work - well, not actually to work, but > to show me any of my 2907 HBAs, as storcli reports a number of 0 > installed HBAs: > > root at nfsvmpool02:/opt/MegaRAID/CLI# ./storcli show > Status Code = 0 > Status = Success > Description = None > > Number of Controllers = 0 > Host Name = nfsvmpool02 > Operating System = SunOS5.11 > > So, how do I make use of it? > > > > Isn't the StorCLI a tool for megaraid controllers? > > Well, I thought that the MRM supported both the plain HBAs and the RAID ones. Interestingly, on my Linux boxe both, the storcli and the MegaCLI seem to be the same binary. Plus, when installing the MRM package with all it's bells and whistles, my *actually* 9207-8i do show up in the UI and can be managed. I just checked the readme from the MRM software package and it specifically notes support for the LSI SAS 9207-8i. > > > > > > > > > > Well, then I supposer I was wrong! > > I was reading online that the StorCLI was meant for managing MegaRaid adpters, and for sure LSISAS9207-8i is not a MegaRaid adapter, it has not the MegaRaid bios on it, as well as no MegaRaid firmware. > But if the readme says it is supported, then I guess it is! > > And I was wondering why you would need to "manage" an HBA, since it is not much to manage...a Raid adapter is something to manage, but an HBA...? But of coarse, you can consider that firmware is something to manage, though I NEVER flash firmware online, since too much can happen...so I wouldn't use StorCLI for that purpose, I would do it offline with a dos utility. By the way, I've been using firmware P19 without any trouble so far, though not tested under heavy load. > > > > Rgrds Johan > > > > > > > > > > > If your controller is an HBA, it is not a megaraid controller. By the way, did you mean 9207-8i, when you wrote 2907-8i? LSISAS9207-8i is an HBA, and it is NOT a MegaRaid controller, so I guess it shouldn't show up in the StorCLI. > > > Why would you need the StorCLI? > > I wanted to check and probably update the FW on the 9207 from 15 to 18, as I seem to encounter an issue with a specific bug in the FW of those models. > > Thanks, > budy > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss From johan.kragsterman at capvert.se Wed Jan 21 09:27:49 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Wed, 21 Jan 2015 10:27:49 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Re: Ang: How to use LSI's storcli? In-Reply-To: <54BF694A.4090809@jvm.de> References: <54BF694A.4090809@jvm.de>, <54BF5F7D.3050006@jvm.de>, <54BED47C.8070800@jvm.de> Message-ID: An HTML attachment was scrubbed... URL: From stephan.budach at JVM.DE Wed Jan 21 09:40:33 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 21 Jan 2015 10:40:33 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: How to use LSI's storcli? In-Reply-To: References: <54BF5F7D.3050006@jvm.de> <, > <54BED47C.8070800@jvm.de> Message-ID: <54BF7411.8020405@jvm.de> Hi Phil, Am 21.01.15 um 09:53 schrieb Phil Harman: > Stephan, > > You need to go hunting for the illusive sas2flash utility. While you are searching, track down lsiutil and sas2ircu, which can also be useful with LSI 9207 HBAs. > > Phil > yeah - got that by now, plus this whacky lsigetsolaris support script? ;) Thanks for pointing that out to me. Cheers, budy From chip at innovates.com Wed Jan 21 14:38:32 2015 From: chip at innovates.com (Schweiss, Chip) Date: Wed, 21 Jan 2015 08:38:32 -0600 Subject: [OmniOS-discuss] Ang: Re: Ang: How to use LSI's storcli? In-Reply-To: <54BF694A.4090809@jvm.de> References: <54BF5F7D.3050006@jvm.de> <54BED47C.8070800@jvm.de> <54BF694A.4090809@jvm.de> Message-ID: On Wed, Jan 21, 2015 at 2:54 AM, Stephan Budach wrote: > > The sas2flash utility perfectly fits my needs, so I will go with that. > Now, the only thing to check is, which FW to use. I have already heard that > P20 should be avoided and that most poeple are F18, but F18 is not > available at LSI's site, just F19 and F20. You definitely don't want P20. I haven't heard much about P19, but most are happy with P17 and P18. You can still get all the old versions on the LSI website. Use the download search: http://www.lsi.com/support/pages/download-search.aspx After selecting your adapter, click on the "Archived" list. -Chip > > Thanks, > budy > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephan.budach at JVM.DE Wed Jan 21 16:35:22 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Wed, 21 Jan 2015 17:35:22 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: How to use LSI's storcli? In-Reply-To: References: <54BF5F7D.3050006@jvm.de> <54BED47C.8070800@jvm.de> <54BF694A.4090809@jvm.de> Message-ID: <54BFD54A.4020004@jvm.de> Am 21.01.15 um 15:38 schrieb Schweiss, Chip: > On Wed, Jan 21, 2015 at 2:54 AM, Stephan Budach > wrote: > > > The sas2flash utility perfectly fits my needs, so I will go with > that. Now, the only thing to check is, which FW to use. I have > already heard that P20 should be avoided and that most poeple are > F18, but F18 is not available at LSI's site, just F19 and F20. > > > You definitely don't want P20. I haven't heard much about P19, but > most are happy with P17 and P18. > > You can still get all the old versions on the LSI website. Use the > download search: http://www.lsi.com/support/pages/download-search.aspx > After selecting your adapter, click on the "Archived" list. > > -Chip Ahh? got it - thanks! I missed that one. Now, I am waiting for LSI to take a look at why I got thet HBA lock-up? Thanks, budy -------------- next part -------------- An HTML attachment was scrubbed... URL: From mir at miras.org Wed Jan 21 19:08:04 2015 From: mir at miras.org (Michael Rasmussen) Date: Wed, 21 Jan 2015 20:08:04 +0100 Subject: [OmniOS-discuss] omnios-build broken Message-ID: <20150121200804.7a805dcc@sleipner.datanom.net> Hi all, Did: 1) git clone https://github.com/omniti-labs/omnios-build.git 2) git checkout -b r151006 3) git branch --set-upstream-to=origin/r151006 r151006 4) git pull Response: Auto-merging lib/site.sh CONFLICT (content): Merge conflict in lib/site.sh Auto-merging lib/functions.sh CONFLICT (content): Merge conflict in lib/functions.sh Auto-merging lib/config.sh CONFLICT (content): Merge conflict in lib/config.sh Auto-merging build/xz/build.sh CONFLICT (content): Merge conflict in build/xz/build.sh Auto-merging build/release/root/etc/release CONFLICT (content): Merge conflict in build/release/root/etc/release Auto-merging build/release/name.p5m CONFLICT (content): Merge conflict in build/release/name.p5m Auto-merging build/python26/patches/series CONFLICT (content): Merge conflict in build/python26/patches/series CONFLICT (modify/delete): build/perl/perl-5142_module_sun-solaris.p5m deleted in HEAD and modified in 1977b2c316c798901b49ebf97fe7cd5601e6fdb9. Version 1977b2c316c798901b49ebf97fe7cd5601e6fdb9 of build/perl/perl-5142_module_sun-solaris.p5m left in tree. CONFLICT (modify/delete): build/perl/perl-5142_manual.p5m deleted in HEAD and modified in 1977b2c316c798901b49ebf97fe7cd5601e6fdb9. Version 1977b2c316c798901b49ebf97fe7cd5601e6fdb9 of build/perl/perl-5142_manual.p5m left in tree. CONFLICT (modify/delete): build/perl/perl-5142.p5m deleted in HEAD and modified in 1977b2c316c798901b49ebf97fe7cd5601e6fdb9. Version 1977b2c316c798901b49ebf97fe7cd5601e6fdb9 of build/perl/perl-5142.p5m left in tree. CONFLICT (modify/delete): build/perl/perl-5142-64.p5m deleted in HEAD and modified in 1977b2c316c798901b49ebf97fe7cd5601e6fdb9. Version 1977b2c316c798901b49ebf97fe7cd5601e6fdb9 of build/perl/perl-5142-64.p5m left in tree. Auto-merging build/pciutils/build.sh CONFLICT (content): Merge conflict in build/pciutils/build.sh Auto-merging build/pci.ids/pci.ids CONFLICT (content): Merge conflict in build/pci.ids/pci.ids CONFLICT (modify/delete): build/nspr/build.sh deleted in HEAD and modified in 1977b2c316c798901b49ebf97fe7cd5601e6fdb9. Version 1977b2c316c798901b49ebf97fe7cd5601e6fdb9 of build/nspr/build.sh left in tree. CONFLICT (modify/delete): build/mozilla-nss/files/SunOS5.11_i86pc.mk deleted in HEAD and modified in 1977b2c316c798901b49ebf97fe7cd5601e6fdb9. Version 1977b2c316c798901b49ebf97fe7cd5601e6fdb9 of build/mozilla-nss/files/SunOS5.11_i86pc.mk left in tree. CONFLICT (modify/delete): build/math/local.mog deleted in HEAD and modified in 1977b2c316c798901b49ebf97fe7cd5601e6fdb9. Version 1977b2c316c798901b49ebf97fe7cd5601e6fdb9 of build/math/local.mog left in tree. CONFLICT (modify/delete): build/math/build.sh deleted in HEAD and modified in 1977b2c316c798901b49ebf97fe7cd5601e6fdb9. Version 1977b2c316c798901b49ebf97fe7cd5601e6fdb9 of build/math/build.sh left in tree. Auto-merging build/libxml2/build.sh Auto-merging build/libtool/build.sh CONFLICT (content): Merge conflict in build/libtool/build.sh Auto-merging build/libffi/patches/unwind-instead-of-exceptions.patch CONFLICT (add/add): Merge conflict in build/libffi/patches/unwind-instead-of-exceptions.patch Auto-merging build/kayak/build.sh CONFLICT (content): Merge conflict in build/kayak/build.sh Auto-merging build/jeos/omnios-userland.p5m CONFLICT (content): Merge conflict in build/jeos/omnios-userland.p5m Auto-merging build/jeos/illumos-gate.p5m CONFLICT (content): Merge conflict in build/jeos/illumos-gate.p5m Auto-merging build/illumos/build.sh Auto-merging build/illumos-kvm/build.sh CONFLICT (content): Merge conflict in build/illumos-kvm/build.sh Auto-merging build/git/build.sh CONFLICT (content): Merge conflict in build/git/build.sh Auto-merging build/gcc48/build.sh CONFLICT (content): Merge conflict in build/gcc48/build.sh Auto-merging build/expat/patches/unwind.patch CONFLICT (add/add): Merge conflict in build/expat/patches/unwind.patch Auto-merging build/expat/build.sh Auto-merging build/entire/entire.p5m CONFLICT (content): Merge conflict in build/entire/entire.p5m Auto-merging build/curl/build.sh Auto-merging build/ca-bundle/build.sh CONFLICT (content): Merge conflict in build/ca-bundle/build.sh Auto-merging build/bind/build.sh CONFLICT (content): Merge conflict in build/bind/build.sh Auto-merging build/bash/patches/series CONFLICT (content): Merge conflict in build/bash/patches/series Auto-merging build/bash/build.sh CONFLICT (content): Merge conflict in build/bash/build.sh Automatic merge failed; fix conflicts and then commit the result. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Remark of Dr. Baldwin's concerning upstarts: We don't care to eat toadstools that think they are truffles. -- Mark Twain, "Pudd'nhead Wilson's Calendar" -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From lotheac at iki.fi Wed Jan 21 19:15:44 2015 From: lotheac at iki.fi (Lauri Tirkkonen) Date: Wed, 21 Jan 2015 21:15:44 +0200 Subject: [OmniOS-discuss] omnios-build broken In-Reply-To: <20150121200804.7a805dcc@sleipner.datanom.net> References: <20150121200804.7a805dcc@sleipner.datanom.net> Message-ID: <20150121191543.GC23628@gutsman.lotheac.fi> On Wed, Jan 21 2015 20:08:04 +0100, Michael Rasmussen wrote: > 1) git clone https://github.com/omniti-labs/omnios-build.git > 2) git checkout -b r151006 You cloned the repository and created a new branch called r151006 from the currently checked out branch, which would be master. Try checkout without -b at this point instead. > 3) git branch --set-upstream-to=origin/r151006 r151006 > 4) git pull ... and then tried to merge origin/r151006 into it. -- Lauri Tirkkonen | lotheac @ IRCnet From mir at miras.org Wed Jan 21 19:26:31 2015 From: mir at miras.org (Michael Rasmussen) Date: Wed, 21 Jan 2015 20:26:31 +0100 Subject: [OmniOS-discuss] omnios-build broken In-Reply-To: <20150121191543.GC23628@gutsman.lotheac.fi> References: <20150121200804.7a805dcc@sleipner.datanom.net> <20150121191543.GC23628@gutsman.lotheac.fi> Message-ID: <20150121202631.26e0d6c5@sleipner.datanom.net> On Wed, 21 Jan 2015 21:15:44 +0200 Lauri Tirkkonen wrote: > On Wed, Jan 21 2015 20:08:04 +0100, Michael Rasmussen wrote: > > 1) git clone https://github.com/omniti-labs/omnios-build.git > > 2) git checkout -b r151006 > > You cloned the repository and created a new branch called r151006 from > the currently checked out branch, which would be master. Try checkout > without -b at this point instead. > > > 3) git branch --set-upstream-to=origin/r151006 r151006 > > 4) git pull > > ... and then tried to merge origin/r151006 into it. > Forget my noise. Long day. I missed: - git fetch origin - git checkout -b origin/r151006 - git checkout -b r151006 -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: If at first you don't succeed, you're doing about average. -- Leonard Levinson -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From contact at jacobvosmaer.nl Wed Jan 21 23:29:17 2015 From: contact at jacobvosmaer.nl (Jacob Vosmaer) Date: Thu, 22 Jan 2015 00:29:17 +0100 Subject: [OmniOS-discuss] Who here had lockd/nlockmgr problems? In-Reply-To: <305CC122-4584-4791-9D63-18C210FA3DC2@omniti.com> References: <968CF721-8839-49E2-8F04-9FD912E78E68@omniti.com> <9A9651B5-B71B-44D3-90C1-BCF96B4ECCE8@omniti.com> <54BE3F70.8080303@alcatel-lucent.com> <305CC122-4584-4791-9D63-18C210FA3DC2@omniti.com> Message-ID: Hi Paul, Dan I wrote in about this a while back. At the time I was optimistic about deleting files from statmon.bak. I have since realized the problem has not disappeared, and all I have now is a faster hammer to whack this problem with: $ cat fix-nfs #!/bin/sh sudo rm -rf /var/statmon/sm.bak/* sudo svcadm restart nfs/status sudo svcadm clear nfs/nlockmgr sudo svcadm restart nfs/ If an NFS client was there when the server shut down, but is gone when the NFS server comes up, I need to reach for this. What I have been reluctant to do until now is to set up something (an SMF?) that deletes the files in /var/statmon/sm.bak on boot. Cheers, Jacob 20 Jan 2015 16:44, "Dan McDonald" wrote: > > > On Jan 20, 2015, at 6:43 AM, Paul Jochum > wrote: > > > > Hi Dan: > > > > Resurrecting an older thread here. > > > > Do you know if a fix was submitted for this problem, and if > submitted, if/when will it be picked up in OmniOS? > > > > We have had this problem when trying to upgrade multiple machines to > R151012 in our environment, and decided to stay at r151010 hoping it will > get fixed soon. > > There have been some changes in nlockmgr, but not *specifically* for that > problem. Given there's now a known workaround (deleting the files in > statmon), I think the community hasn't been as gung-ho to fix it. > > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephan.budach at JVM.DE Thu Jan 22 08:48:16 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Thu, 22 Jan 2015 09:48:16 +0100 Subject: [OmniOS-discuss] Rolling FW upgrade on LSI 9207 Message-ID: <54C0B950.6070704@jvm.de> Hi guys, I do have two zpools of mirrored vdevs, where each vdev is spread over two LSI 9207-8i HBAs. Is it possible to perform a round-robin FW upgrade on the LSI HBAs, without rebooting the box? I'd like to keep the zpools up while I am performing the upgrade and I thought of breaking the mirrors, performing the FW upgrade on the first 9207, re-attaching the devices and doing the same again for the other half of the zpools vdevs. Has anyone done this before, or is a host reboot mandantory? Thanks, budy -------------- next part -------------- An HTML attachment was scrubbed... URL: From johan.kragsterman at capvert.se Thu Jan 22 09:39:28 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Thu, 22 Jan 2015 10:39:28 +0100 Subject: [OmniOS-discuss] Ang: Rolling FW upgrade on LSI 9207 In-Reply-To: <54C0B950.6070704@jvm.de> References: <54C0B950.6070704@jvm.de> Message-ID: Hi! -----"OmniOS-discuss" skrev: ----- Till: Fr?n: Stephan Budach S?nt av: "OmniOS-discuss" Datum: 2015-01-22 09:50 ?rende: [OmniOS-discuss] Rolling FW upgrade on LSI 9207 Hi guys, I do have two zpools of mirrored vdevs, where each vdev is spread over two LSI 9207-8i HBAs. Is it possible to perform a round-robin FW upgrade on the LSI HBAs, without rebooting the box? I'd like to keep the zpools up while I am performing the upgrade and I thought of breaking the mirrors, performing the FW upgrade on the first 9207, re-attaching the devices and doing the same again for the other half of the zpools vdevs. Has anyone done this before, or is a host reboot mandantory? Stephan! As I said before, I would NEVER put myself, and my infrastructure services, in the position where I couldn't find a maintanance window to take it down, to perform a firmware upgrade. IMHO, there are too much that can go wrong, and if it goes wrong....well, then you're in big trouble... If you couldn't even afford to take it down for a firmware upgrade, how would you handle a major problem with firmware or bricked HBA's? Bad things happens now and then, especially when you're not prepared for them... Regards Johan Thanks, budy _______________________________________________ OmniOS-discuss mailing list OmniOS-discuss at lists.omniti.com http://lists.omniti.com/mailman/listinfo/omnios-discuss From stephan.budach at JVM.DE Thu Jan 22 09:54:35 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Thu, 22 Jan 2015 10:54:35 +0100 Subject: [OmniOS-discuss] Ang: Rolling FW upgrade on LSI 9207 In-Reply-To: References: <54C0B950.6070704@jvm.de> Message-ID: <54C0C8DB.9000903@jvm.de> Hi Johan, Am 22.01.15 um 10:39 schrieb Johan Kragsterman: > Hi! > > > > -----"OmniOS-discuss" skrev: ----- > Till: > Fr?n: Stephan Budach > S?nt av: "OmniOS-discuss" > Datum: 2015-01-22 09:50 > ?rende: [OmniOS-discuss] Rolling FW upgrade on LSI 9207 > > Hi guys, > > I do have two zpools of mirrored vdevs, where each vdev is spread over two LSI 9207-8i HBAs. Is it possible to perform a round-robin FW upgrade on the LSI HBAs, without rebooting the box? I'd like to keep the zpools up while I am performing the upgrade and I thought of breaking the mirrors, performing the FW upgrade on the first 9207, re-attaching the devices and doing the same again for the other half of the zpools vdevs. > > Has anyone done this before, or is a host reboot mandantory? > > > > Stephan! > > As I said before, I would NEVER put myself, and my infrastructure services, in the position where I couldn't find a maintanance window to take it down, to perform a firmware upgrade. > > IMHO, there are too much that can go wrong, and if it goes wrong....well, then you're in big trouble... Yeah - I do know that, but it happens, that I don't have a choice at this time. Also, I didn't say, that I couldn't take down the whole server, but I'd rather avoid that. I also don't see any issues with my request, despite that I don't know, if a FW upgrade on the HBA, along with a subsequent reset of that same HBA will suffice, or if the mpt driver will go crazy on me for that. I am well experienced with ZFS to perform the rest without expecting any issues related to ZFS. I will certainly announce a maintenance window for this action, anyway. This NGS store hosts over 60 VMs. > > If you couldn't even afford to take it down for a firmware upgrade, how would you handle a major problem with firmware or bricked HBA's? Bad things happens now and then, especially when you're not prepared for them... Well? it depends on the level of preparation you are willing - are able - to put in. This host gets backed up via zfs send/recv each 4th hour, so it's not, that I am completely unprepared. However, this is a transision phase for us and we are planning for HA in the coming months. Thanks, budy From johan.kragsterman at capvert.se Thu Jan 22 10:19:09 2015 From: johan.kragsterman at capvert.se (Johan Kragsterman) Date: Thu, 22 Jan 2015 11:19:09 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Rolling FW upgrade on LSI 9207 In-Reply-To: <54C0C8DB.9000903@jvm.de> References: <54C0C8DB.9000903@jvm.de>, <54C0B950.6070704@jvm.de> Message-ID: -----Stephan Budach skrev: ----- Till: Johan Kragsterman Fr?n: Stephan Budach Datum: 2015-01-22 10:54 Kopia: ?rende: Re: Ang: [OmniOS-discuss] Rolling FW upgrade on LSI 9207 Hi Johan, Am 22.01.15 um 10:39 schrieb Johan Kragsterman: > Hi! > > > > -----"OmniOS-discuss" skrev: ----- > Till: > Fr?n: Stephan Budach > S?nt av: "OmniOS-discuss" > Datum: 2015-01-22 09:50 > ?rende: [OmniOS-discuss] Rolling FW upgrade on LSI 9207 > > Hi guys, > > I do have two zpools of mirrored vdevs, where each vdev is spread over two LSI 9207-8i HBAs. Is it possible to perform a round-robin FW upgrade on the LSI HBAs, without rebooting the box? I'd like to keep the zpools up while I am performing the upgrade and I thought of breaking the mirrors, performing the FW upgrade on the first 9207, re-attaching the devices and doing the same again for the other half of the zpools vdevs. > > Has anyone done this before, or is a host reboot mandantory? > > > > Stephan! > > As I said before, I would NEVER put myself, and my infrastructure services, in the position where I couldn't find a maintanance window to take it down, to perform a firmware upgrade. > > IMHO, there are too much that can go wrong, and if it goes wrong....well, then you're in big trouble... Yeah - I do know that, but it happens, that I don't have a choice at this time. Also, I didn't say, that I couldn't take down the whole server, but I'd rather avoid that. I also don't see any issues with my request, despite that I don't know, if a FW upgrade on the HBA, along with a subsequent reset of that same HBA will suffice, or if the mpt driver will go crazy on me for that. I am well experienced with ZFS to perform the rest without expecting any issues related to ZFS. I will certainly announce a maintenance window for this action, anyway. This NGS store hosts over 60 VMs. > > If you couldn't even afford to take it down for a firmware upgrade, how would you handle a major problem with firmware or bricked HBA's? Bad things happens now and then, especially when you're not prepared for them... Well… it depends on the level of preparation you are willing - are able - to put in. This host gets backed up via zfs send/recv each 4th hour, so it's not, that I am completely unprepared. However, this is a transision phase for us and we are planning for HA in the coming months. Thanks, budy Stephan! Well, I didn't consider data loss as the problem, but service down time, since you are reluctant to take it down, it is probably a problem for you with down time. And if down time for firmware upgrade a problem for you, I'm sure downtime for fixing a major firmware issue or a bricked HBA problem would be a lot worse... Regarding your planning for HA setup, it is most interesting for this list, since A LOT OF PEOLPLE have discussed it here, and major concerns have been expressed. It would be interesting to follow your project on the list, not only for me, but MANY on this list, I'm sure!!! So, would you be interested in sharing your HA project with the list? I don't think the list so far been involved in a project that you could follow from start to goal, so that would be very interesting, as well as to be able to follow your choice of infrastructure and sofware solutions! I also guess, that for you, to share this with the list, would give you a nice support, since you would involve a lot of people with a lot of experience... If you choose to share this with the list, pls start a new thread with perhaps: HA project as the subject. If you like, I can start the thread with asking you questions about your project? Regards Johan From stephan.budach at JVM.DE Thu Jan 22 10:46:10 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Thu, 22 Jan 2015 11:46:10 +0100 Subject: [OmniOS-discuss] Ang: Rolling FW upgrade on LSI 9207 In-Reply-To: <2502ce20310c4572aa96014a32d674bb@exchange02.cblinux.co.uk> References: <54C0B950.6070704@jvm.de> <54C0C8DB.9000903@jvm.de> <2502ce20310c4572aa96014a32d674bb@exchange02.cblinux.co.uk> Message-ID: <54C0D4F2.4010909@jvm.de> Hi Carl, Am 22.01.15 um 11:05 schrieb Carl Brunning: > HI > Yes I've done it using lsiutils > You do the firmware update in that and then do reset of the port (99) I will use the sas2flash utility on OmniOS, but it has also the capabilities to reset the HBA after upgrade. > > This if it a mirror pool will degrade the pool as it reset the port so no talking to the disk for a second then it all come back > Wait for the resilver to check everything > Then do the same for the other card Great , that's even more simple than I thought it to be, but of course a small resilver will not take long, since that action will be performed on the weekend, when the traffic is low and probaly not many txgs will happen between the reset of the HBA and the re-appearance of the drives. > > But please make sure you have done mirror if you not you will lose a vdev and that means pain lol Yeah - absolutely. I am sure, that I designed both zpools to be equally spread over the two HBAs, but I will make sure of it, nevertheless. > > Hope that helps you It does indeed! Thanks a ton. > > Thanks > Carl > > Cheers, budy From stephan.budach at JVM.DE Thu Jan 22 10:56:40 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Thu, 22 Jan 2015 11:56:40 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Rolling FW upgrade on LSI 9207 In-Reply-To: References: <54C0C8DB.9000903@jvm.de>, <54C0B950.6070704@jvm.de> Message-ID: <54C0D768.4040405@jvm.de> Hi Johan, Am 22.01.15 um 11:19 schrieb Johan Kragsterman: > > > > > Stephan! > > Well, I didn't consider data loss as the problem, but service down time, since you are reluctant to take it down, it is probably a problem for you with down time. And if down time for firmware upgrade a problem for you, I'm sure downtime for fixing a major firmware issue or a bricked HBA problem would be a lot worse... > > > > Regarding your planning for HA setup, it is most interesting for this list, since A LOT OF PEOLPLE have discussed it here, and major concerns have been expressed. Yeah, I know. I know of a number of threads, that circle around HA setups and I yet don't even know, how I will do it. I am running an Oracle VM cluster and currently two different setups provide the shared storage for it. One of them is this OmniOS box, where the 9207 went nuts and caused the lock-up of the whole system. It will likely boil down to a NFS-HA setup, if that's possible, but the question will be, how the backend storage will be provided. > > It would be interesting to follow your project on the list, not only for me, but MANY on this list, I'm sure!!! > > So, would you be interested in sharing your HA project with the list? > > I don't think the list so far been involved in a project that you could follow from start to goal, so that would be very interesting, as well as to be able to follow your choice of infrastructure and sofware solutions! > > I also guess, that for you, to share this with the list, would give you a nice support, since you would involve a lot of people with a lot of experience... > > If you choose to share this with the list, pls start a new thread with perhaps: HA project as the subject. > > If you like, I can start the thread with asking you questions about your project? If anybody is interested in how my project is going, I will be happily provide any information, as long as it fits to OmniOS. OmniOS got me really hooked and I do like it a lot, but we do already have a RAC setup, which makes an excellent choice for a cluster, so it could also be, that the NFS shares might be served by those nodes. In the end you'll always have to pick the tool, which is best suited for the job. ;) Cheers, budy From stephan.budach at JVM.DE Thu Jan 22 11:00:29 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Thu, 22 Jan 2015 12:00:29 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Rolling FW upgrade on LSI 9207 In-Reply-To: References: <54C0C8DB.9000903@jvm.de>, <54C0B950.6070704@jvm.de> Message-ID: <54C0D84D.3070901@jvm.de> Ahh? and I almost forgot. I will of course first try that procedure on the backup OmniOS box? ;) Should that fail, the next zfs send/recv will take care of that? Thanks for all your valiuable input, budy From nrhuff at umn.edu Thu Jan 22 20:11:37 2015 From: nrhuff at umn.edu (Nathan Huff) Date: Thu, 22 Jan 2015 14:11:37 -0600 Subject: [OmniOS-discuss] getgrnam_r hangs if buffer too small Message-ID: <54C15979.8090302@umn.edu> I am running 151006 and we have some very large groups. If the buffer passed to getgrnam_r is too small to fit the group entry it seems to just hang. I think it is supposed to return NULL and set errno to ERANGE. If the buffer is big enough it returns the information fine. -- Nathan Huff System Administrator Academic Health Center Information Systems University of Minnesota 612-626-9136 From danmcd at omniti.com Thu Jan 22 20:58:27 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 22 Jan 2015 15:58:27 -0500 Subject: [OmniOS-discuss] getgrnam_r hangs if buffer too small In-Reply-To: <54C15979.8090302@umn.edu> References: <54C15979.8090302@umn.edu> Message-ID: > On Jan 22, 2015, at 3:11 PM, Nathan Huff wrote: > > I am running 151006 and we have some very large groups. If the buffer passed to getgrnam_r is too small to fit the group entry it seems to just hang. I think it is supposed to return NULL and set errno to ERANGE. If the buffer is big enough it returns the information fine. When your process hangs (assuming it's easily reproducible) could you utter: pstack and share the stack with the list, please? And for bonus points, take a core dump of it as well: gcore I *suspect* this affects all OmniOS versions. The code in question is quite old, with last-changes predating illumos itself. Thanks! Dan From nrhuff at umn.edu Thu Jan 22 22:06:36 2015 From: nrhuff at umn.edu (Nathan Huff) Date: Thu, 22 Jan 2015 16:06:36 -0600 Subject: [OmniOS-discuss] getgrnam_r hangs if buffer too small In-Reply-To: References: <54C15979.8090302@umn.edu> Message-ID: <54C1746C.8020203@umn.edu> I should have also mentioned that this is using the samba winbind nss module. As I am looking at the code I think the problem is that when the buffer is too small the winbind nss module set errno to ERANGE and then returns NSS_TRYAGAIN. In Illumos in nss_commons.c there is a function retry_test that looks at the return value, but not the errno. This causes the nss_search function to loop endlessly since the buffer never gets resized. It looks like the nss modules in Illumos return UNAVAIL instead of TRYAGAIN for cases where the buffer isn't big enough. I will probably try patching the Samba sources and see if that fixes the issue. I couldn't find any documentation that would say which is correct in the general case. The only thing I could find was for glibc which wants TRYAGAIN in this case. I don't know if there is any use for it, but the pstack is below. 2971: ./a.out fef04ae5 nanosleep (8047c28, 8047c20) feef3244 sleep (5, 2d7, fee104c8, 8047cb8, fee84523, fefa0b28) + 31 fee9b385 nss_search (fef74520, fee83eb0, 4, 8047cb8, 0, 1) + 1a5 fee845c0 getgrnam_r (8050f8b, 8047d10, 80611b0, 400, 80611b0, 80611c0) + 9d 08050e89 main (1, 8047d60, 8047d68, 8050bf2, 8050f60, 0) + 59 08050c53 _start (1, 8047e28, 0, 8047e30, 8047e44, 8047e58) + 83 I have a core file, but I think I understand what is going on enough so it probably isn't necessary. On 2015-01-22 2:58 PM, Dan McDonald wrote: > >> On Jan 22, 2015, at 3:11 PM, Nathan Huff wrote: >> >> I am running 151006 and we have some very large groups. If the buffer passed to getgrnam_r is too small to fit the group entry it seems to just hang. I think it is supposed to return NULL and set errno to ERANGE. If the buffer is big enough it returns the information fine. > > When your process hangs (assuming it's easily reproducible) could you utter: > > pstack > > and share the stack with the list, please? > > And for bonus points, take a core dump of it as well: > > gcore > > I *suspect* this affects all OmniOS versions. The code in question is quite old, with last-changes predating illumos itself. > > Thanks! > Dan > -- Nathan Huff System Administrator Academic Health Center Information Systems University of Minnesota 612-626-9136 From stephan.budach at JVM.DE Fri Jan 23 10:48:08 2015 From: stephan.budach at JVM.DE (Stephan Budach) Date: Fri, 23 Jan 2015 11:48:08 +0100 Subject: [OmniOS-discuss] Ang: Re: Ang: Rolling FW upgrade on LSI 9207 In-Reply-To: <54C0D84D.3070901@jvm.de> References: <54C0C8DB.9000903@jvm.de>, <54C0B950.6070704@jvm.de> <54C0D84D.3070901@jvm.de> Message-ID: <54C226E8.6080102@jvm.de> So just for everyone to know: that didn't work on my backup host. After downloading the fw update and resetting the HBA, the communication to the drives was lost and it couldn't be restored other than through a host reset. Seems that the mpt driver didn't like that? Cheers, budy From nrhuff at umn.edu Fri Jan 23 21:23:32 2015 From: nrhuff at umn.edu (Nathan Huff) Date: Fri, 23 Jan 2015 15:23:32 -0600 Subject: [OmniOS-discuss] getgrnam_r hangs if buffer too small In-Reply-To: <54C1746C.8020203@umn.edu> References: <54C15979.8090302@umn.edu> <54C1746C.8020203@umn.edu> Message-ID: <54C2BBD4.7000603@umn.edu> Patching the winbind nss module to return NSS_UNAVAIL in the buffer to small case fixed the issue. Turns out there was a year old open bug report about this. I submitted my patch so hopefully this can get fixed upstream as well. On 2015-01-22 4:06 PM, Nathan Huff wrote: > I should have also mentioned that this is using the samba winbind nss > module. As I am looking at the code I think the > problem is that when the buffer is too small the winbind nss module set > errno to ERANGE and then returns NSS_TRYAGAIN. > > In Illumos in nss_commons.c there is a function retry_test that looks at > the return value, but not the errno. This causes the nss_search > function to loop endlessly since the buffer never gets resized. It looks > like the nss modules in Illumos return UNAVAIL instead of TRYAGAIN for > cases where the buffer isn't big enough. I will probably try patching > the Samba sources and see if that fixes the issue. I couldn't find any > documentation that would say which is correct in the general case. The > only thing I could find was for glibc which wants TRYAGAIN in this case. > > I don't know if there is any use for it, but the pstack is below. > 2971: ./a.out > fef04ae5 nanosleep (8047c28, 8047c20) > feef3244 sleep (5, 2d7, fee104c8, 8047cb8, fee84523, fefa0b28) + 31 > fee9b385 nss_search (fef74520, fee83eb0, 4, 8047cb8, 0, 1) + 1a5 > fee845c0 getgrnam_r (8050f8b, 8047d10, 80611b0, 400, 80611b0, 80611c0) > + 9d > 08050e89 main (1, 8047d60, 8047d68, 8050bf2, 8050f60, 0) + 59 > 08050c53 _start (1, 8047e28, 0, 8047e30, 8047e44, 8047e58) + 83 > > I have a core file, but I think I understand what is going on enough so > it probably isn't necessary. > > On 2015-01-22 2:58 PM, Dan McDonald wrote: >> >>> On Jan 22, 2015, at 3:11 PM, Nathan Huff wrote: >>> >>> I am running 151006 and we have some very large groups. If the >>> buffer passed to getgrnam_r is too small to fit the group entry it >>> seems to just hang. I think it is supposed to return NULL and set >>> errno to ERANGE. If the buffer is big enough it returns the >>> information fine. >> >> When your process hangs (assuming it's easily reproducible) could you >> utter: >> >> pstack >> >> and share the stack with the list, please? >> >> And for bonus points, take a core dump of it as well: >> >> gcore >> >> I *suspect* this affects all OmniOS versions. The code in question is >> quite old, with last-changes predating illumos itself. >> >> Thanks! >> Dan >> > -- Nathan Huff System Administrator Academic Health Center Information Systems University of Minnesota 612-626-9136 From rt at steait.net Sat Jan 24 17:25:15 2015 From: rt at steait.net (Rune Tipsmark) Date: Sat, 24 Jan 2015 17:25:15 +0000 Subject: [OmniOS-discuss] iostat skip first output Message-ID: <1422120313964.61159@steait.net> hi all, I am just writing some scripts to gather performance data from iostat... or at least trying... I would like to completely skip the first output since boot from iostat output and just get right to the period I specified with the data current from that period. Is this possible at all? br, Rune -------------- next part -------------- An HTML attachment was scrubbed... URL: From rt at steait.net Sat Jan 24 19:42:14 2015 From: rt at steait.net (Rune Tipsmark) Date: Sat, 24 Jan 2015 19:42:14 +0000 Subject: [OmniOS-discuss] iostat skip first output In-Reply-To: <1422120313964.61159@steait.net> References: <1422120313964.61159@steait.net> Message-ID: <1422128532172.63117@steait.net> nevermind, I just made it into tokens and counted my way though it... maybe not the best way but it works... root at zfs10:/usr/lib/check_mk_agent/local# cat disk_iostat.sh varInterval=5 varOutput=$(iostat -xn $varInterval 2 | grep c[0-99]); tokens=( $varOutput ) tokenCount=$(echo ${tokens[*]} | wc -w ) tokenStart=$(((tokenCount/2)-1)) tokenInterval=11 tokenEnd=$((tokenCount)) for i in $(eval echo {$tokenStart..$tokenEnd..$tokenInterval}); do echo 0 disk_busy_${tokens[$i]} percent=${tokens[$i-1]} ${tokens[$i-1]} % average disk utilization last $varInterval seconds; echo 0 disk_latency_${tokens[$i]} ms=${tokens[$i-3]} ${tokens[$i-3]} ms response time average last $varInterval seconds; done ________________________________ From: OmniOS-discuss on behalf of Rune Tipsmark Sent: Saturday, January 24, 2015 6:25 PM To: omnios-discuss at lists.omniti.com Subject: [OmniOS-discuss] iostat skip first output hi all, I am just writing some scripts to gather performance data from iostat... or at least trying... I would like to completely skip the first output since boot from iostat output and just get right to the period I specified with the data current from that period. Is this possible at all? br, Rune -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.elling at richardelling.com Sun Jan 25 00:02:57 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Sat, 24 Jan 2015 16:02:57 -0800 Subject: [OmniOS-discuss] iostat skip first output In-Reply-To: <1422120313964.61159@steait.net> References: <1422120313964.61159@steait.net> Message-ID: > On Jan 24, 2015, at 9:25 AM, Rune Tipsmark wrote: > > hi all, I am just writing some scripts to gather performance data from iostat... or at least trying... I would like to completely skip the first output since boot from iostat output and just get right to the period I specified with the data current from that period. Is this possible at all? > iostat -xn 10 2 | awk '$1 == "extended" && NR > 2 {show=1} show == 1' NB, this is just a derivative of a sample period. A better approach is to store long-term trends in a database intended for such use. If that is too much work, then you should consider storing the raw data that iostat uses for this: kstat -p 'sd::/sd[0-9]+$/' or in JSON: kstat -jp 'sd::/sd[0-9]+$/' insert shameless plug for Circonus here :-) -- richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From rt at steait.net Sun Jan 25 02:59:00 2015 From: rt at steait.net (Rune Tipsmark) Date: Sun, 25 Jan 2015 02:59:00 +0000 Subject: [OmniOS-discuss] iostat skip first output In-Reply-To: References: <1422120313964.61159@steait.net>, Message-ID: <1422154738752.45599@steait.net> hi Richard, thanks for that input, will see what I can do with it. I do store data and graph it so I can keep track of things :) br, Rune ________________________________ From: Richard Elling Sent: Sunday, January 25, 2015 1:02 AM To: Rune Tipsmark Cc: omnios-discuss at lists.omniti.com Subject: Re: [OmniOS-discuss] iostat skip first output On Jan 24, 2015, at 9:25 AM, Rune Tipsmark > wrote: hi all, I am just writing some scripts to gather performance data from iostat... or at least trying... I would like to completely skip the first output since boot from iostat output and just get right to the period I specified with the data current from that period. Is this possible at all? iostat -xn 10 2 | awk '$1 == "extended" && NR > 2 {show=1} show == 1' NB, this is just a derivative of a sample period. A better approach is to store long-term trends in a database intended for such use. If that is too much work, then you should consider storing the raw data that iostat uses for this: kstat -p 'sd::/sd[0-9]+$/' or in JSON: kstat -jp 'sd::/sd[0-9]+$/' insert shameless plug for Circonus here :-) -- richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From mir at miras.org Mon Jan 26 23:02:38 2015 From: mir at miras.org (Michael Rasmussen) Date: Tue, 27 Jan 2015 00:02:38 +0100 Subject: [OmniOS-discuss] package build help Message-ID: <20150127000238.24213969@sleipner.datanom.net> Hi all, I am trying to build a package which links to libpg.so.5 (postgresql) installed in /opt/pgsql/lib To be able to use the package the users LD_LIBRARY_PATH needs to contain /opt/pgsql/lib. How do you handle this when installing a package? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: All truths are true to an extend, including this one. -XA -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From doug at will.to Mon Jan 26 23:14:12 2015 From: doug at will.to (Doug Hughes) Date: Mon, 26 Jan 2015 18:14:12 -0500 Subject: [OmniOS-discuss] package build help In-Reply-To: <20150127000238.24213969@sleipner.datanom.net> References: <20150127000238.24213969@sleipner.datanom.net> Message-ID: <54C6CA44.40209@will.to> Agh! ick. please don't use LD_LIBRARY_PATH. If at all possible, rebuild your binary with -R /opt/pgsql/lib passed to the linker (when combined with NFS, LD_LIBRARY_PATH can lead to all sorts of evil behaviors, when combined with NFS and 64/32 bit, multiply by 2!) On 1/26/2015 6:02 PM, Michael Rasmussen wrote: > Hi all, > > I am trying to build a package which links to libpg.so.5 (postgresql) > installed in /opt/pgsql/lib > > To be able to use the package the users LD_LIBRARY_PATH needs to > contain /opt/pgsql/lib. > > How do you handle this when installing a package? > > > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mir at miras.org Mon Jan 26 23:22:49 2015 From: mir at miras.org (Michael Rasmussen) Date: Tue, 27 Jan 2015 00:22:49 +0100 Subject: [OmniOS-discuss] package build help In-Reply-To: <54C6CA44.40209@will.to> References: <20150127000238.24213969@sleipner.datanom.net> <54C6CA44.40209@will.to> Message-ID: <20150127002249.238fce6d@sleipner.datanom.net> On Mon, 26 Jan 2015 18:14:12 -0500 Doug Hughes wrote: > Agh! ick. please don't use LD_LIBRARY_PATH. If at all possible, rebuild your binary with -R /opt/pgsql/lib passed to the linker > build.sh already contains: LDFLAGS64="$LDFLAGS64 -L/opt/pgsqL/lib/amd64 -R/opt/pgsql/lib/amd64" But does not help. Package I am build is a python package. Is it possible to run crle -l from a package? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: A fool must now and then be right by chance. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From mir at miras.org Mon Jan 26 23:57:26 2015 From: mir at miras.org (Michael Rasmussen) Date: Tue, 27 Jan 2015 00:57:26 +0100 Subject: [OmniOS-discuss] package build help In-Reply-To: <20150127002249.238fce6d@sleipner.datanom.net> References: <20150127000238.24213969@sleipner.datanom.net> <54C6CA44.40209@will.to> <20150127002249.238fce6d@sleipner.datanom.net> Message-ID: <20150127005726.6a1b7855@sleipner.datanom.net> On Tue, 27 Jan 2015 00:22:49 +0100 Michael Rasmussen wrote: > On Mon, 26 Jan 2015 18:14:12 -0500 > Doug Hughes wrote: > > > Agh! ick. please don't use LD_LIBRARY_PATH. If at all possible, rebuild your binary with -R /opt/pgsql/lib passed to the linker > > > build.sh already contains: > LDFLAGS64="$LDFLAGS64 -L/opt/pgsqL/lib/amd64 -R/opt/pgsql/lib/amd64" > But does not help. Package I am build is a python package. > setup.py constructs this: gcc -m64 -shared -Wl,-Bsymbolic build/temp.solaris-2.11-i86pc-2.6/psycopg/psycopgmodule.o build/temp.solaris-2.11-i86pc-2.6/psycopg/pqpath.o build/temp.solaris-2.11-i86pc-2.6/psycopg/typecast.o build/temp.solaris-2.11-i86pc-2.6/psycopg/microprotocols.o build/temp.solaris-2.11-i86pc-2.6/psycopg/microprotocols_proto.o build/temp.solaris-2.11-i86pc-2.6/psycopg/connection_type.o build/temp.solaris-2.11-i86pc-2.6/psycopg/connection_int.o build/temp.solaris-2.11-i86pc-2.6/psycopg/cursor_type.o build/temp.solaris-2.11-i86pc-2.6/psycopg/cursor_int.o build/temp.solaris-2.11-i86pc-2.6/psycopg/lobject_type.o build/temp.solaris-2.11-i86pc-2.6/psycopg/lobject_int.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_qstring.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_pboolean.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_binary.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_asis.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_list.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_datetime.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_pfloat.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_pdecimal.o build/temp.solaris-2.11-i86pc-2.6/psycopg/utils.o -L/usr/lib/amd64 -L/opt/pgsql/lib/amd64 -lpython2.6 -lpq -o build/lib.solaris-2.11-i86pc-2.6/psycopg2/64/_psycopg.so [mir at pkg:psycopg2-2.0.14]$ ldd build/lib.solaris-2.11-i86pc-2.6/psycopg2/64/_psycopg.so libpython2.6.so.1.0 => /usr/lib/64/libpython2.6.so.1.0 libpq.so.5 => (file not found) libgcc_s.so.1 => /usr/lib/64/libgcc_s.so.1 libc.so.1 => /lib/64/libc.so.1 libsocket.so.1 => /lib/64/libsocket.so.1 libnsl.so.1 => /lib/64/libnsl.so.1 libm.so.2 => /lib/64/libm.so.2 libmp.so.2 => /lib/64/libmp.so.2 libmd.so.1 => /lib/64/libmd.so.1 if I change the above to contain -R/opt/pgsql/lib/amd64 it works: gcc -m64 -shared -Wl,-Bsymbolic build/temp.solaris-2.11-i86pc-2.6/psycopg/psycopgmodule.o build/temp.solaris-2.11-i86pc-2.6/psycopg/pqpath.o build/temp.solaris-2.11-i86pc-2.6/psycopg/typecast.o build/temp.solaris-2.11-i86pc-2.6/psycopg/microprotocols.o build/temp.solaris-2.11-i86pc-2.6/psycopg/microprotocols_proto.o build/temp.solaris-2.11-i86pc-2.6/psycopg/connection_type.o build/temp.solaris-2.11-i86pc-2.6/psycopg/connection_int.o build/temp.solaris-2.11-i86pc-2.6/psycopg/cursor_type.o build/temp.solaris-2.11-i86pc-2.6/psycopg/cursor_int.o build/temp.solaris-2.11-i86pc-2.6/psycopg/lobject_type.o build/temp.solaris-2.11-i86pc-2.6/psycopg/lobject_int.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_qstring.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_pboolean.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_binary.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_asis.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_list.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_datetime.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_pfloat.o build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_pdecimal.o build/temp.solaris-2.11-i86pc-2.6/psycopg/utils.o -L/usr/lib/amd64 -L/opt/pgsql/lib/amd64 -R/opt/pgsql/lib/amd64 -lpython2.6 -lpq -o build/lib.solaris-2.11-i86pc-2.6/psycopg2/64/_psycopg.so [mir at pkg:psycopg2-2.0.14]$ ldd build/lib.solaris-2.11-i86pc-2.6/psycopg2/64/_psycopg.so libpython2.6.so.1.0 => /usr/lib/64/libpython2.6.so.1.0 libpq.so.5 => /opt/pgsql/lib/amd64/libpq.so.5 libgcc_s.so.1 => /usr/lib/64/libgcc_s.so.1 libc.so.1 => /lib/64/libc.so.1 libsocket.so.1 => /lib/64/libsocket.so.1 libnsl.so.1 => /lib/64/libnsl.so.1 libm.so.2 => /lib/64/libm.so.2 libssl.so.1.0.0 => /lib/64/libssl.so.1.0.0 libcrypto.so.1.0.0 => /lib/64/libcrypto.so.1.0.0 libpthread.so.1 => /lib/64/libpthread.so.1 libmp.so.2 => /lib/64/libmp.so.2 libmd.so.1 => /lib/64/libmd.so.1 libz.so => /usr/lib/64/libz.so -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael rasmussen cc http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E mir datanom net http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C mir miras org http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917 -------------------------------------------------------------- /usr/games/fortune -es says: Two brothers, Mort and Bill, like to sail. While Bill has a great deal of experience, he certainly isn't the rigger Mort is. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 181 bytes Desc: OpenPGP digital signature URL: From wverb73 at gmail.com Mon Jan 26 23:57:51 2015 From: wverb73 at gmail.com (W Verb) Date: Mon, 26 Jan 2015 15:57:51 -0800 Subject: [OmniOS-discuss] VAAI Testing In-Reply-To: <356582D1FC91784992ABB4265A16ED487F498FB8@vEX01.mindstorm-internet.local> References: <356582D1FC91784992ABB4265A16ED487F498FB8@vEX01.mindstorm-internet.local> Message-ID: Hello All, Thank you for your kind offers to help test this. I chatted briefly with Dan M. a few days ago, and I have offered to root through the illumos-nexenta repository to isolate and commit the changes he mentioned in illumos-gate. That should take care of points 1 and 2. I don't have a timeline for this, but I hope to scope the work in the next few days. As for testing, I've begun looking at the old STF suite, which may or may not be the right toolset to use for this. If anyone has other suggestions, please respond to this post. http://blog.delphix.com/jkennedy/2012/01/18/resurrecting-the-zfs-test-suite/ I'll post again once I have made some progress. -Warren V On Tue, Jan 20, 2015 at 1:21 PM, Floris van Essen ..:: House of Ancients Amstafs ::.. wrote: > I would be more than willing to test too! > > > > *Met vriendelijke groet / With kind regards,* > > > > * Floris van Essen * > > > > *Van:* OmniOS-discuss [mailto:omnios-discuss-bounces at lists.omniti.com] *Namens > *W Verb > *Verzonden:* dinsdag 20 januari 2015 3:59 > *Aan:* omnios-discuss at lists.omniti.com > *Onderwerp:* [OmniOS-discuss] VAAI Testing > > > > Hi All, > > > > After seeing the recent message regarding ZFS, iSCSI, zvols and ESXi, I > decided to follow up on where full VAAI support is. > > > > I found Dan?s message from August: > http://lists.omniti.com/pipermail/omnios-discuss/2014-August/002957.html > > > > Is anyone working on his points 1 and 2? > > > > Is anyone keeping track of the testing offers for #3? > > > > I do a fair amount of SQA, and am willing to organize and write tests if > needed. I also have a reasonable lab environment with which to test the > code. > > > > -Warren V > > ...:: House of Ancients ::... > American Staffordshire Terriers > > +31-628-161-350 > +31-614-198-389 > Het Perk 48 > 4903 RB > Oosterhout > Netherlands > www.houseofancients.nl > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at will.to Tue Jan 27 00:15:14 2015 From: doug at will.to (Doug Hughes) Date: Mon, 26 Jan 2015 19:15:14 -0500 Subject: [OmniOS-discuss] package build help In-Reply-To: <20150127005726.6a1b7855@sleipner.datanom.net> References: <20150127000238.24213969@sleipner.datanom.net> <54C6CA44.40209@will.to> <20150127002249.238fce6d@sleipner.datanom.net> <20150127005726.6a1b7855@sleipner.datanom.net> Message-ID: <54C6D892.3050105@will.to> On 1/26/2015 6:57 PM, Michael Rasmussen wrote: > On Tue, 27 Jan 2015 00:22:49 +0100 > Michael Rasmussen wrote: > >> On Mon, 26 Jan 2015 18:14:12 -0500 >> Doug Hughes wrote: >> >>> Agh! ick. please don't use LD_LIBRARY_PATH. If at all possible, rebuild your binary with -R /opt/pgsql/lib passed to the linker >>> >> build.sh already contains: >> LDFLAGS64="$LDFLAGS64 -L/opt/pgsqL/lib/amd64 -R/opt/pgsql/lib/amd64" >> But does not help. Package I am build is a python package. >> > setup.py constructs this: > gcc -m64 -shared -Wl,-Bsymbolic > build/temp.solaris-2.11-i86pc-2.6/psycopg/psycopgmodule.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/pqpath.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/typecast.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/microprotocols.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/microprotocols_proto.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/connection_type.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/connection_int.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/cursor_type.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/cursor_int.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/lobject_type.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/lobject_int.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_qstring.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_pboolean.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_binary.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_asis.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_list.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_datetime.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_pfloat.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_pdecimal.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/utils.o -L/usr/lib/amd64 > -L/opt/pgsql/lib/amd64 -lpython2.6 -lpq -o > build/lib.solaris-2.11-i86pc-2.6/psycopg2/64/_psycopg.so > [mir at pkg:psycopg2-2.0.14]$ ldd > build/lib.solaris-2.11-i86pc-2.6/psycopg2/64/_psycopg.so > libpython2.6.so.1.0 => /usr/lib/64/libpython2.6.so.1.0 > libpq.so.5 => (file not found) libgcc_s.so.1 > => /usr/lib/64/libgcc_s.so.1 libc.so.1 > => /lib/64/libc.so.1 libsocket.so.1 > => /lib/64/libsocket.so.1 libnsl.so.1 > => /lib/64/libnsl.so.1 libm.so.2 => /lib/64/libm.so.2 > libmp.so.2 => /lib/64/libmp.so.2 libmd.so.1 > => /lib/64/libmd.so.1 > > if I change the above to contain -R/opt/pgsql/lib/amd64 it works: > gcc -m64 -shared -Wl,-Bsymbolic > build/temp.solaris-2.11-i86pc-2.6/psycopg/psycopgmodule.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/pqpath.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/typecast.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/microprotocols.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/microprotocols_proto.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/connection_type.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/connection_int.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/cursor_type.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/cursor_int.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/lobject_type.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/lobject_int.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_qstring.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_pboolean.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_binary.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_asis.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_list.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_datetime.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_pfloat.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/adapter_pdecimal.o > build/temp.solaris-2.11-i86pc-2.6/psycopg/utils.o -L/usr/lib/amd64 > -L/opt/pgsql/lib/amd64 -R/opt/pgsql/lib/amd64 -lpython2.6 -lpq -o > build/lib.solaris-2.11-i86pc-2.6/psycopg2/64/_psycopg.so > [mir at pkg:psycopg2-2.0.14]$ ldd > build/lib.solaris-2.11-i86pc-2.6/psycopg2/64/_psycopg.so > libpython2.6.so.1.0 => /usr/lib/64/libpython2.6.so.1.0 > libpq.so.5 => /opt/pgsql/lib/amd64/libpq.so.5 libgcc_s.so.1 > => /usr/lib/64/libgcc_s.so.1 libc.so.1 > => /lib/64/libc.so.1 libsocket.so.1 > => /lib/64/libsocket.so.1 libnsl.so.1 > => /lib/64/libnsl.so.1 libm.so.2 => /lib/64/libm.so.2 > libssl.so.1.0.0 => /lib/64/libssl.so.1.0.0 libcrypto.so.1.0.0 > => /lib/64/libcrypto.so.1.0.0 libpthread.so.1 > => /lib/64/libpthread.so.1 libmp.so.2 > => /lib/64/libmp.so.2 libmd.so.1 => /lib/64/libmd.so.1 > libz.so => /usr/lib/64/libz.so ldd -s should show you the search path that your built object is actually using From wverb73 at gmail.com Tue Jan 27 01:16:50 2015 From: wverb73 at gmail.com (W Verb) Date: Mon, 26 Jan 2015 17:16:50 -0800 Subject: [OmniOS-discuss] Mildly confusing ZFS iostat output Message-ID: Hello All, I am mildly confused by something iostat does when displaying statistics for a zpool. Before I begin rooting through the iostat source, does anyone have an idea of why I am seeing high "wait" and "wsvc_t" values for "ppool" when my devices apparently are not busy? I would have assumed that the stats for the pool would be the sum of the stats for the zdevs.... extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 10.0 9183.0 40.5 344942.0 0.0 1.8 0.0 0.2 0 178 c4 1.0 187.0 4.0 19684.0 0.0 0.1 0.0 0.5 0 8 c4t5000C5006A597B93d0 2.0 199.0 12.0 20908.0 0.0 0.1 0.0 0.6 0 12 c4t5000C500653DE049d0 2.0 197.0 8.0 20788.0 0.0 0.2 0.0 0.8 0 15 c4t5000C5003607D87Bd0 0.0 202.0 0.0 20908.0 0.0 0.1 0.0 0.6 0 11 c4t5000C5006A5903A2d0 0.0 189.0 0.0 19684.0 0.0 0.1 0.0 0.5 0 10 c4t5000C500653DEE58d0 5.0 957.0 16.5 1966.5 0.0 0.1 0.0 0.1 0 7 c4t50026B723A07AC78d0 0.0 201.0 0.0 20787.9 0.0 0.1 0.0 0.7 0 14 c4t5000C5003604ED37d0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t5000C500653E447Ad0 0.0 3525.0 0.0 110107.7 0.0 0.5 0.0 0.2 0 51 c4t500253887000690Dd0 0.0 3526.0 0.0 110107.7 0.0 0.5 0.0 0.1 1 50 c4t5002538870006917d0 10.0 6046.0 40.5 344941.5 837.4 1.9 138.3 0.3 23 67 ppool For those following the VAAI thread, this is the system I will be using as my testbed. Here is the structure of ppool (taken at a different time than above): root at sanbox:/root# zpool iostat -v ppool capacity operations bandwidth pool alloc free read write read write ------------------------- ----- ----- ----- ----- ----- ----- ppool 191G 7.97T 23 637 140K 15.0M mirror 63.5G 2.66T 7 133 46.3K 840K c4t5000C5006A597B93d0 - - 1 13 24.3K 844K c4t5000C500653DEE58d0 - - 1 13 24.1K 844K mirror 63.6G 2.66T 7 133 46.5K 839K c4t5000C5006A5903A2d0 - - 1 13 24.0K 844K c4t5000C500653DE049d0 - - 1 13 24.6K 844K mirror 63.5G 2.66T 7 133 46.8K 839K c4t5000C5003607D87Bd0 - - 1 13 24.5K 843K c4t5000C5003604ED37d0 - - 1 13 24.4K 843K logs - - - - - - mirror 301M 222G 0 236 0 12.5M c4t5002538870006917d0 - - 0 236 5 12.5M c4t500253887000690Dd0 - - 0 236 5 12.5M cache - - - - - - c4t50026B723A07AC78d0 62.3G 11.4G 19 113 83.0K 1.07M ------------------------- ----- ----- ----- ----- ----- ----- root at sanbox:/root# zfs get all ppool NAME PROPERTY VALUE SOURCE ppool type filesystem - ppool creation Sat Jan 24 18:37 2015 - ppool used 5.16T - ppool available 2.74T - ppool referenced 96K - ppool compressratio 1.51x - ppool mounted yes - ppool quota none default ppool reservation none default ppool recordsize 128K default ppool mountpoint /ppool default ppool sharenfs off default ppool checksum on default ppool compression lz4 local ppool atime on default ppool devices on default ppool exec on default ppool setuid on default ppool readonly off default ppool zoned off default ppool snapdir hidden default ppool aclmode discard default ppool aclinherit restricted default ppool canmount on default ppool xattr on default ppool copies 1 default ppool version 5 - ppool utf8only off - ppool normalization none - ppool casesensitivity sensitive - ppool vscan off default ppool nbmand off default ppool sharesmb off default ppool refquota none default ppool refreservation none default ppool primarycache all default ppool secondarycache all default ppool usedbysnapshots 0 - ppool usedbydataset 96K - ppool usedbychildren 5.16T - ppool usedbyrefreservation 0 - ppool logbias latency default ppool dedup off default ppool mlslabel none default ppool sync standard local ppool refcompressratio 1.00x - ppool written 96K - ppool logicalused 445G - ppool logicalreferenced 9.50K - ppool filesystem_limit none default ppool snapshot_limit none default ppool filesystem_count none default ppool snapshot_count none default ppool redundant_metadata all default Currently, ppool contains a single 5TB zvol that I am hosting as an iSCSI block device. At the zdev level, I have ensured that the ashift is 12 for all devices, all physical devices are 4k-native SATA, and the cache/log SSDs are also set for 4k. The block sizes are manually set in sd.conf, and confirmed with "echo ::sd_state | mdb -k | egrep '(^un|_blocksize)'". The zvol blocksize is 4k, and the iSCSI block transfer size is 512B (not that it matters). All drives contain a single Solaris2 partition with an EFI label, and are properly aligned: format> verify Volume name = < > ascii name = bytes/sector = 512 sectors = 5860533167 accessible sectors = 5860533134 Part Tag Flag First Sector Size Last Sector 0 usr wm 256 2.73TB 5860516750 1 unassigned wm 0 0 0 2 unassigned wm 0 0 0 3 unassigned wm 0 0 0 4 unassigned wm 0 0 0 5 unassigned wm 0 0 0 6 unassigned wm 0 0 0 8 reserved wm 5860516751 8.00MB 5860533134 I scrubbed the pool last night, which completed without error. From "zdb ppool", I have extracted (with minor formatting): capacity operations bandwidth ---- errors ---- description used avail read write read write read write cksum ppool 339G 7.82T 26.6K 0 175M 0 0 0 5 mirror 113G 2.61T 8.87K 0 58.5M 0 0 0 2 /dev/dsk/c4t5000C5006A597B93d0s0 3.15K 0 48.8M 0 0 0 2 /dev/dsk/c4t5000C500653DEE58d0s0 3.10K 0 49.0M 0 0 0 2 mirror 113G 2.61T 8.86K 0 58.5M 0 0 0 8 /dev/dsk/c4t5000C5006A5903A2d0s0 3.12K 0 48.7M 0 0 0 8 /dev/dsk/c4t5000C500653DE049d0s0 3.08K 0 48.9M 0 0 0 8 mirror 113G 2.61T 8.86K 0 58.5M 0 0 0 10 /dev/dsk/c4t5000C5003607D87Bd0s0 2.48K 0 48.8M 0 0 0 10 /dev/dsk/c4t5000C5003604ED37d0s0 2.47K 0 48.9M 0 0 0 10 log mirror 44.0K 222G 0 0 37 0 0 0 0 /dev/dsk/c4t5002538870006917d0s0 0 0 290 0 0 0 0 /dev/dsk/c4t500253887000690Dd0s0 0 0 290 0 0 0 0 Cache /dev/dsk/c4t50026B723A07AC78d0s0 0 73.8G 0 0 35 0 0 0 0 Spare /dev/dsk/c4t5000C500653E447Ad0s0 4 0 136K 0 0 0 0 This shows a few checksum errors, which is not consistent with the output of "zfs status -v", and "iostat -eE" shows no physical error count. I again see the discrepancy between the "ppool" value and what I would expect, which would be a sum of the cksum errors for each vdev. I also observed a ton of leaked space, which I expect from a live pool, as well as a single: db_blkptr_cb: Got error 50 reading <96, 1, 2, 3fc8> DVA[0]=<1:1dc4962000:1000> DVA[1]=<2:1dc4654000:1000> [L2 zvol object] fletcher4 lz4 LE contiguous unique double size=4000L/a00P birth=52386L/52386P fill=4825 cksum=c70e8a7765:f2a dce34f59c:c8a289b51fe11d:7e0af40fe154aab4 -- skipping By the way, I also found: Uberblock: magic = 000000000*0bab10c* Wow. Just wow. -Warren V -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.elling at richardelling.com Tue Jan 27 04:14:14 2015 From: richard.elling at richardelling.com (Richard Elling) Date: Mon, 26 Jan 2015 20:14:14 -0800 Subject: [OmniOS-discuss] Mildly confusing ZFS iostat output In-Reply-To: References: Message-ID: <3DA1E148-656B-4373-A4DD-A5BEB23E3A7B@richardelling.com> > On Jan 26, 2015, at 5:16 PM, W Verb wrote: > > Hello All, > > I am mildly confused by something iostat does when displaying statistics for a zpool. Before I begin rooting through the iostat source, does anyone have an idea of why I am seeing high "wait" and "wsvc_t" values for "ppool" when my devices apparently are not busy? I would have assumed that the stats for the pool would be the sum of the stats for the zdevs.... welcome to queuing theory! ;-) First, iostat knows nothing about the devices being measured. It is really just a processor for kstats of type KSTAT_TYPE_IO (see the kstat(3kstat) man page for discussion) For that type, you get a 2-queue set. For many cases, 2-queues is a fine model, but when there is only one interesting queue, sometimes developers choose to put less interesting info in the "wait" queue. Second, it is the responsibility of the developer to define the queues. In the case of pools, the queues are defined as: wait = vdev_queue_io_add() until vdev_queue_io_remove() run = vdev_queue_pending_add() until vdev_queue_pending_remove() The run queue is closer to the actual measured I/O to the vdev (the juicy performance bits) The wait queue is closer to the transaction engine and includes time for aggregation. Thus we expect the wait queue to be higher, especially for async workloads. But since I/Os can and do get aggregated prior to being sent to the vdev, it is not a very useful measure of overall performance. In other words, optimizing this away could actually hurt performance. In general, worry about the run queues and don't worry so much about the wait queues. NB, iostat calls "run" queues "active" queues. You say Tomato, I say 'mater. -- richard > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 10.0 9183.0 40.5 344942.0 0.0 1.8 0.0 0.2 0 178 c4 > 1.0 187.0 4.0 19684.0 0.0 0.1 0.0 0.5 0 8 c4t5000C5006A597B93d0 > 2.0 199.0 12.0 20908.0 0.0 0.1 0.0 0.6 0 12 c4t5000C500653DE049d0 > 2.0 197.0 8.0 20788.0 0.0 0.2 0.0 0.8 0 15 c4t5000C5003607D87Bd0 > 0.0 202.0 0.0 20908.0 0.0 0.1 0.0 0.6 0 11 c4t5000C5006A5903A2d0 > 0.0 189.0 0.0 19684.0 0.0 0.1 0.0 0.5 0 10 c4t5000C500653DEE58d0 > 5.0 957.0 16.5 1966.5 0.0 0.1 0.0 0.1 0 7 c4t50026B723A07AC78d0 > 0.0 201.0 0.0 20787.9 0.0 0.1 0.0 0.7 0 14 c4t5000C5003604ED37d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t5000C500653E447Ad0 > 0.0 3525.0 0.0 110107.7 0.0 0.5 0.0 0.2 0 51 c4t500253887000690Dd0 > 0.0 3526.0 0.0 110107.7 0.0 0.5 0.0 0.1 1 50 c4t5002538870006917d0 > 10.0 6046.0 40.5 344941.5 837.4 1.9 138.3 0.3 23 67 ppool > > > For those following the VAAI thread, this is the system I will be using as my testbed. > > Here is the structure of ppool (taken at a different time than above): > > root at sanbox:/root# zpool iostat -v ppool > capacity operations bandwidth > pool alloc free read write read write > ------------------------- ----- ----- ----- ----- ----- ----- > ppool 191G 7.97T 23 637 140K 15.0M > mirror 63.5G 2.66T 7 133 46.3K 840K > c4t5000C5006A597B93d0 - - 1 13 24.3K 844K > c4t5000C500653DEE58d0 - - 1 13 24.1K 844K > mirror 63.6G 2.66T 7 133 46.5K 839K > c4t5000C5006A5903A2d0 - - 1 13 24.0K 844K > c4t5000C500653DE049d0 - - 1 13 24.6K 844K > mirror 63.5G 2.66T 7 133 46.8K 839K > c4t5000C5003607D87Bd0 - - 1 13 24.5K 843K > c4t5000C5003604ED37d0 - - 1 13 24.4K 843K > logs - - - - - - > mirror 301M 222G 0 236 0 12.5M > c4t5002538870006917d0 - - 0 236 5 12.5M > c4t500253887000690Dd0 - - 0 236 5 12.5M > cache - - - - - - > c4t50026B723A07AC78d0 62.3G 11.4G 19 113 83.0K 1.07M > ------------------------- ----- ----- ----- ----- ----- ----- > > root at sanbox:/root# zfs get all ppool > NAME PROPERTY VALUE SOURCE > ppool type filesystem - > ppool creation Sat Jan 24 18:37 2015 - > ppool used 5.16T - > ppool available 2.74T - > ppool referenced 96K - > ppool compressratio 1.51x - > ppool mounted yes - > ppool quota none default > ppool reservation none default > ppool recordsize 128K default > ppool mountpoint /ppool default > ppool sharenfs off default > ppool checksum on default > ppool compression lz4 local > ppool atime on default > ppool devices on default > ppool exec on default > ppool setuid on default > ppool readonly off default > ppool zoned off default > ppool snapdir hidden default > ppool aclmode discard default > ppool aclinherit restricted default > ppool canmount on default > ppool xattr on default > ppool copies 1 default > ppool version 5 - > ppool utf8only off - > ppool normalization none - > ppool casesensitivity sensitive - > ppool vscan off default > ppool nbmand off default > ppool sharesmb off default > ppool refquota none default > ppool refreservation none default > ppool primarycache all default > ppool secondarycache all default > ppool usedbysnapshots 0 - > ppool usedbydataset 96K - > ppool usedbychildren 5.16T - > ppool usedbyrefreservation 0 - > ppool logbias latency default > ppool dedup off default > ppool mlslabel none default > ppool sync standard local > ppool refcompressratio 1.00x - > ppool written 96K - > ppool logicalused 445G - > ppool logicalreferenced 9.50K - > ppool filesystem_limit none default > ppool snapshot_limit none default > ppool filesystem_count none default > ppool snapshot_count none default > ppool redundant_metadata all default > > Currently, ppool contains a single 5TB zvol that I am hosting as an iSCSI block device. At the zdev level, I have ensured that the ashift is 12 for all devices, all physical devices are 4k-native SATA, and the cache/log SSDs are also set for 4k. The block sizes are manually set in sd.conf, and confirmed with "echo ::sd_state | mdb -k | egrep '(^un|_blocksize)'". The zvol blocksize is 4k, and the iSCSI block transfer size is 512B (not that it matters). > > All drives contain a single Solaris2 partition with an EFI label, and are properly aligned: > format> verify > > Volume name = < > > ascii name = > bytes/sector = 512 > sectors = 5860533167 > accessible sectors = 5860533134 > Part Tag Flag First Sector Size Last Sector > 0 usr wm 256 2.73TB 5860516750 > 1 unassigned wm 0 0 0 > 2 unassigned wm 0 0 0 > 3 unassigned wm 0 0 0 > 4 unassigned wm 0 0 0 > 5 unassigned wm 0 0 0 > 6 unassigned wm 0 0 0 > 8 reserved wm 5860516751 8.00MB 5860533134 > > I scrubbed the pool last night, which completed without error. From "zdb ppool", I have extracted (with minor formatting): > > capacity operations bandwidth ---- errors ---- > description used avail read write read write read write cksum > ppool 339G 7.82T 26.6K 0 175M 0 0 0 5 > mirror 113G 2.61T 8.87K 0 58.5M 0 0 0 2 > /dev/dsk/c4t5000C5006A597B93d0s0 3.15K 0 48.8M 0 0 0 2 > /dev/dsk/c4t5000C500653DEE58d0s0 3.10K 0 49.0M 0 0 0 2 > > mirror 113G 2.61T 8.86K 0 58.5M 0 0 0 8 > /dev/dsk/c4t5000C5006A5903A2d0s0 3.12K 0 48.7M 0 0 0 8 > /dev/dsk/c4t5000C500653DE049d0s0 3.08K 0 48.9M 0 0 0 8 > > mirror 113G 2.61T 8.86K 0 58.5M 0 0 0 10 > /dev/dsk/c4t5000C5003607D87Bd0s0 2.48K 0 48.8M 0 0 0 10 > /dev/dsk/c4t5000C5003604ED37d0s0 2.47K 0 48.9M 0 0 0 10 > > log mirror 44.0K 222G 0 0 37 0 0 0 0 > /dev/dsk/c4t5002538870006917d0s0 0 0 290 0 0 0 0 > /dev/dsk/c4t500253887000690Dd0s0 0 0 290 0 0 0 0 > Cache > /dev/dsk/c4t50026B723A07AC78d0s0 > 0 73.8G 0 0 35 0 0 0 0 > Spare > /dev/dsk/c4t5000C500653E447Ad0s0 4 0 136K 0 0 0 0 > > This shows a few checksum errors, which is not consistent with the output of "zfs status -v", and "iostat -eE" shows no physical error count. I again see the discrepancy between the "ppool" value and what I would expect, which would be a sum of the cksum errors for each vdev. > > I also observed a ton of leaked space, which I expect from a live pool, as well as a single: > db_blkptr_cb: Got error 50 reading <96, 1, 2, 3fc8> DVA[0]=<1:1dc4962000:1000> DVA[1]=<2:1dc4654000:1000> [L2 zvol object] fletcher4 lz4 LE contiguous unique double size=4000L/a00P birth=52386L/52386P fill=4825 cksum=c70e8a7765:f2a > dce34f59c:c8a289b51fe11d:7e0af40fe154aab4 -- skipping > > > By the way, I also found: > > Uberblock: > magic = 0000000000bab10c > > Wow. Just wow. > > > -Warren V > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From wverb73 at gmail.com Tue Jan 27 06:27:53 2015 From: wverb73 at gmail.com (W Verb) Date: Mon, 26 Jan 2015 22:27:53 -0800 Subject: [OmniOS-discuss] Mildly confusing ZFS iostat output In-Reply-To: <3DA1E148-656B-4373-A4DD-A5BEB23E3A7B@richardelling.com> References: <3DA1E148-656B-4373-A4DD-A5BEB23E3A7B@richardelling.com> Message-ID: Thank you Richard. I also found a quite detailed writeup with kstat examples here: http://sunsite.uakom.sk/sunworldonline/swol-09-1997/swol-09-perf.html It's a little old, but I think it gets to the heart of the matter. -Warren V On Mon, Jan 26, 2015 at 8:14 PM, Richard Elling < richard.elling at richardelling.com> wrote: > > On Jan 26, 2015, at 5:16 PM, W Verb wrote: > > Hello All, > > I am mildly confused by something iostat does when displaying statistics > for a zpool. Before I begin rooting through the iostat source, does > anyone have an idea of why I am seeing high "wait" and "wsvc_t" values > for "ppool" when my devices apparently are not busy? I would have assumed > that the stats for the pool would be the sum of the stats for the zdevs.... > > > welcome to queuing theory! ;-) > > First, iostat knows nothing about the devices being measured. It is really > just a processor > for kstats of type KSTAT_TYPE_IO (see the kstat(3kstat) man page for > discussion) For that > type, you get a 2-queue set. For many cases, 2-queues is a fine model, but > when there is > only one interesting queue, sometimes developers choose to put less > interesting info in the > "wait" queue. > > Second, it is the responsibility of the developer to define the queues. In > the case of pools, > the queues are defined as: > wait = vdev_queue_io_add() until vdev_queue_io_remove() > run = vdev_queue_pending_add() until vdev_queue_pending_remove() > > The run queue is closer to the actual measured I/O to the vdev (the juicy > performance bits) > The wait queue is closer to the transaction engine and includes time for > aggregation. > Thus we expect the wait queue to be higher, especially for async > workloads. But since I/Os > can and do get aggregated prior to being sent to the vdev, it is not a > very useful measure of > overall performance. In other words, optimizing this away could actually > hurt performance. > > In general, worry about the run queues and don't worry so much about the > wait queues. > NB, iostat calls "run" queues "active" queues. You say Tomato, I say > 'mater. > -- richard > > > > extended device statistics > r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 10.0 9183.0 40.5 344942.0 0.0 1.8 0.0 0.2 0 178 c4 > 1.0 187.0 4.0 19684.0 0.0 0.1 0.0 0.5 0 8 > c4t5000C5006A597B93d0 > 2.0 199.0 12.0 20908.0 0.0 0.1 0.0 0.6 0 12 > c4t5000C500653DE049d0 > 2.0 197.0 8.0 20788.0 0.0 0.2 0.0 0.8 0 15 > c4t5000C5003607D87Bd0 > 0.0 202.0 0.0 20908.0 0.0 0.1 0.0 0.6 0 11 > c4t5000C5006A5903A2d0 > 0.0 189.0 0.0 19684.0 0.0 0.1 0.0 0.5 0 10 > c4t5000C500653DEE58d0 > 5.0 957.0 16.5 1966.5 0.0 0.1 0.0 0.1 0 7 > c4t50026B723A07AC78d0 > 0.0 201.0 0.0 20787.9 0.0 0.1 0.0 0.7 0 14 > c4t5000C5003604ED37d0 > 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 > c4t5000C500653E447Ad0 > 0.0 3525.0 0.0 110107.7 0.0 0.5 0.0 0.2 0 51 > c4t500253887000690Dd0 > 0.0 3526.0 0.0 110107.7 0.0 0.5 0.0 0.1 1 50 > c4t5002538870006917d0 > 10.0 6046.0 40.5 344941.5 837.4 1.9 138.3 0.3 23 67 ppool > > > For those following the VAAI thread, this is the system I will be using as > my testbed. > > Here is the structure of ppool (taken at a different time than above): > > root at sanbox:/root# zpool iostat -v ppool > capacity operations bandwidth > pool alloc free read write read write > ------------------------- ----- ----- ----- ----- ----- ----- > ppool 191G 7.97T 23 637 140K 15.0M > mirror 63.5G 2.66T 7 133 46.3K 840K > c4t5000C5006A597B93d0 - - 1 13 24.3K 844K > c4t5000C500653DEE58d0 - - 1 13 24.1K 844K > mirror 63.6G 2.66T 7 133 46.5K 839K > c4t5000C5006A5903A2d0 - - 1 13 24.0K 844K > c4t5000C500653DE049d0 - - 1 13 24.6K 844K > mirror 63.5G 2.66T 7 133 46.8K 839K > c4t5000C5003607D87Bd0 - - 1 13 24.5K 843K > c4t5000C5003604ED37d0 - - 1 13 24.4K 843K > logs - - - - - - > mirror 301M 222G 0 236 0 12.5M > c4t5002538870006917d0 - - 0 236 5 12.5M > c4t500253887000690Dd0 - - 0 236 5 12.5M > cache - - - - - - > c4t50026B723A07AC78d0 62.3G 11.4G 19 113 83.0K 1.07M > ------------------------- ----- ----- ----- ----- ----- ----- > > root at sanbox:/root# zfs get all ppool > NAME PROPERTY VALUE SOURCE > ppool type filesystem - > ppool creation Sat Jan 24 18:37 2015 - > ppool used 5.16T - > ppool available 2.74T - > ppool referenced 96K - > ppool compressratio 1.51x - > ppool mounted yes - > ppool quota none default > ppool reservation none default > ppool recordsize 128K default > ppool mountpoint /ppool default > ppool sharenfs off default > ppool checksum on default > ppool compression lz4 local > ppool atime on default > ppool devices on default > ppool exec on default > ppool setuid on default > ppool readonly off default > ppool zoned off default > ppool snapdir hidden default > ppool aclmode discard default > ppool aclinherit restricted default > ppool canmount on default > ppool xattr on default > ppool copies 1 default > ppool version 5 - > ppool utf8only off - > ppool normalization none - > ppool casesensitivity sensitive - > ppool vscan off default > ppool nbmand off default > ppool sharesmb off default > ppool refquota none default > ppool refreservation none default > ppool primarycache all default > ppool secondarycache all default > ppool usedbysnapshots 0 - > ppool usedbydataset 96K - > ppool usedbychildren 5.16T - > ppool usedbyrefreservation 0 - > ppool logbias latency default > ppool dedup off default > ppool mlslabel none default > ppool sync standard local > ppool refcompressratio 1.00x - > ppool written 96K - > ppool logicalused 445G - > ppool logicalreferenced 9.50K - > ppool filesystem_limit none default > ppool snapshot_limit none default > ppool filesystem_count none default > ppool snapshot_count none default > ppool redundant_metadata all default > > Currently, ppool contains a single 5TB zvol that I am hosting as an iSCSI > block device. At the zdev level, I have ensured that the ashift is 12 for > all devices, all physical devices are 4k-native SATA, and the cache/log > SSDs are also set for 4k. The block sizes are manually set in sd.conf, > and confirmed with "echo ::sd_state | mdb -k | egrep '(^un|_blocksize)'". > The zvol blocksize is 4k, and the iSCSI block transfer size is 512B (not > that it matters). > > All drives contain a single Solaris2 partition with an EFI label, and are > properly aligned: > format> verify > > Volume name = < > > ascii name = > bytes/sector = 512 > sectors = 5860533167 > accessible sectors = 5860533134 > Part Tag Flag First Sector Size Last Sector > 0 usr wm 256 2.73TB > 5860516750 > 1 unassigned wm 0 0 0 > 2 unassigned wm 0 0 0 > 3 unassigned wm 0 0 0 > 4 unassigned wm 0 0 0 > 5 unassigned wm 0 0 0 > 6 unassigned wm 0 0 0 > 8 reserved wm 5860516751 8.00MB 5860533134 > > I scrubbed the pool last night, which completed without error. From "zdb > ppool", I have extracted (with minor formatting): > > capacity operations bandwidth ---- errors > ---- > description used avail read write read write read write > cksum > ppool 339G 7.82T 26.6K 0 175M 0 0 > 0 5 > mirror 113G 2.61T 8.87K 0 58.5M 0 0 > 0 2 > /dev/dsk/c4t5000C5006A597B93d0s0 3.15K 0 48.8M 0 0 > 0 2 > /dev/dsk/c4t5000C500653DEE58d0s0 3.10K 0 49.0M 0 0 > 0 2 > > mirror 113G 2.61T 8.86K 0 58.5M 0 0 > 0 8 > /dev/dsk/c4t5000C5006A5903A2d0s0 3.12K 0 48.7M 0 0 > 0 8 > /dev/dsk/c4t5000C500653DE049d0s0 3.08K 0 48.9M 0 0 > 0 8 > > mirror 113G 2.61T 8.86K 0 58.5M 0 0 > 0 10 > /dev/dsk/c4t5000C5003607D87Bd0s0 2.48K 0 48.8M 0 0 > 0 10 > /dev/dsk/c4t5000C5003604ED37d0s0 2.47K 0 48.9M 0 0 > 0 10 > > log mirror 44.0K 222G 0 0 37 0 0 > 0 0 > /dev/dsk/c4t5002538870006917d0s0 0 0 290 0 0 > 0 0 > /dev/dsk/c4t500253887000690Dd0s0 0 0 290 0 0 > 0 0 > Cache > /dev/dsk/c4t50026B723A07AC78d0s0 > 0 73.8G 0 0 35 0 0 > 0 0 > Spare > /dev/dsk/c4t5000C500653E447Ad0s0 4 0 136K 0 0 > 0 0 > > This shows a few checksum errors, which is not consistent with the output > of "zfs status -v", and "iostat -eE" shows no physical error count. I > again see the discrepancy between the "ppool" value and what I would > expect, which would be a sum of the cksum errors for each vdev. > > I also observed a ton of leaked space, which I expect from a live pool, as > well as a single: > db_blkptr_cb: Got error 50 reading <96, 1, 2, 3fc8> > DVA[0]=<1:1dc4962000:1000> DVA[1]=<2:1dc4654000:1000> [L2 zvol object] > fletcher4 lz4 LE contiguous unique double size=4000L/a00P > birth=52386L/52386P fill=4825 > cksum=c70e8a7765:f2a > dce34f59c:c8a289b51fe11d:7e0af40fe154aab4 -- skipping > > > By the way, I also found: > > Uberblock: > magic = 000000000*0bab10c* > > Wow. Just wow. > > > -Warren V > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.sproul at circonus.com Tue Jan 27 20:54:20 2015 From: eric.sproul at circonus.com (Eric Sproul) Date: Tue, 27 Jan 2015 15:54:20 -0500 Subject: [OmniOS-discuss] Marvel based 10 GB network card In-Reply-To: References: Message-ID: On Mon, Jan 5, 2015 at 11:58 AM, F?bio Rabelo wrote: > Hi to all > > Someone knows if this new 10 GB cards : > > http://www.startech.com/Networking-IO/Adapter-Cards/10gb-pcie-nic~ST10000SPEX > > are, or will be supported in OmniOS ? > > They are incredibly afordable .... > > Works with linux, with very good performance ! And that's about all it works with, apparently. All I could find was http://www.tehutinetworks.net/?t=LV&L1=5&L7=100 The adapter is based on the TN4010, according to the specs above. I don't see any Tehuti parts listed at http://illumos.org/hcl/ nor anything with their PCI vendor ID: https://pci-ids.ucw.cz/read/PC/1fc9 Sounds like they only care about Windows and Linux. Eric From eric.sproul at circonus.com Wed Jan 28 15:44:12 2015 From: eric.sproul at circonus.com (Eric Sproul) Date: Wed, 28 Jan 2015 10:44:12 -0500 Subject: [OmniOS-discuss] PERL Modules in PKG In-Reply-To: <21684.17081.907538.853087@glaurung.bb-c.de> References: <8A815BEF-D1A8-4257-8158-B923212216B6@vantagetitle.com> <21684.17081.907538.853087@glaurung.bb-c.de> Message-ID: On Mon, Jan 12, 2015 at 4:55 PM, Volker A. Brandt wrote: > Alex McWhirter writes: > [...] >> Or would it be best >> to compile these perl modules into IP packages and set them as >> dependencies? > > Yes. If you do this more often you might want to script this. Personally I prefer native packages for everything, but those well-versed in CPAN can probably safely script things. Sometimes older module versions get archived and so disappear from CPAN mirrors, and you have to be smart and find them on the BackPAN... If you're thinking of going the package route, have a look at https://github.com/omniti-labs/omnios-build-perl which is an adaptation of the OmniOS build system that packages CPAN modules. The commitment stance for these is the same as for ms.omniti.com-- the specific modules and versions exist to support OmniTI's internal operations, but are open to the community to use if desired. Additions and changes may or may not be accepted. Eric From jmlittle at gmail.com Wed Jan 28 16:49:42 2015 From: jmlittle at gmail.com (Joe Little) Date: Wed, 28 Jan 2015 08:49:42 -0800 Subject: [OmniOS-discuss] NFS v3 locking broken in latest OmniOS r151012 and updates Message-ID: I recently switched one file server from Nexenta 4 Community (still uses closed NLM I believe) to OmniOS r151012. Immediately, users started to complain from various Linux clients that locking was failing. Most of those clients explicitly set their NFS version to 3. I finally isolated that the locking does not fail on NFS v4 and have worked on transition where possible. But presently, no NFS v3 client and successfully lock against OmniOS NFS v3 locking service. I've confirmed that the locking service is running and is present using rpcinfo, matching one for one in services from previous OpenSolaris and Illumos variants. One example from a user: $ strace /bin/tcsh [...] open("/home/REDACTED/.history", O_RDWR|O_CREAT, 0600) = 0 dup(0) = 1 dup(1) = 2 dup(2) = 3 dup(3) = 4 dup(4) = 5 dup(5) = 6 close(5) = 0 close(4) = 0 close(3) = 0 close(2) = 0 close(1) = 0 close(0) = 0 fcntl(6, F_SETFD, FD_CLOEXEC) = 0 fcntl(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) HERE fcntl hangs for 1-2 min and finally returns with "-1 ENOLCK (No locks available)" -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Wed Jan 28 17:02:07 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 28 Jan 2015 12:02:07 -0500 Subject: [OmniOS-discuss] NFS v3 locking broken in latest OmniOS r151012 and updates In-Reply-To: References: Message-ID: <008AFBB4-F95B-4965-9DD9-763AB8231A6F@omniti.com> You should bring this up with the illumos developers list. (And I'm surprised Nexenta hasn't shipped the new lockmgr yet since they were the ones who ported it.) ANYWAY, please bring this up on the illumos developer's list. There may already be a bug open. Dan From youzhong at gmail.com Wed Jan 28 17:23:28 2015 From: youzhong at gmail.com (Youzhong Yang) Date: Wed, 28 Jan 2015 12:23:28 -0500 Subject: [OmniOS-discuss] NFS v3 locking broken in latest OmniOS r151012 and updates In-Reply-To: References: Message-ID: max threads of nlockmgr is set to 20 I think. Bump up this value then you can get rid of 'no locks available' error. To confirm the current value: echo ::svc_pool nlm | mdb -k | grep 'Max threads' On Wed, Jan 28, 2015 at 11:49 AM, Joe Little wrote: > I recently switched one file server from Nexenta 4 Community (still uses > closed NLM I believe) to OmniOS r151012. > > Immediately, users started to complain from various Linux clients that > locking was failing. Most of those clients explicitly set their NFS version > to 3. I finally isolated that the locking does not fail on NFS v4 and have > worked on transition where possible. But presently, no NFS v3 client and > successfully lock against OmniOS NFS v3 locking service. I've confirmed > that the locking service is running and is present using rpcinfo, matching > one for one in services from previous OpenSolaris and Illumos variants. One > example from a user: > > $ strace /bin/tcsh > > [...] > > open("/home/REDACTED/.history", O_RDWR|O_CREAT, 0600) = 0 > > dup(0) = 1 > > dup(1) = 2 > > dup(2) = 3 > > dup(3) = 4 > > dup(4) = 5 > > dup(5) = 6 > > close(5) = 0 > > close(4) = 0 > > close(3) = 0 > > close(2) = 0 > > close(1) = 0 > > close(0) = 0 > > fcntl(6, F_SETFD, FD_CLOEXEC) = 0 > > fcntl(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) > > > HERE fcntl hangs for 1-2 min and finally returns with "-1 ENOLCK (No > > locks available)" > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmlittle at gmail.com Wed Jan 28 18:02:19 2015 From: jmlittle at gmail.com (Joe Little) Date: Wed, 28 Jan 2015 10:02:19 -0800 Subject: [OmniOS-discuss] NFS v3 locking broken in latest OmniOS r151012 and updates In-Reply-To: References: Message-ID: Just to answer this question, I had already bumped that up based on some suggestions on the net: root at miele:/root# echo ::svc_pool nlm | mdb -k | grep 'Max threads' mdb: failed to add kvm_pte_chain walker: walk name already in use mdb: failed to add kvm_rmap_desc walker: walk name already in use mdb: failed to add kvm_mmu_page_header walker: walk name already in use mdb: failed to add kvm_pte_chain walker: walk name already in use mdb: failed to add kvm_rmap_desc walker: walk name already in use mdb: failed to add kvm_mmu_page_header walker: walk name already in use Max threads = 80 Still no locking w/ v3. On Wed, Jan 28, 2015 at 9:23 AM, Youzhong Yang wrote: > max threads of nlockmgr is set to 20 I think. Bump up this value then you > can get rid of 'no locks available' error. > > To confirm the current value: > > echo ::svc_pool nlm | mdb -k | grep 'Max threads' > > On Wed, Jan 28, 2015 at 11:49 AM, Joe Little wrote: > >> I recently switched one file server from Nexenta 4 Community (still uses >> closed NLM I believe) to OmniOS r151012. >> >> Immediately, users started to complain from various Linux clients that >> locking was failing. Most of those clients explicitly set their NFS version >> to 3. I finally isolated that the locking does not fail on NFS v4 and have >> worked on transition where possible. But presently, no NFS v3 client and >> successfully lock against OmniOS NFS v3 locking service. I've confirmed >> that the locking service is running and is present using rpcinfo, matching >> one for one in services from previous OpenSolaris and Illumos variants. One >> example from a user: >> >> $ strace /bin/tcsh >> >> [...] >> >> open("/home/REDACTED/.history", O_RDWR|O_CREAT, 0600) = 0 >> >> dup(0) = 1 >> >> dup(1) = 2 >> >> dup(2) = 3 >> >> dup(3) = 4 >> >> dup(4) = 5 >> >> dup(5) = 6 >> >> close(5) = 0 >> >> close(4) = 0 >> >> close(3) = 0 >> >> close(2) = 0 >> >> close(1) = 0 >> >> close(0) = 0 >> >> fcntl(6, F_SETFD, FD_CLOEXEC) = 0 >> >> fcntl(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) >> >> >> HERE fcntl hangs for 1-2 min and finally returns with "-1 ENOLCK (No >> >> locks available)" >> >> _______________________________________________ >> OmniOS-discuss mailing list >> OmniOS-discuss at lists.omniti.com >> http://lists.omniti.com/mailman/listinfo/omnios-discuss >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From youzhong at gmail.com Wed Jan 28 18:13:54 2015 From: youzhong at gmail.com (Youzhong Yang) Date: Wed, 28 Jan 2015 13:13:54 -0500 Subject: [OmniOS-discuss] NFS v3 locking broken in latest OmniOS r151012 and updates In-Reply-To: References: Message-ID: Depending on how many active locks your system needs to handle, 80 might be a small value. We use a different distro of illumos-gate and we set max threads to 1024, so far so good we are happy with the open source nlockmgr except the nlockmgr startup issue when machine reboots. On Wed, Jan 28, 2015 at 1:02 PM, Joe Little wrote: > Just to answer this question, I had already bumped that up based on some > suggestions on the net: > > root at miele:/root# echo ::svc_pool nlm | mdb -k | grep 'Max threads' > > mdb: failed to add kvm_pte_chain walker: walk name already in use > > mdb: failed to add kvm_rmap_desc walker: walk name already in use > > mdb: failed to add kvm_mmu_page_header walker: walk name already in use > > mdb: failed to add kvm_pte_chain walker: walk name already in use > > mdb: failed to add kvm_rmap_desc walker: walk name already in use > > mdb: failed to add kvm_mmu_page_header walker: walk name already in use > > Max threads = 80 > > Still no locking w/ v3. > > On Wed, Jan 28, 2015 at 9:23 AM, Youzhong Yang wrote: > >> max threads of nlockmgr is set to 20 I think. Bump up this value then you >> can get rid of 'no locks available' error. >> >> To confirm the current value: >> >> echo ::svc_pool nlm | mdb -k | grep 'Max threads' >> >> On Wed, Jan 28, 2015 at 11:49 AM, Joe Little wrote: >> >>> I recently switched one file server from Nexenta 4 Community (still uses >>> closed NLM I believe) to OmniOS r151012. >>> >>> Immediately, users started to complain from various Linux clients that >>> locking was failing. Most of those clients explicitly set their NFS version >>> to 3. I finally isolated that the locking does not fail on NFS v4 and have >>> worked on transition where possible. But presently, no NFS v3 client and >>> successfully lock against OmniOS NFS v3 locking service. I've confirmed >>> that the locking service is running and is present using rpcinfo, matching >>> one for one in services from previous OpenSolaris and Illumos variants. One >>> example from a user: >>> >>> $ strace /bin/tcsh >>> >>> [...] >>> >>> open("/home/REDACTED/.history", O_RDWR|O_CREAT, 0600) = 0 >>> >>> dup(0) = 1 >>> >>> dup(1) = 2 >>> >>> dup(2) = 3 >>> >>> dup(3) = 4 >>> >>> dup(4) = 5 >>> >>> dup(5) = 6 >>> >>> close(5) = 0 >>> >>> close(4) = 0 >>> >>> close(3) = 0 >>> >>> close(2) = 0 >>> >>> close(1) = 0 >>> >>> close(0) = 0 >>> >>> fcntl(6, F_SETFD, FD_CLOEXEC) = 0 >>> >>> fcntl(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) >>> >>> >>> HERE fcntl hangs for 1-2 min and finally returns with "-1 ENOLCK (No >>> >>> locks available)" >>> >>> _______________________________________________ >>> OmniOS-discuss mailing list >>> OmniOS-discuss at lists.omniti.com >>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmlittle at gmail.com Wed Jan 28 18:23:00 2015 From: jmlittle at gmail.com (Joe Little) Date: Wed, 28 Jan 2015 10:23:00 -0800 Subject: [OmniOS-discuss] NFS v3 locking broken in latest OmniOS r151012 and updates In-Reply-To: References: Message-ID: I just set it to 1024 and still locking times out. On Wed, Jan 28, 2015 at 10:13 AM, Youzhong Yang wrote: > Depending on how many active locks your system needs to handle, 80 might > be a small value. > > We use a different distro of illumos-gate and we set max threads to 1024, > so far so good we are happy with the open source nlockmgr except the > nlockmgr startup issue when machine reboots. > > > > On Wed, Jan 28, 2015 at 1:02 PM, Joe Little wrote: > >> Just to answer this question, I had already bumped that up based on some >> suggestions on the net: >> >> root at miele:/root# echo ::svc_pool nlm | mdb -k | grep 'Max threads' >> >> mdb: failed to add kvm_pte_chain walker: walk name already in use >> >> mdb: failed to add kvm_rmap_desc walker: walk name already in use >> >> mdb: failed to add kvm_mmu_page_header walker: walk name already in use >> >> mdb: failed to add kvm_pte_chain walker: walk name already in use >> >> mdb: failed to add kvm_rmap_desc walker: walk name already in use >> >> mdb: failed to add kvm_mmu_page_header walker: walk name already in use >> >> Max threads = 80 >> >> Still no locking w/ v3. >> >> On Wed, Jan 28, 2015 at 9:23 AM, Youzhong Yang >> wrote: >> >>> max threads of nlockmgr is set to 20 I think. Bump up this value then >>> you can get rid of 'no locks available' error. >>> >>> To confirm the current value: >>> >>> echo ::svc_pool nlm | mdb -k | grep 'Max threads' >>> >>> On Wed, Jan 28, 2015 at 11:49 AM, Joe Little wrote: >>> >>>> I recently switched one file server from Nexenta 4 Community (still >>>> uses closed NLM I believe) to OmniOS r151012. >>>> >>>> Immediately, users started to complain from various Linux clients that >>>> locking was failing. Most of those clients explicitly set their NFS version >>>> to 3. I finally isolated that the locking does not fail on NFS v4 and have >>>> worked on transition where possible. But presently, no NFS v3 client and >>>> successfully lock against OmniOS NFS v3 locking service. I've confirmed >>>> that the locking service is running and is present using rpcinfo, matching >>>> one for one in services from previous OpenSolaris and Illumos variants. One >>>> example from a user: >>>> >>>> $ strace /bin/tcsh >>>> >>>> [...] >>>> >>>> open("/home/REDACTED/.history", O_RDWR|O_CREAT, 0600) = 0 >>>> >>>> dup(0) = 1 >>>> >>>> dup(1) = 2 >>>> >>>> dup(2) = 3 >>>> >>>> dup(3) = 4 >>>> >>>> dup(4) = 5 >>>> >>>> dup(5) = 6 >>>> >>>> close(5) = 0 >>>> >>>> close(4) = 0 >>>> >>>> close(3) = 0 >>>> >>>> close(2) = 0 >>>> >>>> close(1) = 0 >>>> >>>> close(0) = 0 >>>> >>>> fcntl(6, F_SETFD, FD_CLOEXEC) = 0 >>>> >>>> fcntl(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) >>>> >>>> >>>> HERE fcntl hangs for 1-2 min and finally returns with "-1 ENOLCK (No >>>> >>>> locks available)" >>>> >>>> _______________________________________________ >>>> OmniOS-discuss mailing list >>>> OmniOS-discuss at lists.omniti.com >>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From youzhong at gmail.com Wed Jan 28 18:36:12 2015 From: youzhong at gmail.com (Youzhong Yang) Date: Wed, 28 Jan 2015 13:36:12 -0500 Subject: [OmniOS-discuss] NFS v3 locking broken in latest OmniOS r151012 and updates In-Reply-To: References: Message-ID: I would suggest capturing packets, find out if the 'no locks available' is returned from the server. If it is, do dtrace on the server, find out where it returns nlm4_denied_nolocks . http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/klm/nlm_service.c#460 Again, as Dan suggested, it would be better to post on illumos-dev list. On Wed, Jan 28, 2015 at 1:23 PM, Joe Little wrote: > I just set it to 1024 and still locking times out. > > On Wed, Jan 28, 2015 at 10:13 AM, Youzhong Yang > wrote: > >> Depending on how many active locks your system needs to handle, 80 might >> be a small value. >> >> We use a different distro of illumos-gate and we set max threads to 1024, >> so far so good we are happy with the open source nlockmgr except the >> nlockmgr startup issue when machine reboots. >> >> >> >> On Wed, Jan 28, 2015 at 1:02 PM, Joe Little wrote: >> >>> Just to answer this question, I had already bumped that up based on some >>> suggestions on the net: >>> >>> root at miele:/root# echo ::svc_pool nlm | mdb -k | grep 'Max threads' >>> >>> mdb: failed to add kvm_pte_chain walker: walk name already in use >>> >>> mdb: failed to add kvm_rmap_desc walker: walk name already in use >>> >>> mdb: failed to add kvm_mmu_page_header walker: walk name already in use >>> >>> mdb: failed to add kvm_pte_chain walker: walk name already in use >>> >>> mdb: failed to add kvm_rmap_desc walker: walk name already in use >>> >>> mdb: failed to add kvm_mmu_page_header walker: walk name already in use >>> >>> Max threads = 80 >>> >>> Still no locking w/ v3. >>> >>> On Wed, Jan 28, 2015 at 9:23 AM, Youzhong Yang >>> wrote: >>> >>>> max threads of nlockmgr is set to 20 I think. Bump up this value then >>>> you can get rid of 'no locks available' error. >>>> >>>> To confirm the current value: >>>> >>>> echo ::svc_pool nlm | mdb -k | grep 'Max threads' >>>> >>>> On Wed, Jan 28, 2015 at 11:49 AM, Joe Little >>>> wrote: >>>> >>>>> I recently switched one file server from Nexenta 4 Community (still >>>>> uses closed NLM I believe) to OmniOS r151012. >>>>> >>>>> Immediately, users started to complain from various Linux clients that >>>>> locking was failing. Most of those clients explicitly set their NFS version >>>>> to 3. I finally isolated that the locking does not fail on NFS v4 and have >>>>> worked on transition where possible. But presently, no NFS v3 client and >>>>> successfully lock against OmniOS NFS v3 locking service. I've confirmed >>>>> that the locking service is running and is present using rpcinfo, matching >>>>> one for one in services from previous OpenSolaris and Illumos variants. One >>>>> example from a user: >>>>> >>>>> $ strace /bin/tcsh >>>>> >>>>> [...] >>>>> >>>>> open("/home/REDACTED/.history", O_RDWR|O_CREAT, 0600) = 0 >>>>> >>>>> dup(0) = 1 >>>>> >>>>> dup(1) = 2 >>>>> >>>>> dup(2) = 3 >>>>> >>>>> dup(3) = 4 >>>>> >>>>> dup(4) = 5 >>>>> >>>>> dup(5) = 6 >>>>> >>>>> close(5) = 0 >>>>> >>>>> close(4) = 0 >>>>> >>>>> close(3) = 0 >>>>> >>>>> close(2) = 0 >>>>> >>>>> close(1) = 0 >>>>> >>>>> close(0) = 0 >>>>> >>>>> fcntl(6, F_SETFD, FD_CLOEXEC) = 0 >>>>> >>>>> fcntl(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) >>>>> >>>>> >>>>> HERE fcntl hangs for 1-2 min and finally returns with "-1 ENOLCK (No >>>>> >>>>> locks available)" >>>>> >>>>> _______________________________________________ >>>>> OmniOS-discuss mailing list >>>>> OmniOS-discuss at lists.omniti.com >>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jmlittle at gmail.com Wed Jan 28 18:52:42 2015 From: jmlittle at gmail.com (Joe Little) Date: Wed, 28 Jan 2015 10:52:42 -0800 Subject: [OmniOS-discuss] NFS v3 locking broken in latest OmniOS r151012 and updates In-Reply-To: References: Message-ID: Already forwarded to illumos-discuss, and they already have the snoop, and the denied lock segment On Wed, Jan 28, 2015 at 10:36 AM, Youzhong Yang wrote: > I would suggest capturing packets, find out if the 'no locks available' is > returned from the server. If it is, do dtrace on the server, find out where > it returns nlm4_denied_nolocks > > . > > > http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/klm/nlm_service.c#460 > > Again, as Dan suggested, it would be better to post on illumos-dev list. > > On Wed, Jan 28, 2015 at 1:23 PM, Joe Little wrote: > >> I just set it to 1024 and still locking times out. >> >> On Wed, Jan 28, 2015 at 10:13 AM, Youzhong Yang >> wrote: >> >>> Depending on how many active locks your system needs to handle, 80 might >>> be a small value. >>> >>> We use a different distro of illumos-gate and we set max threads to >>> 1024, so far so good we are happy with the open source nlockmgr except the >>> nlockmgr startup issue when machine reboots. >>> >>> >>> >>> On Wed, Jan 28, 2015 at 1:02 PM, Joe Little wrote: >>> >>>> Just to answer this question, I had already bumped that up based on >>>> some suggestions on the net: >>>> >>>> root at miele:/root# echo ::svc_pool nlm | mdb -k | grep 'Max threads' >>>> >>>> mdb: failed to add kvm_pte_chain walker: walk name already in use >>>> >>>> mdb: failed to add kvm_rmap_desc walker: walk name already in use >>>> >>>> mdb: failed to add kvm_mmu_page_header walker: walk name already in use >>>> >>>> mdb: failed to add kvm_pte_chain walker: walk name already in use >>>> >>>> mdb: failed to add kvm_rmap_desc walker: walk name already in use >>>> >>>> mdb: failed to add kvm_mmu_page_header walker: walk name already in use >>>> >>>> Max threads = 80 >>>> >>>> Still no locking w/ v3. >>>> >>>> On Wed, Jan 28, 2015 at 9:23 AM, Youzhong Yang >>>> wrote: >>>> >>>>> max threads of nlockmgr is set to 20 I think. Bump up this value then >>>>> you can get rid of 'no locks available' error. >>>>> >>>>> To confirm the current value: >>>>> >>>>> echo ::svc_pool nlm | mdb -k | grep 'Max threads' >>>>> >>>>> On Wed, Jan 28, 2015 at 11:49 AM, Joe Little >>>>> wrote: >>>>> >>>>>> I recently switched one file server from Nexenta 4 Community (still >>>>>> uses closed NLM I believe) to OmniOS r151012. >>>>>> >>>>>> Immediately, users started to complain from various Linux clients >>>>>> that locking was failing. Most of those clients explicitly set their NFS >>>>>> version to 3. I finally isolated that the locking does not fail on NFS v4 >>>>>> and have worked on transition where possible. But presently, no NFS v3 >>>>>> client and successfully lock against OmniOS NFS v3 locking service. I've >>>>>> confirmed that the locking service is running and is present using rpcinfo, >>>>>> matching one for one in services from previous OpenSolaris and Illumos >>>>>> variants. One example from a user: >>>>>> >>>>>> $ strace /bin/tcsh >>>>>> >>>>>> [...] >>>>>> >>>>>> open("/home/REDACTED/.history", O_RDWR|O_CREAT, 0600) = 0 >>>>>> >>>>>> dup(0) = 1 >>>>>> >>>>>> dup(1) = 2 >>>>>> >>>>>> dup(2) = 3 >>>>>> >>>>>> dup(3) = 4 >>>>>> >>>>>> dup(4) = 5 >>>>>> >>>>>> dup(5) = 6 >>>>>> >>>>>> close(5) = 0 >>>>>> >>>>>> close(4) = 0 >>>>>> >>>>>> close(3) = 0 >>>>>> >>>>>> close(2) = 0 >>>>>> >>>>>> close(1) = 0 >>>>>> >>>>>> close(0) = 0 >>>>>> >>>>>> fcntl(6, F_SETFD, FD_CLOEXEC) = 0 >>>>>> >>>>>> fcntl(6, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) >>>>>> >>>>>> >>>>>> HERE fcntl hangs for 1-2 min and finally returns with "-1 ENOLCK (No >>>>>> >>>>>> locks available)" >>>>>> >>>>>> _______________________________________________ >>>>>> OmniOS-discuss mailing list >>>>>> OmniOS-discuss at lists.omniti.com >>>>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss >>>>>> >>>>>> >>>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Wed Jan 28 18:56:40 2015 From: danmcd at omniti.com (Dan McDonald) Date: Wed, 28 Jan 2015 13:56:40 -0500 Subject: [OmniOS-discuss] NFS v3 locking broken in latest OmniOS r151012 and updates In-Reply-To: References: Message-ID: > On Jan 28, 2015, at 1:52 PM, Joe Little wrote: > > Already forwarded to illumos-discuss, and they already have the snoop, and the denied lock segment Thank you for moving it over there. In the future, such a technical problem should go to the developers' list (developer at lists.illumos.org), but Marcel's gotcha covered already on -discuss. Thank you! Dan From lists at marzocchi.net Thu Jan 29 21:13:55 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Thu, 29 Jan 2015 22:13:55 +0100 Subject: [OmniOS-discuss] Corrupted ZFS metadata? Message-ID: <5C57DEC9-2FE9-4047-BB09-8B489A425BC1@marzocchi.net> Hello, last week I tried to open some NEF (raw) photos I?m keeping in my OmniOS server, served to my Mac 10.10.x via netatalk 3.1.x and I found that they were corrupted: Lightroom gave me ?unexpected EOF? and also Apple Preview and Nikon ViewNX 2 were telling and showing me corrupted data (good only up to a certain amount of pixels and then nothing). I had the idea of checking my oldest snapshot and I directly copied the oldest version I had onto the current one (raw files are not subject to changes by Lightroom, but I didn?t overwrite the XMP sidecar files with the metadata). It worked, all the photos of that corrupted folder are back again. This means something went wrong while they were on the OmniOS server, who knows what caused it (OS X and AFP, or OS X and SMB I also used, or ? ). Today I had the idea of doing a diff between the oldest-snapshot (now current) version and the one from one month ago and also from one week ago. $ for i in $(ls Lightroom/Photos/2010/Takumar/*.NEF); do diff .zfs/snapshot/daily-2013-08-10-03\:15\:00/$i .zfs/snapshot/znapzend-2014-12-28-000000/$i; done No differences of any kind. Nothing. This means the file was not corrupted but something else was? Could anyone explain? thanks. Also, is there a way I could check if more photos/files are in the same situation, without checking 10s of thousands of them via Lightroom? Thanks again. Olaf From danmcd at omniti.com Thu Jan 29 21:23:50 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 29 Jan 2015 16:23:50 -0500 Subject: [OmniOS-discuss] Corrupted ZFS metadata? In-Reply-To: <5C57DEC9-2FE9-4047-BB09-8B489A425BC1@marzocchi.net> References: <5C57DEC9-2FE9-4047-BB09-8B489A425BC1@marzocchi.net> Message-ID: <95AEF4F6-CFFC-4B24-8AB1-70B32DBDF616@omniti.com> Try running "zpool scrub" on your pool. Make sure it doesn't indicate anything. Dan From lists at marzocchi.net Thu Jan 29 21:27:31 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Thu, 29 Jan 2015 22:27:31 +0100 Subject: [OmniOS-discuss] Corrupted ZFS metadata? In-Reply-To: <95AEF4F6-CFFC-4B24-8AB1-70B32DBDF616@omniti.com> References: <5C57DEC9-2FE9-4047-BB09-8B489A425BC1@marzocchi.net> <95AEF4F6-CFFC-4B24-8AB1-70B32DBDF616@omniti.com> Message-ID: <7E566302-A0D7-4D27-81EB-A22C2A17C00B@marzocchi.net> Scrubs are performed biweekly by a cron job that should send me the output in case of errors. I never got anything and indeed I still see no errors in zpool status. In case it happens again I won?t overwrite the files and I?ll contact the list to investigate further, I suppose it?s late now to do anything. Olaf > Il giorno 29/gen/2015, alle ore 22:23, Dan McDonald ha scritto: > > Try running "zpool scrub" on your pool. Make sure it doesn't indicate anything. > > Dan > From danmcd at omniti.com Thu Jan 29 21:29:28 2015 From: danmcd at omniti.com (Dan McDonald) Date: Thu, 29 Jan 2015 16:29:28 -0500 Subject: [OmniOS-discuss] Corrupted ZFS metadata? In-Reply-To: <7E566302-A0D7-4D27-81EB-A22C2A17C00B@marzocchi.net> References: <5C57DEC9-2FE9-4047-BB09-8B489A425BC1@marzocchi.net> <95AEF4F6-CFFC-4B24-8AB1-70B32DBDF616@omniti.com> <7E566302-A0D7-4D27-81EB-A22C2A17C00B@marzocchi.net> Message-ID: <4DE2883C-F7DC-4610-A3ED-470786E5C58F@omniti.com> > On Jan 29, 2015, at 4:27 PM, Olaf Marzocchi wrote: > > Scrubs are performed biweekly by a cron job that should send me the output in case of errors. I never got anything and indeed I still see no errors in zpool status. > > In case it happens again I won?t overwrite the files and I?ll contact the list to investigate further, I suppose it?s late now to do anything. One last question: which OmniOS release are you running? Dan From lists at marzocchi.net Thu Jan 29 21:32:31 2015 From: lists at marzocchi.net (Olaf Marzocchi) Date: Thu, 29 Jan 2015 22:32:31 +0100 Subject: [OmniOS-discuss] Corrupted ZFS metadata? In-Reply-To: <4DE2883C-F7DC-4610-A3ED-470786E5C58F@omniti.com> References: <5C57DEC9-2FE9-4047-BB09-8B489A425BC1@marzocchi.net> <95AEF4F6-CFFC-4B24-8AB1-70B32DBDF616@omniti.com> <7E566302-A0D7-4D27-81EB-A22C2A17C00B@marzocchi.net> <4DE2883C-F7DC-4610-A3ED-470786E5C58F@omniti.com> Message-ID: > Il giorno 29/gen/2015, alle ore 22:29, Dan McDonald ha scritto: > > >> On Jan 29, 2015, at 4:27 PM, Olaf Marzocchi wrote: >> >> Scrubs are performed biweekly by a cron job that should send me the output in case of errors. I never got anything and indeed I still see no errors in zpool status. >> >> In case it happens again I won?t overwrite the files and I?ll contact the list to investigate further, I suppose it?s late now to do anything. > > One last question: which OmniOS release are you running? Latest one, non-bloody: r151012 From sim.ple at live.nl Fri Jan 30 11:22:46 2015 From: sim.ple at live.nl (Randy S) Date: Fri, 30 Jan 2015 12:22:46 +0100 Subject: [OmniOS-discuss] omnios kvm options Message-ID: Hi, Does anyone know if it's possible to give a vm (e.g ubuntu) created in kvm on omniosr12 direct access to the sas disks (e.g. enclosure), making it possible for the vm to e.g. check health, apply configurations etc. This is a test project. If so, an example of how to achieve this would be nice. Thanx in advance. R -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexmcwhirter at vantagetitle.com Fri Jan 30 15:24:03 2015 From: alexmcwhirter at vantagetitle.com (Alex McWhirter) Date: Fri, 30 Jan 2015 10:24:03 -0500 Subject: [OmniOS-discuss] omnios kvm options In-Reply-To: References: Message-ID: As long as illumos-kvm is fairly up to date with the official KVM repo then something like this should do. http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM But keep in mind, that passes through an entire PCI device. So you will have to pass over the entire RAID controller, not just one enclosure attached to it. You also cannot pass over the controller that you are booting from. So you will need a secondary controller if you don?t already have one. > On Jan 30, 2015, at 6:22 AM, Randy S wrote: > > Hi, > > Does anyone know if it's possible to give a vm (e.g ubuntu) created in kvm on omniosr12 direct access to the sas disks (e.g. enclosure), making it possible for the vm to e.g. check health, apply configurations etc. This is a test project. > > If so, an example of how to achieve this would be nice. > > Thanx in advance. > > R > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From danmcd at omniti.com Fri Jan 30 15:59:08 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 30 Jan 2015 10:59:08 -0500 Subject: [OmniOS-discuss] omnios kvm options In-Reply-To: References: Message-ID: > On Jan 30, 2015, at 10:24 AM, Alex McWhirter wrote: > > As long as illumos-kvm is fairly up to date with the official KVM repo then something like this should do. > > http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM > > But keep in mind, that passes through an entire PCI device. So you will have to pass over the entire RAID controller, not just one enclosure attached to it. The KVM in OmniOS is the same one in SmartOS, and the kvm-cmd is a few revs behind because some illumos changes haven't been upstreamed yet. You should consult SmartOS documention about KVM and raw disks, it'll apply to OmniOS as well. Dan From sim.ple at live.nl Fri Jan 30 19:02:00 2015 From: sim.ple at live.nl (Randy S) Date: Fri, 30 Jan 2015 20:02:00 +0100 Subject: [OmniOS-discuss] omnios kvm options In-Reply-To: References: , Message-ID: Thanks guys ! Will check it out. R > Subject: Re: [OmniOS-discuss] omnios kvm options > From: danmcd at omniti.com > Date: Fri, 30 Jan 2015 10:59:08 -0500 > CC: sim.ple at live.nl; omnios-discuss at lists.omniti.com > To: alexmcwhirter at vantagetitle.com > > > > On Jan 30, 2015, at 10:24 AM, Alex McWhirter wrote: > > > > As long as illumos-kvm is fairly up to date with the official KVM repo then something like this should do. > > > > http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM > > > > But keep in mind, that passes through an entire PCI device. So you will have to pass over the entire RAID controller, not just one enclosure attached to it. > > The KVM in OmniOS is the same one in SmartOS, and the kvm-cmd is a few revs behind because some illumos changes haven't been upstreamed yet. > > You should consult SmartOS documention about KVM and raw disks, it'll apply to OmniOS as well. > > Dan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hakansom at ohsu.edu Fri Jan 30 19:46:47 2015 From: hakansom at ohsu.edu (Marion Hakanson) Date: Fri, 30 Jan 2015 11:46:47 -0800 Subject: [OmniOS-discuss] mpt_sas & ixgbe kernel buffer alloc failures Message-ID: <201501301946.t0UJklu9007386@kyklops.ohsu.edu> Greetings, We've had a tough one to fix here, and I'm hoping for some help. This is an OmniOS-151012 system, and it started suffering ZFS pool lockups under busy I/O conditions a couple months ago. The server is an Intel S2600WP server (128GB RAM), and a Quanta M4600H JBOD, connected via an LSI SAS 9206-16e HBA (with four SAS cables in a multipath arrangement). The JBOD has 60x 4TB drives, configured as five, 12-drive raidz3 vdevs in one pool. Initial testing of this system had been done with a 9207-8e (two SAS cables), and it was stable under as heavy of a load I could generate using filebench, etc. We updated the 9206-16e to LSI's P19 firmware (was P17), to match what had been on the 9207-8e for the stress testing, but the problem remained. We then took two of the four SAS cables out of service, and at that time a scrub passed without further lockups. However as time has passed, the lockups returned, giving the usual device timeouts, and also now warning that the HBA's firmware signature was invalid. A reboot and power-cycle of the JBOD were necessary to get things completely unstuck (and the firmware signature warnings disappeared too). Suspecting a faulty HBA, we swapped back in the known-good 9206-8e (and two SAS cables), but now we're facing what looks like a different recurring problem. Rather than SAS timeouts, we've started seeing errors logged like these: ixgbe: [ID 611667 kern.info] NOTICE: ixgbe1: ixgbe_rx_copy: allocate buffer failed . . . scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci 8086,e04 at 2/pci1000,3040 at 0 (mpt_sas0): Unable to allocate dma memory for extra SGL. scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci 8086,e04 at 2/pci1000,3040 at 0 (mpt_sas0): MPT SGL mem alloc failed The system is not completely hung, and existing login sessions continue to work, but new login sessions cannot be established (no network response). A scrub on the pool has produced lots of checksum errors as well, though it's hard to know if these are real checksum errors (perhaps from earlier hang/reboot incidents), or ones induced by the lack of DMA buffers, etc. We've tried some different BIOS settings in an attempt to give more memory-mapped device address space. Enabling "memory-mapped devices above 4GB" unfortunately causes the dual Intel X540 10GbE NIC's to not be attached by the OS, and produces these kernel warnings at boot: NOTICE: unsupported 64-bit prefetch memory on pci-pci bridge [0/3/2] NOTICE: unsupported 64-bit prefetch memory on pci-pci bridge [0/17/0] Enabling "Mmaximize memory below 4GB" doesn't cause any problems, but did not alleviate the problem either. Here's a "::memstat" dump, after the dma memory allocation errors have started showing up: # echo "::memstat" | mdb -k Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 25477565 99521 76% ZFS File Data 6275294 24512 19% Anon 19707 76 0% Exec and libs 724 2 0% Page cache 4292 16 0% Free (cachelist) 7535 29 0% Free (freelist) 1747133 6824 5% Total 33532250 130985 Physical 33532249 130985 # Is it just me, or is that an awful lot of kernel memory in use? Regards, Marion From danmcd at omniti.com Fri Jan 30 20:34:06 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 30 Jan 2015 15:34:06 -0500 Subject: [OmniOS-discuss] mpt_sas & ixgbe kernel buffer alloc failures In-Reply-To: <201501301946.t0UJklu9007386@kyklops.ohsu.edu> References: <201501301946.t0UJklu9007386@kyklops.ohsu.edu> Message-ID: > On Jan 30, 2015, at 2:46 PM, Marion Hakanson wrote: > > Here's a "::memstat" dump, after the dma memory allocation errors > have started showing up: > > # echo "::memstat" | mdb -k > Page Summary Pages MB %Tot > ------------ ---------------- ---------------- ---- > Kernel 25477565 99521 76% > ZFS File Data 6275294 24512 19% > Anon 19707 76 0% > Exec and libs 724 2 0% > Page cache 4292 16 0% > Free (cachelist) 7535 29 0% > Free (freelist) 1747133 6824 5% > > Total 33532250 130985 > Physical 33532249 130985 > # > > > Is it just me, or is that an awful lot of kernel memory in use? That does seem to be a lot. What does ::kmausers say? (It may be a lot of output...) Also, is it possible for you to: 1.) Put this line in /etc/system: set kmem_flags=0xf 2.) Reboot your system. 3.) When you encounter this situation again (it may happen sooner with memory debugging enabled), utter "reboot -d" to get a kernel core dump for memory leak analysis? Thanks, Dan From youzhong at gmail.com Fri Jan 30 20:42:46 2015 From: youzhong at gmail.com (Youzhong Yang) Date: Fri, 30 Jan 2015 15:42:46 -0500 Subject: [OmniOS-discuss] mpt_sas & ixgbe kernel buffer alloc failures In-Reply-To: References: <201501301946.t0UJklu9007386@kyklops.ohsu.edu> Message-ID: Not sure if this is the same issue. We experienced kma problem when ipnet driver is loaded. Basically ipnet is a memory eater. Adding the following lines in /etc/system exclude:drv/bpf exclude:drv/ipnet On Fri, Jan 30, 2015 at 3:34 PM, Dan McDonald wrote: > > > On Jan 30, 2015, at 2:46 PM, Marion Hakanson wrote: > > > > Here's a "::memstat" dump, after the dma memory allocation errors > > have started showing up: > > > > # echo "::memstat" | mdb -k > > Page Summary Pages MB %Tot > > ------------ ---------------- ---------------- ---- > > Kernel 25477565 99521 76% > > ZFS File Data 6275294 24512 19% > > Anon 19707 76 0% > > Exec and libs 724 2 0% > > Page cache 4292 16 0% > > Free (cachelist) 7535 29 0% > > Free (freelist) 1747133 6824 5% > > > > Total 33532250 130985 > > Physical 33532249 130985 > > # > > > > > > Is it just me, or is that an awful lot of kernel memory in use? > > That does seem to be a lot. > > What does ::kmausers say? (It may be a lot of output...) > > Also, is it possible for you to: > > 1.) Put this line in /etc/system: > > set kmem_flags=0xf > > 2.) Reboot your system. > > 3.) When you encounter this situation again (it may happen sooner with > memory debugging enabled), utter "reboot -d" to get a kernel core dump for > memory leak analysis? > > Thanks, > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hakansom at ohsu.edu Fri Jan 30 21:32:41 2015 From: hakansom at ohsu.edu (Marion Hakanson) Date: Fri, 30 Jan 2015 13:32:41 -0800 Subject: [OmniOS-discuss] mpt_sas & ixgbe kernel buffer alloc failures In-Reply-To: Message from Dan McDonald of "Fri, 30 Jan 2015 15:34:06 EST." Message-ID: <201501302132.t0ULWfuD007572@kyklops.ohsu.edu> Thanks for the suggestions. Say, I'm due a couple updates. Do you know what are in these? Changed packages: omnios developer/debug/mdb 0.5.11,5.11-0.151012:20140913T033502Z -> 0.5.11,5.11-0.151012:20150119T1703 19Z driver/storage/mpt_sas 0.5.11,5.11-0.151012:20140913T033536Z -> 0.5.11,5.11-0.151012:20150119T1703 46Z . . . system/kernel 0.5.11,5.11-0.151012:20140913T033629Z -> 0.5.11,5.11-0.151012:20141209T0247 21Z Responses interspersed.... danmcd at omniti.com said: > What does ::kmausers say? (It may be a lot of output...) Not so much: mdb: KMF_AUDIT is not enabled for any caches danmcd at omniti.com said: > Also, is it possible for you to: > 1.) Put this line in /etc/system: > set kmem_flags=0xf > 2.) Reboot your system. > 3.) When you encounter this situation again (it may happen sooner with memory > debugging enabled), utter "reboot -d" to get a kernel core dump for memory > leak analysis? I've got the kmem_flags setting queued up in /etc/system now. Getting a crash dump may be a challenge, since the dump area is on the ZFS JBOD attached by the LSI SAS HBA (root is on a 16GB SATADOM unit). Might work, or we can try putting dump on a USB drive. youzhong at gmail.com said: > Not sure if this is the same issue. We experienced kma problem when ipnet > driver is loaded. Basically ipnet is a memory eater. > > Adding the following lines in /etc/system > > exclude:drv/bpf > exclude:drv/ipnet I've added those too. About to go reboot, the thing is starting to go unresponsive again.... Thanks and regards, Marion On Fri, Jan 30, 2015 at 3:34 PM, Dan McDonald wrote: > > > On Jan 30, 2015, at 2:46 PM, Marion Hakanson wrote: > > > > Here's a "::memstat" dump, after the dma memory allocation errors > > have started showing up: > > > > # echo "::memstat" | mdb -k > > Page Summary Pages MB %Tot > > ------------ ---------------- ---------------- ---- > > Kernel 25477565 99521 76% > > ZFS File Data 6275294 24512 19% > > Anon 19707 76 0% > > Exec and libs 724 2 0% > > Page cache 4292 16 0% > > Free (cachelist) 7535 29 0% > > Free (freelist) 1747133 6824 5% > > > > Total 33532250 130985 > > Physical 33532249 130985 > > # > > > > > > Is it just me, or is that an awful lot of kernel memory in use? > > That does seem to be a lot. > > What does ::kmausers say? (It may be a lot of output...) > > Also, is it possible for you to: > > 1.) Put this line in /etc/system: > > set kmem_flags=0xf > > 2.) Reboot your system. > > 3.) When you encounter this situation again (it may happen sooner with > memory debugging enabled), utter "reboot -d" to get a kernel core dump for > memory leak analysis? > > Thanks, > Dan > > _______________________________________________ > OmniOS-discuss mailing list > OmniOS-discuss at lists.omniti.com > http://lists.omniti.com/mailman/listinfo/omnios-discuss > From danmcd at omniti.com Fri Jan 30 21:57:17 2015 From: danmcd at omniti.com (Dan McDonald) Date: Fri, 30 Jan 2015 16:57:17 -0500 Subject: [OmniOS-discuss] mpt_sas & ixgbe kernel buffer alloc failures In-Reply-To: <201501302132.t0ULWfuD007572@kyklops.ohsu.edu> References: <201501302132.t0ULWfuD007572@kyklops.ohsu.edu> Message-ID: <63BDFF19-6975-46F9-B66E-CDA7100EB96F@omniti.com> > On Jan 30, 2015, at 4:32 PM, Marion Hakanson wrote: > > Thanks for the suggestions. > > Say, I'm due a couple updates. Do you know what are in these? > > Changed packages: > omnios > developer/debug/mdb > 0.5.11,5.11-0.151012:20140913T033502Z -> 0.5.11,5.11-0.151012:20150119T1703 > 19Z > driver/storage/mpt_sas > 0.5.11,5.11-0.151012:20140913T033536Z -> 0.5.11,5.11-0.151012:20150119T1703 > 46Z > . . . > system/kernel > 0.5.11,5.11-0.151012:20140913T033629Z -> 0.5.11,5.11-0.151012:20141209T0247 > 21Z Five commits after 012 shipped: https://github.com/omniti-labs/illumos-omnios/commits/r151012 One's a security fix, the other are mpt_sas improvements. Dan From hakansom at ohsu.edu Fri Jan 30 22:01:36 2015 From: hakansom at ohsu.edu (Marion Hakanson) Date: Fri, 30 Jan 2015 14:01:36 -0800 Subject: [OmniOS-discuss] mpt_sas & ixgbe kernel buffer alloc failures In-Reply-To: Message from Marion Hakanson of "Fri, 30 Jan 2015 13:32:41 PST." Message-ID: <201501302201.t0UM1aJO007661@kyklops.ohsu.edu> youzhong at gmail.com said: > Not sure if this is the same issue. We experienced kma problem when ipnet > driver is loaded. Basically ipnet is a memory eater. > > Adding the following lines in /etc/system > > exclude:drv/bpf > exclude:drv/ipnet Say, if I disable "ipnet", doesn't that mean "dladm" will no longer work? I do see our 10GbE NIC's listed in /dev/ipnet/, and they are configured with static IP's using "ipadm". # ls -l /dev/ipnet total 0 crw-rw-rw- 1 root sys 64, 3 Jan 30 13:55 ixgbe0 crw-rw-rw- 1 root sys 64, 4 Jan 30 13:55 ixgbe1 crw-rw-rw- 1 root sys 64, 2 Jan 30 13:55 lo0 # dladm show-link LINK CLASS MTU STATE BRIDGE OVER igb0 phys 1500 unknown -- -- igb1 phys 1500 unknown -- -- ixgbe0 phys 1500 up -- -- ixgbe1 phys 1500 up -- -- # Regards, Marion