From steve at linuxsuite.org  Tue Apr  1 14:17:40 2014
From: steve at linuxsuite.org (steve at linuxsuite.org)
Date: Tue, 1 Apr 2014 10:17:40 -0400
Subject: [OmniOS-discuss] How to disable ata module / driver at boot
In-Reply-To: <201403312331.s2VNVOIW011926@elvis.arl.psu.edu>
References: <7409d33d8efc08eccda1cecdc31bd7ea.squirrel@emailmg.netfirms.com>
	<201403312331.s2VNVOIW011926@elvis.arl.psu.edu>
Message-ID: <6cda07987dc35bb6735ccd08af13f165.squirrel@emailmg.netfirms.com>

> In message
> <7409d33d8efc08eccda1cecdc31bd7ea.squirrel at emailmg.netfirms.com>, st
> eve at linuxsuite.org writes:
>>          May not be related, but I would like to reboot so that OmniOS
>>does not
>>see the device by not loading the driver / module. I do not need the
>>device after
>>system install..
>
> disable-ata=true
> <URL:http://permalink.gmane.org/gmane.os.solaris.opensolaris.indiana/8851>
>

        Thanks. Is there an entry that can be put into /etc/system that
will prevent the module from loading also?

       -steve

> John
> groenveld at acm.org
>


From jdg117 at elvis.arl.psu.edu  Tue Apr  1 15:18:07 2014
From: jdg117 at elvis.arl.psu.edu (John D Groenveld)
Date: Tue, 01 Apr 2014 11:18:07 -0400
Subject: [OmniOS-discuss] How to disable ata module / driver at boot
In-Reply-To: Your message of "Tue, 01 Apr 2014 10:17:40 EDT."
	<6cda07987dc35bb6735ccd08af13f165.squirrel@emailmg.netfirms.com> 
References: <7409d33d8efc08eccda1cecdc31bd7ea.squirrel@emailmg.netfirms.com>
	<201403312331.s2VNVOIW011926@elvis.arl.psu.edu>
	<6cda07987dc35bb6735ccd08af13f165.squirrel@emailmg.netfirms.com>
Message-ID: <201404011518.s31FI7LJ021915@elvis.arl.psu.edu>

In message <6cda07987dc35bb6735ccd08af13f165.squirrel at emailmg.netfirms.com>, st
eve at linuxsuite.org writes:
>        Thanks. Is there an entry that can be put into /etc/system that
>will prevent the module from loading also?

	exclude: ata 

John
groenveld at acm.org

From groups at tierarzt-mueller.de  Wed Apr  2 11:47:52 2014
From: groups at tierarzt-mueller.de (Alexander Lesle)
Date: Wed, 2 Apr 2014 13:47:52 +0200
Subject: [OmniOS-discuss] Bad kernel fault at addr=0xffffff04f8e08b88
Message-ID: <8210579099.20140402134752@tierarzt-mueller.de>

Hello All

I have had a kernel panic and dont know what happend.

Message on console:
Apr  2 12:19:42 aio fmd: [ID 377184 daemon.error] SUNW-MSG-ID: SUNOS-8000-KL, TYPE: Defect, VER: 1, SEVERITY: Major
Apr  2 12:19:42 aio EVENT-TIME: Mi. Apr  2 12:19:42 CEST 2014
Apr  2 12:19:42 aio PLATFORM: VMware-Virtual-Platform, CSN: VMware-56-4d-8a-b3-c5-36-3b-b8-27-ef-49-0b-c8-94-81-50, HOSTNAME: aio
Apr  2 12:19:42 aio SOURCE: software-diagnosis, REV: 0.1
Apr  2 12:19:42 aio EVENT-ID: 1630fc26-9694-e811-803c-956e16302b39
Apr  2 12:19:42 aio DESC: The system has rebooted after a kernel panic.  Refer to http://illumos.org/msg/SUNOS-8000-KL for more information.
Apr  2 12:19:42 aio AUTO-RESPONSE: The failed system image was dumped to the dump device.  If savecore is enabled (see dumpadm(1M)) a copy of the dump will be written to the savecore directory /var/crash/unknown.
Apr  2 12:19:42 aio IMPACT: There may be some performance impact while the panic is copied to the savecore directory.  Disk space usage by panics can be substantial.
Apr  2 12:19:42 aio REC-ACTION: If savecore is not enabled then please take steps to preserve the crash image.
Apr  2 12:19:42 aio Use 'fmdump -Vp -u 1630fc26-9694-e811-803c-956e16302b39' to view more panic detail.  Please refer to the knowledge article for addi

But what is defect??

Apr  2 12:19:44 aio ^Mpanic[cpu1]/thread=ffffff04ebd5b840:
Apr  2 12:19:44 aio genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff001f591630 addr=ffffff04f8e08b88
Apr  2 12:19:44 aio unix: [ID 100000 kern.notice] 
Apr  2 12:19:44 aio unix: [ID 839527 kern.notice] nc: 
Apr  2 12:19:44 aio unix: [ID 753105 kern.notice] #pf Page fault
Apr  2 12:19:44 aio unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xffffff04f8e08b88
Apr  2 12:19:44 aio unix: [ID 243837 kern.notice] pid=10842, pc=0xfffffffffbb34880, sp=0xffffff001f591720, eflags=0x10282
Apr  2 12:19:44 aio unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 406b8<osxsav,xmme,fxsr,pge,pae,pse,de>
Apr  2 12:19:44 aio unix: [ID 624947 kern.notice] cr2: ffffff04f8e08b88
Apr  2 12:19:44 aio unix: [ID 625075 kern.notice] cr3: 436a09000
Apr  2 12:19:44 aio unix: [ID 625715 kern.notice] cr8: c
Apr  2 12:19:44 aio unix: [ID 100000 kern.notice] 
Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       rdi: ffffff001f5917b8 rsi: ffffff04f8e08b88 rdx:          80bd000
Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       rcx:                0  r8:                0  r9:                2
Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       rax:         ffffffff rbx: ffffff04edba0408 rbp: ffffff001f591730
Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       r10: fffffffffbcf3500 r11:                0 r12: ffffff001f5917b8
Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       r13: ffffff04f8e08ba8 r14: ffffff04f8e08b88 r15:               20
Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       fsb:                0 gsb: ffffff04ea0e1580  ds:               4b
Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]        es:               4b  fs:                0  gs:              1c3
Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       trp:                e err:                0 rip: fffffffffbb34880
Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]        cs:               30 rfl:            10282 rsp: ffffff001f591720
Apr  2 12:19:44 aio unix: [ID 266532 kern.notice]        ss:               38
Apr  2 12:19:44 aio unix: [ID 100000 kern.notice] 

What I have done? Its running a replicate job (nappit) to backup
a FS from aio_server to backup_server.

Any hints?

-- 
Best Regards
Alexander
April, 02 2014
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kernelcrash.PNG
Type: image/png
Size: 141570 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20140402/2e2e8dd6/attachment-0001.png>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: messages.txt
URL: <https://omniosce.org/ml-archive/attachments/20140402/2e2e8dd6/attachment-0001.txt>

From ben at fluffy.co.uk  Wed Apr  2 12:01:04 2014
From: ben at fluffy.co.uk (Ben Summers)
Date: Wed, 2 Apr 2014 13:01:04 +0100
Subject: [OmniOS-discuss] Bad kernel fault at addr=0xffffff04f8e08b88
In-Reply-To: <8210579099.20140402134752@tierarzt-mueller.de>
References: <8210579099.20140402134752@tierarzt-mueller.de>
Message-ID: <204F8923-878C-4520-869D-07951FAFE6EB@fluffy.co.uk>


Alexander

I note this is a VMware VM. If you install VMware tools, you will get crashes when you power off the VM in some versions of VMware.

Which VMware are you using? And can you try it without the "VMware Host-Guest Filesystem" and "vmblock" features?

Ben


On 2 Apr 2014, at 12:47, Alexander Lesle <groups at tierarzt-mueller.de> wrote:

> Hello All
> 
> I have had a kernel panic and dont know what happend.
> 
> Message on console:
> Apr  2 12:19:42 aio fmd: [ID 377184 daemon.error] SUNW-MSG-ID: SUNOS-8000-KL, TYPE: Defect, VER: 1, SEVERITY: Major
> Apr  2 12:19:42 aio EVENT-TIME: Mi. Apr  2 12:19:42 CEST 2014
> Apr  2 12:19:42 aio PLATFORM: VMware-Virtual-Platform, CSN: VMware-56-4d-8a-b3-c5-36-3b-b8-27-ef-49-0b-c8-94-81-50, HOSTNAME: aio
> Apr  2 12:19:42 aio SOURCE: software-diagnosis, REV: 0.1
> Apr  2 12:19:42 aio EVENT-ID: 1630fc26-9694-e811-803c-956e16302b39
> Apr  2 12:19:42 aio DESC: The system has rebooted after a kernel panic.  Refer to http://illumos.org/msg/SUNOS-8000-KL for more information.
> Apr  2 12:19:42 aio AUTO-RESPONSE: The failed system image was dumped to the dump device.  If savecore is enabled (see dumpadm(1M)) a copy of the dump will be written to the savecore directory /var/crash/unknown.
> Apr  2 12:19:42 aio IMPACT: There may be some performance impact while the panic is copied to the savecore directory.  Disk space usage by panics can be substantial.
> Apr  2 12:19:42 aio REC-ACTION: If savecore is not enabled then please take steps to preserve the crash image.
> Apr  2 12:19:42 aio Use 'fmdump -Vp -u 1630fc26-9694-e811-803c-956e16302b39' to view more panic detail.  Please refer to the knowledge article for addi
> 
> But what is defect??
> 
> Apr  2 12:19:44 aio ^Mpanic[cpu1]/thread=ffffff04ebd5b840:
> Apr  2 12:19:44 aio genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff001f591630 addr=ffffff04f8e08b88
> Apr  2 12:19:44 aio unix: [ID 100000 kern.notice] 
> Apr  2 12:19:44 aio unix: [ID 839527 kern.notice] nc: 
> Apr  2 12:19:44 aio unix: [ID 753105 kern.notice] #pf Page fault
> Apr  2 12:19:44 aio unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xffffff04f8e08b88
> Apr  2 12:19:44 aio unix: [ID 243837 kern.notice] pid=10842, pc=0xfffffffffbb34880, sp=0xffffff001f591720, eflags=0x10282
> Apr  2 12:19:44 aio unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 406b8<osxsav,xmme,fxsr,pge,pae,pse,de>
> Apr  2 12:19:44 aio unix: [ID 624947 kern.notice] cr2: ffffff04f8e08b88
> Apr  2 12:19:44 aio unix: [ID 625075 kern.notice] cr3: 436a09000
> Apr  2 12:19:44 aio unix: [ID 625715 kern.notice] cr8: c
> Apr  2 12:19:44 aio unix: [ID 100000 kern.notice] 
> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       rdi: ffffff001f5917b8 rsi: ffffff04f8e08b88 rdx:          80bd000
> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       rcx:                0  r8:                0  r9:                2
> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       rax:         ffffffff rbx: ffffff04edba0408 rbp: ffffff001f591730
> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       r10: fffffffffbcf3500 r11:                0 r12: ffffff001f5917b8
> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       r13: ffffff04f8e08ba8 r14: ffffff04f8e08b88 r15:               20
> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       fsb:                0 gsb: ffffff04ea0e1580  ds:               4b
> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]        es:               4b  fs:                0  gs:              1c3
> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       trp:                e err:                0 rip: fffffffffbb34880
> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]        cs:               30 rfl:            10282 rsp: ffffff001f591720
> Apr  2 12:19:44 aio unix: [ID 266532 kern.notice]        ss:               38
> Apr  2 12:19:44 aio unix: [ID 100000 kern.notice] 
> 
> What I have done? Its running a replicate job (nappit) to backup
> a FS from aio_server to backup_server.
> 
> Any hints?
> 
> -- 
> Best Regards
> Alexander
> April, 02 2014<kernelcrash.PNG><messages.txt>_______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

--
http://bens.me.uk


From dswartz at druber.com  Wed Apr  2 12:13:30 2014
From: dswartz at druber.com (Dan Swartzendruber)
Date: Wed, 02 Apr 2014 08:13:30 -0400
Subject: [OmniOS-discuss] Bad kernel fault at addr=0xffffff04f8e08b88
Message-ID: <hl2pasmm82socvki2ad8vcs0.1396440810905@email.android.com>

I have had this with current omnios and esxi 5?1 but not guest vs.  Haven't tried with esxi 5?5 yet.

Ben Summers <ben at fluffy.co.uk> wrote:

>
>Alexander
>
>I note this is a VMware VM. If you install VMware tools, you will get crashes when you power off the VM in some versions of VMware.
>
>Which VMware are you using? And can you try it without the "VMware Host-Guest Filesystem" and "vmblock" features?
>
>Ben
>
>
>
>
>On 2 Apr 2014, at 12:47, Alexander Lesle <groups at tierarzt-mueller.de> wrote:
>
>> Hello All
>> 
>> I have had a kernel panic and dont know what happend.
>> 
>> Message on console:
>> Apr  2 12:19:42 aio fmd: [ID 377184 daemon.error] SUNW-MSG-ID: SUNOS-8000-KL, TYPE: Defect, VER: 1, SEVERITY: Major
>> Apr  2 12:19:42 aio EVENT-TIME: Mi. Apr  2 12:19:42 CEST 2014
>> Apr  2 12:19:42 aio PLATFORM: VMware-Virtual-Platform, CSN: VMware-56-4d-8a-b3-c5-36-3b-b8-27-ef-49-0b-c8-94-81-50, HOSTNAME: aio
>> Apr  2 12:19:42 aio SOURCE: software-diagnosis, REV: 0.1
>> Apr  2 12:19:42 aio EVENT-ID: 1630fc26-9694-e811-803c-956e16302b39
>> Apr  2 12:19:42 aio DESC: The system has rebooted after a kernel panic.  Refer to http://illumos.org/msg/SUNOS-8000-KL for more information.
>> Apr  2 12:19:42 aio AUTO-RESPONSE: The failed system image was dumped to the dump device.  If savecore is enabled (see dumpadm(1M)) a copy of the dump will be written to the savecore directory /var/crash/unknown.
>> Apr  2 12:19:42 aio IMPACT: There may be some performance impact while the panic is copied to the savecore directory.  Disk space usage by panics can be substantial.
>> Apr  2 12:19:42 aio REC-ACTION: If savecore is not enabled then please take steps to preserve the crash image.
>> Apr  2 12:19:42 aio Use 'fmdump -Vp -u 1630fc26-9694-e811-803c-956e16302b39' to view more panic detail.  Please refer to the knowledge article for addi
>> 
>> But what is defect??
>> 
>> Apr  2 12:19:44 aio ^Mpanic[cpu1]/thread=ffffff04ebd5b840:
>> Apr  2 12:19:44 aio genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff001f591630 addr=ffffff04f8e08b88
>> Apr  2 12:19:44 aio unix: [ID 100000 kern.notice] 
>> Apr  2 12:19:44 aio unix: [ID 839527 kern.notice] nc: 
>> Apr  2 12:19:44 aio unix: [ID 753105 kern.notice] #pf Page fault
>> Apr  2 12:19:44 aio unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xffffff04f8e08b88
>> Apr  2 12:19:44 aio unix: [ID 243837 kern.notice] pid=10842, pc=0xfffffffffbb34880, sp=0xffffff001f591720, eflags=0x10282
>> Apr  2 12:19:44 aio unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 406b8<osxsav,xmme,fxsr,pge,pae,pse,de>
>> Apr  2 12:19:44 aio unix: [ID 624947 kern.notice] cr2: ffffff04f8e08b88
>> Apr  2 12:19:44 aio unix: [ID 625075 kern.notice] cr3: 436a09000
>> Apr  2 12:19:44 aio unix: [ID 625715 kern.notice] cr8: c
>> Apr  2 12:19:44 aio unix: [ID 100000 kern.notice] 
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       rdi: ffffff001f5917b8 rsi: ffffff04f8e08b88 rdx:          80bd000
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       rcx:                0  r8:                0  r9:                2
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       rax:         ffffffff rbx: ffffff04edba0408 rbp: ffffff001f591730
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       r10: fffffffffbcf3500 r11:                0 r12: ffffff001f5917b8
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       r13: ffffff04f8e08ba8 r14: ffffff04f8e08b88 r15:               20
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       fsb:                0 gsb: ffffff04ea0e1580  ds:               4b
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]        es:               4b  fs:                0  gs:              1c3
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       trp:                e err:                0 rip: fffffffffbb34880
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]        cs:               30 rfl:            10282 rsp: ffffff001f591720
>> Apr  2 12:19:44 aio unix: [ID 266532 kern.notice]        ss:               38
>> Apr  2 12:19:44 aio unix: [ID 100000 kern.notice] 
>> 
>> What I have done? Its running a replicate job (nappit) to backup
>> a FS from aio_server to backup_server.
>> 
>> Any hints?
>> 
>> -- 
>> Best Regards
>> Alexander
>> April, 02 2014<kernelcrash.PNG><messages.txt>_______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>--
>http://bens.me.uk
>
>_______________________________________________
>OmniOS-discuss mailing list
>OmniOS-discuss at lists.omniti.com
>http://lists.omniti.com/mailman/listinfo/omnios-discuss

From groups at tierarzt-mueller.de  Wed Apr  2 14:46:50 2014
From: groups at tierarzt-mueller.de (Alexander Lesle)
Date: Wed, 2 Apr 2014 16:46:50 +0200
Subject: [OmniOS-discuss] Bad kernel fault at addr=0xffffff04f8e08b88
In-Reply-To: <204F8923-878C-4520-869D-07951FAFE6EB@fluffy.co.uk>
References: <8210579099.20140402134752@tierarzt-mueller.de>
	<204F8923-878C-4520-869D-07951FAFE6EB@fluffy.co.uk>
Message-ID: <1116464220.20140402164650@tierarzt-mueller.de>

Hello Ben Summers and List,

On April, 02 2014, 14:01 <Ben Summers> wrote in [1]:

> I note this is a VMware VM. If you install VMware tools, you will
> get crashes when you power off the VM in some versions of VMware.
> Which VMware are you using?
Yes Omnios is on ESXi5.5 with 2 vnics vmxnet3.
But I dont power off I copies a snapshot from the VM to my standalone
Omnios backup-server. (Napp-it/Job/replicate)

> And can you try it without the "VMware
> Host-Guest Filesystem" and "vmblock" features?

Dont understand what I have to do.
Can you explain it for me, please.
-- 

> On 2 Apr 2014, at 12:47, Alexander Lesle
> <groups at tierarzt-mueller.de> wrote:

>> Hello All
>> 
>> I have had a kernel panic and dont know what happend.
>> 
>> Message on console:
>> Apr  2 12:19:42 aio fmd: [ID 377184 daemon.error] SUNW-MSG-ID: SUNOS-8000-KL, TYPE: Defect, VER: 1, SEVERITY: Major
>> Apr  2 12:19:42 aio EVENT-TIME: Mi. Apr  2 12:19:42 CEST 2014
>> Apr  2 12:19:42 aio PLATFORM: VMware-Virtual-Platform, CSN: VMware-56-4d-8a-b3-c5-36-3b-b8-27-ef-49-0b-c8-94-81-50, HOSTNAME: aio
>> Apr  2 12:19:42 aio SOURCE: software-diagnosis, REV: 0.1
>> Apr  2 12:19:42 aio EVENT-ID: 1630fc26-9694-e811-803c-956e16302b39
>> Apr  2 12:19:42 aio DESC: The system has rebooted after a kernel panic.  Refer to http://illumos.org/msg/SUNOS-8000-KL for more information.
>> Apr  2 12:19:42 aio AUTO-RESPONSE: The failed system image was dumped to the dump device.  If savecore is enabled (see dumpadm(1M)) a copy of the dump will be written to the savecore directory /var/crash/unknown.
>> Apr  2 12:19:42 aio IMPACT: There may be some performance impact while the panic is copied to the savecore directory.  Disk space usage by panics can be substantial.
>> Apr  2 12:19:42 aio REC-ACTION: If savecore is not enabled then please take steps to preserve the crash image.
>> Apr  2 12:19:42 aio Use 'fmdump -Vp -u 1630fc26-9694-e811-803c-956e16302b39' to view more panic detail.  Please refer to the knowledge article for addi
>> 
>> But what is defect??
>> 
>> Apr  2 12:19:44 aio ^Mpanic[cpu1]/thread=ffffff04ebd5b840:
>> Apr  2 12:19:44 aio genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff001f591630 addr=ffffff04f8e08b88
>> Apr  2 12:19:44 aio unix: [ID 100000 kern.notice] 
>> Apr  2 12:19:44 aio unix: [ID 839527 kern.notice] nc: 
>> Apr  2 12:19:44 aio unix: [ID 753105 kern.notice] #pf Page fault
>> Apr  2 12:19:44 aio unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xffffff04f8e08b88
>> Apr  2 12:19:44 aio unix: [ID 243837 kern.notice] pid=10842, pc=0xfffffffffbb34880, sp=0xffffff001f591720, eflags=0x10282
>> Apr  2 12:19:44 aio unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 406b8<osxsav,xmme,fxsr,pge,pae,pse,de>
>> Apr  2 12:19:44 aio unix: [ID 624947 kern.notice] cr2: ffffff04f8e08b88
>> Apr  2 12:19:44 aio unix: [ID 625075 kern.notice] cr3: 436a09000
>> Apr  2 12:19:44 aio unix: [ID 625715 kern.notice] cr8: c
>> Apr  2 12:19:44 aio unix: [ID 100000 kern.notice] 
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       rdi: ffffff001f5917b8 rsi: ffffff04f8e08b88 rdx:          80bd000
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       rcx:                0  r8:                0  r9:                2
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       rax:         ffffffff rbx: ffffff04edba0408 rbp: ffffff001f591730
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       r10: fffffffffbcf3500 r11:                0 r12: ffffff001f5917b8
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       r13: ffffff04f8e08ba8 r14: ffffff04f8e08b88 r15:               20
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       fsb:                0 gsb: ffffff04ea0e1580  ds:               4b
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]        es:               4b  fs:                0  gs:              1c3
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]       trp:                e err:                0 rip: fffffffffbb34880
>> Apr  2 12:19:44 aio unix: [ID 592667 kern.notice]        cs:               30 rfl:            10282 rsp: ffffff001f591720
>> Apr  2 12:19:44 aio unix: [ID 266532 kern.notice]        ss:               38
>> Apr  2 12:19:44 aio unix: [ID 100000 kern.notice] 
>> 
>> What I have done? Its running a replicate job (nappit) to backup
>> a FS from aio_server to backup_server.
>> 
>> Any hints?
>> 
>> -- 
>> Best Regards
>> Alexander
>> April, 02 2014<kernelcrash.PNG><messages.txt>_______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss

> --
> http://bens.me.uk


-- 
Best Regards
Alexander
April, 02 2014
........
[1] mid:204F8923-878C-4520-869D-07951FAFE6EB at fluffy.co.uk
........


From groups at tierarzt-mueller.de  Wed Apr  2 14:54:36 2014
From: groups at tierarzt-mueller.de (Alexander Lesle)
Date: Wed, 2 Apr 2014 16:54:36 +0200
Subject: [OmniOS-discuss] Bad kernel fault at addr=0xffffff04f8e08b88
In-Reply-To: <hl2pasmm82socvki2ad8vcs0.1396440810905@email.android.com>
References: <hl2pasmm82socvki2ad8vcs0.1396440810905@email.android.com>
Message-ID: <1877857159.20140402165436@tierarzt-mueller.de>

Hello Ben, Dan and List,

I have forgotten to send the output from fmdump:

root at aio:~# fmdump -Vp -u 1630fc26-9694-e811-803c-956e16302b39
TIME                           UUID                                 SUNW-MSG-ID
Apr 02 2014 12:19:42.085291000 1630fc26-9694-e811-803c-956e16302b39 SUNOS-8000-KL

  TIME                 CLASS                                 ENA
  Apr 02 12:19:42.0809 ireport.os.sunos.panic.dump_available 0x0000000000000000
  Apr 02 12:20:24.9371 ireport.os.sunos.panic.dump_pending_on_device 0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 1630fc26-9694-e811-803c-956e16302b39
        code = SUNOS-8000-KL
        diag-time = 1396433982 81733
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru = sw:///:path=/var/crash/unknown/.1630fc26-9694-e811-803c-956e16302b39
                resource = sw:///:path=/var/crash/unknown/.1630fc26-9694-e811-803c-956e16302b39
                savecore-succcess = 1
                dump-dir = /var/crash/unknown
                dump-files = vmdump.1
                os-instance-uuid = 1630fc26-9694-e811-803c-956e16302b39
                panicstr = BAD TRAP: type=e (#pf Page fault) rp=ffffff001f591630 addr=ffffff04f8e08b88
                panicstack = unix:real_mode_stop_cpu_stage2_end+9de3 () | unix:trap+db3 () | unix:cmntrap+e6 () | genunix:as_segcompar+10 () | genunix:avl_find+72 () | genunix:as_segat+3d () | genunix:as_fault+27a () | unix:pagefault+96 () | unix:trap+d23 () | unix:cmntrap+e6 () | unix:bcopy_altentry+55a () | genunix:uiomove+f8 () | fifofs:fifo_read+192 () | genunix:fop_read+5b () | genunix:read+2a7 () | genunix:read32+1e () | unix:brand_sys_sysenter+1c9 () |
                crashtime = 1396433985
                panic-time =  2. April 2014 12:19:45 CEST CEST
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x533be43e 0x5156ff8

root at aio:~#

Hope it helps to help me. :-)

-- 
Best Regards
Alexander
April, 02 2014
........
[1] mid:hl2pasmm82socvki2ad8vcs0.1396440810905 at email.android.com
........


From dswartz at druber.com  Fri Apr  4 19:58:05 2014
From: dswartz at druber.com (Dan Swartzendruber)
Date: Fri, 4 Apr 2014 15:58:05 -0400
Subject: [OmniOS-discuss] Installing local packages?
In-Reply-To: <1393535433.707.6.camel@exilis.si-consulting.us>
References: <CAEc-0iWuU5+O1B+E5yo+APFHK1WMX5iBYws9smiut23bb4Q2pA@mail.gmail.com>
	<1393535433.707.6.camel@exilis.si-consulting.us>
Message-ID: <3b67171d9cc159088bcb71bc9862f73a.squirrel@webmail.druber.com>


Okay, this has got to be something incredibly obvious and stupid, but I've
been looking at it for an hour and am stumped.  I wanted to try a
pacemaker active/passive cluster using omnios.  I found saso's guide on
zfs-create.blogspot.com.  So I downloaded, bunzipped and untarred his
archive.  He says to install the prebuild packages in the
prebuilt_packages subdir using the pkgadd command.  No matter what I do, I
can't get this to work.  The four packages are all gzipped, but after
copying them to /var/spool/pkg, whether I gunzip them or not, I get:

pkgadd: ERROR: no packages were found in </var/spool/pkg>

Yet:

root at vsa3:/var/spool/pkg# ls -l
total 65461
-rw-r--r-- 1 root root 17152000 Apr  4 15:56 CNCclusterglue.pkg
-rw-r--r-- 1 root root 13154816 Apr  4 15:56 CNCheartbeat.pkg
-rw-r--r-- 1 root root 39056896 Apr  4 15:56 CNCpacemaker.pkg
-rw-r--r-- 1 root root  1488896 Apr  4 15:56 CNCrsrcagents.pkg

FWIW, if I try 'pkgadd -d .' while in the prebuild_packages subdir, I get
the same message, only referring to that subdir.  Google has been 1000%
useless for resolving this.  Any tips would be much appreciated.  Thanks!


From danmcd at omniti.com  Fri Apr  4 20:07:03 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Fri, 4 Apr 2014 16:07:03 -0400
Subject: [OmniOS-discuss] Installing local packages?
In-Reply-To: <3b67171d9cc159088bcb71bc9862f73a.squirrel@webmail.druber.com>
References: <CAEc-0iWuU5+O1B+E5yo+APFHK1WMX5iBYws9smiut23bb4Q2pA@mail.gmail.com>
	<1393535433.707.6.camel@exilis.si-consulting.us>
	<3b67171d9cc159088bcb71bc9862f73a.squirrel@webmail.druber.com>
Message-ID: <27167B29-1E7A-4DE2-B4A8-BBA3A3FF09AF@omniti.com>


On Apr 4, 2014, at 3:58 PM, Dan Swartzendruber <dswartz at druber.com> wrote:

> 
> Okay, this has got to be something incredibly obvious and stupid, but I've
> been looking at it for an hour and am stumped.  I wanted to try a
> pacemaker active/passive cluster using omnios.  I found saso's guide on
> zfs-create.blogspot.com.  So I downloaded, bunzipped and untarred his
> archive.  He says to install the prebuild packages in the
> prebuilt_packages subdir using the pkgadd command.  No matter what I do, I
> can't get this to work.  The four packages are all gzipped, but after
> copying them to /var/spool/pkg, whether I gunzip them or not, I get:
> 
> pkgadd: ERROR: no packages were found in </var/spool/pkg>
> 
> Yet:
> 
> root at vsa3:/var/spool/pkg# ls -l
> total 65461
> -rw-r--r-- 1 root root 17152000 Apr  4 15:56 CNCclusterglue.pkg
> -rw-r--r-- 1 root root 13154816 Apr  4 15:56 CNCheartbeat.pkg
> -rw-r--r-- 1 root root 39056896 Apr  4 15:56 CNCpacemaker.pkg
> -rw-r--r-- 1 root root  1488896 Apr  4 15:56 CNCrsrcagents.pkg
> 
> FWIW, if I try 'pkgadd -d .' while in the prebuild_packages subdir, I get
> the same message, only referring to that subdir.

pkgadd -d CNCclusterglue.pkg

That should give you whateever's in that .pkg file.  Repeat with the other .pkg files.

Dan


From dswartz at druber.com  Fri Apr  4 20:14:19 2014
From: dswartz at druber.com (Dan Swartzendruber)
Date: Fri, 4 Apr 2014 16:14:19 -0400
Subject: [OmniOS-discuss] Installing local packages?
In-Reply-To: <27167B29-1E7A-4DE2-B4A8-BBA3A3FF09AF@omniti.com>
References: <CAEc-0iWuU5+O1B+E5yo+APFHK1WMX5iBYws9smiut23bb4Q2pA@mail.gmail.com>
	<1393535433.707.6.camel@exilis.si-consulting.us>
	<3b67171d9cc159088bcb71bc9862f73a.squirrel@webmail.druber.com>
	<27167B29-1E7A-4DE2-B4A8-BBA3A3FF09AF@omniti.com>
Message-ID: <bec81e0101d4de6a7f39dc75e7438da2.squirrel@webmail.druber.com>

>
ly referring to that subdir.
>
> pkgadd -d CNCclusterglue.pkg
>
> That should give you whateever's in that .pkg file.  Repeat with the other
> .pkg files.

Ah, okay, that's got it, thanks!  Kinda puzzle at the manpage which seems
to be telling me if I do 'pkgadd' with no arguments, it will serve up any
packages in /var/spool/pkg and if I give '-d SOMEDIR', it will do so for
'SOMEDIR'.  Hmmm...


From esproul at omniti.com  Fri Apr  4 20:58:15 2014
From: esproul at omniti.com (Eric Sproul)
Date: Fri, 4 Apr 2014 16:58:15 -0400
Subject: [OmniOS-discuss] Installing local packages?
In-Reply-To: <bec81e0101d4de6a7f39dc75e7438da2.squirrel@webmail.druber.com>
References: <CAEc-0iWuU5+O1B+E5yo+APFHK1WMX5iBYws9smiut23bb4Q2pA@mail.gmail.com>
	<1393535433.707.6.camel@exilis.si-consulting.us>
	<3b67171d9cc159088bcb71bc9862f73a.squirrel@webmail.druber.com>
	<27167B29-1E7A-4DE2-B4A8-BBA3A3FF09AF@omniti.com>
	<bec81e0101d4de6a7f39dc75e7438da2.squirrel@webmail.druber.com>
Message-ID: <CA+QY2RRBubzAOp3dpdPjNP82VLesQZa_scUHXaBoddQ07_XURQ@mail.gmail.com>

On Fri, Apr 4, 2014 at 4:14 PM, Dan Swartzendruber <dswartz at druber.com> wrote:
> Ah, okay, that's got it, thanks!  Kinda puzzle at the manpage which seems
> to be telling me if I do 'pkgadd' with no arguments, it will serve up any
> packages in /var/spool/pkg and if I give '-d SOMEDIR', it will do so for
> 'SOMEDIR'.  Hmmm...

These are SVR4 packages, which can exist either as a "datastream"
(single-file archive) or as a "file system", which is a directory
layout.  In SVR4 parlance, -d means "device" which could be a file,
directory or any other block or character device.

The man page of pkgtrans(1) has gory details if you're morbidly curious.

Eric

From dswartz at druber.com  Fri Apr  4 21:05:46 2014
From: dswartz at druber.com (Dan Swartzendruber)
Date: Fri, 4 Apr 2014 17:05:46 -0400
Subject: [OmniOS-discuss] Installing local packages?
In-Reply-To: <CA+QY2RRBubzAOp3dpdPjNP82VLesQZa_scUHXaBoddQ07_XURQ@mail.gmail.com>
References: <CAEc-0iWuU5+O1B+E5yo+APFHK1WMX5iBYws9smiut23bb4Q2pA@mail.gmail.com>
	<1393535433.707.6.camel@exilis.si-consulting.us>
	<3b67171d9cc159088bcb71bc9862f73a.squirrel@webmail.druber.com>
	<27167B29-1E7A-4DE2-B4A8-BBA3A3FF09AF@omniti.com>
	<bec81e0101d4de6a7f39dc75e7438da2.squirrel@webmail.druber.com>
	<CA+QY2RRBubzAOp3dpdPjNP82VLesQZa_scUHXaBoddQ07_XURQ@mail.gmail.com>
Message-ID: <aa34ed2fb3315f6610829a4e8502d69b.squirrel@webmail.druber.com>

> On Fri, Apr 4, 2014 at 4:14 PM, Dan Swartzendruber <dswartz at druber.com>
> wrote:
>> Ah, okay, that's got it, thanks!  Kinda puzzle at the manpage which
>> seems
>> to be telling me if I do 'pkgadd' with no arguments, it will serve up
>> any
>> packages in /var/spool/pkg and if I give '-d SOMEDIR', it will do so for
>> 'SOMEDIR'.  Hmmm...
>
> These are SVR4 packages, which can exist either as a "datastream"
> (single-file archive) or as a "file system", which is a directory
> layout.  In SVR4 parlance, -d means "device" which could be a file,
> directory or any other block or character device.

Yeah, I get that.  What I don't get is the manpage telling me 'pkgadd -d
/foo' will install any packages in the directory '/foo', but it doesn't :(
 Either I am stupid or the wording in the manpage is confusing.  At any
rate, it worked finally :)


From johan.kragsterman at capvert.se  Mon Apr  7 09:19:05 2014
From: johan.kragsterman at capvert.se (Johan Kragsterman)
Date: Mon, 7 Apr 2014 11:19:05 +0200
Subject: [OmniOS-discuss] crash
Message-ID: <OF268EA78E.A066D404-ONC1257CB3.0031E16A-C1257CB3.00332F80@inse.com>


Hej!


Got a crash here, that I would like someone have a look at.

Hardware is a Dell T5500 workstation with dual Xeon L5520 and 36 GB ram, OS/rpool on an Intel SSD SLC, "mainppol" on mirrored Seagate ST4000VN000(new) with an SSD Samsung 840 EVO(new) as L2arc.
Disabled bge0 on mo'bo', and a quad intel gbit nic as the working interfaces.


I run a single kvm vm, edubuntu 13.10 on the machine. The crash came when I built a new chroot environment for the ltsp thin client system.


I give you the info about the crash and what I've done to get it visible here:


OmniOS 5.11     omnios-6de5e81  2013.11.27

OmniOS v11 r151008

root at omni:/var/crash/unknown# ls
bounds  unix.0  vmcore.0  vmdump.0
root at omni:/var/crash/unknown# mdb -k unix.0 vmcore.0
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs sata sd ip hook neti sockfs arp usba uhci stmf stmf_sbd md lofs random idm nfs crypto ptm kvm cpc smbsrv ufs logindmux nsmb ]
> ::status
debugging crash dump vmcore.0 (64-bit) from omni
operating system: 5.11 omnios-6de5e81 (i86pc)
image uuid: a5e10116-5ed1-68ce-eba1-86f6ade3d5f5
panic message: I/O to pool 'mainpool' appears to be hung.
dump content: kernel pages only
> ::stack
vpanic()
vdev_deadman+0x10b(ffffff0a277f0540)
vdev_deadman+0x4a(ffffff0a1eea6040)
vdev_deadman+0x4a(ffffff0a1dfea580)
spa_deadman+0xad(ffffff0a1cd8a580)
cyclic_softint+0xf3(fffffffffbc30d20, 0)
cbe_low_level+0x14()
av_dispatch_softvect+0x78(2)
dispatch_softint+0x39(0, 0)
switch_sp_and_call+0x13()
dosoftint+0x44(ffffff0045805a50)
do_interrupt+0xba(ffffff0045805a50, 1)
_interrupt+0xba()
acpi_cpu_cstate+0x11b(ffffff0a1ce9e670)
cpu_acpi_idle+0x8d()
cpu_idle_adaptive+0x13()
idle+0xa7()
thread_start+8()
> ::msgbuf
MESSAGE                                                               
NOTICE: vnic1001 link down
NOTICE: e1000g3 link up, 1000 Mbps, full duplex
NOTICE: vnic1001 link up, 1000 Mbps, unknown duplex
unhandled rdmsr: 0xff31ca8c
unhandled wrmsr: 0x526849 data 8
unhandled rdmsr: 0xff31ca8c
unhandled wrmsr: 0x526849 data 8
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0
unhandled wrmsr: 0x0 data 0           
vcpu 1 received sipi with vector # 10
kvm_lapic_reset: vcpu=ffffff0a36c8e000, id=1, base_msr= fee00800 PRIx64 base_add
ress=fee00000
vcpu 2 received sipi with vector # 10
kvm_lapic_reset: vcpu=ffffff0a36c86000, id=2, base_msr= fee00800 PRIx64 base_add
ress=fee00000
vcpu 3 received sipi with vector # 10
kvm_lapic_reset: vcpu=ffffff0a36c7e000, id=3, base_msr= fee00800 PRIx64 base_add
ress=fee00000
vcpu 4 received sipi with vector # 10
vcpu 7 received sipi with vector # 10
vcpu 6 received sipi with vector # 10
kvm_lapic_reset: vcpu=ffffff0a36cbe000, id=7, base_msr= fee00800 PRIx64 base_add
ress=fee00000
kvm_lapic_reset: vcpu=ffffff0a36cc6000, id=6, base_msr= fee00800 PRIx64 base_add
ress=fee00000
vcpu 5 received sipi with vector # 10
kvm_lapic_reset: vcpu=ffffff0a36c76000, id=4, base_msr= fee00800 PRIx64 base_add
ress=fee00000
kvm_lapic_reset: vcpu=ffffff0a36cce000, id=5, base_msr= fee00800 PRIx64 base_add
ress=fee00000
unhandled wrmsr: 0x0 data 0
vcpu 1 received sipi with vector # 98 
kvm_lapic_reset: vcpu=ffffff0a36c8e000, id=1, base_msr= fee00800 PRIx64 base_add
ress=fee00000
vcpu 2 received sipi with vector # 98
kvm_lapic_reset: vcpu=ffffff0a36c86000, id=2, base_msr= fee00800 PRIx64 base_add
ress=fee00000
vcpu 3 received sipi with vector # 98
kvm_lapic_reset: vcpu=ffffff0a36c7e000, id=3, base_msr= fee00800 PRIx64 base_add
ress=fee00000
vcpu 4 received sipi with vector # 98
kvm_lapic_reset: vcpu=ffffff0a36c76000, id=4, base_msr= fee00800 PRIx64 base_add
ress=fee00000
vcpu 5 received sipi with vector # 98
kvm_lapic_reset: vcpu=ffffff0a36cce000, id=5, base_msr= fee00800 PRIx64 base_add
ress=fee00000
vcpu 6 received sipi with vector # 98
kvm_lapic_reset: vcpu=ffffff0a36cc6000, id=6, base_msr= fee00800 PRIx64 base_add
ress=fee00000
vcpu 7 received sipi with vector # 98
kvm_lapic_reset: vcpu=ffffff0a36cbe000, id=7, base_msr= fee00800 PRIx64 base_add
ress=fee00000
unhandled rdmsr: 0xfe89f030
unhandled wrmsr: 0x525f43 data 2000000001
unhandled rdmsr: 0xfe89f030           
unhandled wrmsr: 0x525f43 data 2000000001
unhandled rdmsr: 0xfe89f030
unhandled wrmsr: 0x525f43 data 2000000001
unhandled rdmsr: 0xfe89f030
unhandled wrmsr: 0x525f43 data 2000000001
unhandled rdmsr: 0xfe89f030
unhandled wrmsr: 0x525f43 data 2000000001
unhandled rdmsr: 0xfe89f030
unhandled wrmsr: 0x525f43 data 2000000001
unhandled rdmsr: 0xff31ca8c
unhandled wrmsr: 0x525f43 data 10000000001
unhandled rdmsr: 0xff31ca8c
unhandled wrmsr: 0x525f43 data 10000000001
unhandled rdmsr: 0xff31ca8c
unhandled wrmsr: 0x525f43 data 10000000001
unhandled rdmsr: 0xff31ca8c
unhandled wrmsr: 0x525f43 data 10000000001
unhandled rdmsr: 0xff31ca8c
unhandled wrmsr: 0x525f43 data 10000000001
unhandled rdmsr: 0xff31ca8c
unhandled wrmsr: 0x525f43 data 10000000001
unhandled rdmsr: 0xff31ca8c
unhandled wrmsr: 0x525f43 data 10000000001
unhandled rdmsr: 0xff31ca8c
unhandled wrmsr: 0x525f43 data 10000000001
unhandled rdmsr: 0xff31ca8c
unhandled wrmsr: 0x525f43 data 10000000001
unhandled rdmsr: 0xff31ca8c
unhandled wrmsr: 0x525f43 data 10000000001
NOTICE: e1000g3 link down
NOTICE: vnic1001 link down
NOTICE: e1000g3 link up, 100 Mbps, full duplex
NOTICE: vnic1001 link up, 100 Mbps, unknown duplex
WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a5a545088 timed out

WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a5dc38160 timed out

WARNING: ahci0: watchdog port 0 satapkt 0xffffff0a5dc642e0 timed out

WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a57020388 timed out

WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a57020388 timed out

WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a57020388 timed out

WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a57020388 timed out

WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a57020388 timed out

WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a57020388 timed out

WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a57020388 timed out

WARNING: ahci0: watchdog port 0 satapkt 0xffffff0a5fe32b90 timed out

WARNING: ahci0: watchdog port 0 satapkt 0xffffff0a5fe32b90 timed out

WARNING: ahci0: watchdog port 0 satapkt 0xffffff0a5fe32b90 timed out

WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a5fe32b90 timed out

WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a5fe32b90 timed out

WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a5fe32b90 timed out

WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a5fe32b90 timed out

WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a5fe32b90 timed out

WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a5fe32b90 timed out

NOTICE: SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major


panic[cpu0]/thread=ffffff00458cbc40: 
I/O to pool 'mainpool' appears to be hung.


ffffff00458cba20 zfs:vdev_deadman+10b ()
ffffff00458cba70 zfs:vdev_deadman+4a ()
ffffff00458cbac0 zfs:vdev_deadman+4a ()
ffffff00458cbaf0 zfs:spa_deadman+ad ()
ffffff00458cbb90 genunix:cyclic_softint+f3 ()
ffffff00458cbba0 unix:cbe_low_level+14 ()
ffffff00458cbbf0 unix:av_dispatch_softvect+78 ()
ffffff00458cbc20 unix:dispatch_softint+39 ()
ffffff00458059a0 unix:switch_sp_and_call+13 ()
ffffff00458059e0 unix:dosoftint+44 ()
ffffff0045805a40 unix:do_interrupt+ba ()
ffffff0045805a50 unix:cmnint+ba ()
ffffff0045805bc0 unix:acpi_cpu_cstate+11b ()
ffffff0045805bf0 unix:cpu_acpi_idle+8d ()
ffffff0045805c00 unix:cpu_idle_adaptive+13 ()
ffffff0045805c20 unix:idle+a7 ()      
ffffff0045805c30 unix:thread_start+8 ()

syncing file systems...               
 done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
NOTICE: ahci0: ahci_tran_reset_dport port 0 reset port


Would be nice to get some info about this from someone that got some more clues than I got...


Best regards from/Med v?nliga h?lsningar fr?n

Johan Kragsterman

Capvert


From skiselkov.ml at gmail.com  Mon Apr  7 09:37:50 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Mon, 07 Apr 2014 11:37:50 +0200
Subject: [OmniOS-discuss] crash
In-Reply-To: <OF268EA78E.A066D404-ONC1257CB3.0031E16A-C1257CB3.00332F80@inse.com>
References: <OF268EA78E.A066D404-ONC1257CB3.0031E16A-C1257CB3.00332F80@inse.com>
Message-ID: <534271EE.70903@gmail.com>

On 4/7/14, 11:19 AM, Johan Kragsterman wrote:
> 
> Hej!
> 
> 
> Got a crash here, that I would like someone have a look at.
> 
> [..snip..]
> 
>> ::stack
> vpanic()
> vdev_deadman+0x10b(ffffff0a277f0540)
> vdev_deadman+0x4a(ffffff0a1eea6040)
> vdev_deadman+0x4a(ffffff0a1dfea580)
> spa_deadman+0xad(ffffff0a1cd8a580)
> cyclic_softint+0xf3(fffffffffbc30d20, 0)
> cbe_low_level+0x14()
> av_dispatch_softvect+0x78(2)
> dispatch_softint+0x39(0, 0)
> switch_sp_and_call+0x13()
> dosoftint+0x44(ffffff0045805a50)
> do_interrupt+0xba(ffffff0045805a50, 1)
> _interrupt+0xba()
> acpi_cpu_cstate+0x11b(ffffff0a1ce9e670)
> cpu_acpi_idle+0x8d()
> cpu_idle_adaptive+0x13()
> idle+0xa7()
> thread_start+8()
> [..snip..]
> WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a5a545088 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a5dc38160 timed out
> 
> WARNING: ahci0: watchdog port 0 satapkt 0xffffff0a5dc642e0 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a57020388 timed out
> 
> WARNING: ahci0: watchdog port 0 satapkt 0xffffff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 0 satapkt 0xffffff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 0 satapkt 0xffffff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 1 satapkt 0xffffff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a5fe32b90 timed out
> 
> WARNING: ahci0: watchdog port 2 satapkt 0xffffff0a5fe32b90 timed out
> 
> NOTICE: SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
> 
> 
> panic[cpu0]/thread=ffffff00458cbc40: 
> I/O to pool 'mainpool' appears to be hung.
> 
> 
> ffffff00458cba20 zfs:vdev_deadman+10b ()
> ffffff00458cba70 zfs:vdev_deadman+4a ()
> ffffff00458cbac0 zfs:vdev_deadman+4a ()
> ffffff00458cbaf0 zfs:spa_deadman+ad ()
> ffffff00458cbb90 genunix:cyclic_softint+f3 ()
> ffffff00458cbba0 unix:cbe_low_level+14 ()
> ffffff00458cbbf0 unix:av_dispatch_softvect+78 ()
> ffffff00458cbc20 unix:dispatch_softint+39 ()
> ffffff00458059a0 unix:switch_sp_and_call+13 ()
> ffffff00458059e0 unix:dosoftint+44 ()
> ffffff0045805a40 unix:do_interrupt+ba ()
> ffffff0045805a50 unix:cmnint+ba ()
> ffffff0045805bc0 unix:acpi_cpu_cstate+11b ()
> ffffff0045805bf0 unix:cpu_acpi_idle+8d ()
> ffffff0045805c00 unix:cpu_idle_adaptive+13 ()
> ffffff0045805c20 unix:idle+a7 ()      
> ffffff0045805c30 unix:thread_start+8 ()
> 
> syncing file systems...               
>  done
> dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
> NOTICE: ahci0: ahci_tran_reset_dport port 0 reset port
> 
> Would be nice to get some info about this from someone that got some more clues than I got...

Essentially, this says that your SATA controller hung in a bad state
that isn't recoverable:
https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/fs/zfs/spa_misc.c#L256-L261

I'd suspect the SATA controller. If this panic comes with any
regularity, try working around the SATA controller by using a substitute
HBA and disabling the old one to see if it goes away.

Cheers,
-- 
Saso

From johan.kragsterman at capvert.se  Mon Apr  7 09:57:19 2014
From: johan.kragsterman at capvert.se (Johan Kragsterman)
Date: Mon, 7 Apr 2014 11:57:19 +0200
Subject: [OmniOS-discuss] crash
In-Reply-To: <534271EE.70903@gmail.com>
References: <534271EE.70903@gmail.com>,
	<OF268EA78E.A066D404-ONC1257CB3.0031E16A-C1257CB3.00332F80@inse.com>
Message-ID: <OF8489B8FA.73540C66-ONC1257CB3.00355683-C1257CB3.0036AFE1@inse.com>

An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140407/1c7ec272/attachment-0001.html>

From jesus at omniti.com  Tue Apr  8 01:51:46 2014
From: jesus at omniti.com (Theo Schlossnagle)
Date: Mon, 7 Apr 2014 21:51:46 -0400
Subject: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160
Message-ID: <CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>

Today was an unfortunate day for the Internet as a particularly devastating
and quite longstanding bug was reveal in OpenSSL 1.0.1.

OmniOS uses OpenSSL 1.0.1 and, like all other distributions (regardless of
operating system) that use OpenSSL 1.0.1, is vulnerable.

While I'd normally link to the CVE directly, there is a particularly well
organized site dedicated to this bug with many reference documents linked
from it.  If you are interested in the details of the bug (and if you care
about security, you should be interested), please visit
http://heartbleed.com/

Earlier today we updated our builds to use OpenSSL 1.0.1g which addresses
this particular bug (CVE-2014-0160).  We've rerolled and published packages
for all supported OmniOS releases: bloody, r151008 and r151006LTS

The package FMRIs are as follows:

For r151006 LTS:
pkg://omnios/library/security/openssl at 1.0.1.7,5.11-0.151006:20140407T211430Z

For r151008:
pkg://omnios/library/security/openssl at 1.0.1.7,5.11-0.151008:20140407T220403Z

For bloody:
pkg://omnios/library/security/openssl at 1.0.1.7,5.11-0.151009:20140407T211119Z

These packages do not require a new BE or a reboot.  You can perform this
upgrade with minimal service interruption. Please update your systems now
and restart any services that link against OpenSSL libraries to arrive at a
safe state.

On a side note. April 7th is National Beer Day and an OmniTI corporate
holiday.  We considered this security issue critical enough to stop
drinking beer and dive into providing updates.  If we thought this security
issue warranted interruption of our celebration of National Beer Day, you
too should take it very seriously.

Best regards,

Theo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140407/b7f5dc46/attachment.html>

From Kevin.Swab at ColoState.EDU  Tue Apr  8 03:32:22 2014
From: Kevin.Swab at ColoState.EDU (Kevin Swab)
Date: Mon, 07 Apr 2014 21:32:22 -0600
Subject: [OmniOS-discuss] kernel panic
Message-ID: <53436DC6.6080208@ColoState.EDU>

I've got OmniOS 151008j running on a home file server, and the other day
it went into a reboot loop, displaying a kernel panic on the console
just after the kernel banner was printed.

The panic message on screen showed some zfs function calls so following
that lead, I booted off the install media, mounted my root pool and
removed /etc/zpool.cache.  The system was able to boot after that but
when I attempt to import the pool containing my data, it panics again.

FMD shows that a reboot occurred after a kernel panic, and says more
info is available from fmdump.  Here's the stack trace from 'fmdump':

# fmdump -Vp -u 38f6aa49-6c97-4675-b526-e455b1ae215b
TIME                           UUID
SUNW-MSG-ID
Apr 07 2014 21:03:45.097921000 38f6aa49-6c97-4675-b526-e455b1ae215b
SUNOS-8000-KL

  TIME                 CLASS                                 ENA
  Apr 07 21:03:45.0237 ireport.os.sunos.panic.dump_available
0x0000000000000000
  Apr 07 21:03:03.8496 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000

nvlist version: 0
        version = 0x0
        class = list.suspect
        uuid = 38f6aa49-6c97-4675-b526-e455b1ae215b
        code = SUNOS-8000-KL
        diag-time = 1396926225 62791
        de = fmd:///module/software-diagnosis
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
        nvlist version: 0
                version = 0x0
                class = defect.sunos.kernel.panic
                certainty = 0x64
                asru =
sw:///:path=/var/crash/unknown/.38f6aa49-6c97-4675-b526-e455b1ae215b
                resource =
sw:///:path=/var/crash/unknown/.38f6aa49-6c97-4675-b526-e455b1ae215b
                savecore-succcess = 1
                dump-dir = /var/crash/unknown
                dump-files = vmdump.1
                os-instance-uuid = 38f6aa49-6c97-4675-b526-e455b1ae215b
                panicstr = BAD TRAP: type=e (#pf Page fault)
rp=ffffff000fadafc0 addr=2b8 occurred in module "unix" due to a NULL
pointer dereference
                panicstack = unix:die+df () | unix:trap+db3 () |
unix:cmntrap+e6 () | unix:mutex_enter+b () | zfs:zio_buf_alloc+25 () |
zfs:arc_get_data_buf+2b8 () | zfs:arc_buf_alloc+b5 () | zfs:arc_read+42b
() | zfs:dsl_scan_prefetch+a7 () | zfs:dsl_scan_recurse+16f () |
zfs:dsl_scan_visitbp+eb () | zfs:dsl_scan_visitdnode+bd () |
zfs:dsl_scan_recurse+439 () | zfs:dsl_scan_visitbp+eb () |
zfs:dsl_scan_visit_rootbp+61 () | zfs:dsl_scan_visit+26b () |
zfs:dsl_scan_sync+12f () | zfs:spa_sync+334 () | zfs:txg_sync_thread+227
() | unix:thread_start+8 () |
                crashtime = 1396801998
                panic-time = Sun Apr  6 10:33:18 2014 MDT
        (end fault-list[0])

        fault-status = 0x1
        severity = Major
        __ttl = 0x1
        __tod = 0x53436711 0x5d627e8


I'd really like to recover the data on that pool if possible, any
suggestions on what I can try next?

Thanks,
Kevin


From jimklimov at cos.ru  Tue Apr  8 13:35:27 2014
From: jimklimov at cos.ru (Jim Klimov)
Date: Tue, 08 Apr 2014 15:35:27 +0200
Subject: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160
In-Reply-To: <CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
References: <CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
Message-ID: <5343FB1F.4040300@cos.ru>

On 2014-04-08 03:51, Theo Schlossnagle wrote:
> Today was an unfortunate day for the Internet as a particularly
> devastating and quite longstanding bug was reveal in OpenSSL 1.0.1.

Thanks for the heads-up!

Can anyone please elaborate on this question, though: some of the
legacy systems (i.e. Solaris 10 based) out in the field have not,
in fact, seen or used OpenSSL past 0.9.8-something; and ran some
SSL-protected email, openvpn, web or ldap services (though the
latter is probably using some java security layer). It is however
not known what SSL implementations and versions were used by the
users of these systems. Are such setups vulnerable (given that
the server side had no heartbeat handshake code with the bug) to
the extent that everything should be urgently upgraded or not?

Thanks,
//Jim

From skiselkov.ml at gmail.com  Tue Apr  8 13:44:23 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Tue, 08 Apr 2014 15:44:23 +0200
Subject: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160
In-Reply-To: <5343FB1F.4040300@cos.ru>
References: <CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
	<5343FB1F.4040300@cos.ru>
Message-ID: <5343FD37.9000605@gmail.com>

On 4/8/14, 3:35 PM, Jim Klimov wrote:
> On 2014-04-08 03:51, Theo Schlossnagle wrote:
>> Today was an unfortunate day for the Internet as a particularly
>> devastating and quite longstanding bug was reveal in OpenSSL 1.0.1.
> 
> Thanks for the heads-up!
> 
> Can anyone please elaborate on this question, though: some of the
> legacy systems (i.e. Solaris 10 based) out in the field have not,
> in fact, seen or used OpenSSL past 0.9.8-something; and ran some
> SSL-protected email, openvpn, web or ldap services (though the
> latter is probably using some java security layer). It is however
> not known what SSL implementations and versions were used by the
> users of these systems. Are such setups vulnerable (given that
> the server side had no heartbeat handshake code with the bug) to
> the extent that everything should be urgently upgraded or not?

Anything below OpenSSL 1.0.0 (inclusive) isn't vulnerable to this. (Most
legacy systems, including OI, still run on the OpenSSL 0.9.8
release train)

Cheers,
-- 
Saso

From dswartz at druber.com  Tue Apr  8 13:45:30 2014
From: dswartz at druber.com (Dan Swartzendruber)
Date: Tue, 8 Apr 2014 09:45:30 -0400
Subject: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160
In-Reply-To: <CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
References: <CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
Message-ID: <46dd41948454127d1c756bdb1caf34bf.squirrel@webmail.druber.com>

@1.0.1.7,5.11-0.151009:20140407T211119Z
>
> These packages do not require a new BE or a reboot.  You can perform this
> upgrade with minimal service interruption. Please update your systems now
> and restart any services that link against OpenSSL libraries to arrive at
> a
> safe state.

Theo, I am puzzled.  I updated my box, and it did create a boot
environment with the fix in it, so I can't get it until I reboot...  Maybe
I updated the wrong way?  I did 'pkg image-update' which is how I usually
do things...


From johan.kragsterman at capvert.se  Tue Apr  8 13:49:34 2014
From: johan.kragsterman at capvert.se (Johan Kragsterman)
Date: Tue, 8 Apr 2014 15:49:34 +0200
Subject: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160
In-Reply-To: <46dd41948454127d1c756bdb1caf34bf.squirrel@webmail.druber.com>
References: <46dd41948454127d1c756bdb1caf34bf.squirrel@webmail.druber.com>,
	<CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
Message-ID: <OFFC96F383.1FAA8FF4-ONC1257CB4.004BF300-C1257CB4.004BF303@inse.com>

An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140408/15d0694e/attachment.html>

From gmason at msu.edu  Tue Apr  8 14:18:44 2014
From: gmason at msu.edu (Greg Mason)
Date: Tue, 8 Apr 2014 10:18:44 -0400
Subject: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160
In-Reply-To: <46dd41948454127d1c756bdb1caf34bf.squirrel@webmail.druber.com>
References: <CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
	<46dd41948454127d1c756bdb1caf34bf.squirrel@webmail.druber.com>
Message-ID: <D71AB8CD-2DDF-46EC-8989-9D1BD3355157@msu.edu>

> 
> Theo, I am puzzled.  I updated my box, and it did create a boot
> environment with the fix in it, so I can't get it until I reboot...  Maybe
> I updated the wrong way?  I did 'pkg image-update' which is how I usually
> do things?

Dan,

If you simply do a ?pkg install? or ?pkg update? it will install the new OpenSSL package in the current BE.

-Greg
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


From dswartz at druber.com  Tue Apr  8 14:21:35 2014
From: dswartz at druber.com (Dan Swartzendruber)
Date: Tue, 8 Apr 2014 10:21:35 -0400
Subject: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160
In-Reply-To: <OFFC96F383.1FAA8FF4-ONC1257CB4.004BF300-C1257CB4.004BF303@inse.com>
References: <46dd41948454127d1c756bdb1caf34bf.squirrel@webmail.druber.com>,
	<CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
	<OFFC96F383.1FAA8FF4-ONC1257CB4.004BF300-C1257CB4.004BF303@inse.com>
Message-ID: <3941ee54fe1eca513abb21bec1d52634.squirrel@webmail.druber.com>

>
> Hi, Dan!
>
> I just did a pkg install, and it worked like a charm, no new BE...you can
> probably remove the one you just did, and do a new pkg install

Worked great, thanks!


From jimklimov at cos.ru  Tue Apr  8 14:24:00 2014
From: jimklimov at cos.ru (Jim Klimov)
Date: Tue, 08 Apr 2014 16:24:00 +0200
Subject: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160
In-Reply-To: <5343FD37.9000605@gmail.com>
References: <CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
	<5343FB1F.4040300@cos.ru> <5343FD37.9000605@gmail.com>
Message-ID: <53440680.6020407@cos.ru>

On 2014-04-08 15:44, Saso Kiselkov wrote:
> Anything below OpenSSL 1.0.0 (inclusive) isn't vulnerable to this. (Most
> legacy systems, including OI, still run on the OpenSSL 0.9.8
> release train)

Thanks, I've read that statement ;)

I just wanted to make sure that if we have an OpenSSL 0.9.8 enabled
server and an OpenSSL 1.0.1* (vulnerable) client, and someone has
sniffed and saved the traffic, does indeed or does not that disclose
the sensitive data?

For instance, I can't yet figure out if this heartbeat handshake is
something new introduced in 1.0.1 series and so the whole procedure
is skipped when a new OpenSSL connects with an old OpenSSL? Or not?..

Thanks,
//Jim


From dswartz at druber.com  Tue Apr  8 15:00:38 2014
From: dswartz at druber.com (Dan Swartzendruber)
Date: Tue, 8 Apr 2014 11:00:38 -0400
Subject: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160
In-Reply-To: <D71AB8CD-2DDF-46EC-8989-9D1BD3355157@msu.edu>
References: <CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
	<46dd41948454127d1c756bdb1caf34bf.squirrel@webmail.druber.com>
	<D71AB8CD-2DDF-46EC-8989-9D1BD3355157@msu.edu>
Message-ID: <f855c682fda31d5b5af3d47a75c8ba40.squirrel@webmail.druber.com>

>>
>> Theo, I am puzzled.  I updated my box, and it did create a boot
>> environment with the fix in it, so I can't get it until I reboot...
>> Maybe
>> I updated the wrong way?  I did 'pkg image-update' which is how I
>> usually
>> do things

>
> Dan,
>
> If you simply do a ?pkg install? or ?pkg update? it will install the new
> OpenSSL package in the current BE.

Yes, that worked, thanks.  I've just been used to 'pkg image-install'...


From cks at cs.toronto.edu  Tue Apr  8 15:13:42 2014
From: cks at cs.toronto.edu (Chris Siebenmann)
Date: Tue, 08 Apr 2014 11:13:42 -0400
Subject: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160
In-Reply-To: Your message of Tue, 08 Apr 2014 16:24:00 +0200.
	<53440680.6020407@cos.ru>
Message-ID: <20140408151342.2BDB41A0463@apps0.cs.toronto.edu>

| On 2014-04-08 15:44, Saso Kiselkov wrote:
| > Anything below OpenSSL 1.0.0 (inclusive) isn't vulnerable to this. (Most
| > legacy systems, including OI, still run on the OpenSSL 0.9.8
| > release train)
| 
| Thanks, I've read that statement ;)
| 
| I just wanted to make sure that if we have an OpenSSL 0.9.8 enabled
| server and an OpenSSL 1.0.1* (vulnerable) client, and someone has
| sniffed and saved the traffic, does indeed or does not that disclose
| the sensitive data?
| 
| For instance, I can't yet figure out if this heartbeat handshake is
| something new introduced in 1.0.1 series and so the whole procedure is
| skipped when a new OpenSSL connects with an old OpenSSL? Or not?..

 My understanding of the bug is that it requires active exploitation by
one end of the connection (either the client against the server or the
server against the client, if the client holds any sensitive material).
It's not a passive bug that can be exploited by a third party that is
just listening in because it involves introducing a deliberate protocol
violation[*].

 The bug is only present in OpenSSL versions that support heartbeats.
This was apparently introduced in 1.0.1, which dates from early 2012
(and is closed in 1.0.1g or patched versions of earlier 1.0.1 releases).

	- cks
[*: very crudely summarized, the bug is that you send the other end a
    heartbeat request that says 'echo back these 64K bytes' but
    don't actually supply anywhere near that many bytes to echo back.
    The other end then overruns your input buffer and sends you back
    whatever memory was beyond it.
]

From mir at miras.org  Tue Apr  8 15:15:23 2014
From: mir at miras.org (Michael Rasmussen)
Date: Tue, 8 Apr 2014 17:15:23 +0200
Subject: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160
In-Reply-To: <f855c682fda31d5b5af3d47a75c8ba40.squirrel@webmail.druber.com>
References: <CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
	<46dd41948454127d1c756bdb1caf34bf.squirrel@webmail.druber.com>
	<D71AB8CD-2DDF-46EC-8989-9D1BD3355157@msu.edu>
	<f855c682fda31d5b5af3d47a75c8ba40.squirrel@webmail.druber.com>
Message-ID: <20140408171523.51b11654@sleipner.datanom.net>

On Tue, 8 Apr 2014 11:00:38 -0400
"Dan Swartzendruber" <dswartz at druber.com> wrote:

> 
> Yes, that worked, thanks.  I've just been used to 'pkg image-install'...
> 
The reason why someone, like me, are seeing a new BE being created is
that a pkg update will pull in driver/storage/mpt_sas.

This is strange since this is an upgrade after r151008j but no formal
release is mentioning driver/storage/mpt_sas so I wonder where this is
coming from?!!!

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
How's it going in those MODULAR LOVE UNITS??
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20140408/cc74a3a4/attachment.bin>

From esproul at omniti.com  Tue Apr  8 15:15:49 2014
From: esproul at omniti.com (Eric Sproul)
Date: Tue, 8 Apr 2014 11:15:49 -0400
Subject: [OmniOS-discuss] OmniOS OpenSSL 1.0.1g and CVE-2014-0160
In-Reply-To: <CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
References: <CACLsApuxK_dpgpwFjtNt2nJQUc363rgPuC7P1UjnLX7CPxK8cg@mail.gmail.com>
Message-ID: <CA+QY2RR1ecyzQyzEaOpuwb8rJKBnOKU5q7xS3X4yUP3ZX1bWpw@mail.gmail.com>

On Mon, Apr 7, 2014 at 9:51 PM, Theo Schlossnagle <jesus at omniti.com> wrote:
> For r151008:
> pkg://omnios/library/security/openssl at 1.0.1.7,5.11-0.151008:20140407T220403Z
>

FYI, I just re-spun the r151008 package to clear up an issue where the
unsigned manifest appeared in the repo catalog alongside the signed
version.  It's a quirk^Wfeature of how pkg(5) does signing that it
does not alter the version of the package, so effectively we had two
different hashes for the same "version" of the openssl manifest.  This
caused confusion for some pkg* tools and sub-commands but not others.
For instance, update/install was *not* affected, but pkgrecv(1) was.

The new spin is
pkg://omnios/library/security/openssl at 1.0.1.7,5.11-0.151008:20140408T142844Z

Sorry for the inconvenience.  We've clarified our package signing
process to ensure this does not recur.

Eric

From groups at tierarzt-mueller.de  Tue Apr  8 18:13:15 2014
From: groups at tierarzt-mueller.de (Alexander Lesle)
Date: Tue, 8 Apr 2014 20:13:15 +0200
Subject: [OmniOS-discuss] Pool degraded
Message-ID: <1456071949.20140408201315@tierarzt-mueller.de>

Hello All,

I have a pool with mirrors and one spare.
Now my pool is degraded and I though that Omnios/ZFS activate the
spare itself and make a resilvering.

# zpool status -x
  pool: pool_ripley
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 84K in 0h0m with 0 errors on Sun Mar 23 15:09:08 2014
config:

        NAME                       STATE     READ WRITE CKSUM
        pool_ripley                DEGRADED     0     0     0
          mirror-0                 DEGRADED     0     0     0
            c1t5000CCA22BC16BC5d0  ONLINE       0     0     0
            c1t5000CCA22BEEF6A3d0  UNAVAIL      0     0     0  cannot open
          mirror-1                 ONLINE       0     0     0
            c1t5000CCA22BC8D31Ad0  ONLINE       0     0     0
            c1t5000CCA22BF612C4d0  ONLINE       0     0     0
          .
          .
          .

        spares
          c1t5000CCA22BF5B9DEd0    AVAIL

But nothing done.
OK, then I do it myself.
# zpool replace -f pool_ripley c1t5000CCA22BEEF6A3d0 c1t5000CCA22BF5B9DEd0
Resilvering is starting immediately.

# zpool status -x
  pool: pool_ripley
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 1.53T in 3h12m with 0 errors on Sun Apr  6 17:48:51 2014
config:

        NAME                         STATE     READ WRITE CKSUM
        pool_ripley                  DEGRADED     0     0     0
          mirror-0                   DEGRADED     0     0     0
            c1t5000CCA22BC16BC5d0    ONLINE       0     0     0
            spare-1                  DEGRADED     0     0     0
              c1t5000CCA22BEEF6A3d0  UNAVAIL      0     0     0  cannot open
              c1t5000CCA22BF5B9DEd0  ONLINE       0     0     0
          mirror-1                   ONLINE       0     0     0
            c1t5000CCA22BC8D31Ad0    ONLINE       0     0     0
            c1t5000CCA22BF612C4d0    ONLINE       0     0     0
         .
         .
         .
        spares
          c1t5000CCA22BF5B9DEd0      INUSE     currently in use

After resilvering I made power-off, unplugged the broken HDD from
Case-slot 1 and switched the Spare from Slot 21 to Slot 1.
The pool is still degraded. The broken HDD I cant remove it.

# zpool remove pool_ripley c1t5000CCA22BEEF6A3d0
cannot remove c1t5000CCA22BEEF6A3d0: only inactive hot spares,
cache, top-level, or log devices can be removed

What can I do to through out the broken HDD and tell ZFS that the
spare is now member of mirror-0 and remove it from the spare list?
Why does not automatically jump in the Spare device and resilver the
pool?

Thanks.

-- 
Best Regards
Alexander
April, 08 2014


From groups at tierarzt-mueller.de  Tue Apr  8 19:09:28 2014
From: groups at tierarzt-mueller.de (Alexander Lesle)
Date: Tue, 8 Apr 2014 21:09:28 +0200
Subject: [OmniOS-discuss] Pool degraded
In-Reply-To: <53443D37.7020307@ColoState.EDU>
References: <1456071949.20140408201315@tierarzt-mueller.de>
	<53443D37.7020307@ColoState.EDU>
Message-ID: <1697492574.20140408210928@tierarzt-mueller.de>

Hello Kevin Swab and List,

On April, 08 2014, 20:17 <Kevin Swab> wrote in [1]:

> Instead of a 'zpool remove ...', you want to do a 'zpool detach ...' to
> get rid of the old device.

thats it.
zpool detach ... "removes" the broken device from the pool.

> If you turn the 'autoreplace' property on
> for the pool, the spare will automatically kick in the next time a drive
> fails...

Are you sure? Because the man zpool tell me other:

,-----[ man zpool ]-----
|
| autoreplace=on | off
| 
|          Controls automatic device replacement. If set to  "off",
|          device  replacement must be initiated by the administra-
|          tor by using the "zpool  replace"  command.  If  set  to
|          "on",  any  new device, found in the same physical loca-
|          tion as a device that previously belonged to  the  pool,
|          is  automatically  formatted  and  replaced. The default
|          behavior is "off". This property can also be referred to
|          by its shortened column name, "replace".
|
`-------------------

I understand it that when I pull out a device and put a new device in
the _same_ Case-Slot ZFS make a resilver and ZFS pull out the old one
automatically.
When the property if off I have use the command zpool replace ... ...
what I have done.

But in my case, the spare device was in the Case and _named for_ this
pool
So the 'Hot Spares-Section' tells
,-----[ man zpool ]-----
|
| ZFS allows devices to  be  associated  with  pools  as  "hot
| spares".  These  devices  are not actively used in the pool,
| but  when  an  active  device  fails,  it  is  automatically
| replaced  by  a hot spare.
|
`-------------------

Or I have misunderstood.

> On 04/08/2014 12:13 PM, Alexander Lesle wrote:
>> Hello All,
>> 
>> I have a pool with mirrors and one spare.
>> Now my pool is degraded and I though that Omnios/ZFS activate the
>> spare itself and make a resilvering.
>> 
>> # zpool status -x
>>   pool: pool_ripley
>>  state: DEGRADED
>> status: One or more devices could not be opened.  Sufficient replicas exist for
>>         the pool to continue functioning in a degraded state.
>> action: Attach the missing device and online it using 'zpool online'.
>>    see: http://illumos.org/msg/ZFS-8000-2Q
>>   scan: resilvered 84K in 0h0m with 0 errors on Sun Mar 23 15:09:08 2014
>> config:
>> 
>>         NAME                       STATE     READ WRITE CKSUM
>>         pool_ripley                DEGRADED     0     0     0
>>           mirror-0                 DEGRADED     0     0     0
>>             c1t5000CCA22BC16BC5d0  ONLINE       0     0     0
>>             c1t5000CCA22BEEF6A3d0  UNAVAIL      0     0     0  cannot open
>>           mirror-1                 ONLINE       0     0     0
>>             c1t5000CCA22BC8D31Ad0  ONLINE       0     0     0
>>             c1t5000CCA22BF612C4d0  ONLINE       0     0     0
>>           .
>>           .
>>           .
>> 
>>         spares
>>           c1t5000CCA22BF5B9DEd0    AVAIL
>> 
>> But nothing done.
>> OK, then I do it myself.
>> # zpool replace -f pool_ripley c1t5000CCA22BEEF6A3d0 c1t5000CCA22BF5B9DEd0
>> Resilvering is starting immediately.
>> 
>> # zpool status -x
>>   pool: pool_ripley
>>  state: DEGRADED
>> status: One or more devices could not be opened.  Sufficient replicas exist for
>>         the pool to continue functioning in a degraded state.
>> action: Attach the missing device and online it using 'zpool online'.
>>    see: http://illumos.org/msg/ZFS-8000-2Q
>>   scan: resilvered 1.53T in 3h12m with 0 errors on Sun Apr  6 17:48:51 2014
>> config:
>> 
>>         NAME                         STATE     READ WRITE CKSUM
>>         pool_ripley                  DEGRADED     0     0     0
>>           mirror-0                   DEGRADED     0     0     0
>>             c1t5000CCA22BC16BC5d0    ONLINE       0     0     0
>>             spare-1                  DEGRADED     0     0     0
>>               c1t5000CCA22BEEF6A3d0  UNAVAIL      0     0     0  cannot open
>>               c1t5000CCA22BF5B9DEd0  ONLINE       0     0     0
>>           mirror-1                   ONLINE       0     0     0
>>             c1t5000CCA22BC8D31Ad0    ONLINE       0     0     0
>>             c1t5000CCA22BF612C4d0    ONLINE       0     0     0
>>          .
>>          .
>>          .
>>         spares
>>           c1t5000CCA22BF5B9DEd0      INUSE     currently in use
>> 
>> After resilvering I made power-off, unplugged the broken HDD from
>> Case-slot 1 and switched the Spare from Slot 21 to Slot 1.
>> The pool is still degraded. The broken HDD I cant remove it.
>> 
>> # zpool remove pool_ripley c1t5000CCA22BEEF6A3d0
>> cannot remove c1t5000CCA22BEEF6A3d0: only inactive hot spares,
>> cache, top-level, or log devices can be removed
>> 
>> What can I do to through out the broken HDD and tell ZFS that the
>> spare is now member of mirror-0 and remove it from the spare list?
>> Why does not automatically jump in the Spare device and resilver the
>> pool?
>> 
>> Thanks.
>> 


-- 
Best Regards
Alexander
April, 08 2014
........
[1] mid:53443D37.7020307 at ColoState.EDU
........


From Kevin.Swab at ColoState.EDU  Tue Apr  8 20:22:46 2014
From: Kevin.Swab at ColoState.EDU (Kevin Swab)
Date: Tue, 08 Apr 2014 14:22:46 -0600
Subject: [OmniOS-discuss] Pool degraded
In-Reply-To: <1697492574.20140408210928@tierarzt-mueller.de>
References: <1456071949.20140408201315@tierarzt-mueller.de>
	<53443D37.7020307@ColoState.EDU>
	<1697492574.20140408210928@tierarzt-mueller.de>
Message-ID: <53445A96.8090902@ColoState.EDU>

Hello, and sorry for accidentally failing to "reply-all" on your first
message...

The man page seems misleading or incomplete on the subject of
"autoreplace" and spares.  Setting 'autoreplace=on' should cause your
hot spare to kick in during a drive failure - with over 1100 spindles
running ZFS here, we've had the "opportunity" to test it many times! ;-)

I couldn't find any authoratative references for this, but here's a few
unautoratative ones:

http://my.safaribooksonline.com/book/operating-systems-and-server-administration/solaris/9780137049639/managing-storage-pools/ch02lev1sec7

http://stanley-huang.blogspot.com/2009/09/how-to-set-autoreplace-in-zfs-pool.html

http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm

Hope this helps,
Kevin

On 04/08/2014 01:09 PM, Alexander Lesle wrote:
> Hello Kevin Swab and List,
> 
> On April, 08 2014, 20:17 <Kevin Swab> wrote in [1]:
> 
>> Instead of a 'zpool remove ...', you want to do a 'zpool detach ...' to
>> get rid of the old device.
> 
> thats it.
> zpool detach ... "removes" the broken device from the pool.
> 
>> If you turn the 'autoreplace' property on
>> for the pool, the spare will automatically kick in the next time a drive
>> fails...
> 
> Are you sure? Because the man zpool tell me other:
> 
> ,-----[ man zpool ]-----
> |
> | autoreplace=on | off
> | 
> |          Controls automatic device replacement. If set to  "off",
> |          device  replacement must be initiated by the administra-
> |          tor by using the "zpool  replace"  command.  If  set  to
> |          "on",  any  new device, found in the same physical loca-
> |          tion as a device that previously belonged to  the  pool,
> |          is  automatically  formatted  and  replaced. The default
> |          behavior is "off". This property can also be referred to
> |          by its shortened column name, "replace".
> |
> `-------------------
> 
> I understand it that when I pull out a device and put a new device in
> the _same_ Case-Slot ZFS make a resilver and ZFS pull out the old one
> automatically.
> When the property if off I have use the command zpool replace ... ...
> what I have done.
> 
> But in my case, the spare device was in the Case and _named for_ this
> pool
> So the 'Hot Spares-Section' tells
> ,-----[ man zpool ]-----
> |
> | ZFS allows devices to  be  associated  with  pools  as  "hot
> | spares".  These  devices  are not actively used in the pool,
> | but  when  an  active  device  fails,  it  is  automatically
> | replaced  by  a hot spare.
> |
> `-------------------
> 
> Or I have misunderstood.
> 
>> On 04/08/2014 12:13 PM, Alexander Lesle wrote:
>>> Hello All,
>>>
>>> I have a pool with mirrors and one spare.
>>> Now my pool is degraded and I though that Omnios/ZFS activate the
>>> spare itself and make a resilvering.
>>>
>>> # zpool status -x
>>>   pool: pool_ripley
>>>  state: DEGRADED
>>> status: One or more devices could not be opened.  Sufficient replicas exist for
>>>         the pool to continue functioning in a degraded state.
>>> action: Attach the missing device and online it using 'zpool online'.
>>>    see: http://illumos.org/msg/ZFS-8000-2Q
>>>   scan: resilvered 84K in 0h0m with 0 errors on Sun Mar 23 15:09:08 2014
>>> config:
>>>
>>>         NAME                       STATE     READ WRITE CKSUM
>>>         pool_ripley                DEGRADED     0     0     0
>>>           mirror-0                 DEGRADED     0     0     0
>>>             c1t5000CCA22BC16BC5d0  ONLINE       0     0     0
>>>             c1t5000CCA22BEEF6A3d0  UNAVAIL      0     0     0  cannot open
>>>           mirror-1                 ONLINE       0     0     0
>>>             c1t5000CCA22BC8D31Ad0  ONLINE       0     0     0
>>>             c1t5000CCA22BF612C4d0  ONLINE       0     0     0
>>>           .
>>>           .
>>>           .
>>>
>>>         spares
>>>           c1t5000CCA22BF5B9DEd0    AVAIL
>>>
>>> But nothing done.
>>> OK, then I do it myself.
>>> # zpool replace -f pool_ripley c1t5000CCA22BEEF6A3d0 c1t5000CCA22BF5B9DEd0
>>> Resilvering is starting immediately.
>>>
>>> # zpool status -x
>>>   pool: pool_ripley
>>>  state: DEGRADED
>>> status: One or more devices could not be opened.  Sufficient replicas exist for
>>>         the pool to continue functioning in a degraded state.
>>> action: Attach the missing device and online it using 'zpool online'.
>>>    see: http://illumos.org/msg/ZFS-8000-2Q
>>>   scan: resilvered 1.53T in 3h12m with 0 errors on Sun Apr  6 17:48:51 2014
>>> config:
>>>
>>>         NAME                         STATE     READ WRITE CKSUM
>>>         pool_ripley                  DEGRADED     0     0     0
>>>           mirror-0                   DEGRADED     0     0     0
>>>             c1t5000CCA22BC16BC5d0    ONLINE       0     0     0
>>>             spare-1                  DEGRADED     0     0     0
>>>               c1t5000CCA22BEEF6A3d0  UNAVAIL      0     0     0  cannot open
>>>               c1t5000CCA22BF5B9DEd0  ONLINE       0     0     0
>>>           mirror-1                   ONLINE       0     0     0
>>>             c1t5000CCA22BC8D31Ad0    ONLINE       0     0     0
>>>             c1t5000CCA22BF612C4d0    ONLINE       0     0     0
>>>          .
>>>          .
>>>          .
>>>         spares
>>>           c1t5000CCA22BF5B9DEd0      INUSE     currently in use
>>>
>>> After resilvering I made power-off, unplugged the broken HDD from
>>> Case-slot 1 and switched the Spare from Slot 21 to Slot 1.
>>> The pool is still degraded. The broken HDD I cant remove it.
>>>
>>> # zpool remove pool_ripley c1t5000CCA22BEEF6A3d0
>>> cannot remove c1t5000CCA22BEEF6A3d0: only inactive hot spares,
>>> cache, top-level, or log devices can be removed
>>>
>>> What can I do to through out the broken HDD and tell ZFS that the
>>> spare is now member of mirror-0 and remove it from the spare list?
>>> Why does not automatically jump in the Spare device and resilver the
>>> pool?
>>>
>>> Thanks.
>>>
> 
> 

-- 
-------------------------------------------------------------------
Kevin Swab                          UNIX Systems Administrator
ACNS                                Colorado State University
Phone: (970)491-6572                Email: Kevin.Swab at ColoState.EDU
GPG Fingerprint: 7026 3F66 A970 67BD 6F17  8EB8 8A7D 142F 2392 791C

From daleg at omniti.com  Wed Apr  9 19:02:38 2014
From: daleg at omniti.com (Dale Ghent)
Date: Wed, 9 Apr 2014 15:02:38 -0400
Subject: [OmniOS-discuss] [ANN] OmniOS releases 151006_049 and 151008t
Message-ID: <8EB23259-C87C-4C9D-8EB7-D8F2E9899445@omniti.com>


Hello, this week brings a new release for both 151006 and 151008. These releases include OpenSSL 1.0.1g, which corrects the ?Heartbleed? vulnerability, as well as fixes which are relevant to users of the mpt_sas and ipmi BMC drivers.

* 151008t release notes: http://omnios.omniti.com/wiki.php/ReleaseNotes#r151008t

* 151006_049 release notes: http://omnios.omniti.com/wiki.php/ReleaseNotes#r151006_049

ISO, USB, and Kayak images which reflect these versions are available at http://omnios.omniti.com/wiki.php/Installation

Of course, all updated packages are available from the pkg.omniti.com IPS repository.

/dale
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 494 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://omniosce.org/ml-archive/attachments/20140409/9b6f4ab4/attachment.bin>

From henson at acm.org  Thu Apr 10 20:45:31 2014
From: henson at acm.org (Paul B. Henson)
Date: Thu, 10 Apr 2014 13:45:31 -0700
Subject: [OmniOS-discuss] update from r151008f -> r151008t -- two reboots?
Message-ID: <20140410204531.GC1367@bender.unx.csupomona.edu>

I'm currently running r151008f, and was looking to update to r151008t to
get the openssl fix. However, it looks like the pkg update from r151008j
has to be installed first, by itself? Necessitating a reboot into that
new BE, before installing the rest of the updates into another new BE,
and then rebooting into that to be done?

# pkg update -n                                                   
WARNING: pkg(5) appears to be out of date, and should be updated before
running update.  Please update pkg(5) by executing 'pkg install
pkg:/package/pkg' as a privileged user and then retry the update.

# pkg install -n pkg:/package/pkg
            Packages to update:   2
       Create boot environment: Yes
Create backup boot environment:  No

It would be nice to be able to install all updates with one reboot, it's
not like I'm running Windows ;).

Thanks...

From groups at tierarzt-mueller.de  Fri Apr 11 08:39:05 2014
From: groups at tierarzt-mueller.de (Alexander Lesle)
Date: Fri, 11 Apr 2014 10:39:05 +0200
Subject: [OmniOS-discuss] Pool degraded
In-Reply-To: <53445A96.8090902@ColoState.EDU>
References: <1456071949.20140408201315@tierarzt-mueller.de>
	<53443D37.7020307@ColoState.EDU>
	<1697492574.20140408210928@tierarzt-mueller.de>
	<53445A96.8090902@ColoState.EDU>
Message-ID: <227786046.20140411103905@tierarzt-mueller.de>

Hello Kevin Swab and List,

thanks Kevin your contribution helps me.

It would be nice if an official of Illumos or Omnios would confirm it
and would change the man page of zpool(1m).

On April, 08 2014, 22:22 <Kevin Swab> wrote in [1]:

> The man page seems misleading or incomplete on the subject of
> "autoreplace" and spares.  Setting 'autoreplace=on' should cause your
> hot spare to kick in during a drive failure - with over 1100 spindles
> running ZFS here, we've had the "opportunity" to test it many times! ;-)

> I couldn't find any authoratative references for this, but here's a few
> unautoratative ones:

> http://my.safaribooksonline.com/book/operating-systems-and-server-administration/solaris/9780137049639/managing-storage-pools/ch02lev1sec7

> http://stanley-huang.blogspot.com/2009/09/how-to-set-autoreplace-in-zfs-pool.html

> http://www.datadisk.co.uk/html_docs/sun/sun_zfs_cs.htm

> Hope this helps,
> Kevin

> On 04/08/2014 01:09 PM, Alexander Lesle wrote:
>> Hello Kevin Swab and List,
>> 
>> On April, 08 2014, 20:17 <Kevin Swab> wrote in [1]:
>> 
>>> Instead of a 'zpool remove ...', you want to do a 'zpool detach ...' to
>>> get rid of the old device.
>> 
>> thats it.
>> zpool detach ... "removes" the broken device from the pool.
>> 
>>> If you turn the 'autoreplace' property on
>>> for the pool, the spare will automatically kick in the next time a drive
>>> fails...
>> 
>> Are you sure? Because the man zpool tell me other:
>> 
>> ,-----[ man zpool ]-----
>> |
>> | autoreplace=on | off
>> | 
>> |          Controls automatic device replacement. If set to  "off",
>> |          device  replacement must be initiated by the administra-
>> |          tor by using the "zpool  replace"  command.  If  set  to
>> |          "on",  any  new device, found in the same physical loca-
>> |          tion as a device that previously belonged to  the  pool,
>> |          is  automatically  formatted  and  replaced. The default
>> |          behavior is "off". This property can also be referred to
>> |          by its shortened column name, "replace".
>> |
>> `-------------------
>> 
>> I understand it that when I pull out a device and put a new device in
>> the _same_ Case-Slot ZFS make a resilver and ZFS pull out the old one
>> automatically.
>> When the property if off I have use the command zpool replace ... ...
>> what I have done.
>> 
>> But in my case, the spare device was in the Case and _named for_ this
>> pool
>> So the 'Hot Spares-Section' tells
>> ,-----[ man zpool ]-----
>> |
>> | ZFS allows devices to  be  associated  with  pools  as  "hot
>> | spares".  These  devices  are not actively used in the pool,
>> | but  when  an  active  device  fails,  it  is  automatically
>> | replaced  by  a hot spare.
>> |
>> `-------------------
>> 
>> Or I have misunderstood.
>> 
>>> On 04/08/2014 12:13 PM, Alexander Lesle wrote:
>>>> Hello All,
>>>>
>>>> I have a pool with mirrors and one spare.
>>>> Now my pool is degraded and I though that Omnios/ZFS activate the
>>>> spare itself and make a resilvering.
>>>>
>>>> # zpool status -x
>>>>   pool: pool_ripley
>>>>  state: DEGRADED
>>>> status: One or more devices could not be opened.  Sufficient replicas exist for
>>>>         the pool to continue functioning in a degraded state.
>>>> action: Attach the missing device and online it using 'zpool online'.
>>>>    see: http://illumos.org/msg/ZFS-8000-2Q
>>>>   scan: resilvered 84K in 0h0m with 0 errors on Sun Mar 23 15:09:08 2014
>>>> config:
>>>>
>>>>         NAME                       STATE     READ WRITE CKSUM
>>>>         pool_ripley                DEGRADED     0     0     0
>>>>           mirror-0                 DEGRADED     0     0     0
>>>>             c1t5000CCA22BC16BC5d0  ONLINE       0     0     0
>>>>             c1t5000CCA22BEEF6A3d0  UNAVAIL      0     0     0  cannot open
>>>>           mirror-1                 ONLINE       0     0     0
>>>>             c1t5000CCA22BC8D31Ad0  ONLINE       0     0     0
>>>>             c1t5000CCA22BF612C4d0  ONLINE       0     0     0
>>>>           .
>>>>           .
>>>>           .
>>>>
>>>>         spares
>>>>           c1t5000CCA22BF5B9DEd0    AVAIL
>>>>
>>>> But nothing done.
>>>> OK, then I do it myself.
>>>> # zpool replace -f pool_ripley c1t5000CCA22BEEF6A3d0 c1t5000CCA22BF5B9DEd0
>>>> Resilvering is starting immediately.
>>>>
>>>> # zpool status -x
>>>>   pool: pool_ripley
>>>>  state: DEGRADED
>>>> status: One or more devices could not be opened.  Sufficient replicas exist for
>>>>         the pool to continue functioning in a degraded state.
>>>> action: Attach the missing device and online it using 'zpool online'.
>>>>    see: http://illumos.org/msg/ZFS-8000-2Q
>>>>   scan: resilvered 1.53T in 3h12m with 0 errors on Sun Apr  6 17:48:51 2014
>>>> config:
>>>>
>>>>         NAME                         STATE     READ WRITE CKSUM
>>>>         pool_ripley                  DEGRADED     0     0     0
>>>>           mirror-0                   DEGRADED     0     0     0
>>>>             c1t5000CCA22BC16BC5d0    ONLINE       0     0     0
>>>>             spare-1                  DEGRADED     0     0     0
>>>>               c1t5000CCA22BEEF6A3d0  UNAVAIL      0     0     0  cannot open
>>>>               c1t5000CCA22BF5B9DEd0  ONLINE       0     0     0
>>>>           mirror-1                   ONLINE       0     0     0
>>>>             c1t5000CCA22BC8D31Ad0    ONLINE       0     0     0
>>>>             c1t5000CCA22BF612C4d0    ONLINE       0     0     0
>>>>          .
>>>>          .
>>>>          .
>>>>         spares
>>>>           c1t5000CCA22BF5B9DEd0      INUSE     currently in use
>>>>
>>>> After resilvering I made power-off, unplugged the broken HDD from
>>>> Case-slot 1 and switched the Spare from Slot 21 to Slot 1.
>>>> The pool is still degraded. The broken HDD I cant remove it.
>>>>
>>>> # zpool remove pool_ripley c1t5000CCA22BEEF6A3d0
>>>> cannot remove c1t5000CCA22BEEF6A3d0: only inactive hot spares,
>>>> cache, top-level, or log devices can be removed
>>>>
>>>> What can I do to through out the broken HDD and tell ZFS that the
>>>> spare is now member of mirror-0 and remove it from the spare list?
>>>> Why does not automatically jump in the Spare device and resilver the
>>>> pool?
>>>>
>>>> Thanks.
>>>>
>> 
>> 


-- 
Best Regards
Alexander
April, 11 2014
........
[1] mid:53445A96.8090902 at ColoState.EDU
........


From kjf at taylorbritt.com  Fri Apr 11 19:12:14 2014
From: kjf at taylorbritt.com (Ken F)
Date: Fri, 11 Apr 2014 19:12:14 +0000 (UTC)
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com>
Message-ID: <loom.20140411T210747-249@post.gmane.org>


Can you provide a link to the dl_sea_fw software for download as I can not
find it anywhere?
Thanks.
Ken


From daleg at omniti.com  Tue Apr 15 18:04:48 2014
From: daleg at omniti.com (Dale Ghent)
Date: Tue, 15 Apr 2014 14:04:48 -0400
Subject: [OmniOS-discuss] [ANN] web/curl 7.36.0 package available
Message-ID: <D4652C16-E25C-4A71-985A-8C4712768E56@omniti.com>


web/curl 7.36.0 has been released for 151006 and 151008 to address two security issues:

Info on the security issues addressed with this version:
http://curl.haxx.se/docs/adv_20140326A.html
http://curl.haxx.se/docs/adv_20140326B.html

Package FMRIs:
pkg://omnios/web/curl at 7.36.0,5.11-0.151006:20140414T214024Z
pkg://omnios/web/curl at 7.36.0,5.11-0.151008:20140414T215242Z

/dale
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 494 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <https://omniosce.org/ml-archive/attachments/20140415/c23f7d77/attachment.bin>

From skiselkov.ml at gmail.com  Tue Apr 15 22:30:20 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Wed, 16 Apr 2014 00:30:20 +0200
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <52FC9C12.9090900@smartjog.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com>
Message-ID: <534DB2FC.9090404@gmail.com>

Hi,

I've hit this exact same issue on my recent SEAGATE ST2000NM0023 drives.
Can you please direct me to where I can get the firmware package?
Perhaps we could also post the link publicly, so that people can find it
through google or some such method.

Thanks!

Best wishes,
-- 
Saso

On 2/13/14, 11:18 AM, Thibault VINCENT wrote:
> On 02/12/2014 09:59 PM, Steamer wrote:
>> Did you ever find a solution to the overheating faults with the
>> ST4000NM0023?
>>  
>> I'm currently having the exact same issue with ST1000NM0023 drives,
>> seems like seagate has the user temp probe set at 40'C. The manual
>> states that the temperature settings are programmable via smart, but I
>> haven't found a way to do that.
> 
> Hello Emile,
> 
> I've found a workaround but the definitive fix should be handled by
> Illumos I guess. There is no open ticket, first I was waiting for
> something to happen with #4051 before going back to using that distro
> and kernel.
> 
> Here's the story:
> The SCSI specification defines two registers to store the temperature
> thresholds in SMART data. One contains the recommended maximum operation
> temperature for best MTBF, and the other register is for the absolute
> maximum rating. Usually the industry has always put the same value in
> both, and that is the absolute maximum. That's why we always see
> something like 60/65?C from SMART. But recently Seagate has changed that
> because it was asked by a large OS company to comply with the
> specification for better hardware monitoring integration. The change did
> not only occur in newer products but in a firmware update for existing
> disks and that was applied to the production line which explains some
> disks mays or may not expose this problem although they are the same
> model. Our disks are of the Megalodon serie and all share the same
> firmware basecode.
> 
> So any Seagate disk will now trigger faults in FMA if they have a
> firmware with the newer policy. Also I think other brands will follow
> the same path.
> 
> Like other members suggested in that thread, maybe nothing should change
> in FMA but let's face it, you can't maintain a temperature steadily
> under 40?C in a JBOD of hundreds of busy disks. Especially in
> eco-friendly datacenters. IMHO we should not trigger a fault on the
> lower threshold, and certainly not a drive retirement. It breaks storage
> servers on reboot or before a pool import, also spare disks could
> disappear with the retirement triggered.
> 
> The workaround is to downgrade firmware to the last version before the
> change, and to reset the register with an SCSI command. It is not
> possible to set the register to a user specified value like the
> documentation suggests, they confirmed it.
> 
> I'm sending a working firmware to you in a private mail. I'm not aware
> of any issue working with that older version and hopefully it should
> upload to 1TB drives as well.
> I'm applying it like this but from Linux not OmniOS:
> # ./dl_sea_fw-0.2.3_32 -f Megalodon_StdOEM_SAS_0002+C84C.lod -m ST4000NM0023
> # ./dl_sea_fw-0.2.3_32 -i
> 
> Then you should reset the drives so they reload the firmware.
> Here's our example for 4TB drives:
> -------------
> for i in $(lsscsi | grep 'ST4000NM0023' | awk '{print $6}') ; do
>   sg_reset -d $i
> done
> -------------
> 
> And reset the register that contains value from the previous firmware.
> It doesn't work well so we've got this script to run a few times until
> all disks got it. Again it matches 4TB Megalodon.
> -------------
> for i in $(lsscsi | grep 'ST4000NM0023' | awk '{print $6}') ; do
>   echo -n "$i "
>   if sg_logs $i --page=0x0d | grep 'Reference temperature = 68 C'
>> /dev/null ; then
>     echo 'ok'
>   else
>     sg_logs $i --page=0x0d --reset
>     echo 'reset'
>   fi
> done
> -------------
> 
> 
> Cheers
> 


From matthias-omn-discuss at mteege.de  Wed Apr 16 09:29:41 2014
From: matthias-omn-discuss at mteege.de (Matthias Teege)
Date: Wed, 16 Apr 2014 11:29:41 +0200
Subject: [OmniOS-discuss] No updates available for zone but an old openssl?
Message-ID: <1849611f-16fe-420d-8135-c2a544d6d836@mteege.de>

Hallo,

I've upgraded my omnios system with pkg update.  After that the new
openssl is installed.

root at tst:~# openssl version
OpenSSL 1.0.1g 7 Apr 2014

But there are no updates for the zone:

root at tst:~# zlogin t1                                                                           
[Connected to zone 't1' pts/2]
Last login: Wed Apr 16 05:57:01 on pts/2
OmniOS 5.11     omnios-6de5e81  2013.11.27
root at t1:~# openssl versin
OpenSSL 1.0.1f 6 Jan 2014
root at t1:~# pkg update -vn
No updates available for this image.

How do I update the zone?

Many thanks
Matthias

From mailinglists at qutic.com  Wed Apr 16 10:47:51 2014
From: mailinglists at qutic.com (qutic development)
Date: Wed, 16 Apr 2014 12:47:51 +0200
Subject: [OmniOS-discuss] No updates available for zone but an old
	openssl?
In-Reply-To: <1849611f-16fe-420d-8135-c2a544d6d836@mteege.de>
References: <1849611f-16fe-420d-8135-c2a544d6d836@mteege.de>
Message-ID: <F94877DF-E03E-4FE7-AA15-8ADC553BE132@qutic.com>

> How do I update the zone?

http://omnios.omniti.com/wiki.php/GeneralAdministration#UpgradingWithNon-GlobalZones

From groups at tierarzt-mueller.de  Wed Apr 16 11:00:39 2014
From: groups at tierarzt-mueller.de (Alexander Lesle)
Date: Wed, 16 Apr 2014 13:00:39 +0200
Subject: [OmniOS-discuss] Granular control of fma modules
In-Reply-To: <etPan.5306518a.507ed7ab.12a0@abp.local>
References: <etPan.5303f80e.79e2a9e3.4ee@abp.local>
	<94925D68-A787-4BF4-9A26-AD9CF80D7268@cooperi.net>
	<etPan.5306518a.507ed7ab.12a0@abp.local>
Message-ID: <1008203753.20140416130039@tierarzt-mueller.de>

Hello List,

sorry for highjacking this thread.

I am searching for informations and helps about disk-transport.conf
and changing properties for a fma modul. But I cant found the right
manual.

Background is that I have insert at the end of the file
/usr/lib/fm/fmd/plugins/disk-transport.conf
setprop interval 6h

Restart fmd and the fmd get faulty.

Where I can get some helps about fma and setting other properties in
the various modules?
Links?
man?


Thanks.

-- 
Best Regards
Alexander
April, 16 2014
........
[1] mid:etPan.5306518a.507ed7ab.12a0 at abp.local
........


From Kevin.Swab at ColoState.EDU  Wed Apr 16 16:39:21 2014
From: Kevin.Swab at ColoState.EDU (Kevin Swab)
Date: Wed, 16 Apr 2014 10:39:21 -0600
Subject: [OmniOS-discuss] kernel panic
In-Reply-To: <53436DC6.6080208@ColoState.EDU>
References: <53436DC6.6080208@ColoState.EDU>
Message-ID: <534EB239.6000609@ColoState.EDU>

Any thoughts on this one?  I can provide some more info if that helps.
The system is all desktop-grade hardware, with a core-i3 540 CPU and
8gigs of (non-ecc) ram.  The pool in question is a 3-disk raidz built on
Toshiba DT01ACA3 3T SATA drives attached to the motherboard SATA ports.
 The pool was working fine for about 12 months prior to the panic.  The
pool originally had dedup running, but the stack trace from an isolated
panic about 2 months ago indicated dedup problems, so I turned it off.

In an attempt to eliminate hardware problems, I've tried the following:

- Ran memtest86+ for about 30 hours, no errors found
- Ran SMART long tests on all the drives, no errors
- Read the entire drive with 'dd' to /dev/null (all 3 drives), no errors
reported by dd or iostat
- Put the drives in another machine w/ an LSI SAS controller, same result.
- dd'ed the contents of the drives to 3 borrowed SAS drives, and
attmpted to import the pool from there, same results.

I found this page with steps that solved a similar problem for someone else:

http://sigtar.com/2009/10/19/opensolaris-zfs-recovery-after-kernel-panic/

Importing the pool read-only as suggested still results in a kernel
panic.  The 'zdb' command mentioned dumps core before completing:

# zpool import
   pool: data1
     id: 17144127232233481271
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        data1       ONLINE
          raidz1-0  ONLINE
            c2t3d0  ONLINE
            c2t2d0  ONLINE
            c2t4d0  ONLINE
# zdb -e -bcsvL data1

Traversing all blocks to verify checksums ...

assertion failed for thread 0xfffffd7fff162a40, thread-id 1: c <
SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT, file
../../../uts/common/fs/zfs/zio.c, line 226
Abort (core dumped)
#

# zpool import -F -f -o readonly=on -R /mnt data1
plankton console login:
panic[cpu1]/thread=ffffff000ef07c40: BAD TRAP: type=e (#pf Page fault)
rp=ffffff000ef07530 addr=278 occurred in module "unix" due to a NULL
pointer dereference

sched: #pf Page fault
Bad kernel fault at addr=0x278
pid=0, pc=0xfffffffffb85ed1b, sp=0xffffff000ef07628, eflags=0x10246
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4:
26f8<vmxe,xmme,fxsr,pge,mce,pae,pse,de>
cr2: 278cr3: bc00000cr8: c

        rdi:              278 rsi:                4 rdx: ffffff000ef07c40
        rcx:                0  r8: ffffff02d9168840  r9:                2
        rax:                0 rbx:              278 rbp: ffffff000ef07680
        r10: fffffffffb8540bc r11: ffffff02d91b7000 r12:                0
        r13:                1 r14:                4 r15:                0
        fsb:                0 gsb: ffffff02cbb4dac0  ds:               4b
         es:               4b  fs:                0  gs:              1c3
        trp:                e err:                2 rip: fffffffffb85ed1b
         cs:               30 rfl:            10246 rsp: ffffff000ef07628
         ss:               38

ffffff000ef07410 unix:die+df ()
ffffff000ef07520 unix:trap+db3 ()
ffffff000ef07530 unix:cmntrap+e6 ()
ffffff000ef07680 unix:mutex_enter+b ()
ffffff000ef076a0 zfs:zio_buf_alloc+25 ()
ffffff000ef076e0 zfs:arc_get_data_buf+1d0 ()
ffffff000ef07730 zfs:arc_buf_alloc+b5 ()
ffffff000ef07820 zfs:arc_read+42b ()
ffffff000ef07880 zfs:traverse_prefetch_metadata+9d ()
ffffff000ef07970 zfs:traverse_visitbp+38b ()
ffffff000ef07a00 zfs:traverse_dnode+8b ()
ffffff000ef07af0 zfs:traverse_visitbp+5fd ()
ffffff000ef07b90 zfs:traverse_prefetch_thread+79 ()
ffffff000ef07c20 genunix:taskq_d_thread+b7 ()
ffffff000ef07c30 unix:thread_start+8 ()

syncing file systems... done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
 0:44 100% done
100% done: 146470 pages dumped, dump succeeded
rebooting...

Most other 'zdb' commands I've tried also dump core.

I really want to recover the data on this pool if at all possible.  I
can provide crash dumps if needed.  Barring recovery, I would at least
like to understand what went wrong so I can avoid doing it again in the
future.

Please, can anyone help?
Thanks - Kevin


On 04/07/2014 09:32 PM, Kevin Swab wrote:
> I've got OmniOS 151008j running on a home file server, and the other day
> it went into a reboot loop, displaying a kernel panic on the console
> just after the kernel banner was printed.
> 
> The panic message on screen showed some zfs function calls so following
> that lead, I booted off the install media, mounted my root pool and
> removed /etc/zpool.cache.  The system was able to boot after that but
> when I attempt to import the pool containing my data, it panics again.
> 
> FMD shows that a reboot occurred after a kernel panic, and says more
> info is available from fmdump.  Here's the stack trace from 'fmdump':
> 
> # fmdump -Vp -u 38f6aa49-6c97-4675-b526-e455b1ae215b
> TIME                           UUID
> SUNW-MSG-ID
> Apr 07 2014 21:03:45.097921000 38f6aa49-6c97-4675-b526-e455b1ae215b
> SUNOS-8000-KL
> 
>   TIME                 CLASS                                 ENA
>   Apr 07 21:03:45.0237 ireport.os.sunos.panic.dump_available
> 0x0000000000000000
>   Apr 07 21:03:03.8496 ireport.os.sunos.panic.dump_pending_on_device
> 0x0000000000000000
> 
> nvlist version: 0
>         version = 0x0
>         class = list.suspect
>         uuid = 38f6aa49-6c97-4675-b526-e455b1ae215b
>         code = SUNOS-8000-KL
>         diag-time = 1396926225 62791
>         de = fmd:///module/software-diagnosis
>         fault-list-sz = 0x1
>         fault-list = (array of embedded nvlists)
>         (start fault-list[0])
>         nvlist version: 0
>                 version = 0x0
>                 class = defect.sunos.kernel.panic
>                 certainty = 0x64
>                 asru =
> sw:///:path=/var/crash/unknown/.38f6aa49-6c97-4675-b526-e455b1ae215b
>                 resource =
> sw:///:path=/var/crash/unknown/.38f6aa49-6c97-4675-b526-e455b1ae215b
>                 savecore-succcess = 1
>                 dump-dir = /var/crash/unknown
>                 dump-files = vmdump.1
>                 os-instance-uuid = 38f6aa49-6c97-4675-b526-e455b1ae215b
>                 panicstr = BAD TRAP: type=e (#pf Page fault)
> rp=ffffff000fadafc0 addr=2b8 occurred in module "unix" due to a NULL
> pointer dereference
>                 panicstack = unix:die+df () | unix:trap+db3 () |
> unix:cmntrap+e6 () | unix:mutex_enter+b () | zfs:zio_buf_alloc+25 () |
> zfs:arc_get_data_buf+2b8 () | zfs:arc_buf_alloc+b5 () | zfs:arc_read+42b
> () | zfs:dsl_scan_prefetch+a7 () | zfs:dsl_scan_recurse+16f () |
> zfs:dsl_scan_visitbp+eb () | zfs:dsl_scan_visitdnode+bd () |
> zfs:dsl_scan_recurse+439 () | zfs:dsl_scan_visitbp+eb () |
> zfs:dsl_scan_visit_rootbp+61 () | zfs:dsl_scan_visit+26b () |
> zfs:dsl_scan_sync+12f () | zfs:spa_sync+334 () | zfs:txg_sync_thread+227
> () | unix:thread_start+8 () |
>                 crashtime = 1396801998
>                 panic-time = Sun Apr  6 10:33:18 2014 MDT
>         (end fault-list[0])
> 
>         fault-status = 0x1
>         severity = Major
>         __ttl = 0x1
>         __tod = 0x53436711 0x5d627e8
> 
> 
> 
> I'd really like to recover the data on that pool if possible, any
> suggestions on what I can try next?
> 
> Thanks,
> Kevin
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
> 

-- 
-------------------------------------------------------------------
Kevin Swab                          UNIX Systems Administrator
ACNS                                Colorado State University
Phone: (970)491-6572                Email: Kevin.Swab at ColoState.EDU
GPG Fingerprint: 7026 3F66 A970 67BD 6F17  8EB8 8A7D 142F 2392 791C

From danmcd at omniti.com  Wed Apr 16 17:44:46 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 16 Apr 2014 13:44:46 -0400
Subject: [OmniOS-discuss] kernel panic
In-Reply-To: <534EB239.6000609@ColoState.EDU>
References: <53436DC6.6080208@ColoState.EDU> <534EB239.6000609@ColoState.EDU>
Message-ID: <C95B213C-FC7F-4002-AA20-50C264A03613@omniti.com>


On Apr 16, 2014, at 12:39 PM, Kevin Swab <Kevin.Swab at colostate.edu> wrote:
> <SNIP!>


> Traversing all blocks to verify checksums ...
> 
> assertion failed for thread 0xfffffd7fff162a40, thread-id 1: c <
> SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT, file
> ../../../uts/common/fs/zfs/zio.c, line 226
> Abort (core dumped)
> #
> 
> # zpool import -F -f -o readonly=on -R /mnt data1
> plankton console login:
> panic[cpu1]/thread=ffffff000ef07c40: BAD TRAP: type=e (#pf Page fault)
> rp=ffffff000ef07530 addr=278 occurred in module "unix" due to a NULL
> pointer dereference

Interesting.

We've seen one (just one) panic just like this in-house.  In our case, some very strange corruption was written to disk, and ZFS couldn't cope.  I have a request out to the ZFS community to improve the coping mechanisms.  :)

I've some dumb questions:

	1.) Earlier in the thread, you mention these are SATA drives.  When the panic occurred, were they attached via AHCI?  Or to a controller of some sort?  You mention you tried attaching these disks to an mpt_sas controller to try and recover them.  Our machine was using plain SATA drives attached via AHCI.

	2.) Is the kernel coredump available?  If this is what we were seeing, I'd VERY much like to see what your corruption actually looks like. Knowing might help us root-cause the corruption in the first place.

The corruption is of the blkptr_t, in particular its size, which ZFS now assumes is sane.  zdb indicates this via an assertion failure, a non-debug kernel will just panic when it goes dereferencing a pointer in hyperspace.  The coping mechanism involved would throw an IO error if an insane size is read off disk.

The biggest question, of course, is how the corruption was introduced.  THAT's why I want to see your coredump.  If your corruption is close to ours - ours has a disk name of all things scribbled there - we share a common source of corruption.

> I really want to recover the data on this pool if at all possible.  I
> can provide crash dumps if needed.  Barring recovery, I would at least
> like to understand what went wrong so I can avoid doing it again in the
> future.

If we can get a version of ZFS that can cope with corrupted blkptrs, that may help in recovery.

I know *IN THIS PARTICULAR CODEPATH* how to cope, but I'm concerned it would expose other errors, and even read-only, I don't want to perform such experiments on a customer's data.  :)

It does seem, however, that our box is in the same state, so I will try it there.  If I have success, I can share the modified "zfs" module.

Dan


From Kevin.Swab at ColoState.EDU  Wed Apr 16 19:32:23 2014
From: Kevin.Swab at ColoState.EDU (Kevin Swab)
Date: Wed, 16 Apr 2014 13:32:23 -0600
Subject: [OmniOS-discuss] kernel panic
In-Reply-To: <C95B213C-FC7F-4002-AA20-50C264A03613@omniti.com>
References: <53436DC6.6080208@ColoState.EDU> <534EB239.6000609@ColoState.EDU>
	<C95B213C-FC7F-4002-AA20-50C264A03613@omniti.com>
Message-ID: <534EDAC7.5060009@ColoState.EDU>

Hello Dan - Thanks for your help, I really appreciate it!  Answers to
your questions are inline below....

On 04/16/2014 11:44 AM, Dan McDonald wrote:
> 
> On Apr 16, 2014, at 12:39 PM, Kevin Swab <Kevin.Swab at colostate.edu> wrote:
>> <SNIP!>
> 
> 
>> Traversing all blocks to verify checksums ...
>>
>> assertion failed for thread 0xfffffd7fff162a40, thread-id 1: c <
>> SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT, file
>> ../../../uts/common/fs/zfs/zio.c, line 226
>> Abort (core dumped)
>> #
>>
>> # zpool import -F -f -o readonly=on -R /mnt data1
>> plankton console login:
>> panic[cpu1]/thread=ffffff000ef07c40: BAD TRAP: type=e (#pf Page fault)
>> rp=ffffff000ef07530 addr=278 occurred in module "unix" due to a NULL
>> pointer dereference
> 
> Interesting.
> 
> We've seen one (just one) panic just like this in-house.  In our case, some very strange corruption was written to disk, and ZFS couldn't cope.  I have a request out to the ZFS community to improve the coping mechanisms.  :)
> 
> I've some dumb questions:
> 
> 	1.) Earlier in the thread, you mention these are SATA drives.  When the panic occurred, were they attached via AHCI?  Or to a controller of some sort?  You mention you tried attaching these disks to an mpt_sas controller to try and recover them.  Our machine was using plain SATA drives attached via AHCI.

Yes, at the time of the initial panic, the drives were attached to
motherboard SATA ports that are conigured to run in AHCI mode.  At the
current time, they are in a test machine at work attached via mpt_sas.

> 
> 	2.) Is the kernel coredump available?  If this is what we were seeing, I'd VERY much like to see what your corruption actually looks like. Knowing might help us root-cause the corruption in the first place.

I believe the original crash dump files are available on my home
fileserver, I'll check tonight.  I can reproduce the crash at will in my
test system at work and have those crash dump files available now.
Which would you like to see?


> The corruption is of the blkptr_t, in particular its size, which ZFS now assumes is sane.  zdb indicates this via an assertion failure, a non-debug kernel will just panic when it goes dereferencing a pointer in hyperspace.  The coping mechanism involved would throw an IO error if an insane size is read off disk.
> 
> The biggest question, of course, is how the corruption was introduced.  THAT's why I want to see your coredump.  If your corruption is close to ours - ours has a disk name of all things scribbled there - we share a common source of corruption.
> 
>> I really want to recover the data on this pool if at all possible.  I
>> can provide crash dumps if needed.  Barring recovery, I would at least
>> like to understand what went wrong so I can avoid doing it again in the
>> future.
> 
> If we can get a version of ZFS that can cope with corrupted blkptrs, that may help in recovery.
> 
> I know *IN THIS PARTICULAR CODEPATH* how to cope, but I'm concerned it would expose other errors, and even read-only, I don't want to perform such experiments on a customer's data.  :)
> 

I appreciate your caution, but without a fix of some kind, my data's
gone anyway so I'm willing to experiment...


> It does seem, however, that our box is in the same state, so I will try it there.  If I have success, I can share the modified "zfs" module.
> 
> Dan
> 

Thanks!  that would be great.  Let me know what I can do to help...

-- 
-------------------------------------------------------------------
Kevin Swab                          UNIX Systems Administrator
ACNS                                Colorado State University
Phone: (970)491-6572                Email: Kevin.Swab at ColoState.EDU
GPG Fingerprint: 7026 3F66 A970 67BD 6F17  8EB8 8A7D 142F 2392 791C

From danmcd at omniti.com  Wed Apr 16 19:35:07 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 16 Apr 2014 15:35:07 -0400
Subject: [OmniOS-discuss] kernel panic
In-Reply-To: <534EDAC7.5060009@ColoState.EDU>
References: <53436DC6.6080208@ColoState.EDU> <534EB239.6000609@ColoState.EDU>
	<C95B213C-FC7F-4002-AA20-50C264A03613@omniti.com>
	<534EDAC7.5060009@ColoState.EDU>
Message-ID: <800EF407-523E-4612-9C9D-AE4B158C68E8@omniti.com>

Doesn't matter where the panic is from --> it's caused by a corrupt block on the disk.

A vmdump.N would be nice.  You're running 008, I see, so I can use an 008 box to examine the dump.

Dan


From Kevin.Swab at ColoState.EDU  Wed Apr 16 21:04:21 2014
From: Kevin.Swab at ColoState.EDU (Kevin Swab)
Date: Wed, 16 Apr 2014 15:04:21 -0600
Subject: [OmniOS-discuss] kernel panic
In-Reply-To: <800EF407-523E-4612-9C9D-AE4B158C68E8@omniti.com>
References: <53436DC6.6080208@ColoState.EDU> <534EB239.6000609@ColoState.EDU>
	<C95B213C-FC7F-4002-AA20-50C264A03613@omniti.com>
	<534EDAC7.5060009@ColoState.EDU>
	<800EF407-523E-4612-9C9D-AE4B158C68E8@omniti.com>
Message-ID: <534EF055.3060904@ColoState.EDU>

Thanks again Dan - sending "vmdump.2" in a separate message...

On 04/16/2014 01:35 PM, Dan McDonald wrote:
> Doesn't matter where the panic is from --> it's caused by a corrupt block on the disk.
> 
> A vmdump.N would be nice.  You're running 008, I see, so I can use an 008 box to examine the dump.
> 
> Dan
> 

-- 
-------------------------------------------------------------------
Kevin Swab                          UNIX Systems Administrator
ACNS                                Colorado State University
Phone: (970)491-6572                Email: Kevin.Swab at ColoState.EDU
GPG Fingerprint: 7026 3F66 A970 67BD 6F17  8EB8 8A7D 142F 2392 791C

From chip at innovates.com  Thu Apr 17 15:40:02 2014
From: chip at innovates.com (Schweiss, Chip)
Date: Thu, 17 Apr 2014 10:40:02 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <534DB2FC.9090404@gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
Message-ID: <CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>

You can get the Seagate firmwares from this link:

https://apps1.seagate.com/downloads/request.html

Seems they don't link to this on their site any more I found it in an old
email from their site.

-Chip


On Tue, Apr 15, 2014 at 5:30 PM, Saso Kiselkov <skiselkov.ml at gmail.com>wrote:

> Hi,
>
> I've hit this exact same issue on my recent SEAGATE ST2000NM0023 drives.
> Can you please direct me to where I can get the firmware package?
> Perhaps we could also post the link publicly, so that people can find it
> through google or some such method.
>
> Thanks!
>
> Best wishes,
> --
> Saso
>
> On 2/13/14, 11:18 AM, Thibault VINCENT wrote:
> > On 02/12/2014 09:59 PM, Steamer wrote:
> >> Did you ever find a solution to the overheating faults with the
> >> ST4000NM0023?
> >>
> >> I'm currently having the exact same issue with ST1000NM0023 drives,
> >> seems like seagate has the user temp probe set at 40'C. The manual
> >> states that the temperature settings are programmable via smart, but I
> >> haven't found a way to do that.
> >
> > Hello Emile,
> >
> > I've found a workaround but the definitive fix should be handled by
> > Illumos I guess. There is no open ticket, first I was waiting for
> > something to happen with #4051 before going back to using that distro
> > and kernel.
> >
> > Here's the story:
> > The SCSI specification defines two registers to store the temperature
> > thresholds in SMART data. One contains the recommended maximum operation
> > temperature for best MTBF, and the other register is for the absolute
> > maximum rating. Usually the industry has always put the same value in
> > both, and that is the absolute maximum. That's why we always see
> > something like 60/65?C from SMART. But recently Seagate has changed that
> > because it was asked by a large OS company to comply with the
> > specification for better hardware monitoring integration. The change did
> > not only occur in newer products but in a firmware update for existing
> > disks and that was applied to the production line which explains some
> > disks mays or may not expose this problem although they are the same
> > model. Our disks are of the Megalodon serie and all share the same
> > firmware basecode.
> >
> > So any Seagate disk will now trigger faults in FMA if they have a
> > firmware with the newer policy. Also I think other brands will follow
> > the same path.
> >
> > Like other members suggested in that thread, maybe nothing should change
> > in FMA but let's face it, you can't maintain a temperature steadily
> > under 40?C in a JBOD of hundreds of busy disks. Especially in
> > eco-friendly datacenters. IMHO we should not trigger a fault on the
> > lower threshold, and certainly not a drive retirement. It breaks storage
> > servers on reboot or before a pool import, also spare disks could
> > disappear with the retirement triggered.
> >
> > The workaround is to downgrade firmware to the last version before the
> > change, and to reset the register with an SCSI command. It is not
> > possible to set the register to a user specified value like the
> > documentation suggests, they confirmed it.
> >
> > I'm sending a working firmware to you in a private mail. I'm not aware
> > of any issue working with that older version and hopefully it should
> > upload to 1TB drives as well.
> > I'm applying it like this but from Linux not OmniOS:
> > # ./dl_sea_fw-0.2.3_32 -f Megalodon_StdOEM_SAS_0002+C84C.lod -m
> ST4000NM0023
> > # ./dl_sea_fw-0.2.3_32 -i
> >
> > Then you should reset the drives so they reload the firmware.
> > Here's our example for 4TB drives:
> > -------------
> > for i in $(lsscsi | grep 'ST4000NM0023' | awk '{print $6}') ; do
> >   sg_reset -d $i
> > done
> > -------------
> >
> > And reset the register that contains value from the previous firmware.
> > It doesn't work well so we've got this script to run a few times until
> > all disks got it. Again it matches 4TB Megalodon.
> > -------------
> > for i in $(lsscsi | grep 'ST4000NM0023' | awk '{print $6}') ; do
> >   echo -n "$i "
> >   if sg_logs $i --page=0x0d | grep 'Reference temperature = 68 C'
> >> /dev/null ; then
> >     echo 'ok'
> >   else
> >     sg_logs $i --page=0x0d --reset
> >     echo 'reset'
> >   fi
> > done
> > -------------
> >
> >
> > Cheers
> >
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140417/1f42ca7a/attachment.html>

From skiselkov.ml at gmail.com  Thu Apr 17 16:15:23 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Thu, 17 Apr 2014 18:15:23 +0200
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
Message-ID: <534FFE1B.2060201@gmail.com>

On 4/17/14, 5:40 PM, Schweiss, Chip wrote:
> You can get the Seagate firmwares from this link:
> 
> https://apps1.seagate.com/downloads/request.html
> 
> Seems they don't link to this on their site any more I found it in an
> old email from their site.

I found the same form, but the damn thing can't find my drive by S/N
(Z1Y18H7V0000C4196NRF).

Cheers,
-- 
Saso

From chip at innovates.com  Thu Apr 17 16:27:05 2014
From: chip at innovates.com (Schweiss, Chip)
Date: Thu, 17 Apr 2014 11:27:05 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <534FFE1B.2060201@gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
Message-ID: <CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>

On Thu, Apr 17, 2014 at 11:15 AM, Saso Kiselkov <skiselkov.ml at gmail.com>wrote:

>
> I found the same form, but the damn thing can't find my drive by S/N
> (Z1Y18H7V0000C4196NRF).
>
>
Use the short form of the S/N: Z1Y18H7V

-Chip


> Cheers,
> --
> Saso
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140417/2b50a5d0/attachment.html>

From mweiss at cimlbr.com  Thu Apr 17 16:50:26 2014
From: mweiss at cimlbr.com (Matt Weiss)
Date: Thu, 17 Apr 2014 11:50:26 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>	<52FC9C12.9090900@smartjog.com>
	<534DB2FC.9090404@gmail.com>	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
Message-ID: <53500652.5020803@cimlbr.com>

https://apps1.seagate.com/downloads/certificate.html?action=performDownload&key=393947625083

Don't know how long the link will work

On 4/17/2014 11:27 AM, Schweiss, Chip wrote:
>
>
>
> On Thu, Apr 17, 2014 at 11:15 AM, Saso Kiselkov 
> <skiselkov.ml at gmail.com <mailto:skiselkov.ml at gmail.com>> wrote:
>
>
>     I found the same form, but the damn thing can't find my drive by S/N
>     (Z1Y18H7V0000C4196NRF).
>
>
> Use the short form of the S/N: Z1Y18H7V
>
> -Chip
>
>     Cheers,
>     --
>     Saso
>
>
>
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140417/454979b1/attachment.html>

From cscoman at gmail.com  Thu Apr 17 17:37:17 2014
From: cscoman at gmail.com (Jason Cox)
Date: Thu, 17 Apr 2014 10:37:17 -0700
Subject: [OmniOS-discuss] Granular control of fma modules
In-Reply-To: <989404F2-E09D-4681-9CD4-9614CD795518@RichardElling.com>
References: <etPan.5303f80e.79e2a9e3.4ee@abp.local>
	<989404F2-E09D-4681-9CD4-9614CD795518@RichardElling.com>
Message-ID: <CAC4WUHpDb041vsie29z+Uv+zcCXeu+jtqP7u+c5KLWCNa6WbUQ@mail.gmail.com>

So I am running into this for a server I built for production. I unloaded
the module, but I am guessing that this means I will not be able to tell
when a drive is failed now outside of the normal way you can tell when a
drive is having issues. Thinking long term here, what other options do I
have? Just look at running smartmontool to monitor the drive I guess?

Also is Seagate going to provide a way to update the firmware once it is
available or do we have to try and RMA the drives or just swap them as they
fail...  I love how the spec says they are good from 5-60c, but the
firmware says 40c as the threshold temp.

Thanks


On Sat, Feb 22, 2014 at 9:50 PM, Richard Elling <
richard.elling at richardelling.com> wrote:

> On Feb 18, 2014, at 4:17 PM, Anh Quach <anhquach at me.com> wrote:
>
> > Is it possible to tell the disk-transport FMA module to ignore
> over-temperature on only a certain set of disks?
>
> In Solaris 11, yes this is possible. However, the open source community
> has not implemented it
> yet, AFAIK.
>
> >
> > I?m doing testing with some Seagate Constellation.3?s that seem to run
> hotter even at idle than the rest of my disks (39-44 C) and they are
> continually getting flagged for over temp. I know I can disable to the temp
> alert for that module but I don?t want to disable it for all disks, just
> these new Seagates.
>
> You can unload disk-transport altogether as a workaround. The root cause
> is a bug in
> the Seagate firmware introduced in version 3 of their firmware. A fix is
> in the works
> for version 4, available RSN.
>  ? richard
>
> --
>
> ZFS storage and performance consulting at http://www.RichardElling.com
>
>
>
>
>
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>


-- 
Jason Cox
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140417/7e1468eb/attachment-0001.html>

From skiselkov.ml at gmail.com  Thu Apr 17 17:49:20 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Thu, 17 Apr 2014 19:49:20 +0200
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
Message-ID: <53501420.9090307@gmail.com>

On 4/17/14, 6:27 PM, Schweiss, Chip wrote:
> 
> Use the short form of the S/N: Z1Y18H7V

Ok, thanks, didn't know there two forms... (FMA only prints one).

-- 
Saso


From cscoman at gmail.com  Thu Apr 17 17:58:03 2014
From: cscoman at gmail.com (Jason Cox)
Date: Thu, 17 Apr 2014 10:58:03 -0700
Subject: [OmniOS-discuss] Granular control of fma modules
In-Reply-To: <CAC4WUHpDb041vsie29z+Uv+zcCXeu+jtqP7u+c5KLWCNa6WbUQ@mail.gmail.com>
References: <etPan.5303f80e.79e2a9e3.4ee@abp.local>
	<989404F2-E09D-4681-9CD4-9614CD795518@RichardElling.com>
	<CAC4WUHpDb041vsie29z+Uv+zcCXeu+jtqP7u+c5KLWCNa6WbUQ@mail.gmail.com>
Message-ID: <CAC4WUHqH6wngWKfjDXSaZp_iT2-k5wQORRMTVTHkqJA_2opVSw@mail.gmail.com>

Sorry, I guess I jumped on the sent button a little to soon. I found
another thread that mentions the firmware update for the drives and how to
get it. I guess I will be updating my drives and re-enabling the module.


On Thu, Apr 17, 2014 at 10:37 AM, Jason Cox <cscoman at gmail.com> wrote:

> So I am running into this for a server I built for production. I unloaded
> the module, but I am guessing that this means I will not be able to tell
> when a drive is failed now outside of the normal way you can tell when a
> drive is having issues. Thinking long term here, what other options do I
> have? Just look at running smartmontool to monitor the drive I guess?
>
> Also is Seagate going to provide a way to update the firmware once it is
> available or do we have to try and RMA the drives or just swap them as they
> fail...  I love how the spec says they are good from 5-60c, but the
> firmware says 40c as the threshold temp.
>
> Thanks
>
>
> On Sat, Feb 22, 2014 at 9:50 PM, Richard Elling <
> richard.elling at richardelling.com> wrote:
>
>> On Feb 18, 2014, at 4:17 PM, Anh Quach <anhquach at me.com> wrote:
>>
>> > Is it possible to tell the disk-transport FMA module to ignore
>> over-temperature on only a certain set of disks?
>>
>> In Solaris 11, yes this is possible. However, the open source community
>> has not implemented it
>> yet, AFAIK.
>>
>> >
>> > I?m doing testing with some Seagate Constellation.3?s that seem to run
>> hotter even at idle than the rest of my disks (39-44 C) and they are
>> continually getting flagged for over temp. I know I can disable to the temp
>> alert for that module but I don?t want to disable it for all disks, just
>> these new Seagates.
>>
>> You can unload disk-transport altogether as a workaround. The root cause
>> is a bug in
>> the Seagate firmware introduced in version 3 of their firmware. A fix is
>> in the works
>> for version 4, available RSN.
>>  ? richard
>>
>> --
>>
>> ZFS storage and performance consulting at http://www.RichardElling.com
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>
>
>
> --
> Jason Cox
>


-- 
Jason Cox
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140417/713e3f56/attachment.html>

From chip at innovates.com  Fri Apr 18 19:23:27 2014
From: chip at innovates.com (Schweiss, Chip)
Date: Fri, 18 Apr 2014 14:23:27 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <53501420.9090307@gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
Message-ID: <CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>

I've flashed 0004 to some of my Constellations so far.   The drives are now
set at a reference temperature of 60C which is much better than 40C.

I had to disable mulltipathing to get these disks to flash.   I'm not sure
if this is an issue with the drive or the Supermicro JBOD.

I disabled multipathing and I'm getting them to flash.

-Chip


On Thu, Apr 17, 2014 at 12:49 PM, Saso Kiselkov <skiselkov.ml at gmail.com>wrote:

> On 4/17/14, 6:27 PM, Schweiss, Chip wrote:
> >
> > Use the short form of the S/N: Z1Y18H7V
>
> Ok, thanks, didn't know there two forms... (FMA only prints one).
>
> --
> Saso
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140418/f7356645/attachment.html>

From skiselkov.ml at gmail.com  Fri Apr 18 20:35:09 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Fri, 18 Apr 2014 22:35:09 +0200
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
Message-ID: <53518C7D.7060703@gmail.com>

On 4/18/14, 9:23 PM, Schweiss, Chip wrote:
> I've flashed 0004 to some of my Constellations so far.   The drives are
> now set at a reference temperature of 60C which is much better than 40C.  
> 
> I had to disable mulltipathing to get these disks to flash.   I'm not
> sure if this is an issue with the drive or the Supermicro JBOD.  
> 
> I disabled multipathing and I'm getting them to flash.

I'm still trying to figure out how to flash them, as the flashing tools
only seem to be available for Linux :(

Guess I'm gonna have to ask the customer to take the machine offline for
a while.

Cheers,
-- 
Saso

From chip at innovates.com  Fri Apr 18 20:49:48 2014
From: chip at innovates.com (Schweiss, Chip)
Date: Fri, 18 Apr 2014 15:49:48 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <53518C7D.7060703@gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
Message-ID: <CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>

I used Santools, which is a licensed product.

>From what I understand lsiutil and sg_buffer_write from sg3-utils can do it
too.  The mode for sg_buffer_write may need to be set to 7 instead of 5 as
stated in the firmware docs.

-Chip


On Fri, Apr 18, 2014 at 3:35 PM, Saso Kiselkov <skiselkov.ml at gmail.com>wrote:

> On 4/18/14, 9:23 PM, Schweiss, Chip wrote:
> > I've flashed 0004 to some of my Constellations so far.   The drives are
> > now set at a reference temperature of 60C which is much better than 40C.
> >
> > I had to disable mulltipathing to get these disks to flash.   I'm not
> > sure if this is an issue with the drive or the Supermicro JBOD.
> >
> > I disabled multipathing and I'm getting them to flash.
>
> I'm still trying to figure out how to flash them, as the flashing tools
> only seem to be available for Linux :(
>
> Guess I'm gonna have to ask the customer to take the machine offline for
> a while.
>
> Cheers,
> --
> Saso
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140418/15c3154d/attachment.html>

From skiselkov.ml at gmail.com  Fri Apr 18 21:23:16 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Fri, 18 Apr 2014 23:23:16 +0200
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
Message-ID: <535197C4.9010507@gmail.com>

On 4/18/14, 10:49 PM, Schweiss, Chip wrote:
> I used Santools, which is a licensed product.  
> 
> From what I understand lsiutil and sg_buffer_write from sg3-utils can do
> it too.  The mode for sg_buffer_write may need to be set to 7 instead of
> 5 as stated in the firmware docs.

Hey cool, didn't know sg3_utils was compilable on non-Linux systems.
Will try it out, thanks!

-- 
Saso


From matthias-omn-discuss at mteege.de  Sun Apr 20 11:47:25 2014
From: matthias-omn-discuss at mteege.de (Matthias Teege)
Date: Sun, 20 Apr 2014 13:47:25 +0200
Subject: [OmniOS-discuss] No updates available for zone but an old
 openssl?
In-Reply-To: <F94877DF-E03E-4FE7-AA15-8ADC553BE132@qutic.com>
References: <1849611f-16fe-420d-8135-c2a544d6d836@mteege.de>
	<F94877DF-E03E-4FE7-AA15-8ADC553BE132@qutic.com>
Message-ID: <6d498b05-34ac-470e-8ed0-72b799f99cff@mteege.de>

On Wed, Apr 16, 2014 at 12:47:51PM +0200, qutic development wrote:

Hi,

> > How do I update the zone?
> 
> http://omnios.omniti.com/wiki.php/GeneralAdministration#UpgradingWithNon-GlobalZones

the Omnios publisher was missing. After

root at tst:~# pkg -R /export/t1/root/ set-publisher -g http://pkg.omniti.com/omnios/release/ omnios
root at tst:~# pkg -R /export/t1/root publisher
PUBLISHER                             TYPE     STATUS   URI
cs.umd.edu                            origin   online   http://pkg.cs.umd.edu/
omnios                                origin   online   http://pkg.omniti.com/omnios/release/
root at tst:~# pkg -R /export/t1/root update

it works.

Thanks
Matthias

From chip at innovates.com  Mon Apr 21 13:12:56 2014
From: chip at innovates.com (Schweiss, Chip)
Date: Mon, 21 Apr 2014 08:12:56 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <535197C4.9010507@gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<535197C4.9010507@gmail.com>
Message-ID: <CALeZrrQcCTK2JtDxUQYt5pkM=EzRV5TU8BAdz-kStgkrNoWrbQ@mail.gmail.com>

I have 20 disks that went offline because they reached 40C before I applied
the firmware update.

I tried marking them as repaired in fmadm, but that didn't make any
difference.

Does anyone know the trick to bring these back online to OmniOS?

-Chip


On Fri, Apr 18, 2014 at 4:23 PM, Saso Kiselkov <skiselkov.ml at gmail.com>wrote:

> On 4/18/14, 10:49 PM, Schweiss, Chip wrote:
> > I used Santools, which is a licensed product.
> >
> > From what I understand lsiutil and sg_buffer_write from sg3-utils can do
> > it too.  The mode for sg_buffer_write may need to be set to 7 instead of
> > 5 as stated in the firmware docs.
>
> Hey cool, didn't know sg3_utils was compilable on non-Linux systems.
> Will try it out, thanks!
>
> --
> Saso
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140421/10ee1a1b/attachment.html>

From chip at innovates.com  Mon Apr 21 16:19:44 2014
From: chip at innovates.com (Schweiss, Chip)
Date: Mon, 21 Apr 2014 11:19:44 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <CALeZrrQcCTK2JtDxUQYt5pkM=EzRV5TU8BAdz-kStgkrNoWrbQ@mail.gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<535197C4.9010507@gmail.com>
	<CALeZrrQcCTK2JtDxUQYt5pkM=EzRV5TU8BAdz-kStgkrNoWrbQ@mail.gmail.com>
Message-ID: <CALeZrrQ1PxoiTnGCrq1tCGVK1=Sjd5N8Dx=deqL4w2YYXrYe+A@mail.gmail.com>

I suspecting these drives have self-destructed.

Can anyone confirm this firmware issue causes the drives to permanently go
offline?

-Chip


On Mon, Apr 21, 2014 at 8:12 AM, Schweiss, Chip <chip at innovates.com> wrote:

> I have 20 disks that went offline because they reached 40C before I
> applied the firmware update.
>
> I tried marking them as repaired in fmadm, but that didn't make any
> difference.
>
> Does anyone know the trick to bring these back online to OmniOS?
>
> -Chip
>
>
> On Fri, Apr 18, 2014 at 4:23 PM, Saso Kiselkov <skiselkov.ml at gmail.com>wrote:
>
>> On 4/18/14, 10:49 PM, Schweiss, Chip wrote:
>> > I used Santools, which is a licensed product.
>> >
>> > From what I understand lsiutil and sg_buffer_write from sg3-utils can do
>> > it too.  The mode for sg_buffer_write may need to be set to 7 instead of
>> > 5 as stated in the firmware docs.
>>
>> Hey cool, didn't know sg3_utils was compilable on non-Linux systems.
>> Will try it out, thanks!
>>
>> --
>> Saso
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140421/3413d261/attachment.html>

From jimklimov at cos.ru  Mon Apr 21 16:28:55 2014
From: jimklimov at cos.ru (Jim Klimov)
Date: Mon, 21 Apr 2014 18:28:55 +0200
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <CALeZrrQ1PxoiTnGCrq1tCGVK1=Sjd5N8Dx=deqL4w2YYXrYe+A@mail.gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<535197C4.9010507@gmail.com>
	<CALeZrrQcCTK2JtDxUQYt5pkM=EzRV5TU8BAdz-kStgkrNoWrbQ@mail.gmail.com>
	<CALeZrrQ1PxoiTnGCrq1tCGVK1=Sjd5N8Dx=deqL4w2YYXrYe+A@mail.gmail.com>
Message-ID: <53554747.1070204@cos.ru>

Is it possible to boot into another OS (perhaps the LiveCD) and/or
connect these disks to another host, just to check if they respond
to read requests? Essentially this would help confirm or reject the
theory of self-destruction.

HTH,
//Jim

On 2014-04-21 18:19, Schweiss, Chip wrote:
> I suspecting these drives have self-destructed.
>
> Can anyone confirm this firmware issue causes the drives to permanently
> go offline?
>
> -Chip
>
>
> On Mon, Apr 21, 2014 at 8:12 AM, Schweiss, Chip <chip at innovates.com
> <mailto:chip at innovates.com>> wrote:
>
>     I have 20 disks that went offline because they reached 40C before I
>     applied the firmware update.
>
>     I tried marking them as repaired in fmadm, but that didn't make any
>     difference.
>
>     Does anyone know the trick to bring these back online to OmniOS?

From skiselkov.ml at gmail.com  Tue Apr 22 09:36:03 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Tue, 22 Apr 2014 11:36:03 +0200
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
Message-ID: <53563803.5030905@gmail.com>

On 4/18/14, 10:49 PM, Schweiss, Chip wrote:
> I used Santools, which is a licensed product.  
> 
> From what I understand lsiutil and sg_buffer_write from sg3-utils can do
> it too.  The mode for sg_buffer_write may need to be set to 7 instead of
> 5 as stated in the firmware docs.
>

Sadly, I had no luck with either lsiutil or sg_write_buffer from
sg3-utils. lsiutil is only for older MPT HBAs (I have an MPT 2.0 one)
and sg_write_buffer fails with the following error:

# sg_write_buffer -v --in=MegalodonES3-SAS-STD-0004.LOD --length=1625600
--mode=5 /dev/rdsk/c9t5000C500578F774Bd0
    Write buffer cmd: 3b 05 00 00 00 00 18 ce 00 00
ioctl(USCSICMD) failed with os_err (errno) = 22
write buffer: pass through os error: Invalid argument
Write buffer failed res=-1

I also tried the following device names:
  /dev/rdsk/c9t5000C500578F774Bd0p0
  /dev/dsk/c9t5000C500578F774Bd0
  /dev/dsk/c9t5000C500578F774Bd0p0

The OS also printed the following error:

WARNING: mpt_sas: coding error detected, the driver is using
ddi_dma_attr(9S) incorrectly. There is a small risk of data corruption
in particular with large I/Os. The driver should be replaced with a
corrected version for proper system operation. To disable this warning,
add 'set rootnex:rootnex_bind_warn=0' to /etc/system(4).

Staring at the code near usr/src/uts/i86pc/io/rootnex.c:3305, this means
that the driver can't submit a DMA job this large, which means that I
can't really fix this at all (this is really way outside of my field).

Any ideas on what to do next?

Cheers,
-- 
Saso

From mir at miras.org  Tue Apr 22 09:53:41 2014
From: mir at miras.org (Michael Rasmussen)
Date: Tue, 22 Apr 2014 11:53:41 +0200
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <53563803.5030905@gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<53563803.5030905@gmail.com>
Message-ID: <20140422115341.4c0a26a9@sleipner.datanom.net>

On Tue, 22 Apr 2014 11:36:03 +0200
Saso Kiselkov <skiselkov.ml at gmail.com> wrote:

> 
> Any ideas on what to do next?
> 
Could you boot the system from a live linux distro and run the tools
from it? Maybe support for linux is better.

I can recommend systemrescuecd (based on gentoo).

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
All bad precedents began as justifiable measures.
		-- Gaius Julius Caesar, quoted in "The Conspiracy of
		   Catiline", by Sallust
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20140422/b3ed5dc4/attachment.bin>

From skiselkov.ml at gmail.com  Tue Apr 22 09:58:35 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Tue, 22 Apr 2014 11:58:35 +0200
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <20140422115341.4c0a26a9@sleipner.datanom.net>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>	<52FC9C12.9090900@smartjog.com>
	<534DB2FC.9090404@gmail.com>	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>	<534FFE1B.2060201@gmail.com>	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>	<53501420.9090307@gmail.com>	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>	<53518C7D.7060703@gmail.com>	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>	<53563803.5030905@gmail.com>
	<20140422115341.4c0a26a9@sleipner.datanom.net>
Message-ID: <53563D4B.7020105@gmail.com>

On 4/22/14, 11:53 AM, Michael Rasmussen wrote:
> On Tue, 22 Apr 2014 11:36:03 +0200
> Saso Kiselkov <skiselkov.ml at gmail.com> wrote:
> 
>>
>> Any ideas on what to do next?
>>
> Could you boot the system from a live linux distro and run the tools
> from it? Maybe support for linux is better.
> 
> I can recommend systemrescuecd (based on gentoo).

I can't, the system is in production.

-- 
Saso


From chip at innovates.com  Tue Apr 22 15:03:38 2014
From: chip at innovates.com (Schweiss, Chip)
Date: Tue, 22 Apr 2014 10:03:38 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <53563803.5030905@gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<53563803.5030905@gmail.com>
Message-ID: <CALeZrrS1==6tJy+3wXRuWAV-7z=P_9XSGSWpvduSZ5DapRUmGw@mail.gmail.com>

Are you sure you have SAS multipath disabled on the disk you are trying to
flash?

I couldn't get these to flash at all with MP enabled.  I too kept getting
OS related errors.

For one system I did an stmsboot -d, for another I just pulled one of the
SAS cables to each JBOD.

-Chip


On Tue, Apr 22, 2014 at 4:36 AM, Saso Kiselkov <skiselkov.ml at gmail.com>wrote:

> On 4/18/14, 10:49 PM, Schweiss, Chip wrote:
> > I used Santools, which is a licensed product.
> >
> > From what I understand lsiutil and sg_buffer_write from sg3-utils can do
> > it too.  The mode for sg_buffer_write may need to be set to 7 instead of
> > 5 as stated in the firmware docs.
> >
>
> Sadly, I had no luck with either lsiutil or sg_write_buffer from
> sg3-utils. lsiutil is only for older MPT HBAs (I have an MPT 2.0 one)
> and sg_write_buffer fails with the following error:
>
> # sg_write_buffer -v --in=MegalodonES3-SAS-STD-0004.LOD --length=1625600
> --mode=5 /dev/rdsk/c9t5000C500578F774Bd0
>     Write buffer cmd: 3b 05 00 00 00 00 18 ce 00 00
> ioctl(USCSICMD) failed with os_err (errno) = 22
> write buffer: pass through os error: Invalid argument
> Write buffer failed res=-1
>
> I also tried the following device names:
>   /dev/rdsk/c9t5000C500578F774Bd0p0
>   /dev/dsk/c9t5000C500578F774Bd0
>   /dev/dsk/c9t5000C500578F774Bd0p0
>
> The OS also printed the following error:
>
> WARNING: mpt_sas: coding error detected, the driver is using
> ddi_dma_attr(9S) incorrectly. There is a small risk of data corruption
> in particular with large I/Os. The driver should be replaced with a
> corrected version for proper system operation. To disable this warning,
> add 'set rootnex:rootnex_bind_warn=0' to /etc/system(4).
>
> Staring at the code near usr/src/uts/i86pc/io/rootnex.c:3305, this means
> that the driver can't submit a DMA job this large, which means that I
> can't really fix this at all (this is really way outside of my field).
>
> Any ideas on what to do next?
>
> Cheers,
> --
> Saso
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140422/a4666e5b/attachment-0001.html>

From richard.elling at richardelling.com  Tue Apr 22 16:15:32 2014
From: richard.elling at richardelling.com (Richard Elling)
Date: Tue, 22 Apr 2014 09:15:32 -0700
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <CALeZrrQ1PxoiTnGCrq1tCGVK1=Sjd5N8Dx=deqL4w2YYXrYe+A@mail.gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<535197C4.9010507@gmail.com>
	<CALeZrrQcCTK2JtDxUQYt5pkM=EzRV5TU8BAdz-kStgkrNoWrbQ@mail.gmail.com>
	<CALeZrrQ1PxoiTnGCrq1tCGVK1=Sjd5N8Dx=deqL4w2YYXrYe+A@mail.gmail.com>
Message-ID: <0DFDE814-0EC4-4ABB-9752-6C5F910F7F8B@RichardElling.com>


On Apr 21, 2014, at 9:19 AM, Schweiss, Chip <chip at innovates.com> wrote:

> I suspecting these drives have self-destructed.   
> 
> Can anyone confirm this firmware issue causes the drives to permanently go offline?

They are fine. FMA retires them, so you have to coerce the OS to reinstantiate them.
In my case, they were in the lab, and we reinstall OSes continuously, so it wasn't a 
problem for us :-) You might have a look at cfgadm -al, and see if it is in a state that
can be coerced... the docs are poor in this area :-( and this is not a frequent operation :-)
 -- richard

--

Richard.Elling at RichardElling.com
+1-760-896-4422


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140422/8b87b1ac/attachment.html>

From chip at innovates.com  Tue Apr 22 16:36:24 2014
From: chip at innovates.com (Schweiss, Chip)
Date: Tue, 22 Apr 2014 11:36:24 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <0DFDE814-0EC4-4ABB-9752-6C5F910F7F8B@RichardElling.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<535197C4.9010507@gmail.com>
	<CALeZrrQcCTK2JtDxUQYt5pkM=EzRV5TU8BAdz-kStgkrNoWrbQ@mail.gmail.com>
	<CALeZrrQ1PxoiTnGCrq1tCGVK1=Sjd5N8Dx=deqL4w2YYXrYe+A@mail.gmail.com>
	<0DFDE814-0EC4-4ABB-9752-6C5F910F7F8B@RichardElling.com>
Message-ID: <CALeZrrRGv-N0RQAepAJE1g28xJNisD=Ni4+dbF3XtNG7CtzjCQ@mail.gmail.com>

On Tue, Apr 22, 2014 at 11:15 AM, Richard Elling <
richard.elling at richardelling.com> wrote:

>
> On Apr 21, 2014, at 9:19 AM, Schweiss, Chip <chip at innovates.com> wrote:
>
> I suspecting these drives have self-destructed.
>
> Can anyone confirm this firmware issue causes the drives to permanently go
> offline?
>
>
> They are fine. FMA retires them, so you have to coerce the OS to
> reinstantiate them.
> In my case, they were in the lab, and we reinstall OSes continuously, so
> it wasn't a
> problem for us :-) You might have a look at cfgadm -al, and see if it is
> in a state that
> can be coerced... the docs are poor in this area :-( and this is not a
> frequent operation :-)
>  -- richard
>

After running devfsadm -C the device stubs aren't there anymore.   They
don't show up in 'cfgadm -al'

I can see them from the HBA BIOs.   So I'm still leaning towards the disk
are okay, but OmniOS refuses to talk to them.

So once 'retired', even marking the device repaired will not allow it to be
mounted?

-Chip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140422/7c56fc4b/attachment.html>

From hakansom at ohsu.edu  Tue Apr 22 17:15:48 2014
From: hakansom at ohsu.edu (Marion Hakanson)
Date: Tue, 22 Apr 2014 10:15:48 -0700
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: Message from Saso Kiselkov <skiselkov.ml@gmail.com>
	of "Tue, 22 Apr 2014 11:36:03 +0200." <53563803.5030905@gmail.com>
Message-ID: <201404221715.s3MHFm4B000071@kyklops.ohsu.edu>

skiselkov.ml at gmail.com said:
> Sadly, I had no luck with either lsiutil or sg_write_buffer from sg3-utils.
> lsiutil is only for older MPT HBAs (I have an MPT 2.0 one) and
> sg_write_buffer fails with the following error: 
> . . .
> Staring at the code near usr/src/uts/i86pc/io/rootnex.c:3305, this means that
> the driver can't submit a DMA job this large, which means that I can't really
> fix this at all (this is really way outside of my field).
> 
> Any ideas on what to do next? 

Have any of you tried the "fwflash" utility (comes with OmniOS, oi151a7, etc.)?
When I do "fwflash -l" it does list out the Seagate 2TB and 4TB drives on a
couple of our systems here (multipath enabled).  I don't have any drives
needing firmware updates, so haven't tested out that functionality yet.

Regards,

Marion


From skiselkov.ml at gmail.com  Tue Apr 22 17:58:58 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Tue, 22 Apr 2014 19:58:58 +0200
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <CALeZrrS1==6tJy+3wXRuWAV-7z=P_9XSGSWpvduSZ5DapRUmGw@mail.gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<53563803.5030905@gmail.com>
	<CALeZrrS1==6tJy+3wXRuWAV-7z=P_9XSGSWpvduSZ5DapRUmGw@mail.gmail.com>
Message-ID: <5356ADE2.5070605@gmail.com>

On 4/22/14, 5:03 PM, Schweiss, Chip wrote:
> Are you sure you have SAS multipath disabled on the disk you are trying
> to flash?
> 
> I couldn't get these to flash at all with MP enabled.  I too kept
> getting OS related errors.
> 
> For one system I did an stmsboot -d, for another I just pulled one of
> the SAS cables to each JBOD.

Oh, you're right, hadn't considered that. I'll have to try this out,
even though it means downtime.

Cheers,
-- 
Saso

From richard.elling at richardelling.com  Tue Apr 22 20:08:57 2014
From: richard.elling at richardelling.com (Richard Elling)
Date: Tue, 22 Apr 2014 13:08:57 -0700
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <5356ADE2.5070605@gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<53563803.5030905@gmail.com>
	<CALeZrrS1==6tJy+3wXRuWAV-7z=P_9XSGSWpvduSZ5DapRUmGw@mail.gmail.com>
	<5356ADE2.5070605@gmail.com>
Message-ID: <C18F6433-53E7-4863-B2E0-167D147BF396@RichardElling.com>

On Apr 22, 2014, at 10:58 AM, Saso Kiselkov <skiselkov.ml at gmail.com> wrote:

> On 4/22/14, 5:03 PM, Schweiss, Chip wrote:
>> Are you sure you have SAS multipath disabled on the disk you are trying
>> to flash?
>> 
>> I couldn't get these to flash at all with MP enabled.  I too kept
>> getting OS related errors.
>> 
>> For one system I did an stmsboot -d, for another I just pulled one of
>> the SAS cables to each JBOD.
> 
> Oh, you're right, hadn't considered that. I'll have to try this out,
> even though it means downtime.

mpathadm(1m) allows you to enable/disable paths on the fly, without pulling cables.
 -- richard

--

Richard.Elling at RichardElling.com
+1-760-896-4422


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140422/2069649e/attachment-0001.html>

From chip at innovates.com  Tue Apr 22 20:10:35 2014
From: chip at innovates.com (Schweiss, Chip)
Date: Tue, 22 Apr 2014 15:10:35 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <C18F6433-53E7-4863-B2E0-167D147BF396@RichardElling.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<53563803.5030905@gmail.com>
	<CALeZrrS1==6tJy+3wXRuWAV-7z=P_9XSGSWpvduSZ5DapRUmGw@mail.gmail.com>
	<5356ADE2.5070605@gmail.com>
	<C18F6433-53E7-4863-B2E0-167D147BF396@RichardElling.com>
Message-ID: <CALeZrrS93dKRGj+J++8dr7=C20rFpUFRdkTxq92p-wyiFuGECg@mail.gmail.com>

mpathadm also panics the kernel on OmniOS if there are any offline disks.

Proceed with caution.


On Tue, Apr 22, 2014 at 3:08 PM, Richard Elling <
richard.elling at richardelling.com> wrote:

> On Apr 22, 2014, at 10:58 AM, Saso Kiselkov <skiselkov.ml at gmail.com>
> wrote:
>
> On 4/22/14, 5:03 PM, Schweiss, Chip wrote:
>
> Are you sure you have SAS multipath disabled on the disk you are trying
> to flash?
>
> I couldn't get these to flash at all with MP enabled.  I too kept
> getting OS related errors.
>
> For one system I did an stmsboot -d, for another I just pulled one of
> the SAS cables to each JBOD.
>
>
> Oh, you're right, hadn't considered that. I'll have to try this out,
> even though it means downtime.
>
>
> mpathadm(1m) allows you to enable/disable paths on the fly, without
> pulling cables.
>  -- richard
>
>   --
>
> Richard.Elling at RichardElling.com
> +1-760-896-4422
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140422/fde59736/attachment.html>

From skiselkov.ml at gmail.com  Tue Apr 22 20:17:15 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Tue, 22 Apr 2014 22:17:15 +0200
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <C18F6433-53E7-4863-B2E0-167D147BF396@RichardElling.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<53563803.5030905@gmail.com>
	<CALeZrrS1==6tJy+3wXRuWAV-7z=P_9XSGSWpvduSZ5DapRUmGw@mail.gmail.com>
	<5356ADE2.5070605@gmail.com>
	<C18F6433-53E7-4863-B2E0-167D147BF396@RichardElling.com>
Message-ID: <5356CE4B.50606@gmail.com>

On 4/22/14, 10:08 PM, Richard Elling wrote:
> On Apr 22, 2014, at 10:58 AM, Saso Kiselkov <skiselkov.ml at gmail.com
> <mailto:skiselkov.ml at gmail.com>> wrote:
> 
>> On 4/22/14, 5:03 PM, Schweiss, Chip wrote:
>>> Are you sure you have SAS multipath disabled on the disk you are trying
>>> to flash?
>>>
>>> I couldn't get these to flash at all with MP enabled.  I too kept
>>> getting OS related errors.
>>>
>>> For one system I did an stmsboot -d, for another I just pulled one of
>>> the SAS cables to each JBOD.
>>
>> Oh, you're right, hadn't considered that. I'll have to try this out,
>> even though it means downtime.
> 
> mpathadm(1m) allows you to enable/disable paths on the fly, without
> pulling cables.

I know, but if I understand it correctly, I need to not only disable a
particular path, I need to disable mpath support entirely to get
sg_write_buffer to talk to mpt_sas directly, instead of going through
the scsi_vhci glob in the middle (which, presumably, is what's causing
this problem). If I'm misunderstanding this, please do set me straight.

Cheers,
-- 
Saso

From chip at innovates.com  Tue Apr 22 20:31:59 2014
From: chip at innovates.com (Schweiss, Chip)
Date: Tue, 22 Apr 2014 15:31:59 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <5356CE4B.50606@gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<53563803.5030905@gmail.com>
	<CALeZrrS1==6tJy+3wXRuWAV-7z=P_9XSGSWpvduSZ5DapRUmGw@mail.gmail.com>
	<5356ADE2.5070605@gmail.com>
	<C18F6433-53E7-4863-B2E0-167D147BF396@RichardElling.com>
	<5356CE4B.50606@gmail.com>
Message-ID: <CALeZrrS06iSg_0Za-HnjXo9rCQjq8jS8Oh+Y3iChMEcN6w1pHw@mail.gmail.com>

On Tue, Apr 22, 2014 at 3:17 PM, Saso Kiselkov <skiselkov.ml at gmail.com>wrote:

>
> I know, but if I understand it correctly, I need to not only disable a
> particular path, I need to disable mpath support entirely to get
> sg_write_buffer to talk to mpt_sas directly, instead of going through
> the scsi_vhci glob in the middle (which, presumably, is what's causing
> this problem). If I'm misunderstanding this, please do set me straight.
>
> Cheers,
> --
> Saso
>

Actually no.  Disabling a physical path works too.   That is how I stumbled
upon the MP issue.  I plugged one of my paths into a second server to
attempt using Linux to flash the firmware.   When the flash started working
from the primary server, I never loaded Linux in the second server.

I think the problem is actually in the disk accepting firmware via
multipath not so much the OS.  The OS throws the error when a message down
a second path gets rejected by the drive.

-Chip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140422/86c833eb/attachment.html>

From skiselkov.ml at gmail.com  Tue Apr 22 21:02:39 2014
From: skiselkov.ml at gmail.com (Saso Kiselkov)
Date: Tue, 22 Apr 2014 23:02:39 +0200
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <CALeZrrS06iSg_0Za-HnjXo9rCQjq8jS8Oh+Y3iChMEcN6w1pHw@mail.gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<53563803.5030905@gmail.com>
	<CALeZrrS1==6tJy+3wXRuWAV-7z=P_9XSGSWpvduSZ5DapRUmGw@mail.gmail.com>
	<5356ADE2.5070605@gmail.com>
	<C18F6433-53E7-4863-B2E0-167D147BF396@RichardElling.com>
	<5356CE4B.50606@gmail.com>
	<CALeZrrS06iSg_0Za-HnjXo9rCQjq8jS8Oh+Y3iChMEcN6w1pHw@mail.gmail.com>
Message-ID: <5356D8EF.7020604@gmail.com>

On 4/22/14, 10:31 PM, Schweiss, Chip wrote:
> 
> On Tue, Apr 22, 2014 at 3:17 PM, Saso Kiselkov <skiselkov.ml at gmail.com
> <mailto:skiselkov.ml at gmail.com>> wrote:
> 
> 
>     I know, but if I understand it correctly, I need to not only disable a
>     particular path, I need to disable mpath support entirely to get
>     sg_write_buffer to talk to mpt_sas directly, instead of going through
>     the scsi_vhci glob in the middle (which, presumably, is what's causing
>     this problem). If I'm misunderstanding this, please do set me straight.
> 
>     Cheers,
>     --
>     Saso
> 
> 
> Actually no.  Disabling a physical path works too.   That is how I
> stumbled upon the MP issue.  I plugged one of my paths into a second
> server to attempt using Linux to flash the firmware.   When the flash
> started working from the primary server, I never loaded Linux in the
> second server.
> 
> I think the problem is actually in the disk accepting firmware via
> multipath not so much the OS.  The OS throws the error when a message
> down a second path gets rejected by the drive. 

Still no luck, though it's possible I'm doing it wrong:

# mpathadm disable path -l /dev/rdsk/c9t5000C500578F774Bd0s2 \
  -i w5b8ca3a0e5029c00 -t w5000c500578f774a

# mpathadm show lu /dev/rdsk/c9t5000C500578F774Bd0s2
Logical Unit:  /dev/rdsk/c9t5000C500578F774Bd0s2
        mpath-support:  libmpscsi_vhci.so
        Vendor:  SEAGATE
        Product:  ST2000NM0023
        Revision:  0003
        Name Type:  unknown type
        Name:  5000c500578f774b
        Asymmetric:  no
        Current Load Balance:  round-robin
        Logical Unit Group ID:  NA
        Auto Failback:  on
        Auto Probing:  NA

        Paths:
                Initiator Port Name:  w5b8ca3a0e5029c00
                Target Port Name:  w5000c500578f774a
                Override Path:  NA
                Path State:  OK
                Disabled:  yes

                Initiator Port Name:  w5b8ca3a0e5029c00
                Target Port Name:  w5000c500578f7749
                Override Path:  NA
                Path State:  OK
                Disabled:  no

        Target Ports:
                Name:  w5000c500578f774a
                Relative ID:  0

                Name:  w5000c500578f7749
                Relative ID:  0

# sg_write_buffer -v --in=MegalodonES3-SAS-STD-0004.LOD \
  --length=1625600 --mode=5 /dev/rdsk/c9t5000C500578F774Bd0
    Write buffer cmd: 3b 05 00 00 00 00 18 ce 00 00
ioctl(USCSICMD) failed with os_err (errno) = 22
write buffer: pass through os error: Invalid argument
Write buffer failed res=-1

The situation is the same regardless of which path I disable. At the
point of the sg_write_buffer, I also get a single SCSI error logged by
"iostat -E", so it's clear there's something wrong going on on the SCSI
bus. I suspect it might have something to do with what you mentioned,
but I'm just no SCSI guru to figure this out.

Cheers,
-- 
Saso

From richard.elling at richardelling.com  Tue Apr 22 22:42:51 2014
From: richard.elling at richardelling.com (Richard Elling)
Date: Tue, 22 Apr 2014 15:42:51 -0700
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <5356D8EF.7020604@gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<53563803.5030905@gmail.com>
	<CALeZrrS1==6tJy+3wXRuWAV-7z=P_9XSGSWpvduSZ5DapRUmGw@mail.gmail.com>
	<5356ADE2.5070605@gmail.com>
	<C18F6433-53E7-4863-B2E0-167D147BF396@RichardElling.com>
	<5356CE4B.50606@gmail.com>
	<CALeZrrS06iSg_0Za-HnjXo9rCQjq8jS8Oh+Y3iChMEcN6w1pHw@mail.gmail.com>
	<5356D8EF.7020604@gmail.com>
Message-ID: <D37A68AB-5FBE-4182-97A4-5E60E6CB5226@RichardElling.com>

going out on a limb...

On Apr 22, 2014, at 2:02 PM, Saso Kiselkov <skiselkov.ml at gmail.com> wrote:

> On 4/22/14, 10:31 PM, Schweiss, Chip wrote:
>> 
>> On Tue, Apr 22, 2014 at 3:17 PM, Saso Kiselkov <skiselkov.ml at gmail.com
>> <mailto:skiselkov.ml at gmail.com>> wrote:
>> 
>> 
>>    I know, but if I understand it correctly, I need to not only disable a
>>    particular path, I need to disable mpath support entirely to get
>>    sg_write_buffer to talk to mpt_sas directly, instead of going through
>>    the scsi_vhci glob in the middle (which, presumably, is what's causing
>>    this problem). If I'm misunderstanding this, please do set me straight.
>> 
>>    Cheers,
>>    --
>>    Saso
>> 
>> 
>> Actually no.  Disabling a physical path works too.   That is how I
>> stumbled upon the MP issue.  I plugged one of my paths into a second
>> server to attempt using Linux to flash the firmware.   When the flash
>> started working from the primary server, I never loaded Linux in the
>> second server.
>> 
>> I think the problem is actually in the disk accepting firmware via
>> multipath not so much the OS.  The OS throws the error when a message
>> down a second path gets rejected by the drive. 

This is plausible. The default multipath policy of round-robin means that it will
chop up such big transfers across both ports. One would think that the drives
would treat this as one server, multiple queues, but my recent experience with
drive firmware bugs reaffirms the old adage: never assume anything.

> 
> Still no luck, though it's possible I'm doing it wrong:
> 
> # mpathadm disable path -l /dev/rdsk/c9t5000C500578F774Bd0s2 \
>  -i w5b8ca3a0e5029c00 -t w5000c500578f774a
> 
> # mpathadm show lu /dev/rdsk/c9t5000C500578F774Bd0s2
> Logical Unit:  /dev/rdsk/c9t5000C500578F774Bd0s2
>        mpath-support:  libmpscsi_vhci.so
>        Vendor:  SEAGATE
>        Product:  ST2000NM0023
>        Revision:  0003
>        Name Type:  unknown type
>        Name:  5000c500578f774b
>        Asymmetric:  no
>        Current Load Balance:  round-robin
>        Logical Unit Group ID:  NA
>        Auto Failback:  on
>        Auto Probing:  NA
> 
>        Paths:
>                Initiator Port Name:  w5b8ca3a0e5029c00
>                Target Port Name:  w5000c500578f774a
>                Override Path:  NA
>                Path State:  OK
>                Disabled:  yes
> 
>                Initiator Port Name:  w5b8ca3a0e5029c00
>                Target Port Name:  w5000c500578f7749
>                Override Path:  NA
>                Path State:  OK
>                Disabled:  no

The other lesson I've learned recently is that some drive firmware is 
keyed to look at one port over the other for certain operations :-(
While I have no knowledge or suspicion of it in this specific case, you might
try switching ports.

> 
>        Target Ports:
>                Name:  w5000c500578f774a
>                Relative ID:  0
> 
>                Name:  w5000c500578f7749
>                Relative ID:  0
> 
> # sg_write_buffer -v --in=MegalodonES3-SAS-STD-0004.LOD \
>  --length=1625600 --mode=5 /dev/rdsk/c9t5000C500578F774Bd0
>    Write buffer cmd: 3b 05 00 00 00 00 18 ce 00 00
> ioctl(USCSICMD) failed with os_err (errno) = 22
> write buffer: pass through os error: Invalid argument
> Write buffer failed res=-1
> 
> The situation is the same regardless of which path I disable. At the
> point of the sg_write_buffer, I also get a single SCSI error logged by
> "iostat -E", so it's clear there's something wrong going on on the SCSI
> bus. I suspect it might have something to do with what you mentioned,
> but I'm just no SCSI guru to figure this out.

fmdump -eV shows SCSI error reports in detail. 
 -- richard 

--

Richard.Elling at RichardElling.com
+1-760-896-4422


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140422/0b706e0a/attachment-0001.html>

From chip at innovates.com  Wed Apr 23 18:06:54 2014
From: chip at innovates.com (Schweiss, Chip)
Date: Wed, 23 Apr 2014 13:06:54 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <5356D8EF.7020604@gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<53563803.5030905@gmail.com>
	<CALeZrrS1==6tJy+3wXRuWAV-7z=P_9XSGSWpvduSZ5DapRUmGw@mail.gmail.com>
	<5356ADE2.5070605@gmail.com>
	<C18F6433-53E7-4863-B2E0-167D147BF396@RichardElling.com>
	<5356CE4B.50606@gmail.com>
	<CALeZrrS06iSg_0Za-HnjXo9rCQjq8jS8Oh+Y3iChMEcN6w1pHw@mail.gmail.com>
	<5356D8EF.7020604@gmail.com>
Message-ID: <CALeZrrQd035jwy_u-TmcVzcRmP3V6VTkW0fGEGgb=ZcVjYSFpA@mail.gmail.com>

On Tue, Apr 22, 2014 at 4:02 PM, Saso Kiselkov <skiselkov.ml at gmail.com>wrote:

>
> # sg_write_buffer -v --in=MegalodonES3-SAS-STD-0004.LOD \
>   --length=1625600 --mode=5 /dev/rdsk/c9t5000C500578F774Bd0
>     Write buffer cmd: 3b 05 00 00 00 00 18 ce 00 00
> ioctl(USCSICMD) failed with os_err (errno) = 22
> write buffer: pass through os error: Invalid argument
> Write buffer failed res=-1
>
> The situation is the same regardless of which path I disable. At the
> point of the sg_write_buffer, I also get a single SCSI error logged by
> "iostat -E", so it's clear there's something wrong going on on the SCSI
> bus. I suspect it might have something to do with what you mentioned,
> but I'm just no SCSI guru to figure this out.
>
> Cheers,
> --
> Saso
>

Like I said I use Santools.  However, David Lethe, the author of Santools,
who was a great help to me in working through this, informed me that from
Solaris sg_write_buffer should be set to --mode-7 and possibly even set
--length to 16384.    I have not tested this.

For me Santools has been well worth it's investment on every ZFS server
I've deployed.

-Chip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140423/9cf140ca/attachment.html>

From chip at innovates.com  Wed Apr 23 18:25:45 2014
From: chip at innovates.com (Schweiss, Chip)
Date: Wed, 23 Apr 2014 13:25:45 -0500
Subject: [OmniOS-discuss] Overheating faults with ST4000NM0023
In-Reply-To: <CALeZrrRGv-N0RQAepAJE1g28xJNisD=Ni4+dbF3XtNG7CtzjCQ@mail.gmail.com>
References: <6B4199959CFA4E1CB067BBF9270654CB@486dx4>
	<52FC9C12.9090900@smartjog.com> <534DB2FC.9090404@gmail.com>
	<CALeZrrRDJcykO+ez_s8NjzvzHUHvA=KHq1R5dFcN_q40eeN5uA@mail.gmail.com>
	<534FFE1B.2060201@gmail.com>
	<CALeZrrQz+iTdrmbbZaCnrTz0SGTGFuz7KQn0xfyUJGVdAa__jw@mail.gmail.com>
	<53501420.9090307@gmail.com>
	<CALeZrrSyGY=WmcJ61nTP8pvT5heRL8kGxF4Rak39cGo8GHjq=w@mail.gmail.com>
	<53518C7D.7060703@gmail.com>
	<CALeZrrSL3qMwnvg8ZGcq90ZOddqSivrm-z5eO+tEG=HhkoPKNg@mail.gmail.com>
	<535197C4.9010507@gmail.com>
	<CALeZrrQcCTK2JtDxUQYt5pkM=EzRV5TU8BAdz-kStgkrNoWrbQ@mail.gmail.com>
	<CALeZrrQ1PxoiTnGCrq1tCGVK1=Sjd5N8Dx=deqL4w2YYXrYe+A@mail.gmail.com>
	<0DFDE814-0EC4-4ABB-9752-6C5F910F7F8B@RichardElling.com>
	<CALeZrrRGv-N0RQAepAJE1g28xJNisD=Ni4+dbF3XtNG7CtzjCQ@mail.gmail.com>
Message-ID: <CALeZrrSTcu2ymxJhCQN9jeSLooOAhLdvTemLDzgmmr6SAGjt0Q@mail.gmail.com>

I can confirm the disks are fine.  Getting around FMA is darn near
impossible from the information I've collected.

I attached the disks to another server still running OmniOS, but disable
FMA  service before doing so.   I then flashed 0004 firmware to these
disks.   Upon reboot the original server now sees the disks just fine.

There has to be a way to "un-retire" disks so they can be flashed, but I
have not found such a way.

-Chip


On Tue, Apr 22, 2014 at 11:36 AM, Schweiss, Chip <chip at innovates.com> wrote:

>
> On Tue, Apr 22, 2014 at 11:15 AM, Richard Elling <
> richard.elling at richardelling.com> wrote:
>
>>
>> On Apr 21, 2014, at 9:19 AM, Schweiss, Chip <chip at innovates.com> wrote:
>>
>> I suspecting these drives have self-destructed.
>>
>> Can anyone confirm this firmware issue causes the drives to permanently
>> go offline?
>>
>>
>> They are fine. FMA retires them, so you have to coerce the OS to
>> reinstantiate them.
>> In my case, they were in the lab, and we reinstall OSes continuously, so
>> it wasn't a
>> problem for us :-) You might have a look at cfgadm -al, and see if it is
>> in a state that
>> can be coerced... the docs are poor in this area :-( and this is not a
>> frequent operation :-)
>>  -- richard
>>
>
> After running devfsadm -C the device stubs aren't there anymore.   They
> don't show up in 'cfgadm -al'
>
> I can see them from the HBA BIOs.   So I'm still leaning towards the disk
> are okay, but OmniOS refuses to talk to them.
>
> So once 'retired', even marking the device repaired will not allow it to
> be mounted?
>
> -Chip
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140423/03ff8916/attachment.html>

From kai at meder.info  Sun Apr 27 16:24:00 2014
From: kai at meder.info (Kai Meder)
Date: Sun, 27 Apr 2014 18:24:00 +0200
Subject: [OmniOS-discuss] Install on (not from) USB-Stick
Message-ID: <ljjav3$50n$1@ger.gmane.org>

Hello,

is it possible to install an OmniOS to an USB2-Stick for production 
home-use of a ZFS-NAS without trashing the stick to death?

Any recommendations, advice, proven USB-Sticks?
Thanks alot


From lists at marzocchi.net  Sun Apr 27 16:41:42 2014
From: lists at marzocchi.net (Olaf Marzocchi)
Date: Sun, 27 Apr 2014 18:41:42 +0200
Subject: [OmniOS-discuss] Install on (not from) USB-Stick
In-Reply-To: <ljjav3$50n$1@ger.gmane.org>
References: <ljjav3$50n$1@ger.gmane.org>
Message-ID: <4D0B3F97-E47D-41D3-A9CB-384B66A2E054@marzocchi.net>

Copy on write reduces strain on the flash units, but also you can take a branded SD card and it should have wear leveling. If the size is quite bigger than what you need, the wear leveling will be effective.

http://electronics.stackexchange.com/questions/27619/is-it-true-that-a-sd-mmc-card-does-wear-levelling-with-its-own-controller

Olaf


Il giorno 27/apr/2014, alle ore 18:24, Kai Meder <kai at meder.info> ha scritto:

> 
> Hello,
> 
> is it possible to install an OmniOS to an USB2-Stick for production home-use of a ZFS-NAS without trashing the stick to death?
> 
> Any recommendations, advice, proven USB-Sticks?
> Thanks alot
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss


From kai at meder.info  Sun Apr 27 20:57:57 2014
From: kai at meder.info (Kai Meder)
Date: Sun, 27 Apr 2014 22:57:57 +0200
Subject: [OmniOS-discuss] OmniOS and Intel Atom Avoton vs. Rangeley
Message-ID: <ljjr0o$nhe$1@ger.gmane.org>

Hello,

I am about to buy either the new Intel Atom C2750 "Avoton" or C2758 
"Rangeley", difference being higher Turbo Clocks (Avoton) vs. Intel 
QuickAssist support (Rangeley).

Does OmniOS take any advantage of Rangeleys QuickAssist featureset or is 
any support forseeable in the future?

Avoton is about 30 EUR more expensive than Rangeley, so if there is only 
the slightest support for Rangeley I would buy it and save some money. 
However, if there is absolutely no point in buying Rangeleys 
QuickAssist-thingy, I will choose Avoton's higher clock speeds and its 
30 eur premium...

Thanks


From kai at meder.info  Sun Apr 27 21:17:56 2014
From: kai at meder.info (Kai Meder)
Date: Sun, 27 Apr 2014 23:17:56 +0200
Subject: [OmniOS-discuss] Install on (not from) USB-Stick
In-Reply-To: <4D0B3F97-E47D-41D3-A9CB-384B66A2E054@marzocchi.net>
References: <ljjav3$50n$1@ger.gmane.org>
	<4D0B3F97-E47D-41D3-A9CB-384B66A2E054@marzocchi.net>
Message-ID: <535D7404.3080908@meder.info>

Thank you,
do you think it's feasible to install OmniOS to a normal SanDisk 
MicroSDHC "Ultra" 16GB Class10 Card, via a normal MicroSDHC-USB Adapter?
My current installation takes only 6GB.

I am currently investigating whether their "Ultra"-series also support 
any Wear-Leveling of if it is a featureset only available to their top 
"Extreme" lines.

Is a modern Lexar/SanDisk USB3-stick >16GB OK as well or is SD to be 
preferred in principle?

Thanks

Olaf Marzocchi schrieb:
> Copy on write reduces strain on the flash units, but also you can take a branded SD card and it should have wear leveling. If the size is quite bigger than what you need, the wear leveling will be effective.
>
> http://electronics.stackexchange.com/questions/27619/is-it-true-that-a-sd-mmc-card-does-wear-levelling-with-its-own-controller
>
> Olaf
>
>
>
> Il giorno 27/apr/2014, alle ore 18:24, Kai Meder<kai at meder.info>  ha scritto:
>
>> Hello,
>>
>> is it possible to install an OmniOS to an USB2-Stick for production home-use of a ZFS-NAS without trashing the stick to death?
>>
>> Any recommendations, advice, proven USB-Sticks?
>> Thanks alot
>>
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>


From mmabis at vmware.com  Mon Apr 28 05:15:52 2014
From: mmabis at vmware.com (Matthew Mabis)
Date: Sun, 27 Apr 2014 22:15:52 -0700 (PDT)
Subject: [OmniOS-discuss] Install on (not from) USB-Stick
In-Reply-To: <4D0B3F97-E47D-41D3-A9CB-384B66A2E054@marzocchi.net>
References: <ljjav3$50n$1@ger.gmane.org>
	<4D0B3F97-E47D-41D3-A9CB-384B66A2E054@marzocchi.net>
Message-ID: <2002310730.192526.1398662152058.JavaMail.root@vmware.com>

Kai,

Be careful of who you buy your board from, when i bought the ASRock C2750D4I Support gave me a lot of crap because it wasn't on their supported matrix.  I had to install new OS's just to prove the issue wasn't Solaris related.

I sold off that board because i was having a lot of issues with Freezing on the board.  Just some Friendly advice!


Matt Mabis 
Sr. Consultant PSO (End User Computing) 
VCA-DCV/WM,VCP-DCV/DT,VCAP-DCA/DCD/DTD 
mmabis at vmware.com 
3401 Hillview Avenue, Palo Alto, CA 94304 
530.481.5405 Mobile 


----- Original Message -----
From: "Olaf Marzocchi" <lists at marzocchi.net>
To: "Kai Meder" <kai at meder.info>
Cc: omnios-discuss at lists.omniti.com
Sent: Sunday, April 27, 2014 4:41:42 PM
Subject: Re: [OmniOS-discuss] Install on (not from) USB-Stick

Copy on write reduces strain on the flash units, but also you can take a branded SD card and it should have wear leveling. If the size is quite bigger than what you need, the wear leveling will be effective.

https://urldefense.proofpoint.com/v1/url?u=http://electronics.stackexchange.com/questions/27619/is-it-true-that-a-sd-mmc-card-does-wear-levelling-with-its-own-controller&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=yqgQ6LhGnfWMd79QvLrmWsnr%2FlpWj5c0oy4MpT8%2Bgik%3D%0A&m=se%2BvXDB3CI3L3%2FQPMz4fmFFsrRvOrUEIDCt0Ku4x9Pg%3D%0A&s=b2d780ad73f7d9792c9d9ec4355a28f09cf5903fc803b5c91af11f21d26ae0ac

Olaf


Il giorno 27/apr/2014, alle ore 18:24, Kai Meder <kai at meder.info> ha scritto:

> 
> Hello,
> 
> is it possible to install an OmniOS to an USB2-Stick for production home-use of a ZFS-NAS without trashing the stick to death?
> 
> Any recommendations, advice, proven USB-Sticks?
> Thanks alot
> 
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> https://urldefense.proofpoint.com/v1/url?u=http://lists.omniti.com/mailman/listinfo/omnios-discuss&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=yqgQ6LhGnfWMd79QvLrmWsnr%2FlpWj5c0oy4MpT8%2Bgik%3D%0A&m=se%2BvXDB3CI3L3%2FQPMz4fmFFsrRvOrUEIDCt0Ku4x9Pg%3D%0A&s=64e517ff0bca2a5e2cfa37932123dd5385db5eda81da229c500718942a55194d

_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss at lists.omniti.com
https://urldefense.proofpoint.com/v1/url?u=http://lists.omniti.com/mailman/listinfo/omnios-discuss&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=yqgQ6LhGnfWMd79QvLrmWsnr%2FlpWj5c0oy4MpT8%2Bgik%3D%0A&m=se%2BvXDB3CI3L3%2FQPMz4fmFFsrRvOrUEIDCt0Ku4x9Pg%3D%0A&s=64e517ff0bca2a5e2cfa37932123dd5385db5eda81da229c500718942a55194d

From alex.ranskis at gmail.com  Mon Apr 28 13:17:44 2014
From: alex.ranskis at gmail.com (Alex)
Date: Mon, 28 Apr 2014 15:17:44 +0200
Subject: [OmniOS-discuss] "zpool import" triggers deadlock in somes cases ?
	(metaslab_group_taskqs)
Message-ID: <CA+VdLjBGsF00WA+72M3WE=BFXHaNgttvWB8K1EgGuOf7L2UC0Q@mail.gmail.com>

Hello,

I'm trying to understand this behavior, which I see on servers connected to
an external disk enclosure. (I cannot reproduce it on a simple 1 disk VM)

# kstat -c taskq | grep metaslab_group_tasksq| wc -l
1112

# zpool import >/dev/null

# kstat -c taskq | grep metaslab_group_tasksq| wc -l
1160


we are accumulating 'metaslab_group_taskqs'

module: unix                            instance: 513
name:   metaslab_group_tasksq           class:    taskq
        crtime                          842173.739164514
        executed                        0
        maxtasks                        0
        nactive                         0
        nalloc                          0
        pid                             0
        priority                        60
        snaptime                        842774.7092530ok 06
        tasks                           0
        threads                         3
        totaltime                       0


The "zpool import" command itself runs fine. I get the same behavior
whether there are pools to import or not.

but kernel threads are piling up, for each CV there are 3 threads :
> ffffff05844fe080::wchaninfo -v
ADDR             TYPE NWAITERS   THREAD           PROC
ffffff05844fe080 cond        3:  ffffff0021c58c40 sched
                                 ffffff0021c5ec40 sched
                                 ffffff0021c64c40 sched

and they're all blocking, with a similar stack :
> ffffff0021c58c40::findstack -v
stack pointer for thread ffffff0021c58c40: ffffff0021c58a80
[ ffffff0021c58a80 _resume_from_idle+0xf4() ]
  ffffff0021c58ab0 swtch+0x141()
  ffffff0021c58af0 cv_wait+0x70(ffffff05844fe080, ffffff05844fe070)
  ffffff0021c58b60 taskq_thread_wait+0xbe(ffffff05844fe050,
ffffff05844fe070, ffffff05844fe080, ffffff0021c58bc0, ffffffffffffffff)
  ffffff0021c58c20 taskq_thread+0x37c(ffffff05844fe050)
  ffffff0021c58c30 thread_start+8()


the taskq seems to be created by a call to metaslab_group_create(), here :
              zfs`vdev_alloc+0x54a
              zfs`spa_config_parse+0x48
              zfs`spa_config_parse+0xda
              zfs`spa_config_valid+0x78
              zfs`spa_load_impl+0xa81
              zfs`spa_load+0x14e
              zfs`spa_tryimport+0xaa
              zfs`zfs_ioc_pool_tryimport+0x51
              zfs`zfsdev_ioctl+0x4a7
              genunix`cdev_ioctl+0x39
              specfs`spec_ioctl+0x60
              genunix`fop_ioctl+0x55
              genunix`ioctl+0x9b
              unix`sys_syscall32+0xff


I'm out of my depth here, any pointer to investigate further would be much
appreciated !

cheers,
alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140428/0941c294/attachment.html>

From esproul at omniti.com  Mon Apr 28 14:01:04 2014
From: esproul at omniti.com (Eric Sproul)
Date: Mon, 28 Apr 2014 10:01:04 -0400
Subject: [OmniOS-discuss] OmniOS and Intel Atom Avoton vs. Rangeley
In-Reply-To: <ljjr0o$nhe$1@ger.gmane.org>
References: <ljjr0o$nhe$1@ger.gmane.org>
Message-ID: <CA+QY2RTaovYLxtqsqD85S-LYkvuzfwRtv=wFCmix7-mU8du1+w@mail.gmail.com>

On Sun, Apr 27, 2014 at 4:57 PM, Kai Meder <kai at meder.info> wrote:
> Hello,
>
> I am about to buy either the new Intel Atom C2750 "Avoton" or C2758
> "Rangeley", difference being higher Turbo Clocks (Avoton) vs. Intel
> QuickAssist support (Rangeley).
>
> Does OmniOS take any advantage of Rangeleys QuickAssist featureset or is any
> support forseeable in the future?

My guess is no.  I can't make sense of the marketing buzzword-laden
press releases that I see.  It seems to be aimed more at
single-purpose embedded use cases, and not likely to be found on
general-purpose deployments.

>
> Avoton is about 30 EUR more expensive than Rangeley, so if there is only the
> slightest support for Rangeley I would buy it and save some money. However,
> if there is absolutely no point in buying Rangeleys QuickAssist-thingy, I
> will choose Avoton's higher clock speeds and its 30 eur premium...

TurboBoost at least stands a chance of helping a typical OS workload.  :)

Eric

From youzhong at gmail.com  Mon Apr 28 14:22:18 2014
From: youzhong at gmail.com (Youzhong Yang)
Date: Mon, 28 Apr 2014 10:22:18 -0400
Subject: [OmniOS-discuss] "zpool import" triggers deadlock in somes
 cases ? (metaslab_group_taskqs)
In-Reply-To: <CA+VdLjBGsF00WA+72M3WE=BFXHaNgttvWB8K1EgGuOf7L2UC0Q@mail.gmail.com>
References: <CA+VdLjBGsF00WA+72M3WE=BFXHaNgttvWB8K1EgGuOf7L2UC0Q@mail.gmail.com>
Message-ID: <CADpNCvbdM4s3u862+i6YrHWW8SU0t8wBz8cEzeYoDBhj=AeVDg@mail.gmail.com>

This could be the following issue:

https://www.illumos.org/issues/4730


On Mon, Apr 28, 2014 at 9:17 AM, Alex <alex.ranskis at gmail.com> wrote:

> Hello,
>
> I'm trying to understand this behavior, which I see on servers connected
> to an external disk enclosure. (I cannot reproduce it on a simple 1 disk VM)
>
> # kstat -c taskq | grep metaslab_group_tasksq| wc -l
> 1112
>
> # zpool import >/dev/null
>
> # kstat -c taskq | grep metaslab_group_tasksq| wc -l
> 1160
>
>
> we are accumulating 'metaslab_group_taskqs'
>
> module: unix                            instance: 513
> name:   metaslab_group_tasksq           class:    taskq
>         crtime                          842173.739164514
>         executed                        0
>         maxtasks                        0
>         nactive                         0
>         nalloc                          0
>         pid                             0
>         priority                        60
>         snaptime                        842774.7092530ok 06
>         tasks                           0
>         threads                         3
>         totaltime                       0
>
>
> The "zpool import" command itself runs fine. I get the same behavior
> whether there are pools to import or not.
>
> but kernel threads are piling up, for each CV there are 3 threads :
> > ffffff05844fe080::wchaninfo -v
> ADDR             TYPE NWAITERS   THREAD           PROC
> ffffff05844fe080 cond        3:  ffffff0021c58c40 sched
>                                  ffffff0021c5ec40 sched
>                                  ffffff0021c64c40 sched
>
> and they're all blocking, with a similar stack :
> > ffffff0021c58c40::findstack -v
> stack pointer for thread ffffff0021c58c40: ffffff0021c58a80
> [ ffffff0021c58a80 _resume_from_idle+0xf4() ]
>   ffffff0021c58ab0 swtch+0x141()
>   ffffff0021c58af0 cv_wait+0x70(ffffff05844fe080, ffffff05844fe070)
>   ffffff0021c58b60 taskq_thread_wait+0xbe(ffffff05844fe050,
> ffffff05844fe070, ffffff05844fe080, ffffff0021c58bc0, ffffffffffffffff)
>   ffffff0021c58c20 taskq_thread+0x37c(ffffff05844fe050)
>   ffffff0021c58c30 thread_start+8()
>
>
> the taskq seems to be created by a call to metaslab_group_create(), here :
>               zfs`vdev_alloc+0x54a
>               zfs`spa_config_parse+0x48
>               zfs`spa_config_parse+0xda
>               zfs`spa_config_valid+0x78
>               zfs`spa_load_impl+0xa81
>               zfs`spa_load+0x14e
>               zfs`spa_tryimport+0xaa
>               zfs`zfs_ioc_pool_tryimport+0x51
>               zfs`zfsdev_ioctl+0x4a7
>               genunix`cdev_ioctl+0x39
>               specfs`spec_ioctl+0x60
>               genunix`fop_ioctl+0x55
>               genunix`ioctl+0x9b
>               unix`sys_syscall32+0xff
>
>
> I'm out of my depth here, any pointer to investigate further would be much
> appreciated !
>
> cheers,
> alex
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140428/60e1891b/attachment.html>

From alex.ranskis at gmail.com  Mon Apr 28 15:05:36 2014
From: alex.ranskis at gmail.com (Alex)
Date: Mon, 28 Apr 2014 17:05:36 +0200
Subject: [OmniOS-discuss] "zpool import" triggers deadlock in somes
 cases ? (metaslab_group_taskqs)
In-Reply-To: <CADpNCvbdM4s3u862+i6YrHWW8SU0t8wBz8cEzeYoDBhj=AeVDg@mail.gmail.com>
References: <CA+VdLjBGsF00WA+72M3WE=BFXHaNgttvWB8K1EgGuOf7L2UC0Q@mail.gmail.com>
	<CADpNCvbdM4s3u862+i6YrHWW8SU0t8wBz8cEzeYoDBhj=AeVDg@mail.gmail.com>
Message-ID: <CA+VdLjBwyhh8mkYcG5ex1v1j1-LVMMHv21d_QPYJbhxySiTpqQ@mail.gmail.com>

Hi,
Thanks for your feedback ! It does not hang in my case, but maybe it is
related anyway.


On 28 April 2014 16:22, Youzhong Yang <youzhong at gmail.com> wrote:

> This could be the following issue:
>
> https://www.illumos.org/issues/4730
>
>
>
> On Mon, Apr 28, 2014 at 9:17 AM, Alex <alex.ranskis at gmail.com> wrote:
>
>> Hello,
>>
>> I'm trying to understand this behavior, which I see on servers connected
>> to an external disk enclosure. (I cannot reproduce it on a simple 1 disk VM)
>>
>> # kstat -c taskq | grep metaslab_group_tasksq| wc -l
>> 1112
>>
>> # zpool import >/dev/null
>>
>> # kstat -c taskq | grep metaslab_group_tasksq| wc -l
>> 1160
>>
>>
>> we are accumulating 'metaslab_group_taskqs'
>>
>> module: unix                            instance: 513
>> name:   metaslab_group_tasksq           class:    taskq
>>         crtime                          842173.739164514
>>         executed                        0
>>         maxtasks                        0
>>         nactive                         0
>>         nalloc                          0
>>         pid                             0
>>         priority                        60
>>         snaptime                        842774.7092530ok 06
>>         tasks                           0
>>         threads                         3
>>         totaltime                       0
>>
>>
>> The "zpool import" command itself runs fine. I get the same behavior
>> whether there are pools to import or not.
>>
>> but kernel threads are piling up, for each CV there are 3 threads :
>> > ffffff05844fe080::wchaninfo -v
>> ADDR             TYPE NWAITERS   THREAD           PROC
>> ffffff05844fe080 cond        3:  ffffff0021c58c40 sched
>>                                  ffffff0021c5ec40 sched
>>                                  ffffff0021c64c40 sched
>>
>> and they're all blocking, with a similar stack :
>> > ffffff0021c58c40::findstack -v
>> stack pointer for thread ffffff0021c58c40: ffffff0021c58a80
>> [ ffffff0021c58a80 _resume_from_idle+0xf4() ]
>>   ffffff0021c58ab0 swtch+0x141()
>>   ffffff0021c58af0 cv_wait+0x70(ffffff05844fe080, ffffff05844fe070)
>>   ffffff0021c58b60 taskq_thread_wait+0xbe(ffffff05844fe050,
>> ffffff05844fe070, ffffff05844fe080, ffffff0021c58bc0, ffffffffffffffff)
>>   ffffff0021c58c20 taskq_thread+0x37c(ffffff05844fe050)
>>   ffffff0021c58c30 thread_start+8()
>>
>>
>> the taskq seems to be created by a call to metaslab_group_create(), here :
>>               zfs`vdev_alloc+0x54a
>>               zfs`spa_config_parse+0x48
>>               zfs`spa_config_parse+0xda
>>               zfs`spa_config_valid+0x78
>>               zfs`spa_load_impl+0xa81
>>               zfs`spa_load+0x14e
>>               zfs`spa_tryimport+0xaa
>>               zfs`zfs_ioc_pool_tryimport+0x51
>>               zfs`zfsdev_ioctl+0x4a7
>>               genunix`cdev_ioctl+0x39
>>               specfs`spec_ioctl+0x60
>>               genunix`fop_ioctl+0x55
>>               genunix`ioctl+0x9b
>>               unix`sys_syscall32+0xff
>>
>>
>> I'm out of my depth here, any pointer to investigate further would be
>> much appreciated !
>>
>> cheers,
>> alex
>>
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140428/fa163582/attachment-0001.html>

From steve at linuxsuite.org  Mon Apr 28 15:27:47 2014
From: steve at linuxsuite.org (steve at linuxsuite.org)
Date: Mon, 28 Apr 2014 11:27:47 -0400
Subject: [OmniOS-discuss] Hang on Dell R710 with r151004?
Message-ID: <8ed132750a7c6a7b91ba26904d99aea3.squirrel@emailmg.netfirms.com>

  Hi,

           I have 2 Dell R710's as a ZFS storage that are running r151004.
About every 2 or 3 weeks
they will hang with all services unresponsive and must be power cycled. I
do not
suspect hardware as it happens on both machines and hardware worked fine
with other OS's.

         Both systems use the mpt_sas driver. I noticed that there have been
many updates to mpt_sas since r151004. Not knowing the specifics of the
driver issue, is it possible that there is a bug that is causing system
hangs?

        Or could this be some kind of resource starvation?

       I disabled ata driver as it was logging some errors around "hang
time" and is not required after install, but now system hangs
without any logged errors.

        Ideas?

        Will upgrading to r151006 or r151008 fix this?

        thanx - steve


From steve at linuxsuite.org  Mon Apr 28 17:21:31 2014
From: steve at linuxsuite.org (steve at linuxsuite.org)
Date: Mon, 28 Apr 2014 13:21:31 -0400
Subject: [OmniOS-discuss] Hang on Dell R710 with r151004?
In-Reply-To: <8ed132750a7c6a7b91ba26904d99aea3.squirrel@emailmg.netfirms.com>
References: <8ed132750a7c6a7b91ba26904d99aea3.squirrel@emailmg.netfirms.com>
Message-ID: <f7045decc31ac96dfed0754eb5d825c3.squirrel@emailmg.netfirms.com>

>   Hi,
>
>            I have 2 Dell R710's as a ZFS storage that are running r151004.
> About every 2 or 3 weeks
> they will hang with all services unresponsive and must be power cycled. I
> do not

   Hmm... read something about cstates on dell r710's....


root at blahblah:~# kstat |grep current_cstate; kstat |grep
supported_max_cstates
        current_cstate                  3
        current_cstate                  0
        current_cstate                  3
        current_cstate                  3
        current_cstate                  3
        current_cstate                  3
        current_cstate                  3
        current_cstate                  3
        supported_max_cstates           2
        supported_max_cstates           2
        supported_max_cstates           2
        supported_max_cstates           2
        supported_max_cstates           2
        supported_max_cstates           2
        supported_max_cstates           2
        supported_max_cstates           2

         Is this an issue? Do cstates need to be disabled in BIOS??

        thanx - steve

> suspect hardware as it happens on both machines and hardware worked fine
> with other OS's.
>
>          Both systems use the mpt_sas driver. I noticed that there have
> been
> many updates to mpt_sas since r151004. Not knowing the specifics of the
> driver issue, is it possible that there is a bug that is causing system
> hangs?
>
>         Or could this be some kind of resource starvation?
>
>        I disabled ata driver as it was logging some errors around "hang
> time" and is not required after install, but now system hangs
> without any logged errors.
>
>         Ideas?
>
>         Will upgrading to r151006 or r151008 fix this?
>
>         thanx - steve
>
>
>
>
>


From danmcd at omniti.com  Mon Apr 28 17:40:01 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Mon, 28 Apr 2014 13:40:01 -0400
Subject: [OmniOS-discuss] Hang on Dell R710 with r151004?
In-Reply-To: <f7045decc31ac96dfed0754eb5d825c3.squirrel@emailmg.netfirms.com>
References: <8ed132750a7c6a7b91ba26904d99aea3.squirrel@emailmg.netfirms.com>
	<f7045decc31ac96dfed0754eb5d825c3.squirrel@emailmg.netfirms.com>
Message-ID: <326D1AB0-20FD-496C-B93A-5C70E1BFF5C7@omniti.com>


On Apr 28, 2014, at 1:21 PM, steve at linuxsuite.org wrote:

>>  Hi,
>> 
>>           I have 2 Dell R710's as a ZFS storage that are running r151004.
>> About every 2 or 3 weeks
>> they will hang with all services unresponsive and must be power cycled. I
>> do not
> 
>   Hmm... read something about cstates on dell r710's....

You should disable C-states.

Also, upgrading to r151006 or r151008 will get you mpt_sas improvements as well.

Dan


From mir at miras.org  Mon Apr 28 17:50:03 2014
From: mir at miras.org (Michael Rasmussen)
Date: Mon, 28 Apr 2014 19:50:03 +0200
Subject: [OmniOS-discuss] Hang on Dell R710 with r151004?
In-Reply-To: <326D1AB0-20FD-496C-B93A-5C70E1BFF5C7@omniti.com>
References: <8ed132750a7c6a7b91ba26904d99aea3.squirrel@emailmg.netfirms.com>
	<f7045decc31ac96dfed0754eb5d825c3.squirrel@emailmg.netfirms.com>
	<326D1AB0-20FD-496C-B93A-5C70E1BFF5C7@omniti.com>
Message-ID: <20140428195003.2ef0c535@sleipner.datanom.net>

On Mon, 28 Apr 2014 13:40:01 -0400
Dan McDonald <danmcd at omniti.com> wrote:

> 
> You should disable C-states.
> 
The C-states mentioned is the one added with the Haswell chipset?

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael <at> rasmussen <dot> cc
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xD3C9A00E
mir <at> datanom <dot> net
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE501F51C
mir <at> miras <dot> org
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xE3E80917
--------------------------------------------------------------
/usr/games/fortune -es says:
Causes moderate eye irritation.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20140428/987b49e9/attachment.bin>

From takashiary at gmail.com  Wed Apr 30 15:01:44 2014
From: takashiary at gmail.com (takashi ary)
Date: Thu, 1 May 2014 00:01:44 +0900
Subject: [OmniOS-discuss] zfs diff UTF-8 probrem
Message-ID: <CANXaD7gTbeVXTJ6R6i67vGNnc=o2qQqUbyssqm3MV1keYZcFuw@mail.gmail.com>

Hello,

When OmniOS fix illumos Bug #4448 ?
https://www.illumos.org/issues/4448


OmniOS r151008 behavior

root at omnios1:~# uname -v
omnios-6de5e81
root at omnios1:~#
root at omnios1:~# zfs diff -HF tank at test
M       /       /tank/
+       F       /tank/abcd\37777777703\37777777651fg
root at omnios1:~#


I tried to patch from zfsonlinux.
https://github.com/zfsonlinux/zfs/issues/1172

root at omnios1:~# ls -l /root/zfsdiff/lib
total 201
lrwxrwxrwx 1 root root     11 Apr 30 16:17 libzfs.so -> libzfs.so.1
-rwxr-xr-x 1 root bin  324932 Apr 28 20:29 libzfs.so.1
root at omnios1:~#
root at omnios1:~# LD_LIBRARY_PATH=/root/zfsdiff/lib zfs diff -HF tank at test
M       /       /tank/
+       F       /tank/abcd\303\251fg
root at omnios1:~#


I created a wrapper script.

root at omnios1:~# cat /root/zfsdiff/zfsdiff.sh
#!/bin/bash

LIBZFS_DIR=/root/zfsdiff/lib

LD_LIBRARY_PATH=$LIBZFS_DIR zfs diff $* | awk '{cmd = "printf \"a" $0
"\""; cmd | getline line; close(cmd); sub(/^a/,"",line); print line}'
root at omnios1:~#
root at omnios1:~# /root/zfsdiff/zfsdiff.sh -HF tank at test
M       /       /tank/
+       F       /tank/abcd?fg
root at omnios1:~#


Thanks

From danmcd at omniti.com  Wed Apr 30 15:12:28 2014
From: danmcd at omniti.com (Dan McDonald)
Date: Wed, 30 Apr 2014 11:12:28 -0400
Subject: [OmniOS-discuss] zfs diff UTF-8 probrem
In-Reply-To: <CANXaD7gTbeVXTJ6R6i67vGNnc=o2qQqUbyssqm3MV1keYZcFuw@mail.gmail.com>
References: <CANXaD7gTbeVXTJ6R6i67vGNnc=o2qQqUbyssqm3MV1keYZcFuw@mail.gmail.com>
Message-ID: <2F9CED63-403A-4FF9-A6C5-76884A2BF60B@omniti.com>


On Apr 30, 2014, at 11:01 AM, takashi ary <takashiary at gmail.com> wrote:

> Hello,
> 
> When OmniOS fix illumos Bug #4448 ?
> https://www.illumos.org/issues/4448
> 

Your best bet is to raise this issue on the Illumos ZFS list:  zfs at lists.illumos.org.

Dan