<div dir="ltr"><div><div><div>I had the exact same failure mode last week. With over 1000 spindles I see this about once a month.<br><br>I can publish my dump also if anyone actually want's to try to fix this problem, but I think there are several of the same thing already linked to tickets in Illumos-gate.<br><br></div>Pools for the most part should be set to failmode=panic or wait, but a failed disk should not cause a panic. The system this happened to me on failmode was set to wait. It is also on r151012, waiting on a window to upgrade to r151014. My pool is raidz3, so no reason not to kick a bad disk.<br><br></div>All my disks are SAS in DataON JBODs, dual connected across two LSI HBAs. BTW, pull a SAS cable and you get a panic too, not degraded multipath. Illumos seems to panic on just about any SAS event these days regardless of redundancy.<br><br></div>-Chip<br><div><br><br><div><br><br><br><br><br><br><br><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 18, 2015 at 3:08 PM, Paul B. Henson <span dir="ltr"><<a href="mailto:henson@acm.org" target="_blank">henson@acm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Mon, May 18, 2015 at 06:25:34PM +0000, Jeff Stockett wrote:<br>
> A drive failed in one of our supermicro 5048R-E1CR36L servers running<br>
> omnios r151012 last night, and somewhat unexpectedly, the whole system<br>
> seems to have panicked.<br>
<br>
</span>You don't happen to have failmode set to panic on the pool?<br>
<br>
>From the zpool manpage:<br>
<br>
failmode=wait | continue | panic<br>
Controls the system behavior in the event of catastrophic pool<br>
failure. This condition is typically a result of a loss of<br>
connectivity to the underlying storage device(s) or a failure of<br>
all devices within the pool. The behavior of such an event is<br>
determined as follows:<br>
<br>
wait<br>
Blocks all I/O access until the device connectivity is<br>
recovered and the errors are cleared. This is the<br>
default behavior.<br>
<br>
continue<br>
Returns EIO to any new write I/O requests but allows<br>
reads to any of the remaining healthy devices. Any<br>
write requests that have yet to be committed to disk<br>
would be blocked.<br>
<br>
panic<br>
Prints out a message to the console and generates a<br>
system crash dump.<br>
<div class="HOEnZb"><div class="h5"><br>
_______________________________________________<br>
OmniOS-discuss mailing list<br>
<a href="mailto:OmniOS-discuss@lists.omniti.com">OmniOS-discuss@lists.omniti.com</a><br>
<a href="http://lists.omniti.com/mailman/listinfo/omnios-discuss" target="_blank">http://lists.omniti.com/mailman/listinfo/omnios-discuss</a><br>
</div></div></blockquote></div><br></div>