<div dir="ltr"><div><div><div>I had the exact same failure mode last week. With over 1000 spindles I see this about once a month.<br><br>I can publish my dump also if anyone actually want's to try to fix this problem, but I think there are several of the same thing already linked to tickets in Illumos-gate.<br><br></div>Pools for the most part should be set to failmode=panic or wait, but a failed disk should not cause a panic. The system this happened to me on failmode was set to wait. It is also on r151012, waiting on a window to upgrade to r151014. My pool is raidz3, so no reason not to kick a bad disk.<br><br></div>All my disks are SAS in DataON JBODs, dual connected across two LSI HBAs. BTW, pull a SAS cable and you get a panic too, not degraded multipath. Illumos seems to panic on just about any SAS event these days regardless of redundancy.<br><br></div>-Chip<br><div><br><br><div><br><br><br><br><br><br><br><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 18, 2015 at 3:08 PM, Paul B. Henson <span dir="ltr"><<a href="mailto:henson@acm.org" target="_blank">henson@acm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Mon, May 18, 2015 at 06:25:34PM +0000, Jeff Stockett wrote:<br> > A drive failed in one of our supermicro 5048R-E1CR36L servers running<br> > omnios r151012 last night, and somewhat unexpectedly, the whole system<br> > seems to have panicked.<br> <br> </span>You don't happen to have failmode set to panic on the pool?<br> <br> >From the zpool manpage:<br> <br> failmode=wait | continue | panic<br> Controls the system behavior in the event of catastrophic pool<br> failure. This condition is typically a result of a loss of<br> connectivity to the underlying storage device(s) or a failure of<br> all devices within the pool. The behavior of such an event is<br> determined as follows:<br> <br> wait<br> Blocks all I/O access until the device connectivity is<br> recovered and the errors are cleared. This is the<br> default behavior.<br> <br> continue<br> Returns EIO to any new write I/O requests but allows<br> reads to any of the remaining healthy devices. Any<br> write requests that have yet to be committed to disk<br> would be blocked.<br> <br> panic<br> Prints out a message to the console and generates a<br> system crash dump.<br> <div class="HOEnZb"><div class="h5"><br> _______________________________________________<br> OmniOS-discuss mailing list<br> <a href="mailto:OmniOS-discuss@lists.omniti.com">OmniOS-discuss@lists.omniti.com</a><br> <a href="http://lists.omniti.com/mailman/listinfo/omnios-discuss" target="_blank">http://lists.omniti.com/mailman/listinfo/omnios-discuss</a><br> </div></div></blockquote></div><br></div>