<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Ooops… should have waited with sending
      that message after I rebootet the S11.1 host…<br>
      <br>
      <br>
      Am 25.01.17 um 23:41 schrieb Stephan Budach:<br>
    </div>
    <blockquote cite="mid:d0e0e202-0233-8c58-d7a5-2ba107c6a133@jvm.de"
      type="cite">
      <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
      <div class="moz-cite-prefix">Hi Richard,<br>
        <br>
        Am 25.01.17 um 20:27 schrieb Richard Elling:<br>
      </div>
      <blockquote
        cite="mid:36C501AD-8AF8-49F1-960C-E7A2C0F0B85F@richardelling.com"
        type="cite">
        <meta http-equiv="Content-Type" content="text/html;
          charset=utf-8">
        Hi Stephan,
        <div class=""><br class="">
          <div>
            <blockquote type="cite" class="">
              <div class="">On Jan 25, 2017, at 5:54 AM, Stephan Budach
                <<a moz-do-not-send="true"
                  href="mailto:stephan.budach@jvm.de" class="">stephan.budach@JVM.DE</a>>
                wrote:</div>
              <br class="Apple-interchange-newline">
              <div class="">
                <meta http-equiv="content-type" content="text/html;
                  charset=utf-8" class="">
                <div bgcolor="#FFFFFF" text="#000000" class=""> Hi guys,<br
                    class="">
                  <br class="">
                  I have been trying to import a zpool, based on a
                  3way-mirror provided by three omniOS boxes via iSCSI.
                  This zpool had been working flawlessly until some
                  random reboot of the S11.1 host. Since then, S11.1 has
                  been importing this zpool without success.<br class="">
                  <br class="">
                  This zpool consists of three 108TB LUNs, based on a
                  raidz-2 zvols… yeah I know, we shouldn't have done
                  that in the first place, but performance was not the
                  primary goal for that, as this one is a backup/archive
                  pool.<br class="">
                  <br class="">
                  When issueing a zpool import, it says this:<br
                    class="">
                  <br class="">
                  <tt class="">root@solaris11atest2:~# zpool import</tt><tt
                    class=""><br class="">
                  </tt><tt class="">  pool: vsmPool10</tt><tt class=""><br
                      class="">
                  </tt><tt class="">    id: 12653649504720395171</tt><tt
                    class=""><br class="">
                  </tt><tt class=""> state: DEGRADED</tt><tt class=""><br
                      class="">
                  </tt><tt class="">status: The pool was last accessed
                    by another system.</tt><tt class=""><br class="">
                  </tt><tt class="">action: The pool can be imported
                    despite missing or damaged devices.  The</tt><tt
                    class=""><br class="">
                  </tt><tt class="">        fault tolerance of the pool
                    may be compromised if imported.</tt><tt class=""><br
                      class="">
                  </tt><tt class="">   see: <a moz-do-not-send="true"
                      class="moz-txt-link-freetext"
                      href="http://support.oracle.com/msg/ZFS-8000-EY">http://support.oracle.com/msg/ZFS-8000-EY</a></tt><tt
                    class=""><br class="">
                  </tt><tt class="">config:</tt><tt class=""><br
                      class="">
                  </tt><tt class=""><br class="">
                  </tt><tt class="">       
                    vsmPool10                                  DEGRADED</tt><tt
                    class=""><br class="">
                  </tt><tt class="">         
                    mirror-0                                 DEGRADED</tt><tt
                    class=""><br class="">
                  </tt><tt class="">           
                    c0t600144F07A3506580000569398F60001d0  DEGRADED 
                    corrupted data</tt><tt class=""><br class="">
                  </tt><tt class="">           
                    c0t600144F07A35066C00005693A0D90001d0  DEGRADED 
                    corrupted data</tt><tt class=""><br class="">
                  </tt><tt class="">           
                    c0t600144F07A35001A00005693A2810001d0  DEGRADED 
                    corrupted data</tt><tt class=""><br class="">
                  </tt><tt class=""><br class="">
                  </tt><tt class="">device details:</tt><tt class=""><br
                      class="">
                  </tt><tt class=""><br class="">
                  </tt><tt class="">       
                    c0t600144F07A3506580000569398F60001d0   
                    DEGRADED         scrub/resilver needed</tt><tt
                    class=""><br class="">
                  </tt><tt class="">        status: ZFS detected errors
                    on this device.</tt><tt class=""><br class="">
                  </tt><tt class="">                The device is
                    missing some data that is recoverable.</tt><tt
                    class=""><br class="">
                  </tt><tt class=""><br class="">
                  </tt><tt class="">       
                    c0t600144F07A35066C00005693A0D90001d0   
                    DEGRADED         scrub/resilver needed</tt><tt
                    class=""><br class="">
                  </tt><tt class="">        status: ZFS detected errors
                    on this device.</tt><tt class=""><br class="">
                  </tt><tt class="">                The device is
                    missing some data that is recoverable.</tt><tt
                    class=""><br class="">
                  </tt><tt class=""><br class="">
                  </tt><tt class="">       
                    c0t600144F07A35001A00005693A2810001d0   
                    DEGRADED         scrub/resilver needed</tt><tt
                    class=""><br class="">
                  </tt><tt class="">        status: ZFS detected errors
                    on this device.</tt><tt class=""><br class="">
                  </tt><tt class="">                The device is
                    missing some data that is recoverable.</tt><tt
                    class=""><br class="">
                  </tt><br class="">
                  However, when  actually running zpool import -f
                  vsmPool10, the system starts to perform a lot of
                  writes on the LUNs and iostat report an alarming
                  increase in h/w errors:<br class="">
                  <br class="">
                  <tt class="">root@solaris11atest2:~# iostat -xeM 5</tt><tt
                    class=""><br class="">
                  </tt><tt class="">                         extended
                    device statistics         ---- errors ---</tt><tt
                    class=""><br class="">
                  </tt><tt class="">device    r/s    w/s   Mr/s   Mw/s
                    wait actv  svc_t  %w  %b s/w h/w trn tot</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd0       0.0    0.0    0.0    0.0 
                    0.0  0.0    0.0   0   0   0   0   0   0</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd1       0.0    0.0    0.0    0.0 
                    0.0  0.0    0.0   0   0   0   0   0   0</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd2       0.0    0.0    0.0    0.0 
                    0.0  0.0    0.0   0   0   0  71   0  71</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd3       0.0    0.0    0.0    0.0 
                    0.0  0.0    0.0   0   0   0   0   0   0</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd4       0.0    0.0    0.0    0.0 
                    0.0  0.0    0.0   0   0   0   0   0   0</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd5       0.0    0.0    0.0    0.0 
                    0.0  0.0    0.0   0   0   0   0   0   0</tt><tt
                    class=""><br class="">
                  </tt><tt class="">                         extended
                    device statistics         ---- errors ---</tt><tt
                    class=""><br class="">
                  </tt><tt class="">device    r/s    w/s   Mr/s   Mw/s
                    wait actv  svc_t  %w  %b s/w h/w trn tot</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd0      14.2  147.3    0.7    0.4 
                    0.2  0.1    2.0   6   9   0   0   0   0</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd1      14.2    8.4    0.4    0.0 
                    0.0  0.0    0.3   0   0   0   0   0   0</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd2       0.0    4.2    0.0    0.0 
                    0.0  0.0    0.0   0   0   0  92   0  92</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd3     157.3   46.2    2.1    0.2 
                    0.0  0.7    3.7   0  14   0  30   0  30</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd4     123.9   29.4    1.6    0.1 
                    0.0  1.7   10.9   0  36   0  40   0  40</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd5     142.5   43.0    2.0    0.1 
                    0.0  1.9   10.2   0  45   0  88   0  88</tt><tt
                    class=""><br class="">
                  </tt><tt class="">                         extended
                    device statistics         ---- errors ---</tt><tt
                    class=""><br class="">
                  </tt><tt class="">device    r/s    w/s   Mr/s   Mw/s
                    wait actv  svc_t  %w  %b s/w h/w trn tot</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd0       0.0  234.5    0.0    0.6 
                    0.2  0.1    1.4   6  10   0   0   0   0</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd1       0.0    0.0    0.0    0.0 
                    0.0  0.0    0.0   0   0   0   0   0   0</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd2       0.0    0.0    0.0    0.0 
                    0.0  0.0    0.0   0   0   0  92   0  92</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd3       3.6   64.0    0.0    0.5 
                    0.0  4.3   63.2   0  63   0 235   0 235</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd4       3.0   67.0    0.0    0.6 
                    0.0  4.2   60.5   0  68   0 298   0 298</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd5       4.2   59.6    0.0    0.4 
                    0.0  5.2   81.0   0  72   0 406   0 406</tt><tt
                    class=""><br class="">
                  </tt><tt class="">                         extended
                    device statistics         ---- errors ---</tt><tt
                    class=""><br class="">
                  </tt><tt class="">device    r/s    w/s   Mr/s   Mw/s
                    wait actv  svc_t  %w  %b s/w h/w trn tot</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd0       0.0  234.8    0.0    0.7 
                    0.4  0.1    2.2  11  10   0   0   0   0</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd1       0.0    0.0    0.0    0.0 
                    0.0  0.0    0.0   0   0   0   0   0   0</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd2       0.0    0.0    0.0    0.0 
                    0.0  0.0    0.0   0   0   0  92   0  92</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd3       5.4   54.4    0.0    0.3 
                    0.0  2.9   48.5   0  67   0 384   0 384</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd4       6.0   53.4    0.0    0.3 
                    0.0  4.6   77.7   0  87   0 519   0 519</tt><tt
                    class=""><br class="">
                  </tt><tt class="">sd5       6.0   60.8    0.0    0.3 
                    0.0  4.8   72.5   0  87   0 727   0 727</tt><tt
                    class=""><br class="">
                  </tt></div>
              </div>
            </blockquote>
            <div><br class="">
            </div>
            <div>h/w errors are a classification of other errors. The
              full error list is available from "iostat -E" and will</div>
            <div>be important to tracking this down.</div>
            <div><br class="">
            </div>
            <div>A better, more detailed analysis can be gleaned from
              the "fmdump -e" ereports that should be </div>
            <div>associated with each h/w error. However, there are
              dozens of causes of these so we don’t have</div>
            <div>enough info here to fully understand.</div>
            <div> — richard</div>
            <br>
          </div>
        </div>
      </blockquote>
      Well… I can't provide you with the output of fmdump -e (since  I
      am currently unable to get the '-' typed in to the console, due to
      some fancy keyboard layout issues and nit being able to login via
      ssh as well (can authenticate, but I don't get to the shell, which
      may be due to the running zpool import), but I can confirm that
      fmdump does show nothing at all. I could just reset the S11.1
      host, after removing the zpool.cache file, such as that the system
      will not try to import the zpool upon restart right away…<br>
      <br>
      …plus I might get the option to set the keyboard right, after
      reboot, but that's another issue…<br>
      <br>
    </blockquote>
    After resetting the S11.1 host and getting the keyboard layout
    right, I issued a fmdump -e and there they are… lots of:<br>
     <br>
    <tt>Jan 25 23:25:13.5643 ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.8944
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.8945
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.8946
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9274
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9275
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9276
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9277
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9282
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9284
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9285
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9286
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9287
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9288 ereport.fs.zfs.dev.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9290
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9294
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9301
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:25:13.9306
      ereport.io.scsi.cmd.disk.dev.rqs.merr.write</tt><tt><br>
    </tt><tt>Jan 25 23:50:44.7195 ereport.io.scsi.cmd.disk.dev.rqs.derr</tt><tt><br>
    </tt><tt>Jan 25 23:50:44.7306 ereport.io.scsi.cmd.disk.dev.rqs.derr</tt><tt><br>
    </tt><tt>Jan 25 23:50:44.7434 ereport.io.scsi.cmd.disk.dev.rqs.derr</tt><tt><br>
    </tt><tt>Jan 25 23:53:31.4386 ereport.io.scsi.cmd.disk.dev.rqs.derr</tt><tt><br>
    </tt><tt>Jan 25 23:53:31.4579 ereport.io.scsi.cmd.disk.dev.rqs.derr</tt><tt><br>
    </tt><tt>Jan 25 23:53:31.4710 ereport.io.scsi.cmd.disk.dev.rqs.derr</tt><tt><br>
    </tt><br>
    <br>
    These seem to be media errors and disk errors on the zpools/zvols
    that make up the LUNs for this zpool… I am wondering, why this
    happens.<br>
    <br>
    Stephan<br>
  </body>
</html>