[OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

Wed Jan 25 19:27:05 UTC 2017

Hi Stephan,

> On Jan 25, 2017, at 5:54 AM, Stephan Budach <stephan.budach at JVM.DE> wrote:
> 
> Hi guys,
> 
> I have been trying to import a zpool, based on a 3way-mirror provided by three omniOS boxes via iSCSI. This zpool had been working flawlessly until some random reboot of the S11.1 host. Since then, S11.1 has been importing this zpool without success.
> 
> This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… yeah I know, we shouldn't have done that in the first place, but performance was not the primary goal for that, as this one is a backup/archive pool.
> 
> When issueing a zpool import, it says this:
> 
> root at solaris11atest2:~# zpool import
>   pool: vsmPool10
>     id: 12653649504720395171
>  state: DEGRADED
> status: The pool was last accessed by another system.
> action: The pool can be imported despite missing or damaged devices.  The
>         fault tolerance of the pool may be compromised if imported.
>    see: http://support.oracle.com/msg/ZFS-8000-EY <http://support.oracle.com/msg/ZFS-8000-EY>
> config:
> 
>         vsmPool10                                  DEGRADED
>           mirror-0                                 DEGRADED
>             c0t600144F07A3506580000569398F60001d0  DEGRADED  corrupted data
>             c0t600144F07A35066C00005693A0D90001d0  DEGRADED  corrupted data
>             c0t600144F07A35001A00005693A2810001d0  DEGRADED  corrupted data
> 
> device details:
> 
>         c0t600144F07A3506580000569398F60001d0    DEGRADED         scrub/resilver needed
>         status: ZFS detected errors on this device.
>                 The device is missing some data that is recoverable.
> 
>         c0t600144F07A35066C00005693A0D90001d0    DEGRADED         scrub/resilver needed
>         status: ZFS detected errors on this device.
>                 The device is missing some data that is recoverable.
> 
>         c0t600144F07A35001A00005693A2810001d0    DEGRADED         scrub/resilver needed
>         status: ZFS detected errors on this device.
>                 The device is missing some data that is recoverable.
> 
> However, when  actually running zpool import -f vsmPool10, the system starts to perform a lot of writes on the LUNs and iostat report an alarming increase in h/w errors:
> 
> root at solaris11atest2:~# iostat -xeM 5
>                          extended device statistics         ---- errors ---
> device    r/s    w/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w trn tot
> sd0       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0   0   0   0   0
> sd1       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0   0   0   0   0
> sd2       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0   0  71   0  71
> sd3       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0   0   0   0   0
> sd4       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0   0   0   0   0
> sd5       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0   0   0   0   0
>                          extended device statistics         ---- errors ---
> device    r/s    w/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w trn tot
> sd0      14.2  147.3    0.7    0.4  0.2  0.1    2.0   6   9   0   0   0   0
> sd1      14.2    8.4    0.4    0.0  0.0  0.0    0.3   0   0   0   0   0   0
> sd2       0.0    4.2    0.0    0.0  0.0  0.0    0.0   0   0   0  92   0  92
> sd3     157.3   46.2    2.1    0.2  0.0  0.7    3.7   0  14   0  30   0  30
> sd4     123.9   29.4    1.6    0.1  0.0  1.7   10.9   0  36   0  40   0  40
> sd5     142.5   43.0    2.0    0.1  0.0  1.9   10.2   0  45   0  88   0  88
>                          extended device statistics         ---- errors ---
> device    r/s    w/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w trn tot
> sd0       0.0  234.5    0.0    0.6  0.2  0.1    1.4   6  10   0   0   0   0
> sd1       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0   0   0   0   0
> sd2       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0   0  92   0  92
> sd3       3.6   64.0    0.0    0.5  0.0  4.3   63.2   0  63   0 235   0 235
> sd4       3.0   67.0    0.0    0.6  0.0  4.2   60.5   0  68   0 298   0 298
> sd5       4.2   59.6    0.0    0.4  0.0  5.2   81.0   0  72   0 406   0 406
>                          extended device statistics         ---- errors ---
> device    r/s    w/s   Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w trn tot
> sd0       0.0  234.8    0.0    0.7  0.4  0.1    2.2  11  10   0   0   0   0
> sd1       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0   0   0   0   0
> sd2       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0   0  92   0  92
> sd3       5.4   54.4    0.0    0.3  0.0  2.9   48.5   0  67   0 384   0 384
> sd4       6.0   53.4    0.0    0.3  0.0  4.6   77.7   0  87   0 519   0 519
> sd5       6.0   60.8    0.0    0.3  0.0  4.8   72.5   0  87   0 727   0 727

h/w errors are a classification of other errors. The full error list is available from "iostat -E" and will
be important to tracking this down.

A better, more detailed analysis can be gleaned from the "fmdump -e" ereports that should be 
associated with each h/w error. However, there are dozens of causes of these so we don’t have
enough info here to fully understand.
 — richard

> 
> 
> I have tried pulling data from the LUNs using dd to /dev/null and I didn't get any h/w error, this just started, when trying to actually import the zpool. As the h/w errors are constantly rising, I am wondering what could cause this and if there can something be done about this?
> 
> Cheers,
> Stephan
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170125/eb8a6d24/attachment.html>