[OmniOS-discuss] issue importing zpool on S11.1 from omniOS LUNs

Thu Jan 26 08:20:20 UTC 2017

Hi Richard,

gotcha… read on, below…

Am 26.01.17 um 00:43 schrieb Richard Elling:
> more below…
>
>> On Jan 25, 2017, at 3:01 PM, Stephan Budach <stephan.budach at JVM.DE 
>> <mailto:stephan.budach at jvm.de>> wrote:
>>
>> Ooops… should have waited with sending that message after I rebootet 
>> the S11.1 host…
>>
>>
>> Am 25.01.17 um 23:41 schrieb Stephan Budach:
>>> Hi Richard,
>>>
>>> Am 25.01.17 um 20:27 schrieb Richard Elling:
>>>> Hi Stephan,
>>>>
>>>>> On Jan 25, 2017, at 5:54 AM, Stephan Budach <stephan.budach at JVM.DE 
>>>>> <mailto:stephan.budach at jvm.de>> wrote:
>>>>>
>>>>> Hi guys,
>>>>>
>>>>> I have been trying to import a zpool, based on a 3way-mirror 
>>>>> provided by three omniOS boxes via iSCSI. This zpool had been 
>>>>> working flawlessly until some random reboot of the S11.1 host. 
>>>>> Since then, S11.1 has been importing this zpool without success.
>>>>>
>>>>> This zpool consists of three 108TB LUNs, based on a raidz-2 zvols… 
>>>>> yeah I know, we shouldn't have done that in the first place, but 
>>>>> performance was not the primary goal for that, as this one is a 
>>>>> backup/archive pool.
>>>>>
>>>>> When issueing a zpool import, it says this:
>>>>>
>>>>> root at solaris11atest2:~# zpool import
>>>>>   pool: vsmPool10
>>>>>     id: 12653649504720395171
>>>>>  state: DEGRADED
>>>>> status: The pool was last accessed by another system.
>>>>> action: The pool can be imported despite missing or damaged 
>>>>> devices.  The
>>>>>         fault tolerance of the pool may be compromised if imported.
>>>>>    see: http://support.oracle.com/msg/ZFS-8000-EY
>>>>> config:
>>>>>
>>>>> vsmPool10 DEGRADED
>>>>> mirror-0 DEGRADED
>>>>> c0t600144F07A3506580000569398F60001d0 DEGRADED  corrupted data
>>>>> c0t600144F07A35066C00005693A0D90001d0 DEGRADED  corrupted data
>>>>> c0t600144F07A35001A00005693A2810001d0 DEGRADED  corrupted data
>>>>>
>>>>> device details:
>>>>>
>>>>> c0t600144F07A3506580000569398F60001d0 DEGRADED         
>>>>> scrub/resilver needed
>>>>>         status: ZFS detected errors on this device.
>>>>>                 The device is missing some data that is recoverable.
>>>>>
>>>>> c0t600144F07A35066C00005693A0D90001d0 DEGRADED         
>>>>> scrub/resilver needed
>>>>>         status: ZFS detected errors on this device.
>>>>>                 The device is missing some data that is recoverable.
>>>>>
>>>>> c0t600144F07A35001A00005693A2810001d0 DEGRADED         
>>>>> scrub/resilver needed
>>>>>         status: ZFS detected errors on this device.
>>>>>                 The device is missing some data that is recoverable.
>>>>>
>>>>> However, when  actually running zpool import -f vsmPool10, the 
>>>>> system starts to perform a lot of writes on the LUNs and iostat 
>>>>> report an alarming increase in h/w errors:
>>>>>
>>>>> root at solaris11atest2:~# iostat -xeM 5
>>>>> extended device statistics         ---- errors ---
>>>>> device    r/s    w/s Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
>>>>> trn tot
>>>>> sd0       0.0    0.0 0.0    0.0  0.0  0.0    0.0   0   0 0   0   0   0
>>>>> sd1       0.0    0.0 0.0    0.0  0.0  0.0    0.0   0   0 0   0   0   0
>>>>> sd2       0.0    0.0 0.0    0.0  0.0  0.0    0.0   0   0   0 71   
>>>>> 0  71
>>>>> sd3       0.0    0.0 0.0    0.0  0.0  0.0    0.0   0   0 0   0   0   0
>>>>> sd4       0.0    0.0 0.0    0.0  0.0  0.0    0.0   0   0 0   0   0   0
>>>>> sd5       0.0    0.0 0.0    0.0  0.0  0.0    0.0   0   0 0   0   0   0
>>>>> extended device statistics         ---- errors ---
>>>>> device    r/s    w/s Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
>>>>> trn tot
>>>>> sd0      14.2  147.3 0.7    0.4  0.2  0.1    2.0   6   9 0   0   0   0
>>>>> sd1      14.2    8.4 0.4    0.0  0.0  0.0    0.3   0   0 0   0   0   0
>>>>> sd2       0.0    4.2 0.0    0.0  0.0  0.0    0.0   0   0   0 92   
>>>>> 0  92
>>>>> sd3     157.3   46.2 2.1    0.2  0.0  0.7    3.7   0  14   0 30   
>>>>> 0  30
>>>>> sd4     123.9   29.4 1.6    0.1  0.0  1.7   10.9   0  36   0 40   
>>>>> 0  40
>>>>> sd5     142.5   43.0 2.0    0.1  0.0  1.9   10.2   0  45   0 88   
>>>>> 0  88
>>>>> extended device statistics         ---- errors ---
>>>>> device    r/s    w/s Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
>>>>> trn tot
>>>>> sd0       0.0  234.5 0.0    0.6  0.2  0.1    1.4   6  10 0   0   0   0
>>>>> sd1       0.0    0.0 0.0    0.0  0.0  0.0    0.0   0   0 0   0   0   0
>>>>> sd2       0.0    0.0 0.0    0.0  0.0  0.0    0.0   0   0   0 92   
>>>>> 0  92
>>>>> sd3       3.6   64.0 0.0    0.5  0.0  4.3   63.2   0  63   0 235   
>>>>> 0 235
>>>>> sd4       3.0   67.0 0.0    0.6  0.0  4.2   60.5   0  68   0 298   
>>>>> 0 298
>>>>> sd5       4.2   59.6 0.0    0.4  0.0  5.2   81.0   0  72   0 406   
>>>>> 0 406
>>>>> extended device statistics         ---- errors ---
>>>>> device    r/s    w/s Mr/s   Mw/s wait actv  svc_t  %w  %b s/w h/w 
>>>>> trn tot
>>>>> sd0       0.0  234.8 0.0    0.7  0.4  0.1    2.2  11  10 0   0   0   0
>>>>> sd1       0.0    0.0 0.0    0.0  0.0  0.0    0.0   0   0 0   0   0   0
>>>>> sd2       0.0    0.0 0.0    0.0  0.0  0.0    0.0   0   0   0 92   
>>>>> 0  92
>>>>> sd3       5.4   54.4 0.0    0.3  0.0  2.9   48.5   0  67   0 384   
>>>>> 0 384
>>>>> sd4       6.0   53.4 0.0    0.3  0.0  4.6   77.7   0  87   0 519   
>>>>> 0 519
>>>>> sd5       6.0   60.8 0.0    0.3  0.0  4.8   72.5   0  87   0 727   
>>>>> 0 727
>>>>
>>>> h/w errors are a classification of other errors. The full error 
>>>> list is available from "iostat -E" and will
>>>> be important to tracking this down.
>>>>
>>>> A better, more detailed analysis can be gleaned from the "fmdump 
>>>> -e" ereports that should be
>>>> associated with each h/w error. However, there are dozens of causes 
>>>> of these so we don’t have
>>>> enough info here to fully understand.
>>>>  — richard
>>>>
>>> Well… I can't provide you with the output of fmdump -e (since  I am 
>>> currently unable to get the '-' typed in to the console, due to some 
>>> fancy keyboard layout issues and nit being able to login via ssh as 
>>> well (can authenticate, but I don't get to the shell, which may be 
>>> due to the running zpool import), but I can confirm that fmdump does 
>>> show nothing at all. I could just reset the S11.1 host, after 
>>> removing the zpool.cache file, such as that the system will not try 
>>> to import the zpool upon restart right away…
>>>
>>> …plus I might get the option to set the keyboard right, after 
>>> reboot, but that's another issue…
>>>
>> After resetting the S11.1 host and getting the keyboard layout right, 
>> I issued a fmdump -e and there they are… lots of:
>>
>> Jan 25 23:25:13.5643 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.8944 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.8945 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.8946 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9274 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9275 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9276 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9277 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9282 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9284 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9285 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9286 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9287 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9288 ereport.fs.zfs.dev.merr.write
>> Jan 25 23:25:13.9290 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9294 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9301 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:25:13.9306 ereport.io.scsi.cmd.disk.dev.rqs.merr.write
>> Jan 25 23:50:44.7195 ereport.io.scsi.cmd.disk.dev.rqs.derr
>> Jan 25 23:50:44.7306 ereport.io.scsi.cmd.disk.dev.rqs.derr
>> Jan 25 23:50:44.7434 ereport.io.scsi.cmd.disk.dev.rqs.derr
>> Jan 25 23:53:31.4386 ereport.io.scsi.cmd.disk.dev.rqs.derr
>> Jan 25 23:53:31.4579 ereport.io.scsi.cmd.disk.dev.rqs.derr
>> Jan 25 23:53:31.4710 ereport.io.scsi.cmd.disk.dev.rqs.derr
>>
>>
>> These seem to be media errors and disk errors on the zpools/zvols 
>> that make up the LUNs for this zpool… I am wondering, why this happens.
>
> yes, good question
> That we get media errors "merr" on write is one clue. To find out more 
> details, "fmdump -eV"
> will show in gory details the exact SCSI asc/ascq codes, LBAs, etc.
>
> ZFS is COW, so if the LUs are backed by ZFS and there isn’t enough 
> free space, then this is
> the sort of error we expect. But there could be other reasons.
>  — richard
>
Oh Lord… I really think, that this is it… this is what the zpool/zvol 
looks like on one of the three targets:

root at tr1206900:/root# zpool list
NAME            SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP DEDUP  
HEALTH  ALTROOT
rpool          29,8G  21,5G  8,27G         -    45%    72% 1.00x  ONLINE  -
tr1206900data   109T   106T  3,41T         -    51%    96% 1.00x  ONLINE  -

root at tr1206900:/root# zfs list -r tr1206900data
NAME                      USED  AVAIL  REFER  MOUNTPOINT
tr1206900data            86,6T      0   236K /tr1206900data
tr1206900data/vsmPool10  86,6T      0  86,6T  -

root at tr1206900:/root# zfs get all tr1206900data/vsmPool10
NAME                     PROPERTY VALUE                  SOURCE
tr1206900data/vsmPool10  type volume                 -
tr1206900data/vsmPool10  creation              Mo. Jan 11 12:57 2016  -
tr1206900data/vsmPool10  used 86,6T                  -
tr1206900data/vsmPool10  available 0                      -
tr1206900data/vsmPool10  referenced 86,6T                  -
tr1206900data/vsmPool10  compressratio 1.00x                  -
tr1206900data/vsmPool10  reservation none                   default
tr1206900data/vsmPool10  volsize 109T                   local
tr1206900data/vsmPool10  volblocksize 128K                   -
tr1206900data/vsmPool10  checksum on                     default
tr1206900data/vsmPool10  compression off                    default
tr1206900data/vsmPool10  readonly off                    default
tr1206900data/vsmPool10  copies 1                      default
tr1206900data/vsmPool10  refreservation none                   default
tr1206900data/vsmPool10  primarycache all                    local
tr1206900data/vsmPool10  secondarycache all                    default
tr1206900data/vsmPool10  usedbysnapshots 0                      -
tr1206900data/vsmPool10  usedbydataset 86,6T                  -
tr1206900data/vsmPool10  usedbychildren 0                      -
tr1206900data/vsmPool10  usedbyrefreservation 0                      -
tr1206900data/vsmPool10  logbias latency                default
tr1206900data/vsmPool10  dedup off                    default
tr1206900data/vsmPool10  mlslabel none                   default
tr1206900data/vsmPool10  sync standard               default
tr1206900data/vsmPool10  refcompressratio 1.00x                  -
tr1206900data/vsmPool10  written 86,6T                  -
tr1206900data/vsmPool10  logicalused 86,5T                  -
tr1206900data/vsmPool10  logicalreferenced 86,5T                  -
tr1206900data/vsmPool10  snapshot_limit none                   default
tr1206900data/vsmPool10  snapshot_count none                   default
tr1206900data/vsmPool10  redundant_metadata all                    default

This must be the dumbest failure one can possibly have, when setting up 
a zvol iSCSI target. So, someone - no, it wasn't me, actually, but this 
doesn't do me any good anyway, created a zvol equal the size to the 
zpool and now it is as you suspected: the zvol has run out of space.

So, the only chance would be to add additional space to these zpools, 
such as that the zvol actually can occupy the space the claim to have? 
Should be manageable… I could provide some iSCSI LUNs to the targets 
themselves and add another vdev. There will be some serious cleanup 
needed afterwards…

What about the "free" 3,41T in zpool itself? Could those be somehow 
utilised?

Thanks,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170126/c722cbd4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5546 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20170126/c722cbd4/attachment-0001.bin>