[OmniOS-discuss] How bad are these controller / io errors??

Narayan Desai narayan.desai at gmail.com
Fri Aug 16 01:25:08 UTC 2013


We're seeing something similar on the same gear (LSI/supermicro expanders,
lsi controllers, sata drives).

We've tried standard hardware debugging (cable reseat/replacement, etc) and
the problem in our case seems to follow the sas expander backplane.

We did a disk by disk migration into a different expander and they stopped.

How high are your error counts? (in our case, we were getting about
1500/day/device). Is your performance impacted? (it was in our case)
 -nld


On Tue, Aug 13, 2013 at 10:20 AM, <steve at linuxsuite.org> wrote:

>
>    Howdy!
>
>          This is a SuperMicro JBOD with SATA disks. I am aware of the
> issues of having
> SATA on SAS, but was wondering just how serious these kinds of errors
> are.. a scrub of the pool
> completes without noticable problems.. I did a lot of stress testing
> earlier and could
> not get a failure. Disabling NCQ on the controller was a neccessary.
> What is the practical risk to data??
>
>         See below info for iostat / syslog
>
>  thanx - steve
>
>            syslog info
>
> kern.warning<4>: Aug 13 10:39:10 dfs1 scsi: [ID 243001 kern.warning]
> WARNING: /pci at 0,0/pci8086,340d at 6/pci1000,3080 at 0 (mpt_sas0):
> kern.warning<4>: Aug 13 10:39:10 dfs1 #011mptsas_handle_event_sync:
> IOCStatus=0x8000, IOCLogInfo=0x31120303
> kern.warning<4>: Aug 13 10:39:10 dfs1 scsi: [ID 243001 kern.warning]
> WARNING: /pci at 0,0/pci8086,340d at 6/pci1000,3080 at 0 (mpt_sas0):
> kern.warning<4>: Aug 13 10:39:10 dfs1 #011mptsas_handle_event_sync:
> IOCStatus=0x8000, IOCLogInfo=0x31120436
> kern.warning<4>: Aug 13 10:39:10 dfs1 scsi: [ID 243001 kern.warning]
> WARNING: /pci at 0,0/pci8086,340d at 6/pci1000,3080 at 0 (mpt_sas0):
> kern.warning<4>: Aug 13 10:39:10 dfs1 #011mptsas_handle_event:
> IOCStatus=0x8000, IOCLogInfo=0x31120303
> kern.warning<4>: Aug 13 10:39:10 dfs1 scsi: [ID 243001 kern.warning]
> WARNING: /pci at 0,0/pci8086,340d at 6/pci1000,3080 at 0 (mpt_sas0):
>
> Blah Blah...
>
> kern.warning<4>: Aug 13 10:39:10 dfs1 #011mptsas_handle_event:
> IOCStatus=0x8000, IOCLogInfo=0x31120436
> kern.info<6>: Aug 13 10:39:11 dfs1 scsi: [ID 365881 kern.info]
> /pci at 0,0/pci8086,340d at 6/pci1000,3080 at 0 (mpt_sas0):
> kern.info<6>: Aug 13 10:39:11 dfs1 #011Log info 0x31120303 received for
> target 13.
> kern.info<6>: Aug 13 10:39:11 dfs1 #011scsi_status=0x0, ioc_status=0x804b,
> scsi_state=0xc
> kern.info<6>: Aug 13 10:39:11 dfs1 scsi: [ID 365881 kern.info]
> /pci at 0,0/pci8086,340d at 6/pci1000,3080 at 0 (mpt_sas0):
> kern.info<6>: Aug 13 10:39:11 dfs1 #011Log info 0x31120303 received for
> target 13.
> kern.info<6>: Aug 13 10:39:11 dfs1 #011scsi_status=0x0, ioc_status=0x804b,
> scsi_state=0xc
> kern.info<6>: Aug 13 10:39:11 dfs1 scsi: [ID 365881 kern.info]
> /pci at 0,0/pci8086,340d at 6/pci1000,3080 at 0 (mpt_sas0):
>
>           Output of iostat -En
>
>          Looks like "Hard Errors" and "No Device" correspond. What
> does "Transport Error" and "Recoverable" mean. I see no evidence
> of data corruption/loss, does ZFS deal/recover from these errors in a
> good/safe
> way?
>
>
> c5t5000C500489947A8d0 Soft Errors: 0 Hard Errors: 2 Transport Errors: 11
> Vendor: ATA      Product: ST3000DM001-9YN1 Revision: CC4H Serial No:
> W1F0AAMA
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 2 Recoverable: 0
> Illegal Request: 2 Predictive Failure Analysis: 0
>
> c5t5000C500525EB2B9d0 Soft Errors: 0 Hard Errors: 5 Transport Errors: 46
> Vendor: ATA      Product: ST3000DM001-9YN1 Revision: CC4H Serial No:
> W1F0QM5H
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 5 Recoverable: 0
> Illegal Request: 5 Predictive Failure Analysis: 0
>
> c5t5000C50045561CEAd0 Soft Errors: 0 Hard Errors: 1 Transport Errors: 7
> Vendor: ATA      Product: ST3000DM001-9YN1 Revision: CC4H Serial No:
> W1F09G4Q
> Size: 3000.59GB <3000592982016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
> Illegal Request: 1 Predictive Failure Analysis: 0
>
>
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20130815/203e6776/attachment-0001.html>


More information about the OmniOS-discuss mailing list