<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta name="Generator" content="Microsoft Word 14 (filtered medium)"> <style></style> </head> <body lang="EN-US" link="blue" vlink="purple"> <div class="WordSection1"> Chip was responding to an issue that we have been having (since 9/18) with our online systems becoming unresponsive to NFS and CIFS. During these hangs we also see that doing an “ls” works, but an “ls –l” hangs. NFS cannot be restarted, and we are forced to reboot. I was hopeful that the NFS settings would work, but on Tuesday of this week, our offline replication target experienced the same issue. We were about to perform a scrub, when we noticed in our Napp-it interface that “ZFS Filesystems” was not displaying. After SSHing to the system, we saw that “ls –l” did not respond. This was a pure replication target with no NFS access, so I don’t see how the NFS lock tuning could be the solution. A “reboot –d” was performed, and we have a 3.9GB dump. If you have a preferred location to receive such dumps, I would be more than happy to share.<o:p></o:p> <o:p> </o:p> I should note that we just started using OmniOS this summer, so we have always been at the r151014 version. These were newly created units that have performed perfectly for 2.5 months, and now we are having hangs every 1-2 weeks. Here is a timeline that I shared with Guenther Alka:<o:p></o:p> (I know it’s not best practice, but we had to use “export” as the name of our data pool)<o:p></o:p> <o:p> </o:p> 9/14 – notified of the L2Arc issue – removed the L2Arc device from the export pool and stopped nightly replication jobs. Ran a scrub on the replicated NAS appliance, and found no issues<o:p></o:p> <o:p> </o:p> 9/15 – ran “zdb –bbccsv export” on the replicated unit and it came up clean<o:p></o:p> <o:p> </o:p> 9/16 – updated OmniOS and Napp-it on the replicated unit; re-added the L2Arc device; re-enabled the replication jobs<o:p></o:p> <o:p> </o:p> 9/18 – in the morning, we were notified that the main NAS unit had stopped responding to CIFS and “ls –l” was not working – NFS continued to function. So this was prior to any rebuild of the system disk, upgrade of OmniOS, or re-enabling the L2Arc device. That night the system disk was rebuilt from scratch from the original CD, with a pkg update performed prior to importing the data pool.<o:p></o:p> <o:p> </o:p> 9/19-9/21 – everything appeared to be working well. The replication jobs that run at 1, 2, and 3am this morning completed just fine. Around 5:40am this morning is when we were notified that NFS had stopped serving up files. After logging in, we found that the “ls –l” issue had returned, and CIFS was non-functional as well. Also, the “Pools” tab in Napp-it worked, but the “ZFS Filesystems” tab did not. I find it interesting that an “ls –l” failed while I was in /root – the rpool is brand new, with updated OS, and no L2Arc device.<o:p></o:p> <o:p> </o:p> I have performed scrubs and “zdb –bbccvs” on both units shortly after this, and no errors were found. For the system that hung on Tuesday, a new scrub was clean, but the zdb showed leaks, and over 33k checksum errors (full disclosure – several replication jobs were ran during the zdb). A subsequent scrub and “zpool status” showed no problem. I performed another round of replications and all completed without errors. I have now exported that pool, and am performing another zdb, which is still running but showing clean so far.<o:p></o:p> <o:p> </o:p> Just in case I have done something insanely stupid with regard to the system configuration, here is the config (both units are identical except for the OS – if the config seems strange, these were conversions of two old CORAID chassis):<o:p></o:p> <o:p> </o:p> OS: omnios-f090f73 (the online unit is at omnios-cffff65)<o:p></o:p> Mobo: Supermicro H8DG6-H8DGi<o:p></o:p> RAM: 128GB ECC (Hynix, which is really Samsung I think)<o:p></o:p> Controllers: 2x LSI 9201-16i with SAS-SATA break-out cables going to a SAS846TQ backplane<o:p></o:p> System disk: mirrored Innodisk 128GB SATADOMs (SATADOM-MV 3ME) off mobo<o:p></o:p> ZIL: mirrored Samsung SSD 845D (400GB) off mobo<o:p></o:p> L2ARC: SAMSUNG MZHPU512 (512GB PCIe)<o:p></o:p> NICs: Intel 10G (X520)<o:p></o:p> Data disks: 24x WD 1TB Enterprise SATA (WDC WD1002FBYS)<o:p></o:p> <o:p> </o:p> Any help/insight is greatly appreciated.<o:p></o:p> <o:p> </o:p> Thanks,<o:p></o:p> Bill<o:p></o:p> <o:p> </o:p> <o:p> </o:p> From: OmniOS-discuss [mailto:omnios-discuss-bounces@lists.omniti.com] On Behalf Of Yavor Tomov Sent: Thursday, October 22, 2015 12:37 PM To: jimklimov@cos.ru Cc: OmniOS-discuss Subject: Re: [OmniOS-discuss] OmniOS backup box hanging regularly<o:p></o:p> <o:p> </o:p> <div> Hi Tovarishch Jim,<o:p></o:p> <div> <o:p> </o:p> </div> <div> I had similar issue with my box and it was related to the NFS locks. I assume you are using it due to the Linux backups. The solution was posted by Chip on the mailing list. Copy of his solution below:<o:p></o:p> </div> <div> <o:p> </o:p> </div> <div> <div> "I've seen issues like this when you run out of NFS locks. NFSv3 in Illumos is really slow at releasing locks. <o:p></o:p> </div> <div> <o:p> </o:p> </div> <div> On all my NFS servers I do:<o:p></o:p> </div> <div> <o:p> </o:p> </div> <div> sharectl set -p lockd_listen_backlog=256 nfs<o:p></o:p> </div> <div> sharectl set -p lockd_servers=2048 nfs<o:p></o:p> </div> <div> <o:p> </o:p> </div> <div> Everywhere I can, I use NFSv4 instead of v3. It handles lock much better."<o:p></o:p> </div> <div> <o:p> </o:p> </div> <div> All the Best<o:p></o:p> </div> <div> Yavor <o:p></o:p> </div> </div> <div> <o:p> </o:p> </div> <div> <o:p> </o:p> </div> </div> <div> <o:p> </o:p> <div> On Thu, Oct 22, 2015 at 11:59 AM, Jim Klimov <<a href="mailto:jim@cos.ru" target="_blank">jim@cos.ru</a>> wrote:<o:p></o:p> Hello all,<o:p></o:p> I have this HP-Z400 workstation with 16Gb ECC(should be) RAM running OmniOS bloody, which acts as a backup server for our production systems (regularly rsync'ing large files off Linux boxes, and rotating ZFS auto-snapshots to keep its space free). Sometimes it also runs replicas of infrastructure (DHCP, DNS) and was set up as a VirtualBox + phpVirtualBox host to test that out, but no VMs running.<o:p></o:p> So the essential loads are ZFS snapshots and ZFS scrubs :)<o:p></o:p> And it freezes roughly every week. Stops responding to ping, attempts to log in via SSH or physical console - it processes keypresses on the latter, but does not present a login prompt. It used to be stable, and such regular hangs began around summertime.<o:p></o:p> <o:p></o:p> My primary guess would be for flaky disks, maybe timing out under load or going to sleep or whatever... But I have yet to prove it, or any other theory. Maybe just CPU is overheating due to regular near-100% load with disk I/O... At least I want to rule out OS errors and rule out (or point out) operator/box errors as much as possible - which is something I can change to try and fix ;)<o:p></o:p> Before I proceed to TL;DR screenshots, I'd overview what I see:<o:p></o:p> * In the "top" output, processes owned by zfssnap lead most of the time... But even the SSH shell is noticeably slow to respond (1 sec per line when just pressing enter to clear the screen to prepare nice screenshots).<o:p></o:p> * SMART was not enabled on 3TB mirrored "pool" SATA disks (is now, long tests initiated), but was in place on the "rpool" SAS disk where it logged some corrected ECC errors - but none uncorrected.<o:p></o:p> Maybe the cabling should be reseated.<o:p></o:p> * iostat shows disks are generally not busy (they don't audibly rattle nor visibly blink all the time, either)<o:p></o:p> * zpool scrubs return clean<o:p></o:p> * there are partitions of the system rpool disk (10K RPM SAS) used as log and cache devices for the main data pool on 3TB SATA disks. The system disk is fast and underutilized, so what the heck ;) And it was not a problem for the first year of this system's honest and stable workouts. These devices are pretty empty at the moment.<o:p></o:p> <o:p></o:p> I have enabled deadman panics according to Wiki, but none have happened so far:<o:p></o:p> # cat /etc/system | egrep -v '(^\*|^$)' set snooping=1 set pcplusmp:apic_panic_on_nmi=1 set apix:apic_panic_on_nmi = 1<o:p></o:p> <o:p></o:p> <o:p></o:p> In the "top" output, processes owned by zfssnap lead most of the time:<o:p></o:p> <o:p></o:p> last pid: 22599; load avg: 12.9, 12.2, 11.2; up 0+09:52:11 18:34:41 140 processes: 125 sleeping, 13 running, 2 on cpu CPU states: 0.0% idle, 22.9% user, 77.1% kernel, 0.0% iowait, 0.0% swap Memory: 16G phys mem, 1765M free mem, 2048M total swap, 2048M free swap Seconds to delay: PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 21389 zfssnap 1 43 2 863M 860M run 5:04 35.61% zfs 22360 zfssnap 1 52 2 118M 115M run 0:37 16.50% zfs 21778 zfssnap 1 52 2 563M 560M run 3:15 13.17% zfs 21278 zfssnap 1 52 2 947M 944M run 5:32 6.91% zfs 21881 zfssnap 1 43 2 433M 431M run 2:31 5.41% zfs 21852 zfssnap 1 52 2 459M 456M run 2:39 5.16% zfs 21266 zfssnap 1 43 2 906M 903M run 5:18 3.95% zfs 21757 zfssnap 1 43 2 597M 594M run 3:26 2.91% zfs 21274 zfssnap 1 52 2 930M 927M cpu/0 5:27 2.78% zfs 22588 zfssnap 1 43 2 30M 27M run 0:08 2.48% zfs 22580 zfssnap 1 52 2 49M 46M run 0:14 0.71% zfs 22038 root 1 59 0 5312K 3816K cpu/1 0:01 0.10% top 22014 root 1 59 0 8020K 4988K sleep 0:00 0.02% sshd<o:p></o:p> <o:p></o:p> Average "iostats" are not that busy:<o:p></o:p> <o:p></o:p> # zpool iostat -Td 5 Thu Oct 22 18:24:59 CEST 2015 capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- pool 2.52T 207G 802 116 28.3M 840K rpool 33.0G 118G 0 4 4.52K 58.7K ---------- ----- ----- ----- ----- ----- -----<o:p></o:p> Thu Oct 22 18:25:04 CEST 2015 pool 2.52T 207G 0 0 0 0 rpool 33.0G 118G 0 10 0 97.9K ---------- ----- ----- ----- ----- ----- ----- Thu Oct 22 18:25:09 CEST 2015 pool 2.52T 207G 0 0 0 0 rpool 33.0G 118G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- Thu Oct 22 18:25:14 CEST 2015 pool 2.52T 207G 0 0 0 0 rpool 33.0G 118G 0 9 0 93.5K ---------- ----- ----- ----- ----- ----- ----- Thu Oct 22 18:25:19 CEST 2015 pool 2.52T 207G 0 0 0 0 rpool 33.0G 118G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- Thu Oct 22 18:25:24 CEST 2015 pool 2.52T 207G 0 0 0 0 rpool 33.0G 118G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- Thu Oct 22 18:25:29 CEST 2015 pool 2.52T 207G 0 0 0 0 rpool 33.0G 118G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- Thu Oct 22 18:25:34 CEST 2015 pool 2.52T 207G 0 0 0 0 rpool 33.0G 118G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- Thu Oct 22 18:25:39 CEST 2015 pool 2.52T 207G 0 0 0 0 rpool 33.0G 118G 0 16 0 374K ---------- ----- ----- ----- ----- ----- ----- ...<o:p></o:p> Thu Oct 22 18:33:49 CEST 2015 pool 2.52T 207G 0 0 0 0 rpool 33.0G 118G 0 11 0 94.5K ---------- ----- ----- ----- ----- ----- ----- Thu Oct 22 18:33:54 CEST 2015 pool 2.52T 207G 0 13 819 80.0K rpool 33.0G 118G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- Thu Oct 22 18:33:59 CEST 2015 pool 2.52T 207G 0 129 0 1.06M rpool 33.0G 118G 0 0 0 0 ---------- ----- ----- ----- ----- ----- ----- Thu Oct 22 18:34:04 CEST 2015 pool 2.52T 207G 0 55 0 503K rpool 33.0G 118G 0 11 0 97.9K ---------- ----- ----- ----- ----- ----- ----- ...<o:p></o:p> just occasional bursts of work. <o:p></o:p> I've now enabled SMART on the disks (2*3Tb mirror "pool" and 1*300Gb "rpool") and ran some short tests and triggered long tests (hopefully they'd succeed by tomorrow); current results are: # for D in /dev/rdsk/c0*s0; do echo "===== $D :"; smartctl -d sat,12 -a $D ; done ; for D in /dev/rdsk/c4*s0 ; do echo "===== $D :"; smartctl -d scsi -a $D ; done ===== /dev/rdsk/c0t3d0s0 : smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build) Copyright (C) 2002-12, Bruce Allen, Christian Franke, <a href="http://www.smartmontools.org/" target="_blank"> www.smartmontools.org</a><o:p></o:p> === START OF INFORMATION SECTION === Device Model: WDC WD3003FZEX-00Z4SA0 Serial Number: WD-WCC5D1KKU0PA LU WWN Device Id: 5 0014ee 2610716b7 Firmware Version: 01.01A01 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Thu Oct 22 18:45:28 2015 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled<o:p></o:p> === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED<o:p></o:p> General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 249) Self-test routine in progress... 90% of test remaining. Total time to complete Offline data collection: (32880) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 357) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x7035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported.<o:p></o:p> SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 246 154 021 Pre-fail Always - 6691 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 14 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 4869 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 14 16 Unknown_Attribute 0x0022 130 070 000 Old_age Always - 2289651870502 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 12 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2 194 Temperature_Celsius 0x0022 117 111 000 Old_age Always - 35 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0<o:p></o:p> SMART Error Log Version: 1 No Errors Logged<o:p></o:p> SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 4869 -<o:p></o:p> SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.<o:p></o:p> ===== /dev/rdsk/c0t5d0s0 : smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build) Copyright (C) 2002-12, Bruce Allen, Christian Franke, <a href="http://www.smartmontools.org/" target="_blank"> www.smartmontools.org</a><o:p></o:p> === START OF INFORMATION SECTION === Model Family: Seagate SV35 Device Model: ST3000VX000-1ES166 Serial Number: Z500S3L8 LU WWN Device Id: 5 000c50 079e3757b Firmware Version: CV26 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 7200 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Thu Oct 22 18:45:28 2015 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled<o:p></o:p> === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED<o:p></o:p> General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 249) Self-test routine in progress... 90% of test remaining. Total time to complete Offline data collection: ( 80) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 325) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x10b9) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported.<o:p></o:p> SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 105 099 006 Pre-fail Always - 8600880 3 Spin_Up_Time 0x0003 096 094 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 19 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 085 060 030 Pre-fail Always - 342685681 9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 4214 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 19 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 028 028 000 Old_age Always - 72 190 Airflow_Temperature_Cel 0x0022 069 065 045 Old_age Always - 31 (Min/Max 29/32) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 19 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 28 194 Temperature_Celsius 0x0022 031 040 000 Old_age Always - 31 (0 20 0 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0<o:p></o:p> SMART Error Log Version: 1 No Errors Logged<o:p></o:p> SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Self-test routine in progress 90% 4214 - # 2 Short offline Completed without error 00% 4214 -<o:p></o:p> SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.<o:p></o:p> ===== /dev/rdsk/c4t5000CCA02A1292DDd0s0 : smartctl 6.0 2012-10-10 r3643 [i386-pc-solaris2.11] (local build) Copyright (C) 2002-12, Bruce Allen, Christian Franke, <a href="http://www.smartmontools.org/" target="_blank"> www.smartmontools.org</a><o:p></o:p> Vendor: HITACHI Product: HUS156030VLS600 Revision: HPH1 User Capacity: 300,000,000,000 bytes [300 GB] Logical block size: 512 bytes Logical Unit id: 0x5000cca02a1292dc Serial number: LVVA6NHS Device type: disk Transport protocol: SAS Local Time is: Thu Oct 22 18:45:29 2015 CEST Device supports SMART and is Enabled Temperature Warning Enabled SMART Health Status: OK<o:p></o:p> Current Drive Temperature: 45 C Drive Trip Temperature: 70 C Manufactured in week 14 of year 2012 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 80 Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 2340336504406016<o:p></o:p> Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 888890 0 888890 0 29326.957 0 write: 0 961315 0 961315 0 6277.560 0<o:p></o:p> Non-medium error count: 283<o:p></o:p> SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Self test in progress ... - NOW - [- - -] # 2 Background long Aborted (device reset ?) - 14354 - [- - -] # 3 Background short Completed - 14354 - [- - -] # 4 Background long Aborted (device reset ?) - 14354 - [- - -] # 5 Background long Aborted (device reset ?) - 14354 - [- - -]<o:p></o:p> Long (extended) Self Test duration: 2506 seconds [41.8 minutes]<o:p></o:p> <o:p></o:p> The zpool scrub results and general layout:<o:p></o:p> <o:p></o:p> # zpool status -v pool: pool state: ONLINE scan: scrub repaired 0 in 164h13m with 0 errors on Thu Oct 22 18:13:33 2015 config:<o:p></o:p> NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 logs c4t5000CCA02A1292DDd0p2 ONLINE 0 0 0 cache c4t5000CCA02A1292DDd0p3 ONLINE 0 0 0<o:p></o:p> errors: No known data errors<o:p></o:p> pool: rpool state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(5) for details. scan: scrub repaired 0 in 3h3m with 0 errors on Thu Oct 8 04:12:35 2015 config:<o:p></o:p> NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 c4t5000CCA02A1292DDd0s0 ONLINE 0 0 0<o:p></o:p> errors: No known data errors<o:p></o:p> # zpool list -v NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT pool 2.72T 2.52T 207G - 68% 92% 1.36x ONLINE / mirror 2.72T 2.52T 207G - 68% 92% c0t3d0 - - - - - - c0t5d0 - - - - - - log - - - - - - c4t5000CCA02A1292DDd0p2 8G 148K 8.00G - 0% 0% cache - - - - - - c4t5000CCA02A1292DDd0p3 120G 1.80G 118G - 0% 1% rpool 151G 33.0G 118G - 76% 21% 1.00x ONLINE - c4t5000CCA02A1292DDd0s0 151G 33.0G 118G - 76% 21%<o:p></o:p> Note the long scrub time may have included the downtime while the system was frozen until it was rebooted.<o:p></o:p> <o:p></o:p> Thanks in advance for the fresh pairs of eyeballs, Jim Klimov<o:p></o:p> _______________________________________________ OmniOS-discuss mailing list <a href="mailto:OmniOS-discuss@lists.omniti.com">OmniOS-discuss@lists.omniti.com</a> <a href="http://lists.omniti.com/mailman/listinfo/omnios-discuss" target="_blank">http://lists.omniti.com/mailman/listinfo/omnios-discuss</a><o:p></o:p> </div> <o:p> </o:p> </div> </div> <hr> This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others. </body> </html>