[OmniOS-discuss] OmniOS random freezes

Saso Kiselkov skiselkov.ml at gmail.com
Mon Jan 6 23:32:54 UTC 2014


On 1/6/14, 10:50 PM, Niels Goossens wrote:
> 3. Is the pool healthy?
> A. The drives are consumer grade sata drives and about 3 years old. They
> are not really used that much - they used to be in my Opensolaris based
> NAS before I upgraded that to something bigger. Smartctl tells me SMART
> status of all drives is OK. There are no other log entries that lead me
> to believe a drive is bad. Zpool status is OK.

Just to perform a test, you could try loading up the pool with as much
test data as you can (some repetitive incompressible test pattern would
be best, e.g. a movie file) and then run "zpool scrub" to verify all the
checksums.

> 4. Is there anything in Supermicro IPMI?
> A. The following, which has occurred only twice now:
> 
> 2013/11/23 19:53:28Correctable Memory ECC @ DIMM2A(CPU1) - Asserted
> 2013/11/23 19:53:29Uncorrectable Memory ECC @ DIMM2A(CPU1) - Asserted
> 2013/11/23 22:28:56Correctable Memory ECC @ DIMM2A(CPU1) - Asserted
> 2013/11/23 22:28:57Uncorrectable Memory ECC @ DIMM2A(CPU1) - Asserted
> 
> Even though I'd rather not see this error, I'm not alarmed considering
> it has not occurred since.

This could indicate degrading or failing ECC memory. Try running
memtest86+ on the machine for a while to see if it reports anything
useful. You can grab a pre-built ISO at http://www.memtest.org/#downiso
Alternatively, grab a bootable file from that site. Then, just put it
somewhere on your root filesystem, e.g.
/platform/i86pc/memtest86+-4.20.bin, gunzip and boot to it from GRUB by
entering the following GRUB commands:

findroot (pool_rpool,0,a)	<- partition number + slice
bootfs rpool/ROOT/omnios	<- see "beadm list" for the exact name
kernel /platform/i86pc/memtest86+-4.20.bin
boot

> 5. Are core dump or crash files available?
> A. I've setup dumpadm and coreadm only today. There are no core files in
> /, or crash files in /var/crash. There are no log entries in /var/log.
> There is nothing in /var/adm/messages, the last entry there is hours
> before the machine freezes.

System crash dumps are usually saved on the dump device (rpool/dump)
until you manually retrieve them. If you run "savecore" without any
arguments it will try to extract the crash dump from the dump device and
save it to /var/crash/<hostname>.

Cheers,
-- 
Saso


More information about the OmniOS-discuss mailing list