[OmniOS-discuss] omniOS r018 crashed due to scsi/iSCSI issue

Stephan Budach stephan.budach at jvm.de
Fri Jan 13 07:43:42 UTC 2017


Hi Dan,

just wanted to know, if you would be interested in the core dump of this 
crash:

Jan 11 07:28:52 zfsha02gh79 scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk at g600144f090d09613000056b8a83b0007 (sd27):
Jan 11 07:28:52 zfsha02gh79     incomplete write- retrying
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING: 
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79 
/scsi_vhci/disk at g600144f0564d504f4f4c3035534c3133 (sd47): Command 
Timeout on path iscsi0/disk at 0000iqn.2016-02.de.jvm:nfsvmpool05ssd030002,3
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING: 
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79 
/scsi_vhci/disk at g600144f0564d504f4f4c3035534c3039 (sd44): Command 
Timeout on path iscsi0/disk at 0000iqn.2016-02.de.jvm:nfsvmpool05ssd030002,0
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING: 
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79 
/scsi_vhci/disk at g600144f0564d504f4f4c3035534c3131 (sd46): Command 
Timeout on path iscsi0/disk at 0000iqn.2016-02.de.jvm:nfsvmpool05ssd030002,2
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING: 
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79 
/scsi_vhci/disk at g600144f0564d504f4f4c3035534c3130 (sd45): Command 
Timeout on path iscsi0/disk at 0000iqn.2016-02.de.jvm:nfsvmpool05ssd030002,1
Jan 12 17:30:22 zfsha02gh79 iscsi: [ID 431120 kern.warning] WARNING: 
iscsi connection(26/3f) closing connection - target requested reason:0x7
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING: 
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79 
/scsi_vhci/disk at g600144f090d09613000056b8a7f10003 (sd19): Command 
Timeout on path iscsi0/disk at 0000iqn.2015-03.de.jvm:nfsvmpool05ssd010002,2
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING: 
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79 
/scsi_vhci/disk at g600144f090d09613000056b8a7fc0004 (sd21): Command 
Timeout on path iscsi0/disk at 0000iqn.2015-03.de.jvm:nfsvmpool05ssd010002,3
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING: 
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79 
/scsi_vhci/disk at g600144f090d09613000056b8a84a0008 (sd29): Command 
Timeout on path iscsi0/disk at 0000iqn.2015-03.de.jvm:nfsvmpool05ssd010002,7
Jan 12 17:30:22 zfsha02gh79 scsi: [ID 243001 kern.warning] WARNING: 
/scsi_vhci (scsi_vhci0):
Jan 12 17:30:22 zfsha02gh79 unix: [ID 836849 kern.notice]
Jan 12 17:30:22 zfsha02gh79 ^Mpanic[cpu1]/thread=ffffff00f6539c40:
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 335743 kern.notice] BAD TRAP: 
type=e (#pf Page fault) rp=ffffff00f6539610 addr=10 occurred in module 
"scsi_vhci" due to a NULL pointer dereference
Jan 12 17:30:22 zfsha02gh79 unix: [ID 100000 kern.notice]
Jan 12 17:30:22 zfsha02gh79 unix: [ID 839527 kern.notice] sched:
Jan 12 17:30:22 zfsha02gh79 unix: [ID 753105 kern.notice] #pf Page fault
Jan 12 17:30:22 zfsha02gh79 unix: [ID 532287 kern.notice] Bad kernel 
fault at addr=0x10
Jan 12 17:30:22 zfsha02gh79 unix: [ID 243837 kern.notice] pid=0, 
pc=0xfffffffff7948e15, sp=0xffffff00f6539700, eflags=0x10246
Jan 12 17:30:22 zfsha02gh79 unix: [ID 211416 kern.notice] cr0: 
8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 
1426f8<smep,osxsav,vmxe,xmme,fxsr,pge,mce,pae,pse,de>
Jan 12 17:30:22 zfsha02gh79 unix: [ID 624947 kern.notice] cr2: 10
Jan 12 17:30:22 zfsha02gh79 unix: [ID 625075 kern.notice] cr3: c000000
Jan 12 17:30:22 zfsha02gh79 unix: [ID 625715 kern.notice] cr8: 0
Jan 12 17:30:22 zfsha02gh79 unix: [ID 100000 kern.notice]
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice]       rdi: 
ffffff226adb90d8 rsi:                1 rdx: ffffff227063d400
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice] 
rcx:                2  r8:                0  r9: fffffffff794bd10
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice] 
rax:                0 rbx:                1 rbp: ffffff00f6539780
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice] 
r10:                0 r11: ffffff00f65397b0 r12:                2
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice] 
r13:                1 r14:                0 r15:                4
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice] 
fsb:                0 gsb: ffffff21f0e81040  ds:               4b
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice] 
es:               4b  fs:                0  gs:              1c3
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice] 
trp:                e err:                0 rip: fffffffff7948e15
Jan 12 17:30:22 zfsha02gh79 unix: [ID 592667 kern.notice] 
cs:               30 rfl:            10246 rsp: ffffff00f6539700
Jan 12 17:30:22 zfsha02gh79 unix: [ID 266532 kern.notice] 
ss:               38
Jan 12 17:30:22 zfsha02gh79 unix: [ID 100000 kern.notice]
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f65394f0 unix:die+df ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539600 unix:trap+dd8 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539610 unix:_cmntrap+e6 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539780 scsi_vhci:vhci_scsi_reset_target+75 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f65397d0 scsi_vhci:vhci_recovery_reset+7d ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539820 scsi_vhci:vhci_pathinfo_offline+e5 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f65398c0 scsi_vhci:vhci_pathinfo_state_change+d5 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539950 genunix:i_mdi_pi_state_change+16a ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539990 genunix:mdi_pi_offline+39 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539a20 iscsi:iscsi_lun_offline+b3 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539a60 iscsi:iscsi_sess_offline_luns+4d ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539ab0 iscsi:iscsi_sess_state_logged_in+11e ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539b00 iscsi:iscsi_sess_state_machine+13e ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539b60 iscsi:iscsi_client_notify_task+17e ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539c20 genunix:taskq_thread+2d0 ()
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 655072 kern.notice] 
ffffff00f6539c30 unix:thread_start+8 ()
Jan 12 17:30:22 zfsha02gh79 unix: [ID 100000 kern.notice]
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 672855 kern.notice] syncing 
file systems...
Jan 12 17:30:24 zfsha02gh79 genunix: [ID 904073 kern.notice]  done
Jan 12 17:30:22 zfsha02gh79 genunix: [ID 111219 kern.notice] dumping to 
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Jan 12 17:30:22 zfsha02gh79 ahci: [ID 405573 kern.info] NOTICE: ahci0: 
ahci_tran_reset_dport port 0 reset port
Jan 12 17:48:01 zfsha02gh79 genunix: [ID 100000 kern.notice]
Jan 12 17:48:02 zfsha02gh79 genunix: [ID 665016 kern.notice] ^M100% 
done: 4721646 pages dumped,

This happend on a rather higher load situation, when I was copying a 
200G file from a snapshot back to it's original place on its zvol, when 
this happened. Luckily these are RSF-1 nodes and the other one took over 
very quickliy, such as that my VM cluster didn't even seem to notice 
this issue. However, at that time I was conencted to the crashing host 
via ssh and my heart skipped a beat. ;)

As I have (unvoluntarily) freed this node of it's duties, I could jump 
to r020 on it, but I wonder if there has been any changes to the 
scsi_vhci layer at all in recent times…

Cheers,
Stephan

-- 
Krebs's 3 Basic Rules for Online Safety
1st - "If you didn't go looking for it, don't install it!"
2nd - "If you installed it, update it."
3rd - "If you no longer need it, remove it."
http://krebsonsecurity.com/2011/05/krebss-3-basic-rules-for-online-safety


Stephan Budach
Head of IT
Jung von Matt AG
Glashüttenstraße 79
20357 Hamburg


Tel: +49 40-4321-1353
Fax: +49 40-4321-1114
E-Mail: stephan.budach at jvm.de
Internet: http://www.jvm.com
CiscoJabber Video: https://exp-e2.jvm.de/call/stephan.budach

Vorstand: Dr. Peter Figge, Jean-Remy von Matt, Larissa Pohl, Thomas Strerath, Götz Ulmer
Vorsitzender des Aufsichtsrates: Hans Hermann Münchmeyer
AG HH HRB 72893

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170113/905a30eb/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5546 bytes
Desc: not available
URL: <https://omniosce.org/ml-archive/attachments/20170113/905a30eb/attachment-0001.bin>


More information about the OmniOS-discuss mailing list