<html> <head> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> After having troubles almost every week and missing the time frame to catch the bastard, today I finally had the opportunity to catch it in action:)<br> <br> As it turns out, it looks like a ZFS(not likely) or HW(probably) problem. When in "hangup" state, iscsi and network worked flawlessly and I was able to connect to iSCSI(but mounting the FS and issuing commands(show lvm volume,..) worked really slow). I was also able to work on the server, so it wasn't locked up.<br> <br> Then I decided to check the ZFS FS. I tried to create a file in ZFS mount directory by issuing 'touch test-file' and command froze. I tried to kill it with CTRL+C to no success. I tried to kill the process with kill -9, but that did not help either. Looking at iostat output, there was some reading happening, but absolutely no writes (0, nada).<br> <br> I used 'lsiutils' to connect to my LSI HBA and issued port reset, following a hard SAS link reset in a hope it will come back, but it was still frozen. I also checked 'phy counters' in lsiutils, and there were some devices with errors, but that could be due to port / link reset.<br> <br> Long story short, after 30min, everything returned to normal, without an errors message in logs or anywhere else. Bad thing is, iSCSI target froze a few minutes later and only way to resolve the trouble was to restart the server:(<br> <br> Matej<br> <br> <div class="moz-cite-prefix">On 12. 05. 2015 07:13, Matej Zerovnik wrote:<br> </div> <blockquote cite="mid:55518BFF.6080608@zunaj.si" type="cite"> <meta content="text/html; charset=windows-1252" http-equiv="Content-Type"> I know building a single 50 drives RaidZ2 is a bad idea. As I said, it's a legacy that I can't easily change. I already have a backup pool with 7x10 drives RaidZ2 to which I hope I will be able to switch this week. I hope to get some better results and less crashing...<br> <br> What is interesting is that when the 'event' happens, server works normaly, ZFS is accessable and writable(at least, there is no errors in log files), only iscsi reports errors and drops the connection. Another interesting thing is that after the 'event', all write stops, only read continues for another 30min. After 30min all traffic stops for half an hour. After that, everything starts to coming back up... Weird?!<br> <br> Matej<br> <br> <div class="moz-cite-prefix">On 09. 05. 2015 02:49, Richard Elling wrote:<br> </div> <blockquote cite="mid:40C78E86-F32D-4588-AF98-EB9820019960@richardelling.com" type="cite"> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252"> <br class=""> <div> <blockquote type="cite" class=""> <div class="">On May 5, 2015, at 9:48 AM, Matej Zerovnik <<a moz-do-not-send="true" href="mailto:matej@zunaj.si" class="">matej@zunaj.si</a>> wrote:</div> <br class="Apple-interchange-newline"> <div class=""> <meta http-equiv="Content-Type" content="text/html; charset=windows-1252" class=""> <div class=""> <div class=""> <div style="font-family: Calibri,sans-serif; font-size: 11pt;" class="">I will replace the hardwarw in about 4 months with all SAS drives, but I would love to have a working setup for the time being as well;)<br class=""> <br class=""> I looked at smart stats and there doesnt seem to be any errors. Also, no hard/soft/transfer error reported by any drive. Will take a look at service time tomorrow, maybe put the drives to graphite and look at them over a longer period.<br class=""> <br class=""> I looked at iostat -x status today and stats for pool itself reported 100% busy most of the time, 98-100% wait, 500-1300 transactions in queue, around 500 active,... First line, that is average from boot, says avg service <a moz-do-not-send="true" href="http://time.is" class="">time.is</a> around 1600ms which seems like aaaalot. Can it be due to really big queue?<br class=""> <br class=""> Would it help to create 5 10drives raidz pools instead of one with 50 drives?<br class=""> </div> </div> </div> </div> </blockquote> <div><br class=""> </div> <div>It is a bad idea to build a single raidz set with 50 drives. Very bad. Hence the zpool</div> <div>man page says, "The recommended number is between 3 and 9 to help increase performance."</div> <div>But this recommendation applies to reliability, too.</div> <div> -- richard</div> </div> <br class=""> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <pre wrap="">_______________________________________________ OmniOS-discuss mailing list <a class="moz-txt-link-abbreviated" href="mailto:OmniOS-discuss@lists.omniti.com">OmniOS-discuss@lists.omniti.com</a> <a class="moz-txt-link-freetext" href="http://lists.omniti.com/mailman/listinfo/omnios-discuss">http://lists.omniti.com/mailman/listinfo/omnios-discuss</a> </pre> </blockquote> <br> </body> </html>