<div dir="ltr">Hm, how clean is your fabric? Any errors, deadlocks, etc?<div> -nld</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, May 12, 2014 at 6:41 PM, David Bomba <span dir="ltr"><<a href="mailto:turbo124@gmail.com" target="_blank">turbo124@gmail.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Hi Narayan,<br><br></div>We do not use iSER.<br><br>We use SRP for VMWare, and IPoIB for XenServer.<br> <br></div>In our case, our VMs operate as expected. However when copying data between Storage Repo's that is when we see the disconnects irrespective of SCSI transport.<br> </div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On 13 May 2014 09:32, Narayan Desai <span dir="ltr"><<a href="mailto:narayan.desai@gmail.com" target="_blank">narayan.desai@gmail.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Are you perchance using iscsi/iSER? We've seen similar timeouts that don't seem to correspond to hardware issues. From what we can tell, something causes iscsi heartbeats not to be processed, so the client eventually times out the block device and tries to reinitialize it. <div> <br></div><div>In our case, we're running VMs using KVM on linux hosts. The guest detects block device death, and won't recover without a reboot. </div><div><br></div><div>FWIW, switching to iscsi directly over IPoIB works great for identical workloads. We've seen this with 151006 and I think 151008. We've not yet tried it with 151010. This smells like some problem in comstar's iscsi/iser driver.</div> <div> -nld</div></div><div class="gmail_extra"><br><br><div class="gmail_quote"><div><div>On Mon, May 12, 2014 at 5:13 PM, David Bomba <span dir="ltr"><<a href="mailto:turbo124@gmail.com" target="_blank">turbo124@gmail.com</a>></span> wrote:<br> </div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>Hi guys,<br> <br> We have ~ 10 OmniOS powered ZFS storage arrays used to drive Virtual Machines under XenServer + VMWare using Infiniband interconnect.<br> <br> Our usual recipe is to use either LSI HBA or Areca Cards in pass through mode using internal drives SAS drives..<br> <br> This has worked flawlessly with Omnios 6/8.<br> <br> Recently we deployed a slightly different configuration<br> <br> HP DL380 G6<br> 64GB ram<br> X5650 proc<br> LSI 9208-e card<br> HP MDS 600 / SSA 70 external enclosure<br> 30 TOSHIBA-MK2001TRKB-1001-1.82TB SAS2 drives in mirrored configuration.<br> <br> despite the following message in dmesg the array appeared to be working as expected<br> <br> scsi: [ID 365881 <a href="http://kern.info" target="_blank">kern.info</a>] /pci@0,0/pci8086,340f@8/pci1000,30b0@0 (mpt_sas1):<br> May 13 04:01:07 s6 Log info 0x31140000 received for target 11.<br> <br> Despite this message we pushed into production and whilst the performance of the array has been good, as soon as we perform high write IO performance goes from 22k IOPS down to 100IOPS, this causes the target to disconnect from hypervisors and general mayhem ensues for the VMs.\<br> <br> During this period where performance degrades, there are no other messages coming into dmesg.<br> <br> Where should we begin to debug this? Could this be a symptom of not enough RAM? We have flashed the LSI cards to the latest firmware with no change in performance.<br> <br> Thanks in advance!<br></div></div> _______________________________________________<br> OmniOS-discuss mailing list<br> <a href="mailto:OmniOS-discuss@lists.omniti.com" target="_blank">OmniOS-discuss@lists.omniti.com</a><br> <a href="http://lists.omniti.com/mailman/listinfo/omnios-discuss" target="_blank">http://lists.omniti.com/mailman/listinfo/omnios-discuss</a><br> </blockquote></div><br></div> </blockquote></div><br></div> </div></div></blockquote></div><br></div>