[OmniOS-discuss] Comstar Disconnects under high load.

David Bomba turbo124 at gmail.com
Mon May 12 23:41:45 UTC 2014


Hi Narayan,

We do not use iSER.

We use SRP for VMWare, and IPoIB for XenServer.

In our case, our VMs operate as expected. However when copying data between
Storage Repo's that is when we see the disconnects irrespective of SCSI
transport.


On 13 May 2014 09:32, Narayan Desai <narayan.desai at gmail.com> wrote:

> Are you perchance using iscsi/iSER? We've seen similar timeouts that don't
> seem to correspond to hardware issues. From what we can tell, something
> causes iscsi heartbeats not to be processed, so the client eventually times
> out the block device and tries to reinitialize it.
>
> In our case, we're running VMs using KVM on linux hosts. The guest detects
> block device death, and won't recover without a reboot.
>
> FWIW, switching to iscsi directly over IPoIB works great for identical
> workloads. We've seen this with 151006 and I think 151008. We've not yet
> tried it with 151010. This smells like some problem in comstar's iscsi/iser
> driver.
>  -nld
>
>
> On Mon, May 12, 2014 at 5:13 PM, David Bomba <turbo124 at gmail.com> wrote:
>
>> Hi guys,
>>
>> We have ~ 10 OmniOS powered ZFS storage arrays used to drive Virtual
>> Machines under XenServer + VMWare using Infiniband interconnect.
>>
>> Our usual recipe is to use either LSI HBA or Areca Cards in pass through
>> mode using internal drives SAS drives..
>>
>> This has worked flawlessly with Omnios 6/8.
>>
>> Recently we deployed a slightly different configuration
>>
>> HP DL380 G6
>> 64GB ram
>> X5650 proc
>> LSI 9208-e card
>> HP MDS 600 / SSA 70 external enclosure
>> 30 TOSHIBA-MK2001TRKB-1001-1.82TB SAS2 drives in mirrored configuration.
>>
>> despite the following message in dmesg the array appeared to be working
>> as expected
>>
>> scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,340f at 8/pci1000,30b0 at 0(mpt_sas1):
>> May 13 04:01:07 s6      Log info 0x31140000 received for target 11.
>>
>> Despite this message we pushed into production and whilst the performance
>> of the array has been good, as soon as we perform high write IO performance
>> goes from 22k IOPS down to 100IOPS, this causes the target to disconnect
>> from hypervisors and general mayhem ensues for the VMs.\
>>
>> During this period where performance degrades, there are no other
>> messages coming into dmesg.
>>
>> Where should we begin to debug this? Could this be a symptom of not
>> enough RAM? We have flashed the LSI cards to the latest firmware with no
>> change in performance.
>>
>> Thanks in advance!
>> _______________________________________________
>> OmniOS-discuss mailing list
>> OmniOS-discuss at lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140513/664ae9a4/attachment-0001.html>


More information about the OmniOS-discuss mailing list