[OmniOS-discuss] kernel panic "kernel heap corruption detected" when creating zero eager disks

Thu Mar 26 15:47:47 UTC 2015

On Thu, Mar 26, 2015 at 11:37 AM, Dan McDonald <danmcd at omniti.com> wrote:

> I mentioned earlier:
>
> > I know Nexenta's done a LOT of improvements on this in illumos-nexenta.
> It might be time to upstream some of what they've done.  I know it's a
> moving target (COMSTAR is not a well-written subsystem), so it may take
> some unravelling.
>
> I was looking at Nexenta's changes.  They HAVE done a lot of work in these
> areas, and at some point someone needs to upstream them.  Nexenta isn't
> under an obligation to upstream, just to publish, which they have.
>
> I found one particular bug that MAY have manifested as your problem.
> Because 014's coming up, I can't get to it at the moment. If you've built
> kernel modules before, I can tell you where the fix should go and
> approximately what the fix is.  You'd have to test it, however.
>
> Sorry I can't be of more immediate assistance,
> Dan
>
>
>
Hi Dan (just saw your latest reply as I was writing this),

Thanks for all the time you've put into this. It certainly sounds like some
of the Nexenta COMSTAR work might be useful. Is R151014 released yet? It
looks like all the documentation is there but mentions Apr 3/2015. Is there
any reason to believe that it might be fixed if there are no (or low
amounts) of changes in COMSTAR for this release? (Sounds like it isn't, now
that I've read your latest)

It looks like I'll have to make do with lazy zeroed or thin provisioned
disks of 10TB+ for my Veeam tests, if it doesn't cause another kernel
panic. I'm hesitant to create these now during business hours (and I
shouldn't be.. these are normal VM provisioning tasks on available
storage!). In your estimation, would eager zero vs lazy zero vs thin
provisioned vmdks make any difference with that WRITE_SAME code? The
majority of my VMs use eager zeroed disks, but again, never to this size.

If there is anything you need me to test (in R151014? or beyond?), it's
easy enough for me to reproduce (I timed myself last night, it took me
about 2 hours to gracefully shut/save all the VMs, cause the crash dump,
and get the infrastructure back up). I should probably try it on Hyper-V as
well when I get time, but I believe most of those are Dynamic (thin)
instead of Fixed (eager zero) disks, and I don't believe Hyper-V has an
equivalent to lazy zeroed. The Hyper-V environment runs our test VMs after
all, and aren't as performance sensitive.

If you can tell me where the fix should go, I can probably try it out, even
though I haven't built any kernel modules before (though I'm sure there are
enough resources for me to draw on). I'll start by making myself a build
server on a VM. Is this
http://wiki.illumos.org/display/illumos/How+To+Build+illumos still current?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20150326/707c16c8/attachment-0001.html>