[OmniOS-discuss] LX: real ksh93 broken

Ludovic Orban lorban at bitronix.be
Wed May 10 19:44:37 UTC 2017


Okay, I found what causes ksh to misbehave. It's in sh_init(), when
shgd->lim.child_max is initialized with the results of
getconf("CHILD_MAX"), see:
https://github.com/att/ast/blob/master/src/cmd/ksh93/sh/init.c#L1289

I've commented out that line, hardcoded shgd->lim.child_max to 128, rebuilt
and voila: ksh works as it should.

Now I have to dig into that getconf() method to figure out what the
returned value is and where it's coming from. Sounds trivial, but my C is
*very* rusty, the asm gcc generates doesn't look at all what the JVM's JIT
generates (which gives me wrong reflexes as I'm used to the latter) and I'm
not very familiar with mdb.

Oh well, that turned into a nice debugging re-training session which I very
much needed. That reminds me the good old days at my first job when I was
porting Linux apps to Solaris.

Thank you for maintaining such a well-designed and pleasant to use OS!


On Wed, May 10, 2017 at 3:59 PM, Dan McDonald <danmcd at omniti.com> wrote:

> Wow, thank you for the further deep-diving.
>
> > On May 10, 2017, at 5:21 AM, Ludovic Orban <lorban at bitronix.be> wrote:
> >
> > Looking at ksh' sources, my understanding is that job_post is stuck in
> that else clause:
> >        else
> >        {
> >               /* create a new job */
> >               while((pw->p_job = job_alloc()) < 0)
> >                      job_wait((pid_t)1);
> >               pw->p_nxtjob = job.pwlist;
> >               pw->p_nxtproc = 0;
> >        }
> >
> > Digging into the sources and stepping though the instructions of
> job_alloc and job_byjid it looks like ksh cannot allocate a job id as it
> believes they're all reserved. But so far, all this code is purely working
> on internal structures of ksh so a LX bug would have no impact.
> >
> > I'll continue looking into this as time permits and I'll post an update
> if I find anything worth mentioning.
> >
>
> Be careful of narrowing your focus too far.  I see some things worth
> considering:
>
> 1.) If the "if" you're not showing me dependent on something in global
> state that may have been mis-initialized by an LX emulation bug?
>
> 2.) Same question as #1, but applied to job_alloc() and job_wait().
>
> I'm guessing LX in OmniOS is failing because I mismerged or plain forgot
> something, given that Nahum says he can run ksh93 on SmartOS just fine.
>
>
> Please make sure you're looking at the bigger picture, but THANK YOU for
> the further investigation.
>
> Dan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20170510/e1d075b0/attachment.html>


More information about the OmniOS-discuss mailing list