[OmniOS-discuss] getgrnam_r hangs if buffer too small

Nathan Huff nrhuff at umn.edu
Fri Jan 23 21:23:32 UTC 2015


Patching the winbind nss module to return NSS_UNAVAIL in the buffer to 
small case fixed the issue.  Turns out there was a year old open bug 
report about this.  I submitted my patch so hopefully this can get fixed 
upstream as well.

On 2015-01-22 4:06 PM, Nathan Huff wrote:
> I should have also mentioned that this is using the samba winbind nss
> module.  As I am looking at the code I think the
> problem is that when the buffer is too small the winbind nss module set
> errno to ERANGE and then returns NSS_TRYAGAIN.
>
> In Illumos in nss_commons.c there is a function retry_test that looks at
> the return value, but not the errno.  This causes the nss_search
> function to loop endlessly since the buffer never gets resized. It looks
> like the nss modules in Illumos return UNAVAIL instead of TRYAGAIN for
> cases where the buffer isn't big enough.  I will probably try patching
> the Samba sources and see if that fixes the issue.  I couldn't find any
> documentation that would say which is correct in the general case.  The
> only thing I could find was for glibc which wants TRYAGAIN in this case.
>
> I don't know if there is any use for it, but the pstack is below.
> 2971:   ./a.out
>   fef04ae5 nanosleep (8047c28, 8047c20)
>   feef3244 sleep    (5, 2d7, fee104c8, 8047cb8, fee84523, fefa0b28) + 31
>   fee9b385 nss_search (fef74520, fee83eb0, 4, 8047cb8, 0, 1) + 1a5
>   fee845c0 getgrnam_r (8050f8b, 8047d10, 80611b0, 400, 80611b0, 80611c0)
> + 9d
>   08050e89 main     (1, 8047d60, 8047d68, 8050bf2, 8050f60, 0) + 59
>   08050c53 _start   (1, 8047e28, 0, 8047e30, 8047e44, 8047e58) + 83
>
> I have a core file, but I think I understand what is going on enough so
> it probably isn't necessary.
>
> On 2015-01-22 2:58 PM, Dan McDonald wrote:
>>
>>> On Jan 22, 2015, at 3:11 PM, Nathan Huff <nrhuff at umn.edu> wrote:
>>>
>>> I am running 151006 and we have some very large groups.  If the
>>> buffer passed to getgrnam_r is too small to fit the group entry it
>>> seems to just hang.  I think it is supposed to return NULL and set
>>> errno to ERANGE.  If the buffer is big enough it returns the
>>> information fine.
>>
>> When your process hangs (assuming it's easily reproducible) could you
>> utter:
>>
>>     pstack <PID-of-hung-process>
>>
>> and share the stack with the list, please?
>>
>> And for bonus points, take a core dump of it as well:
>>
>>     gcore <PID-of-hung-process>
>>
>> I *suspect* this affects all OmniOS versions.  The code in question is
>> quite old, with last-changes predating illumos itself.
>>
>> Thanks!
>> Dan
>>
>

-- 
Nathan Huff
System Administrator
Academic Health Center Information Systems
University of Minnesota
612-626-9136


More information about the OmniOS-discuss mailing list