<div dir="ltr"><div class="gmail-head" style="font-weight:bold;margin-top:1em;color:black;font-family:monospace;font-size:13.3333px;white-space:pre">This is our patch. It was applied 3 years ago so the line number could be different for the latest version of the file.</div><div class="gmail-head" style="font-weight:bold;margin-top:1em;color:black;font-family:monospace;font-size:13.3333px;white-space:pre">diff --git a/usr/src/uts/common/rpc/clnt_cots.c b/usr/src/uts/common/rpc/clnt_cots.c<br>index 4466e93..0a0951d 100644<br>--- a/<a href="http://bsegit/cgi-bin/cgit/smartos-repo/illumos-joyent.git/.git/tree/usr/src/uts/common/rpc/clnt_cots.c?h=tmw-3.5p1&id=01ab13000b73a72f417e287ae71a7e175b2da20a" style="color:blue;text-decoration-line:none">usr/src/uts/common/rpc/clnt_cots.c</a><br>+++ b/<a href="http://bsegit/cgi-bin/cgit/smartos-repo/illumos-joyent.git/.git/tree/usr/src/uts/common/rpc/clnt_cots.c?h=tmw-3.5p1&id=bab3bdcd1929211af2a220ea41cf946fafc521ed" style="color:blue;text-decoration-line:none">usr/src/uts/common/rpc/clnt_cots.c</a></div><div class="gmail-hunk" style="color:rgb(0,0,153);font-family:monospace;font-size:13.3333px;white-space:pre">@@ -2285,6 +2285,7 @@ start_retry_loop:</div><div class="gmail-ctx" style="color:rgb(51,51,51);font-family:monospace;font-size:13.3333px;white-space:pre">            if (rpcerr->re_status == RPC_SUCCESS)</div><div class="gmail-ctx" style="color:rgb(51,51,51);font-family:monospace;font-size:13.3333px;white-space:pre">                     rpcerr->re_status = RPC_XPRTFAILED;</div><div class="gmail-ctx" style="color:rgb(51,51,51);font-family:monospace;font-size:13.3333px;white-space:pre">               cm_entry->x_connected = FALSE;</div><div class="gmail-add" style="color:green;font-family:monospace;font-size:13.3333px;white-space:pre">+           cm_entry->x_dead = TRUE; </div><div class="gmail-ctx" style="color:rgb(51,51,51);font-family:monospace;font-size:13.3333px;white-space:pre">         } else</div><div class="gmail-ctx" style="color:rgb(51,51,51);font-family:monospace;font-size:13.3333px;white-space:pre">               cm_entry->x_connected = connected;</div><div class="gmail-ctx" style="color:rgb(51,51,51);font-family:monospace;font-size:13.3333px;white-space:pre"> </div><div class="gmail-hunk" style="color:rgb(0,0,153);font-family:monospace;font-size:13.3333px;white-space:pre">@@ -2403,6 +2404,7 @@ connmgr_wrapconnect(</div><div class="gmail-ctx" style="color:rgb(51,51,51);font-family:monospace;font-size:13.3333px;white-space:pre">                       if (rpcerr->re_status == RPC_SUCCESS)</div><div class="gmail-ctx" style="color:rgb(51,51,51);font-family:monospace;font-size:13.3333px;white-space:pre">                             rpcerr->re_status = RPC_XPRTFAILED;</div><div class="gmail-ctx" style="color:rgb(51,51,51);font-family:monospace;font-size:13.3333px;white-space:pre">                       cm_entry->x_connected = FALSE;</div><div class="gmail-add" style="color:green;font-family:monospace;font-size:13.3333px;white-space:pre">+                   cm_entry->x_dead = TRUE;</div><div class="gmail-ctx" style="color:rgb(51,51,51);font-family:monospace;font-size:13.3333px;white-space:pre">          } else</div><div class="gmail-ctx" style="color:rgb(51,51,51);font-family:monospace;font-size:13.3333px;white-space:pre">                       cm_entry->x_connected = connected;</div><div class="gmail-ctx" style="color:rgb(51,51,51);font-family:monospace;font-size:13.3333px;white-space:pre"> </div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jan 8, 2018 at 11:21 AM, Dan McDonald <span dir="ltr"><<a href="mailto:danmcd@joyent.com" target="_blank">danmcd@joyent.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
<br>
> On Jan 7, 2018, at 3:15 PM, Youzhong Yang <<a href="mailto:youzhong@gmail.com">youzhong@gmail.com</a>> wrote:<br>
><br>
> Not sure if it's the same issue we reported 3 years ago. We applied our patch and haven't seen this issue ever since.<br>
><br>
> <a href="https://illumos.topicbox.com/groups/developer/Te5808458a5a5a14f-M74735db9aeccaa5d8c3a70a4" rel="noreferrer" target="_blank">https://illumos.topicbox.com/<wbr>groups/developer/<wbr>Te5808458a5a5a14f-<wbr>M74735db9aeccaa5d8c3a70a4</a><br>
<br>
</span>To quote that e-mail:<br>
<br>
> Hi Marcel:<br>
>       It looks like we're getting an "early disconnect". This is what is leading to the accumulation of bound reserved ports. The scenario for reproduction is a follows:<br>
><br>
>       1. Linux  DEBIAN7.4 client acquires and releases lock on file on server (via NFS).<br>
>       2. reboot Linux client (but do so<br>
> _before_<br>
>  MIR_CLNT_IDLE_TIMEOUT interval fires on server side).<br>
>       3. when Linux client comes back up, repeat step 1.<br>
><br>
>       At this point, a cm_entry with only the ORDREL flag set in x_state_word will remain in the cm_entry linked list list (cm_hd).<br>
>       It appears that without at least a DEAD flag set in x_state_word, this cm_entry will remain bound to a the port...and will never be garbage collected.<br>
><br>
>       To experiment, we added,<br>
>       "cm_entry->x_dead = TRUE;"<br>
>       at line 2272 and 2390 to here:<br>
> <a href="https://github.com/joyent/illumos-joyent/blob/master/usr/src/uts/common/rpc/clnt_cots.c" rel="noreferrer" target="_blank">https://github.com/joyent/<wbr>illumos-joyent/blob/master/<wbr>usr/src/uts/common/rpc/clnt_<wbr>cots.c</a><br>
><br>
>       Testing with the above reproduction scenario, we are taking the path of line 2272 -- and that with the DEAD flag set in x_state_word, these "zombie" cm_entries are being now cleaned up, and we're no longer accumulating/leaking reserved ports.<br>
><br>
>       Is there more to it? This seems too simple of a fix. Are there unintended consequences we should be looking/testing for? Does this seem like it might be #1616 as well?<br>
>       Thoughts?<br>
>       Thanks!<br>
>       -Ken & Youzhong<br>
<br>
<br>
And here's the patch in diff form for easier consumption:<br>
<br>
diff --git a/usr/src/uts/common/rpc/clnt_<wbr>cots.c b/usr/src/uts/common/rpc/clnt_<wbr>cots.c<br>
index 2e64ab0..f9b78ff 100644<br>
--- a/usr/src/uts/common/rpc/clnt_<wbr>cots.c<br>
+++ b/usr/src/uts/common/rpc/clnt_<wbr>cots.c<br>
@@ -2269,6 +2269,7 @@ start_retry_loop:<br>
                cm_entry->x_ordrel = FALSE;<br>
<br>
        cm_entry->x_tidu_size = tidu_size;<br>
+       cm_entry->x_dead = TRUE;<br>
<br>
        if (cm_entry->x_early_disc) {<br>
                /*<br>
@@ -2387,6 +2388,7 @@ connmgr_wrapconnect(<br>
<br>
                mutex_enter(&connmgr_lock);<br>
<br>
+               cm_entry->x_dead = TRUE;<br>
<br>
                if (cm_entry->x_early_disc) {<br>
                        /*<br>
<br>
Went back and checked my notes - I was traveling when that thread was going on, so I likely missed it altogether in the hustle/bustle of that.<br>
<br>
It seems at first glance you're being too aggressive in setting X_DEAD (note that this code gives you BOTH ways to set the flag, via a C bit fields OR the macro form... makes for very difficult reading, IMHO), but it my concern was valid you'd likely see far more outright failures.<br>
<br>
Maybe that patch is all we need?<br>
<br>
Dan<br>
<br>
<br>
------------------------------<wbr>------------<br>
illumos-zfs<br>
Archives: <a href="https://illumos.topicbox.com/groups/zfs/discussions/T8f10bde64dc0d5c5-M9dee7d96157a6ad5c11c472a" rel="noreferrer" target="_blank">https://illumos.topicbox.com/<wbr>groups/zfs/discussions/<wbr>T8f10bde64dc0d5c5-<wbr>M9dee7d96157a6ad5c11c472a</a><br>
<div class="HOEnZb"><div class="h5">Powered by Topicbox: <a href="https://topicbox.com" rel="noreferrer" target="_blank">https://topicbox.com</a><br>
</div></div></blockquote></div><br></div>