<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Am 18.02.16 um 22:56 schrieb Richard
      Elling:<br>
    </div>
    <blockquote
      cite="mid:2D7D9A5E-FFB1-4923-88A2-486FE66C3341@richardelling.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      comments below...
      <div class=""><br class="">
        <div>
          <blockquote type="cite" class="">
            <div class="">On Feb 18, 2016, at 12:57 PM, Schweiss, Chip
              <<a moz-do-not-send="true"
                href="mailto:chip@innovates.com" class="">chip@innovates.com</a>>
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <div dir="ltr" style="font-family: Helvetica; font-size:
                12px; font-style: normal; font-variant: normal;
                font-weight: normal; letter-spacing: normal; orphans:
                auto; text-align: start; text-indent: 0px;
                text-transform: none; white-space: normal; widows: auto;
                word-spacing: 0px; -webkit-text-stroke-width: 0px;"
                class="">
                <div class="gmail_extra"><br
                    class="Apple-interchange-newline">
                  <br class="">
                  <div class="gmail_quote">On Thu, Feb 18, 2016 at 5:14
                    AM, Michael Rasmussen<span
                      class="Apple-converted-space"> </span><span
                      dir="ltr" class=""><<a moz-do-not-send="true"
                        href="mailto:mir@miras.org" target="_blank"
                        class="">mir@miras.org</a>></span><span
                      class="Apple-converted-space"> </span>wrote:<br
                      class="">
                    <blockquote class="gmail_quote" style="margin: 0px
                      0px 0px 0.8ex; border-left-width: 1px;
                      border-left-color: rgb(204, 204, 204);
                      border-left-style: solid; padding-left: 1ex;"><span
                        class="">On Thu, 18 Feb 2016 07:13:36 +0100<br
                          class="">
                        Stephan Budach <<a moz-do-not-send="true"
                          href="mailto:stephan.budach@JVM.DE" class="">stephan.budach@JVM.DE</a>>
                        wrote:<br class="">
                        <br class="">
                        ><br class="">
                        > So, when I issue a simple ls -l on the
                        folder of the vdisks, while the switchover is
                        happening, the command somtimes comcludes in 18
                        to 20 seconds, but sometime ls will just sit
                        there for minutes.<br class="">
                        ><br class="">
                      </span>This is a known limitation in NFS. NFS was
                      never intended to be<br class="">
                      clustered so what you experience is the NFS
                      process on the client side<br class="">
                      keeps kernel locks for the now unavailable NFS
                      server and any request<br class="">
                      to the process hangs waiting for these locks to be
                      resolved. This can<br class="">
                      be compared to a situation where you hot-swap a
                      drive in the pool<br class="">
                      without notifying the pool.<br class="">
                      <br class="">
                      Only way to resolve this is to forcefully kill all
                      NFS client processes<br class="">
                      and the restart the NFS client.<br class="">
                    </blockquote>
                  </div>
                </div>
              </div>
            </div>
          </blockquote>
          <div><br class="">
          </div>
          <div>ugh. No, something else is wrong. I've been running such
            clusters for almost 20 years,</div>
          <div>it isn't a problem with the NFS server code.</div>
          <br class="">
          <blockquote type="cite" class="">
            <div class="">
              <div dir="ltr" style="font-family: Helvetica; font-size:
                12px; font-style: normal; font-variant: normal;
                font-weight: normal; letter-spacing: normal; orphans:
                auto; text-align: start; text-indent: 0px;
                text-transform: none; white-space: normal; widows: auto;
                word-spacing: 0px; -webkit-text-stroke-width: 0px;"
                class="">
                <div class="gmail_extra">
                  <div class="gmail_quote">
                    <blockquote class="gmail_quote" style="margin: 0px
                      0px 0px 0.8ex; border-left-width: 1px;
                      border-left-color: rgb(204, 204, 204);
                      border-left-style: solid; padding-left: 1ex;"><br
                        class="">
                    </blockquote>
                    <div class=""><br class="">
                    </div>
                    <div class="">I've been running RSF-1 on OmniOS
                      since about r151008.  All my clients have always
                      been NFSv3 and NFSv4.  <span
                        class="Apple-converted-space"> </span><br
                        class="">
                      <br class="">
                      My memory is a bit fuzzy, but when I first started
                      testing RSF-1, OmniOS still had the Sun lock
                      manager which was later replaced with the BSD lock
                      manager.   This has had many difficulties.<br
                        class="">
                      <br class="">
                    </div>
                    <div class="">I do remember that fail overs when I
                      first started with RSF-1 never had these stalls, I
                      believe this was because the lock state was stored
                      in the pool and the server taking over the pool
                      would inherit that state too.   That state is now
                      lost when a pool is imported with the BSD lock
                      manager.  <span class="Apple-converted-space"> </span><br
                        class="">
                      <br class="">
                    </div>
                    <div class="">When I did testing I would do both
                      full speed reading and writing to the pool and
                      force fail overs, both by command line and by
                      killing power on the active server.    Never did I
                      have a fail over take more than about 30 seconds
                      for NFS to fully resume data flow.  <span
                        class="Apple-converted-space"> </span><br
                        class="">
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </blockquote>
          <div><br class="">
          </div>
          <div>Clients will back-off, but the client's algorithm is not
            universal, so we do expect to</div>
          <div>see different client retry intervals for different
            clients. For example, the retries can</div>
          <div>exceed 30 seconds for Solaris clients after a minute or
            two (alas, I don't have the</div>
          <div>detailed data at my fingertips anymore :-(. Hence we work
            hard to make sure failovers</div>
          <div>occur as fast as feasible.</div>
          <br class="">
          <blockquote type="cite" class="">
            <div class="">
              <div dir="ltr" style="font-family: Helvetica; font-size:
                12px; font-style: normal; font-variant: normal;
                font-weight: normal; letter-spacing: normal; orphans:
                auto; text-align: start; text-indent: 0px;
                text-transform: none; white-space: normal; widows: auto;
                word-spacing: 0px; -webkit-text-stroke-width: 0px;"
                class="">
                <div class="gmail_extra">
                  <div class="gmail_quote">
                    <div class=""><br class="">
                    </div>
                    <div class="">Others who know more about the BSD
                      lock manager vs the old Sun lock manager may be
                      able to tell us more.  I'd also be curious if
                      Nexenta has addressed this.<br class="">
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </blockquote>
          <div><br class="">
          </div>
          <div>Lock manager itself is an issue and through we're
            currently testing the BSD lock</div>
          <div>manager in anger, we haven't seen this behaviour.</div>
          <div><br class="">
          </div>
          <div>Related to lock manager is name lookup. If you use name
            services, you add a latency</div>
          <div>dependency to failover for name lookups, which is why we
            often disable DNS or other</div>
          <div>network name services on high-availability services as a
            best practice.</div>
          <div> -- richard</div>
        </div>
      </div>
    </blockquote>
    <br>
    This is, why I always put each host name,involved in my cluster
    setups, into /etc/hosts on each node.<br>
    <br>
    Cheers,<br>
    Stephan<br>
  </body>
</html>