<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jan 26, 2015, at 5:16 PM, W Verb <<a href="mailto:wverb73@gmail.com" class="">wverb73@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class=""><div class="">Hello All,<br class=""><br class=""></div>I am mildly confused by something <span style="font-family:monospace,monospace" class="">iostat</span> does when displaying statistics for a zpool. Before I begin rooting through the <span style="font-family:monospace,monospace" class="">iostat</span> source, does anyone have an idea of why I am seeing high "<span style="font-family:monospace,monospace" class="">wait</span>" and "<span style="font-family:monospace,monospace" class="">wsvc_t</span>" values for "<span style="font-family:monospace,monospace" class="">ppool</span>" when my devices apparently are not busy? I would have assumed that the stats for the pool would be the sum of the stats for the zdevs....<br class=""></div></div></div></blockquote><div><br class=""></div><div>welcome to queuing theory! ;-)</div><div><br class=""></div><div>First, iostat knows nothing about the devices being measured. It is really just a processor</div><div>for kstats of type KSTAT_TYPE_IO (see the kstat(3kstat) man page for discussion)  For that</div><div>type, you get a 2-queue set. For many cases, 2-queues is a fine model, but when there is</div><div>only one interesting queue, sometimes developers choose to put less interesting info in the</div><div>"wait" queue.</div><div><br class=""></div><div>Second, it is the responsibility of the developer to define the queues. In the case of pools,</div><div>the queues are defined as:</div><div><span class="Apple-tab-span" style="white-space:pre">   </span>wait = vdev_queue_io_add() until vdev_queue_io_remove()</div><div><span class="Apple-tab-span" style="white-space:pre">      </span>run = vdev_queue_pending_add() until vdev_queue_pending_remove()</div><div><br class=""></div><div>The run queue is closer to the actual measured I/O to the vdev (the juicy performance bits)</div><div>The wait queue is closer to the transaction engine and includes time for aggregation.</div><div>Thus we expect the wait queue to be higher, especially for async workloads. But since I/Os</div><div>can and do get aggregated prior to being sent to the vdev, it is not a very useful measure of</div><div>overall performance. In other words, optimizing this away could actually hurt performance.</div><div><br class=""></div><div>In general, worry about the run queues and don't worry so much about the wait queues.</div><div>NB, iostat calls "run" queues "active" queues. You say Tomato, I say 'mater.</div><div> -- richard</div><div><br class=""></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><br class=""><span style="font-family:monospace,monospace" class="">                    extended device statistics<br class="">    r/s    w/s   kr/s     kw/s  wait actv wsvc_t asvc_t  %w  %b device<br class="">   10.0 9183.0   40.5 344942.0   0.0  1.8    0.0    0.2   0 178 c4<br class="">    1.0  187.0    4.0  19684.0   0.0  0.1    0.0    0.5   0   8 c4t5000C5006A597B93d0<br class="">    2.0  199.0   12.0  20908.0   0.0  0.1    0.0    0.6   0  12 c4t5000C500653DE049d0<br class="">    2.0  197.0    8.0  20788.0   0.0  0.2    0.0    0.8   0  15 c4t5000C5003607D87Bd0<br class="">    0.0  202.0    0.0  20908.0   0.0  0.1    0.0    0.6   0  11 c4t5000C5006A5903A2d0<br class="">    0.0  189.0    0.0  19684.0   0.0  0.1    0.0    0.5   0  10 c4t5000C500653DEE58d0<br class="">    5.0  957.0   16.5   1966.5   0.0  0.1    0.0    0.1   0   7 c4t50026B723A07AC78d0<br class="">    0.0  201.0    0.0  20787.9   0.0  0.1    0.0    0.7   0  14 c4t5000C5003604ED37d0<br class="">    0.0    0.0    0.0      0.0   0.0  0.0    0.0    0.0   0   0 c4t5000C500653E447Ad0<br class="">    0.0 3525.0    0.0 110107.7   0.0  0.5    0.0    0.2   0  51 c4t500253887000690Dd0<br class="">    0.0 3526.0    0.0 110107.7   0.0  0.5    0.0    0.1   1  50 c4t5002538870006917d0<br class="">   10.0 6046.0   40.5 344941.5 837.4  1.9  138.3    0.3  23  67 ppool<br class=""><br class=""></span><br class=""></div><div class="">For those following the VAAI thread, this is the system I will be using as my testbed.<br class=""></div><div class=""><br class=""></div>Here is the structure of <span style="font-family:monospace,monospace" class="">ppool</span> (taken at a different time than above):<br class=""><span style="font-family:monospace,monospace" class=""><br class="">root@sanbox:/root# zpool iostat -v ppool<br class="">                              capacity     operations    bandwidth<br class="">pool                       alloc   free   read  write   read  write<br class="">-------------------------  -----  -----  -----  -----  -----  -----<br class="">ppool                       191G  7.97T     23    637   140K  15.0M<br class="">  mirror                   63.5G  2.66T      7    133  46.3K   840K<br class="">    c4t5000C5006A597B93d0      -      -      1     13  24.3K   844K<br class="">    c4t5000C500653DEE58d0      -      -      1     13  24.1K   844K<br class="">  mirror                   63.6G  2.66T      7    133  46.5K   839K<br class="">    c4t5000C5006A5903A2d0      -      -      1     13  24.0K   844K<br class="">    c4t5000C500653DE049d0      -      -      1     13  24.6K   844K<br class="">  mirror                   63.5G  2.66T      7    133  46.8K   839K<br class="">    c4t5000C5003607D87Bd0      -      -      1     13  24.5K   843K<br class="">    c4t5000C5003604ED37d0      -      -      1     13  24.4K   843K<br class="">logs                           -      -      -      -      -      -<br class="">  mirror                    301M   222G      0    236      0  12.5M<br class="">    c4t5002538870006917d0      -      -      0    236      5  12.5M<br class="">    c4t500253887000690Dd0      -      -      0    236      5  12.5M<br class="">cache                          -      -      -      -      -      -<br class="">  c4t50026B723A07AC78d0    62.3G  11.4G     19    113  83.0K  1.07M<br class="">-------------------------  -----  -----  -----  -----  -----  -----</span><br class=""><div class=""><br class=""><span style="font-family:monospace,monospace" class="">root@sanbox:/root# zfs get all ppool<br class="">NAME   PROPERTY              VALUE                  SOURCE<br class="">ppool  type                  filesystem             -<br class="">ppool  creation              Sat Jan 24 18:37 2015  -<br class="">ppool  used                  5.16T                  -<br class="">ppool  available             2.74T                  -<br class="">ppool  referenced            96K                    -<br class="">ppool  compressratio         1.51x                  -<br class="">ppool  mounted               yes                    -<br class="">ppool  quota                 none                   default<br class="">ppool  reservation           none                   default<br class="">ppool  recordsize            128K                   default<br class="">ppool  mountpoint            /ppool                 default<br class="">ppool  sharenfs              off                    default<br class="">ppool  checksum              on                     default<br class="">ppool  compression           lz4                    local<br class="">ppool  atime                 on                     default<br class="">ppool  devices               on                     default<br class="">ppool  exec                  on                     default<br class="">ppool  setuid                on                     default<br class="">ppool  readonly              off                    default<br class="">ppool  zoned                 off                    default<br class="">ppool  snapdir               hidden                 default<br class="">ppool  aclmode               discard                default<br class="">ppool  aclinherit            restricted             default<br class="">ppool  canmount              on                     default<br class="">ppool  xattr                 on                     default<br class="">ppool  copies                1                      default<br class="">ppool  version               5                      -<br class="">ppool  utf8only              off                    -<br class="">ppool  normalization         none                   -<br class="">ppool  casesensitivity       sensitive              -<br class="">ppool  vscan                 off                    default<br class="">ppool  nbmand                off                    default<br class="">ppool  sharesmb              off                    default<br class="">ppool  refquota              none                   default<br class="">ppool  refreservation        none                   default<br class="">ppool  primarycache          all                    default<br class="">ppool  secondarycache        all                    default<br class="">ppool  usedbysnapshots       0                      -<br class="">ppool  usedbydataset         96K                    -<br class="">ppool  usedbychildren        5.16T                  -<br class="">ppool  usedbyrefreservation  0                      -<br class="">ppool  logbias               latency                default<br class="">ppool  dedup                 off                    default<br class="">ppool  mlslabel              none                   default<br class="">ppool  sync                  standard               local<br class="">ppool  refcompressratio      1.00x                  -<br class="">ppool  written               96K                    -<br class="">ppool  logicalused           445G                   -<br class="">ppool  logicalreferenced     9.50K                  -<br class="">ppool  filesystem_limit      none                   default<br class="">ppool  snapshot_limit        none                   default<br class="">ppool  filesystem_count      none                   default<br class="">ppool  snapshot_count        none                   default<br class="">ppool  redundant_metadata    all                    default</span><br class=""><br class=""></div><div class="">Currently, <span style="font-family:monospace,monospace" class="">ppool</span> contains a single 5TB zvol that I am hosting as an iSCSI block device. At the zdev level, I have ensured that the ashift is 12 for all devices, all physical devices are 4k-native SATA, and the cache/log SSDs are also set for 4k. The block sizes are manually set in <span style="font-family:monospace,monospace" class="">sd.conf</span>, and confirmed with "<span style="font-family:monospace,monospace" class="">echo ::sd_state | mdb -k | egrep '(^un|_blocksize)'</span>". The zvol blocksize is 4k, and the iSCSI block transfer size is 512B (not that it matters).<br class=""><br class=""></div><div class="">All drives contain a single Solaris2 partition with an EFI label, and are properly aligned:<br class=""><span style="font-family:monospace,monospace" class="">format> verify<br class=""><br class="">Volume name = <        ><br class="">ascii name  = <ATA-ST3000DM001-1CH1-CC27-2.73TB><br class="">bytes/sector    =  512<br class="">sectors = 5860533167<br class="">accessible sectors = 5860533134<br class="">Part      Tag    Flag     First Sector          Size          Last Sector<br class="">  0        usr    wm               256         2.73TB           5860516750   <br class="">  1 unassigned    wm                 0            0                0<br class="">  2 unassigned    wm                 0            0                0<br class="">  3 unassigned    wm                 0            0                0<br class="">  4 unassigned    wm                 0            0                0<br class="">  5 unassigned    wm                 0            0                0<br class="">  6 unassigned    wm                 0            0                0<br class="">  8   reserved    wm        5860516751         8.00MB           5860533134 </span><br class=""></div><div class=""><br class=""></div><div class="">I scrubbed the pool last night, which completed without error. From "<span style="font-family:monospace,monospace" class="">zdb ppool</span>", I have extracted (with minor formatting):<br class=""></div><div class=""><span style="font-family:monospace,monospace" class=""><br class="">                             capacity  operations   bandwidth  ---- errors ----<br class="">description                used avail  read write  read write  read write cksum<br class="">ppool                      339G 7.82T 26.6K     0  175M     0     0     0     5<br class="">  mirror                   113G 2.61T 8.87K     0 58.5M     0     0     0     2<br class="">    /dev/dsk/c4t5000C5006A597B93d0s0  3.15K     0 48.8M     0     0     0     2<br class="">    /dev/dsk/c4t5000C500653DEE58d0s0  3.10K     0 49.0M     0     0     0     2<br class="">  <br class="">  mirror                   113G 2.61T 8.86K     0 58.5M     0     0     0     8<br class="">    /dev/dsk/c4t5000C5006A5903A2d0s0  3.12K     0 48.7M     0     0     0     8<br class="">    /dev/dsk/c4t5000C500653DE049d0s0  3.08K     0 48.9M     0     0     0     8<br class="">  <br class="">  mirror                   113G 2.61T 8.86K     0 58.5M     0     0     0    10<br class="">    /dev/dsk/c4t5000C5003607D87Bd0s0  2.48K     0 48.8M     0     0     0    10<br class="">    /dev/dsk/c4t5000C5003604ED37d0s0  2.47K     0 48.9M     0     0     0    10<br class="">  <br class="">  log mirror              44.0K  222G     0     0    37     0     0     0     0<br class="">    /dev/dsk/c4t5002538870006917d0s0      0     0   290     0     0     0     0<br class="">    /dev/dsk/c4t500253887000690Dd0s0      0     0   290     0     0     0     0<br class="">  Cache<br class="">  /dev/dsk/c4t50026B723A07AC78d0s0<br class="">                              0 73.8G     0     0    35     0     0     0     0<br class="">  Spare<br class="">  /dev/dsk/c4t5000C500653E447Ad0s0        4     0  136K     0     0     0     0</span><br class=""><br class=""></div><div class="">This shows a few checksum errors, which is not consistent with the output of "<span style="font-family:monospace,monospace" class="">zfs status -v</span>", and "<span style="font-family:monospace,monospace" class="">iostat -eE</span>" shows no physical error count. I again see the discrepancy between the "<span style="font-family:monospace,monospace" class="">ppool</span>" value and what I would expect, which would be a sum of the <span style="font-family:monospace,monospace" class="">cksum</span> errors for each vdev.<br class=""><br class=""></div><div class="">I also observed a ton of leaked space, which I expect from a live pool, as well as a single:<br class=""><span style="font-family:monospace,monospace" class="">db_blkptr_cb: Got error 50 reading <96, 1, 2, 3fc8> DVA[0]=<1:1dc4962000:1000> DVA[1]=<2:1dc4654000:1000> [L2 zvol object] fletcher4 lz4 LE contiguous unique double size=4000L/a00P birth=52386L/52386P fill=4825 cksum=c70e8a7765:f2a                                         </span><br class=""><span style="font-family:monospace,monospace" class="">dce34f59c:c8a289b51fe11d:7e0af40fe154aab4 -- skipping</span><br class=""></div><div class=""><br class=""><br class=""></div><div class="">By the way, I also found:<br class=""><span style="font-family:monospace,monospace" class=""><br class="">Uberblock:<br class="">        magic = 000000000<b class="">0bab10c</b></span><br class=""><br class=""></div><div class="">Wow. Just wow.<br class=""></div><div class=""><br class=""><br class=""></div><div class="">-Warren V<br class=""></div><div class=""><br class=""></div></div>
_______________________________________________<br class="">OmniOS-discuss mailing list<br class=""><a href="mailto:OmniOS-discuss@lists.omniti.com" class="">OmniOS-discuss@lists.omniti.com</a><br class="">http://lists.omniti.com/mailman/listinfo/omnios-discuss<br class=""></div></blockquote></div><br class=""></body></html>