[OmniOS-discuss] Strange ARC reads numbers

Wed May 28 07:09:50 UTC 2014

26 мая 2014 г. 9:36:02 CEST, Filip Marvan <filip.marvan at aira.cz> пишет:
>Hello,
>
> 
>
>just for information, after two weeks, numbers of ARC assesses came
>back to
>high numbers as before deletion of data (you can see that in
>screenshot).
>
>And I try to delete the same amount of data on different storage
>server, and
>the accesses to ARC droped in the same way as on first pool
>
> 
>
>Interesting.
>
> 
>
>Filip Marvan
>
> 
>
> 
>
> 
>
> 
>
>From: Richard Elling [mailto:richard.elling at richardelling.com] 
>Sent: Thursday, May 08, 2014 12:47 AM
>To: Filip Marvan
>Cc: omnios-discuss at lists.omniti.com
>Subject: Re: [OmniOS-discuss] Strange ARC reads numbers
>
> 
>
>On May 7, 2014, at 1:44 AM, Filip Marvan <filip.marvan at aira.cz> wrote:
>
>
>
>
>
>Hi Richard,
>
> 
>
>thank you for your reply.
>
> 
>
>1. Workload is still the same or very similar. Zvols, which we deleted
>from
>our pool were disconnected from KVM server a few days before, so the
>only
>change was, that we deleted that zvols with all snapshots.
>
>2. As you wrote, our customers are fine for now :) We have monitoring
>of all
>our virtual servers running from that storage server, and there is no
>noticeable change in workload or latencies.
>
> 
>
>good, then there might not be an actual problem, just a puzzle :-)
>
>
>
>
>
>3. That could be the reason, of course. But in the graph are only data
>from
>arcstat.pl script. We can see, that arcstat is reporting heavy read
>accesses
>every 5 seconds (propably some update of ARC after ZFS writes data to
>disks
>from ZIL? All of them are marked as "cache hits" by arcstat script) and
>with
>only few ARC accesses between that 5 seconds periody. Before we deleted
>that
>zvols (about 0.7 TB data from 10 TB pool, which have 5 TB of free
>space)
>there were about 40k accesses every 5 seconds, now there are no more
>than 2k
>accesses every 5 seconds.
>
> 
>
>This is expected behaviour for older ZFS releases that used a
>txg_timeout of
>5 seconds. You should
>
>see a burst of write activity around that timeout and it can include
>reads
>for zvols. Unfortunately, the
>
>zvol code is not very efficient and you will see a lot more reads than
>you
>expect.
>
> -- richard
>
> 
>
>
>
>
>
> 
>
>Most of our zvols have 8K volblocksize (including deleted zvols), only
>few
>have 64K. Unfortunately I have no data about size of the read before
>that
>change. But we have two more storage servers, with similary high ARC
>read
>accesses every 5 seconds as on the first pool before deletion. Maybe I
>should try to delete some data on that pools and see what happen with
>more
>detailed monitoring.
>
> 
>
>Thank you,
>
>Filip
>
> 
>
> 
>
>  _____  
>
>From: Richard Elling [mailto:richard.elling at richardelling.com] 
>Sent: Wednesday, May 07, 2014 3:56 AM
>To: Filip Marvan
>Cc: omnios-discuss at lists.omniti.com
>Subject: Re: [OmniOS-discuss] Strange ARC reads numbers
>
> 
>
>Hi Filip,
>
> 
>
>There are two primary reasons for reduction in the number of ARC reads.
>
>            1. the workload isn't reading as much as it used to
>
>            2. the latency of reads has increased
>
>            3. your measurement is b0rken
>
>there are three reasons...
>
> 
>
>The data you shared clearly shows reduction in reads, but doesn't
>contain
>the answers
>
>to the cause. Usually, if #2 is the case, then the phone will be
>ringing
>with angry customers
>
>on the other end.
>
> 
>
>If the above 3 are not the case, then perhaps it is something more
>subtle.
>The arcstat reads
>
>does not record the size of the read. To get the read size for zvols is
>a
>little tricky, you can
>
>infer it from the pool statistics in iostat. The subtleness here is
>that if
>the volblocksize is 
>
>different between the old and new zvols, then the number of (block)
>reads
>will be different
>
>for the same workload.
>
> -- richard
>
> 
>
>--
>
> 
>
>Richard.Elling at RichardElling.com
>+1-760-896-4422
>
>
>
> 
>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>OmniOS-discuss mailing list
>OmniOS-discuss at lists.omniti.com
>http://lists.omniti.com/mailman/listinfo/omnios-discuss

Reads from L2ARC suggest that this data is being read, but it is not hot enough to stick in the main RAM ARC. Deleting the datasets seemingly caused this data to no longer be read, perhaps because those blocks are no longer referenced by the pool. IIRC in your first post you wrote this happens to your older snapshots, and now it seems that the situation repeats as your system accumulated new snapshots after that mass deletion a few weeks back.

To me it sums up to: "somebody mass-reads your available snapshots". Do you have 'zfssnap=visible' so that $dataset/.zfs directories are always visible (not only upon direct request) and do you have daemons or cronjobs or something of that kind (possibly a slocate/mlocate updatedb job, or an rsync backup) that reads your posix filesystem structure? 

Since it does not seem that your whole datasets are being re-read (exact guess depends on amount of unique data related to l2arc size, of course - and on measurable presence of reads from the main pool devices), regular accesses to just the FS metadata might explain the symptoms. Though backups that do read the file data (perhaps "rsync -c", or tar, or zfs send of any dataset type redone over and over for some reason) and sufficiently small unique data in the snapshots might also fit this explanation.

HTH,
//Jim Klimov
--
Typos courtesy of K-9 Mail on my Samsung Android