<div dir="ltr">As far as we've been able to tell, zfs replace is a one way street; once you start the replace, there doesn't seem to be a way to cancel it until it is completed. <div><br></div><div>Also, resilvers appear to start from scratch any time anything about the pool changes. Do you have a drive that is flapping offline and coming back, or something like that? Are you getting any messages in /var/adm/messages about disk devices?</div>
<div><br></div><div>Considering the dire appearance of that pool, you might consider trying to boost resilver priority. We found this:</div><div><a href="http://my2ndhead.blogspot.com/2011/03/adjusting-zfs-resilvering-speed.html">http://my2ndhead.blogspot.com/2011/03/adjusting-zfs-resilvering-speed.html</a><br>
</div><div>to work well to improve overall resilver performance (at the cost of pending IO requests from clients), ymmv.</div><div> -nld</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">
On Fri, Aug 30, 2013 at 1:26 PM, "Daniel D. Gonçalves" <span dir="ltr"><<a href="mailto:daniel@dgnetwork.com.br" target="_blank">daniel@dgnetwork.com.br</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
My ZFS POOL is over a month doing RESILVER in LOOP thus ending one RESILVER after a few minutes, another starts.<br>
The replace command is never finished, did three days ago subsitution of a device, and it never ends:<br>
mirror-3 DEGRADED 0 0 28<br>
c17t20d1 ONLINE 0 0 28<br>
replacing-1 DEGRADED 28 0 0<br>
c17t22d1 UNAVAIL 0 0 0 cannot open<br>
c17t13d1 ONLINE 0 0 28 (resilvering)<br>
<br>
<br>
In the mirror belowI would like to remove all devices with status UNAVAIL and do a replace again for a correct device, but the commands OFFLINE, REMOVE, DETACH not work:<br>
mirror-1 DEGRADED 28 0 0<br>
c17t24d1 ONLINE 0 0 28 (resilvering)<br>
replacing-1 UNAVAIL 0 0 0 insufficient replicas<br>
c17t22d1 UNAVAIL 0 0 0 cannot open<br>
c17t12d1 UNAVAIL 0 0 0 cannot open<br>
c17t21d1 UNAVAIL 0 0 0 cannot open<br>
<br>
My entire POOL:<br>
pool: STORAGE01<br>
state: DEGRADED<br>
status: One or more devices is currently being resilvered. The pool will<br>
continue to function, possibly in a degraded state.<br>
action: Wait for the resilver to complete.<br>
scan: resilver in progress since Fri Aug 30 14:42:42 2013<br>
530G scanned out of 18.4T at 227M/s, 23h1m to go<br>
62.1G resilvered, 2.80% done<br>
config:<br>
<br>
NAME STATE READ WRITE CKSUM<br>
STORAGE01 DEGRADED 14 0 16<br>
mirror-0 ONLINE 0 0 0<br>
c17t15d1 ONLINE 0 0 0<br>
c17t19d1 ONLINE 0 0 0<br>
mirror-1 DEGRADED 28 0 0<br>
c17t24d1 ONLINE 0 0 28 (resilvering)<br>
replacing-1 UNAVAIL 0 0 0 insufficient replicas<br>
c17t22d1 UNAVAIL 0 0 0 cannot open<br>
c17t12d1 UNAVAIL 0 0 0 cannot open<br>
c17t21d1 UNAVAIL 0 0 0 cannot open<br>
mirror-2 ONLINE 0 0 0<br>
c17t18d1 ONLINE 0 0 0 (resilvering)<br>
c17t17d1 ONLINE 0 0 0 (resilvering)<br>
mirror-3 DEGRADED 0 0 32<br>
c17t20d1 ONLINE 0 0 32<br>
replacing-1 DEGRADED 32 0 0<br>
c17t22d1 UNAVAIL 0 0 0 cannot open<br>
c17t13d1 ONLINE 0 0 32 (resilvering)<br>
mirror-5 ONLINE 0 0 0<br>
c17t25d1 ONLINE 0 0 0<br>
c17t27d1 ONLINE 0 0 0<br>
mirror-6 ONLINE 0 0 0<br>
c17t26d1 ONLINE 0 0 0<br>
c17t28d1 ONLINE 0 0 0<br>
mirror-7 ONLINE 0 0 0<br>
c17t29d1 ONLINE 0 0 0<br>
c17t31d1 ONLINE 0 0 0<br>
mirror-8 ONLINE 0 0 0<br>
c17t32d1 ONLINE 0 0 0<br>
c17t30d1 ONLINE 0 0 0<br>
mirror-9 ONLINE 0 0 0<br>
c17t23d1 ONLINE 0 0 0<br>
c17t14d1 ONLINE 0 0 0<br>
logs<br>
mirror-4 ONLINE 0 0 0<br>
c14t1d0 ONLINE 0 0 0<br>
c14t3d0 ONLINE 0 0 0<br>
cache<br>
c14t4d0 ONLINE 0 0 0<br>
<br>
<br>
Need urgent help to solve this. I believe it is a bug in ZFS.<br>
<br>
Thanks,<br>
<br>
Daniel<br>
<br>
Em 22/08/2013 17:42, Saso Kiselkov escreveu:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On 8/22/13 9:20 PM, "Daniel D. Gonçalves" wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Thanks Saso,<br>
<br>
To stop RESILVER, which device I to set to OFFLINE?<br>
</blockquote>
The one that says 'resilvering'. But beware that that means that the<br>
pool might not have full fault tolerance.<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I do not know how the device "c17t33d1" was placed in the<br>
MIRROR-11/REPLACING-1, how do I remove it from there?<br>
</blockquote>
If you can, let it run to completion before attempting any further<br>
manipulation. The pool seems to be in quite an unhappy state anyway, so<br>
better not compound the situation by doing more changes. Let the thing<br>
resync back up, find the files that have the data errors in them ("zpool<br>
status -v" I think), restore them or delete them and then post a new<br>
"zpool status" to the list - then we'll see what can be done.<br>
<br>
Above all, be patient if you don't want to lose your data.<br>
<br>
Cheers,<br>
</blockquote>
<br>
______________________________<u></u>_________________<br>
OmniOS-discuss mailing list<br>
<a href="mailto:OmniOS-discuss@lists.omniti.com" target="_blank">OmniOS-discuss@lists.omniti.<u></u>com</a><br>
<a href="http://lists.omniti.com/mailman/listinfo/omnios-discuss" target="_blank">http://lists.omniti.com/<u></u>mailman/listinfo/omnios-<u></u>discuss</a><br>
</blockquote></div><br></div>