[OmniOS-discuss] [zfs] SSD alignment, EFI label rpool support

Jim Klimov jimklimov at cos.ru
Thu Aug 1 12:42:35 UTC 2013


On 2013-08-01 12:39, Richard Elling wrote:
> The ZFS label already reserves 8KB of space at the front so that it will
> not clobber an SMI label.
> The actual data use begins at a 4MB offset, past the ZFS labels and
> reserved space.
>
> In other words, why would you purposefully misalign?


Well, technically - this is correct, except that this applies to
offsets within the the "device" which you gave to ZFS to be a leaf
component of the pool. This device may be a classic Solaris slice
in a SMI label-table, possibly in MBR-backed partitions on x86, or
a whole partition (in either MBR or EFI definitions), or a file -
just to be complete.

What matters is that this container (slice/partition) usually does
not start at HDD sector 0 (and as history has shown, complex devices
such as "old-windows-compatible 4K AF drives" or RAID0-backed JBOD
LUNs or anything else may lead to the logical 0-offset not being
physically well aligned with hardware sectors either).

The same applies to cases where you give ZFS a "whole disk" and
it creates a EFI partition table according to its rules, and marks
"whole-disk usage" in the pool labels - but otherwise this is an
ordinary partition table (properly aligned by default, in belief
that physical 0 == logical 0).

I believe Paul's question, just like any question of this sort,
regarded the possible need to realign his partitions - or perhaps
a way to verify that they are aligned. In case of SSD, there is a
fresh twist regarding page size vs. erase-block size (and not yet
asked - a recommendation about recommended ZFS minblocksize for
such devices).

Now, since 512k is divisible by 8k, an offset of 512k or 1024k
for the partition which should contain the rpool should be good
for both types of alignment in question (note the next paragraph
though). While it may indeed be problematic to carve disks with
such precision via fdisk/format, one can use the command-line
"parted" to manage disk partitions, including MBR-style ones.
When the MBR partition for the rpool with the desired offset
(and Solaris or maybe Solaris2 type) is made, it can be sliced
with "format" in order to designate a container for rpool.
I believe, manually prepared partitions like this can also be
used in the Caiman installer, so you don't have to fuss with
"format" (the installer will overwrite your slicing anyway).

Note that in my sample box which I glanced at while writing this,
the zeroth "cylinder" (16065 * 512-byte "blocks" or 7.84MB) is
reserved on x86 for "boot", and the rpool starts at cylinder
number 1. This may mean that for proper alignment of the rpool,
its MBR partition may have to start at, for example, 8Mb-7.84Mb
or 16384 - 16065 512b-"blocks" (legacy "sectors", as still used
in partitioning terminology), give or take one ;) This way the
rpool's slice 0 would start at the physical device's logical
sector 16384 which is hopefully properly aligned for the IOs,
and ZFS's 4Mb offset further into that would not contradict
anything.

Note that I've picked 8Mb rather arbitrarily, as a multiple of
1Mb next after this "cylinder" size. The classical MBR layout
does only reserve 63 sectors (and yes, the "tracks" have odd
sizes) before the first partition, which is what bootloaders
should be able to cope with. In my example I give ample room -
over 300 sectors ;) Some software (i.e. for low-level disk
archiving) may complain about offsets which are not whole
tracks, but otherwise this is quite usable.

HTH,
//Jim Klimov



More information about the OmniOS-discuss mailing list