February 7, 2022

The Foundation sponsored feature reflows existing data to rewrite it onto a new arrangement of disks thereby freeing space at the end of the logical RAID-Z group

The FreeBSD Foundation funded the project to ensure the completion and release of an easy-to-use and practical application. The project came in under budget despite delays caused by the pandemic. The feature was developed by Matthew Ahrens and is now completed but not yet integrated.

The purpose of this overview is to introduce the feature and explain how it works.

How it works

The feature reflows existing data, essentially rewriting it onto a new arrangement of disks – meaning the original group plus a newly added disk. In so doing, a new adjacent chunk of free space is created at the end of the logical RAID-Z group and thus at the end of each physical disk. 

The reflowed data retains the original logical stripe width, meaning the ratio of data to parity will remain the same. Newly written data will use the new logical stripe width with an improved data to parity ratio.  The reflow happens online, while other zfs and zpool operations are also in progress.

Expansions can be made multiple times. However, pools must be healthy and if a disk were to die, the process will pause to allow for it to be reconstructed. It works with RAIDZ-1/2/3.

Below are two figures illustrating the difference in how traditional expansion of RAID 4/5/6 works (Fig 1) versus how this new RAID-Z Expansion Feature works (Fig2).

Figure 1:

Reflowing data doesn’t change or read block pointers. Reads and writes are performed sequentially. Spacemaps reveal what data needs to be copied. Last but not least, each logical stripe is independent making it unnecessary to know where parity is. Segments are still stored on different disks which preserves redundancy.

Figure 2: 

How to use it:

Start with the command:

+ zpool attach test raidz2-0 /var/tmp/6

In this command example zpool attach” means attach a disk to the pool and “test” is the name of the pool. The name of the existing raidz vdev is “raidz2-0” and “/var/tmp/6” is the name of the new disk. That looks a little different than disk names in production. The point here is to show placement in the command line for the actual name of the disk.

Figure 3

Expect it to take a while as it has to move the data around on the disks.  This line will report the progress it is making for you:

“raidz expand: Expansion of vdev 0 in progress since Wed Jun  9 16:36:19 2021

    444M copied out of 4.29G at 22.2M/s, 10.12% done, 0h2m to go”

It will also tell you when the action is completed: 

“raidz expand: Expansion of vdev 0 copied 4.27G in 0h3m, completed on Wed Jun  9 16:39:31 2021”

Now more space is available:

Figure 4

The status report

While all capabilities of this feature have been implemented and all tests so far have been passed, there are still a few loose ends to tie up. Specifically, there is some code cleanup to do, some verbose logging to remove, some code documentation to write, and similar relatively minor tasks. We aim for this to be integrated by Q3.

But the biggest need for additional help is in testing the feature further and in doing a code review before this feature can be integrated.