(original article)

Re: TODO list for mdadm

08 July 2007, 07:00 UTC

As someone that finds the software you've written to be one of the 'cooler' things about linux, I just wanted to say thanks and put in a question or two about future capabilities of mdadm.

As of 2.6.21 (or was it 2.6.22?), raid5 and raid6 are supported with '--grow' functionality. I know raid5 -> raid6 migration isn't there yet, but I'm looking forward to it [having2 drives 'fail' (or at least flake-out) at once (out of 16) has happened to me before].

The question I haven't seen asked too often is what about raid5 / raid6 '--shrink' functionality?

Why bother, you ask? Well, right now, if I have (let's say) a pair of 16-way raid5 arrays (using mdadm! :) ) I'm kinda stuck with whatever file-systems I chose originally. In the first case, I chose ext3, in the second I chose ntfs-3g. The problem, as you can perhaps see, is that right now, the machine is literally 'full'. I can't really change either array without deleting/recreating half the (9TB of) data, so I can't migrate either direction. If linux/mdadm supported a shrink operation, then I could get one extra hdd of member-drive size, then copy 1/15 of the data off the array to the standalone hdd and shrink the array by one, recovering the drive, then use that to grow the other array and put the data onto it, repeating until I had the array layout I wanted without [necessarily] having to destroy/force-restore anything from backup.

The other thing I have been thinking about is raid version-1 superblocks.

mdadm (i'm a few versions out of date) doesn't seem to have a clean "generate me an mdadm.conf file that should work to mount the currently mounted drives on next bootup" option. It doesn't export the sub-version of the version-1 superblocks [justsays metadata=1] which fedora linux does not necessarily recognize as meaning "1.1" (in my case). I've seen articles that discuss using mdadm to recover / transfer a raid array from a failed PC to another, but the overly-generic "1" vs "1.x" output of mdadm was a stumbling block to that type of operation when I did it. :-) I ended up using 1.1 superblocks since they were "at the beginning" of the drive, which mdadm --auto recognized okay.

However, that kinda bit me, too. [asad admission on my part]. I tried to script a backup operation for the superblocks of each member drive to individual files [iwas attempting to restore an array that inexplicably failed on reboot due to two drives missing]. I ended up doing the lamentable: 'mdadm --examine /dev/sdx1 >> /dev/sdx1' (instead of the intended 'mdadm --examine /dev/sdx1 >> file-called-sdx1'). and so I ended up with a textual representation of the superblock instead of the superblock itself.

So, the questions that came out of that sad mistake were:

*Can the existing 'way it's done' be amended to allow for version 1.3 superblocks? I'm thinking the '.3' could mean: Put copies of the superblock at *all three* locations (1.0, 1.1, and 1.2). That way, if the 1.1/1.2 ones are lost to user error, there's still a functional one hiding at 1.0's 'end of drive' location. I know lots of file systems do a 'multiple copies of the FAT' for such emergencies, and I can't think of a great reason why not to do it for raid arrays, too, since they're, oh, say ... 16x as important as any other disk. :)

*The superblock formats are pretty easy to understand, after one reads the source code of mdadm a few times (although the documentation could definitely use a human-readable offset-means-this table). But what they don't store, and even linux sysfs doesn't seem to easily expose, is the associated 'serial number' of the member drives. udev has this info, at least for usb, ide, and sata devices. It would have come in handy to know when I was trying to figure out which array slot went with each drive. Granted, mine was a somewhat unfortunate series-of-events sort of problem, but having more info is seldom bad. :)

Anyway, those are some things I've had running around in my head for a while now.