TODO list for mdadm

27 July 2005, 14:31 UTC

I not only have a TODO list for linux/md/raid, but for mdadm -- the userspace md management tool -- too.

It is mostly focussed on getting 2.0 ready for release, but there are some bits that can wait until after 2.0

It includes a test-suite, a '--hostid' flag to tie arrays to host and make automatic assembly more possible, and improvements to support for version-1 superblocks.

# testsuite
# better error messages
# bitmap with version-1 superblock
# convert a single device into a linear
# hostid
# --assume-clean for --create
# have --add remove the device first if it is failed
# Version-1 superblocks
# check and remove old superblocks on create
# Comments...

testsuite

I really need a testsuite for mdadm. It does too many things and has broken too often for me to be confident in a new release without one.

This will probably involve creating some loop-back devices and building lots of different arrays with them.

It needs to test version0.90 and version1 superblocks, raid0, 1, 4, 5, 6, 10, linear, multipath, and various flavours of these (particularly raid10).

It needs to auto-create names in /dev.

It needs to work with config files and without

All different options in all different modes need to be tested.

Finding the right structure for it all will be the hard bit.

better error messages

mdadm should never simply report an error code from the kernel, as the codes were not designed for md. It should always interpret and try to explain what really is going wrong. With --verbose it should sometimes give even more detail.

bitmap with version-1 superblock

When creating a verion-1 array, we should leave room for an internal superblock, and allow it to be added afterwards.

convert a single device into a linear

When creating a linear, it would be good to preserve all the data on the first drive. As the superblock will overwrite some it it, that bit should be copied to the second drive. But the user should be able to over-ride this.

hostid

This is a biggy.

The reason I don't like auto-discovery and auto-assembly is that if you pull drives from one machine and plug them into another, they might get autoassembled in the wrong place.

The fix for this is to encode something about the host machine into the superblock so that 'foreign' devices can be recognised. A 'hostname' would be ideal, if this were available.

--hostid=XXX would, on create, encode XXX, or a hash thereof into the UUID or NAME field of the superblock, and on assemble would look for only those uuids.

--assume-clean for --create

This currently works for --build. It is rather dangerous for --create, but somepeople want it for testing, so maybe....

have --add remove the device first if it is failed

This is what people seem to expect. Either we do it, or we give a better error message.

Version-1 superblocks

There are currently some oddities when reporting on version-1 superblocks as spares are handled differently, and the position in the array is different from the role in the RAID. This all needs to be resolved.

check and remove old superblocks on create

When creating an array we need to check for old superblcoks of all versions, and remove any that might confuse future assembly.




Comments...

Re: TODO list for mdadm (21 February 2007, 12:10 UTC)

Another todo item for you... The documentation needs to be updated and/or reviewed.

I have come across two issues in 2.5.6, and have checked the 2.6 documentation and they are still there.

1. The documentation refers to --raid_devices as the option for grow/build/create, whereas mdadm actually accepts --raid-disks

2. The documentation and `mdadm --grow --help` state that -n can only be used to change the number of devices on a RAID1 array, not on RAID 4/5/6. However, anecdotal evidence from the net seems to imply that it can be used for (at least) RAID 5. I hope that it can also be used for RAID 6 ;-)

In the documentation it states that --level is "Not yet supported with --grow." Is there any plan to allow that to work in the future? I am primarily interested in whether or not it will be possible to migrate from RAID5 to RAID6.

[permalink][hide]

Re: TODO list for mdadm (22 February 2007, 01:40 UTC)

Yes, there is undoubtedly room for improving the documentation. Suggestions are alway welcome.

Both --raid-disks and --raid-devices are accepted by mdadm. Maybe this should be made clearer?

I have updated the --help text for --grow. The new text will be in 2.6.1. Thanks.

[permalink][hide]

Re: TODO list for mdadm (22 February 2007, 01:46 UTC)

I forget to add:

Growing a raid6 (Adding one or more disks) will be supported in linux 2.6.21 (if all goes as planned). Growing a raid5->raid6 is somewhat more complex.. probably possible within the next 6 months though.

[permalink][hide]

Re: TODO list for mdadm (08 July 2007, 07:00 UTC)

As someone that finds the software you've written to be one of the 'cooler' things about linux, I just wanted to say thanks and put in a question or two about future capabilities of mdadm.

As of 2.6.21 (or was it 2.6.22?), raid5 and raid6 are supported with '--grow' functionality. I know raid5 -> raid6 migration isn't there yet, but I'm looking forward to it [having2 drives 'fail' (or at least flake-out) at once (out of 16) has happened to me before].

The question I haven't seen asked too often is what about raid5 / raid6 '--shrink' functionality?

Why bother, you ask? Well, right now, if I have (let's say) a pair of 16-way raid5 arrays (using mdadm! :) ) I'm kinda stuck with whatever file-systems I chose originally. In the first case, I chose ext3, in the second I chose ntfs-3g. The problem, as you can perhaps see, is that right now, the machine is literally 'full'. I can't really change either array without deleting/recreating half the (9TB of) data, so I can't migrate either direction. If linux/mdadm supported a shrink operation, then I could get one extra hdd of member-drive size, then copy 1/15 of the data off the array to the standalone hdd and shrink the array by one, recovering the drive, then use that to grow the other array and put the data onto it, repeating until I had the array layout I wanted without [necessarily] having to destroy/force-restore anything from backup.

The other thing I have been thinking about is raid version-1 superblocks.

mdadm (i'm a few versions out of date) doesn't seem to have a clean "generate me an mdadm.conf file that should work to mount the currently mounted drives on next bootup" option. It doesn't export the sub-version of the version-1 superblocks [justsays metadata=1] which fedora linux does not necessarily recognize as meaning "1.1" (in my case). I've seen articles that discuss using mdadm to recover / transfer a raid array from a failed PC to another, but the overly-generic "1" vs "1.x" output of mdadm was a stumbling block to that type of operation when I did it. :-) I ended up using 1.1 superblocks since they were "at the beginning" of the drive, which mdadm --auto recognized okay.

However, that kinda bit me, too. [asad admission on my part]. I tried to script a backup operation for the superblocks of each member drive to individual files [iwas attempting to restore an array that inexplicably failed on reboot due to two drives missing]. I ended up doing the lamentable: 'mdadm --examine /dev/sdx1 >> /dev/sdx1' (instead of the intended 'mdadm --examine /dev/sdx1 >> file-called-sdx1'). and so I ended up with a textual representation of the superblock instead of the superblock itself.

So, the questions that came out of that sad mistake were:

*Can the existing 'way it's done' be amended to allow for version 1.3 superblocks? I'm thinking the '.3' could mean: Put copies of the superblock at *all three* locations (1.0, 1.1, and 1.2). That way, if the 1.1/1.2 ones are lost to user error, there's still a functional one hiding at 1.0's 'end of drive' location. I know lots of file systems do a 'multiple copies of the FAT' for such emergencies, and I can't think of a great reason why not to do it for raid arrays, too, since they're, oh, say ... 16x as important as any other disk. :)

*The superblock formats are pretty easy to understand, after one reads the source code of mdadm a few times (although the documentation could definitely use a human-readable offset-means-this table). But what they don't store, and even linux sysfs doesn't seem to easily expose, is the associated 'serial number' of the member drives. udev has this info, at least for usb, ide, and sata devices. It would have come in handy to know when I was trying to figure out which array slot went with each drive. Granted, mine was a somewhat unfortunate series-of-events sort of problem, but having more info is seldom bad. :)

Anyway, those are some things I've had running around in my head for a while now.

Thanks!

[permalink][hide]

Re: TODO list for mdadm (13 July 2007, 07:02 UTC)

  • shrink for raid4/5/6. Yes, it would be possible, You would need to process each device from the end to the beginning, so it would quite possibly be a lot slower. One difficulty is that there is no protocol for the device to check with the filesystem that smaller is OK. So you could accidentally shrink your array, and lose lots of data. You can already shrink your array with "--grow --size=smallnumber" but if you do that by mistake it is reverable. With "--grow --raid-devices=smallnumber" it wouldn't be. I guess I would just put some smarts in mdadm to check for the common filesystem types.

  • version 1 superblocks and mdadm.conf. Yes, I received a patch recently to fix this. It is in current .git and will be in the next release.

  • version 1.3 superblock with metadata redundancy. Yes, metadata-redundancy is something I have had in the back of my mind for a while. Your suggestion is quite a good one. I might just action it...

  • Using the serial numbers of the drives. Maybe... I guess the per-device UUID could be initialised from this value if it were available. Would that be what you want?

Thanks for the input.

NeilBrown

[permalink][hide]

Re: TODO list for mdadm (20 November 2007, 22:42 UTC)

Hi again,

  • re: version 1.3 superblock with metadata redundancy
I still think this idea has merit. I've been thinking about playing around and seeing if I could hack something together, mostly as an introduction to kernel coding. Is there a source of good documentation (other than just looking at the .h/.c files) on the interplay between mdadm and the kernel raid code? Or on the superblocks themselves? I'm wondering how much of a faux pas it would be to publish the superblock formats, at least, to a place like wikipedia.

  • re: Using the serial numbers of the drives
Putting this info into the UID seems like the simple/easy/no-reason-not-to way to handle it. Ideally, I think it would be a more first-class citizen and have its own dedicated place in the superblock, but the proposed method sounds like low-hanging-fruit.

--- (I'm not sure if the stuff below is already implemented, but I thought I'd mention it)

  • New idea: a "sync-as/take-over-for" command
This command would be for use in place of fail/remove/add. Having a command that told a spare drive to begin syncing up to take over for a specific drive in an array (without degrading the array/taking the original drive offline first) would be handy. As a use case, several times I have noticed a single drive in an array starting to display instability. It doesn't necessarily have to be yet falling off the chain, so to speak, to really need replacement/RMA. However, simple fail/remove semantics would require the array to become "degraded" and resynced fresh, leaving a real window of opportunity for bad things to happen. If mdadm could sync the spare as diskX, then take diskX offline (or back to spare?) when synced, that would keep the safety net intact for the array without exposure.

Thanks again!

[permalink][hide]

Re: TODO list for mdadm (22 November 2007, 04:29 UTC)

  • Documentation of the superblock format Sorry, but we kernel coders have trouble with long words like "Documentation". Little ones like "Code" and "bugs" and even "test" are fine. But "Doc-u-men-ta-ion"... too long :-)

    There is a linux-raid wiki somewhere. Maybe you could try to add stuff there. Or wikipedia if that is your preference. If you send me a pointer I'm happy to tell you which bits are wrong...

  • serial numbers dunno - doesn't really excite me I guess. Maybe if I had a clear idea of the use-case.

  • sync-as Yes, this is probably the most-asked-for feature for md. You'd need to be careful that the partial-synced drive never got accidentally mistaken for the real drive (e.g. crash in the middle of resync). This should be safe enough with the metadata at the end of the device, but if the metadata is at the start and gets copied first...

[permalink][hide]

Re: TODO list for mdadm (11 January 2009, 01:13 UTC)
if this is still being read, thanks a zillion for what is an incredibly powerful and yet pretty easy to use toolset.

One suggestion: i managed to whack one of my arrays today by issuing mdadm <device> --grow --size=<smallnumber>

I see in an earlier post where you mention this can be reversed -- how? I did this on my testbed that has no real data so it doesn't matter now, but I'd like to know just in case.

Also, what do you think about the idea of a size check for this command? i.e. if I have an array where I use 10,000 blocks from each component device, and then I issue:

mdadm /dev/md0 --grow --size=1000

the system could say something like: "grow size given is smaller than the current size of the array. This could be a very bad idea. Use the --force option if you really want to do this"

(or something like that)

[permalink][hide]

Re: TODO list for mdadm (28 January 2010, 10:28 UTC)
thank you so much for spending your time developing raid software, and tools like mdadm. We appreciate your role and participation in free software very very much!

=)

Hello from Canada! mdadm v3.0.2, 4x320GB - Raid5 --felipe1982

[permalink][hide]




[æ]