I've just spent a while hunting through old emails and todo-lists and patches and my brain, to try to create a fairly complete TODO list of md/raid in linux.
Rather than keeping it to myself, I thought I would let you, my loyal reader, see it too.
It mentions various enhancements including not kicking drives on read-errors, backgroup check/repair, sysfs support, adding devices to linear arrays and fixing particularly involving version-1 superblocks but also improving read-only mode, making 'linear' cope with v.large devices and other things.
Readonly assemblyIt should be possible to assemble an array in 'read-only' mode. In this mode the superblocks would not be updated, and resync would not be started. Read requests would be allowed, but not write requests.
This would allow a raid1 to be assembled as soon as hotplug found any device. As other devices were found they could be added. When either all devices were found, or write access was required, it could be switched to writeable and any required resync or reconstruction could then happen.
Similarly a raid5 could be assembled before the last device was found.
Possibly "mdadm --assemble", which will currently not start a degraded array without "--run" or "--scan" could be change to start it in read-only mode, which is just as safe.
It might be nice for all assembly to be read-only but that normally the first write request will automatically switch to read-write and start off any resync/recovery.
Don't kick drives on read errors
This is probably the most requested enhancement, as drive get bigger, the chance of individual block read errors seems to be increasing.
When a read error is detected, we flag the fact but don't fail the drive. The data is recoverred from elsewhere and we attempt to re-write (if array is not read-only). If the re-write succeeds, everyone is happy, but it should be logged and possibly counted in the superblock.
If the re-write fails we should probably kick the drive and start a reconstruction. We could possibly hold on to the drive an allow reads to succeed until reconstruction finishes (if ever). This means that write errors on different blocks of different devices doesn't completely kill the array.
On a write error, we behave as with a re-write failure.
md already does backgroup resync/recovery. It would be good to also do background read checks.
This would involve reading all blocks of all devices and checking that all redundancy information is consistant. For raid5/raid6 this means checking the parity (and syndrome) blocks are correct. For raid1/raid10 it means checking that all copies are the same.
This has two particular values. One is to check that all blocks can actually be read. The other is to check the consistancy. If errors are found they can be reported and optionally corrected.
md currently has two tunables -- max and min resync speed -- and these apply equally across all devices. It would be nice to allow more tunables (e.g. cache size for raid 5/6) and have them per-array. This would be best done with entries in sysfs.
Linux seems to have lots of ways to report events to user-space. I want to report events like "a drive has failed" directly rather than having to re-read /proc/mdstat all the time. I need to find the best mechanism to do this.
online add-device to linear arrays
It is conceptually straight forward to add a drive to a linear array while the array is online. It needs to be made practically straight forward too.
raid1 write-mostly and write-behind
To support mirroring to a remote device (similar to drdb) it would be good to add 'write-mostly' and 'write-behind' flags to devices in raid1.
'write-mostly' is easy and means the we should never read from this device unless there is no alternative.
'write-behind' is harder. It means "don't wait for the write to complete before returning", and requires the data to be copied before we submit that write request.
Different device-create interfaceCurrently, /dev/mdX needs to exist before mdX can be assembled. This is contrary to the hotplug/udev approach. A different interface that could create mdX through some other window into the kernel would be good.
more transparencyIt would be nice if md were more integrated with dm. It would also be nice if a "normal" device (e.g SCSI drive) could be transparently turned into a raid1 with one disk, that could then be extended.
barriersthe block layer has a concept of 'barriers' and 'flushing' that md understands barely if at all. It should be educated.
version-1 superProbably a few fixes needed here, but particularly:
- difference between devnum and raidnum should be apparent in /proc/mdstat
It might be good to flag spares more obviously.
bypass cache for some raid5/6 reads
Currently reading a raid5 (or raid6) reads the data into the stripe-cache, and then copies into the client buffer. This is an unneeded copy in many cases. We should by-pass the cache in the simple cases where there is no active write on the same stripe, and the array isn't degraded.
read-only supportYou can currently only switch an md array to read-only when no-one is using it. It would be best if this was 'no-one is writing to it' and a 'force readonly even if there a writers' could be useful.
start active degraded raid5 at boot timemd will not currently start an 'active' (i.e. dirty) raid5 if it is degraded. mdadm can re-write the superblock to convince md otherwise, but this is no good for booting from raid5. A kernel parameter to force this is needed.
linear on v.large devicesCurrently on 32bit syhstems, a linear array cannot have componenets bigger than 2terrabytes. this should be relaxed.
linear chunksize of 0 should be allowed'linear' treats 'chunksize' as 'rounding' and reduces the effective size of the device to a multiple of 'chunksize'. We should allow a 'chunksize' of 0 meaning 'no rounding' but we currently don't.
allow quiescent spares
md currently updates the superblock quite frequently in some circumstances (occasionaly write traffic). This includes the superblocks on spares. This means that 'spares' cannot spin-down, should you want that.
We should allow spares to be incorporated into an array even if their event count is old, and then stop updating superblocks on spares when simply switching between active and inactive mode.