Probably the most wanted feature of mdadm is auto-assembly. People want it to just do-the-right-thing. They want to simply be able to assemble all of their arrays without having to worry about creating and maintaining config files or anything like that.
I've always been against blind auto-assembly as it can (and occasionally has) cause problems when the wrong thing gets assembled.
However it is possible to find a middle ground, that isn't completely blind, but that requires minimal configuration effort. I've finally figured out how I want to implement that and scheduled the time to do it, and so it should appear in mdadm-2.5.
The core idea is to report the host name of each raid array. mdadm can then assemble every array that it can find, providing it is for 'this' host.
The chief problem with blind auto-assembly is that if you move the drives comprising a RAID array from one host to another, and if the destination host already has RAID arrays, then the auto-assembly could assemble the new drives rather than the old drives, resulting is a situation that is at least confusing, and may be detrimental.
The other problem that is substantially less significant is that if someone has been experimenting with RAID and created a few different arrays with different devices, then these old arrays might 'magically' reappear, and could get confused with array that should exist.
As mentioned, the solution is simply to store the host name in the superblock, and only to assemble arrays which contain the right host name. This removes the problem of incorrectly assembling arrays imported from another machine.
We also need to check the create time for the array (which is already stored in the superblock) and if there is any ambiguity (multiple arrays with the same name) assemble the most recently created array.
Storing the host name in a version-1 superblock is quite easy. We have a 32 character 'name' field which can be used for whatever mdadm wants. We decide to treat that as having an optional hostname prefix separated from the rest of the name by a ':'. So if the name contains a colon, everything before that is the host name.
Storing the host name in a version-0.90 superblock isn't quite so easy. There isn't really anywhere to store it. While there is some unused space, squeezing a hostname in there is rather ugly.
So instead, we borrow some of the 16 byte uuid. This is normally chosen randomly to ensure that different arrays will have different uuids and make sure bits of different arrays don't get confused. If a hostname is known when an array is created, we will now use 8 bytes of random data, and 8 bytes taken from the SHA1 hash of the host name. This should provide a very similar guarantee of uniqueness, but allow us to tell if an array is intended for a particular machine or not.
Doing it this way does mean that we cannot easily tell what host an array is intended for, but that is a fairly small cost. If you want this functionality, use version-1 superblocks.
So assuming all of your arrays have been tagged by the host name, how is mdadm going to auto-assemble them?
We make multiple passes through the list of available devices. The first superblock we find for an array on 'this' host is put aside and all other superblocks for the same array are put with it. When we have found all available devices for the array, we try to assemble.
While looking, if we find a superblock for a different array (different uuid) but the same name (more on that later) then we check the create time and choose this superblock in place of the current set if this is newer.
When we find a collection of superblocks that form an array, we need to decide what device name to use for assembling the array. For version-0.90 arrays, we use the minor number as stored on the superblock to identify the minor number and device name.
For version-1 arrays, we take the remainder of the name field. If this is numeric, we treat it just like the minor number of version 0.90 arrays. If it is non-numeric, we choose a free minor number, and create device with the given number under /dev/md/.
On problem with introducing this functionality is that people have pre-existing array that aren't tagged with the host name. To help we this we will have a new 'update' option of --assemble: --update=homehost which will update the host information in the superblock prior to assembly. This is usable fairly easily for everything except an array holding a root filesystem. In order to avoid needing to boot from different media, there will be an option that can safely be used from an initramfs which will do the right thing.
This will probably be called --auto-update-home-host.
This option is only meaningful when doing a hostname based auto assembly. If the autoassembly process finds anything to assemble, the option is ignored. However if nothing is found with the right host name, then a second pass is made. On this pass, any md array that is found is updated to belong to the current host, and is automatically assembled.
Thus it should be safe to always run mdadm with --auto-update-home-host in initramfs. It will only do its magic once, and after that the arrays should always assemble properly.
All that is left is to find the host name. Often the host name is stored in a file on the root filesystem. If this is on a raid array, then we won't be able to assemble the array until we know the host name, and cannot find the host name until we assemble the array. To break this deadlock it is recommended that the host name be passed as a kernel parameter via whichever boot loader is being used.