<?xml version="1.0" encoding="utf-8"?>
<feed version="0.3" xmlns="http://purl.org/atom/ns#">
<link rel="alternate" type="text/html" href="http://neil.brown.name/blog/SoftRaid"/>

<title>SoftRaid</title>
<modified>2011-06-14T10:17:08Z</modified>
<author></author>
<entry>
<title>Closing the RAID5 write hole</title>
<issued>2011-06-14T10:17:08Z</issued>
<modified>2011-06-14T10:17:08Z</modified>
<id>http://neil.brown.name/blog/20110614101708</id>
<link rel="alternate" type="text/html" href="http://neil.brown.name/blog/20110614101708"/>
<content type="text/html" mode="escaped">
Over a year ago I wrote some thoughts about closing the RAID5 write hole
in an answer to a comment on a blog post:

&lt;p&gt;&lt;a href=&quot;http://neil.brown.name/blog/20090129234603&quot;&gt;http://neil.brown.name/blog/20090129234603&lt;/a&gt; and 
&lt;a href=&quot;http://neil.brown.name/blog/20090129234603-028&quot;&gt;http://neil.brown.name/blog/20090129234603-028&lt;/a&gt;.

&lt;p&gt;I recently had some interest shown in this so I thought it might be
useful to write up some thoughts more coherently and completely.
&lt;p&gt;&lt;a href=http://neil.brown.name/blog/20110614101708&gt;read more...(No comments)&lt;/a&gt;</content>
</entry>
<entry>
<title>Another mdadm release: 3.2.1</title>
<issued>2011-03-28T02:54:07Z</issued>
<modified>2011-03-28T02:54:07Z</modified>
<id>http://neil.brown.name/blog/20110328025407</id>
<link rel="alternate" type="text/html" href="http://neil.brown.name/blog/20110328025407"/>
<content type="text/html" mode="escaped">


&lt;p&gt;Hot on the heals of mdadm-3.1.5 I have just released 3.2.1.

&lt;p&gt;The 3.2 series contains two particular sets of new functionality.

&lt;p&gt;Firstly there is the &amp;quot;policy&amp;quot; framework.  This allows us to set policy for different devices based on where they are connected (e.g. which controller) so that e.g. when a device is hot-plugged it can immediately be made a hot-spare for an array without further operator intervention.  It also allows broader controller of spare-migration between arrays.  It is likely that more functionality will be added to this framework over time

&lt;p&gt;Secondly, the support for Intel Matrix Storage Manager (IMSM) arrays has been substantially enhanced.  Spare migration is now possible as is level migration and OLCE (OnLine Capacity Expansion).  This support is not quite complete yet and requires MDADM_EXPERIMENTAL=1 in the environment to ensure people only use it with care.  In particular if you start a reshape in Linux and then shutdown and boot into Window, the Windows driver may not correctly restart the reshape.  And vice-versa.

&lt;p&gt;If you don't want any of the new functionality then it is probably safest to stay with 3.1.5 as it has all recent bug fixes.  But if you are at all interested in the new functionality, then by all means give 3.2.1 a try.  It should work fine and is no more likely to eat your data than any other program out there.

&lt;p&gt;&lt;p&gt;&lt;a href=http://neil.brown.name/blog/20110328025407&gt;(10 comments)&lt;/a&gt;</content>
</entry>
<entry>
<title>Release of mdadm-3.1.5</title>
<issued>2011-03-23T04:59:10Z</issued>
<modified>2011-03-23T04:59:10Z</modified>
<id>http://neil.brown.name/blog/20110323045910</id>
<link rel="alternate" type="text/html" href="http://neil.brown.name/blog/20110323045910"/>
<content type="text/html" mode="escaped">


&lt;p&gt;The last release of mdadm that I mentioned in this blog was 2.6.1.  As I am now announcing 3.1.5 you can see that I missed a few.  That's OK though as I keep the release announcements in the source distribution so you can always go and read them there.

&lt;p&gt;3.1.5 is just bugfixes.  It is essentially 3.1.4 plus all the bug fixes found while working on 3.2 and 3.2.1.  The list from the release announcement is:

&lt;p&gt;&lt;ul&gt;&lt;li&gt;Fixes for v1.x metadata on big-endian machines.&lt;/li&gt;&lt;/ul&gt;
&lt;ul&gt;&lt;li&gt;man page improvements&lt;/li&gt;&lt;/ul&gt;
&lt;ul&gt;&lt;li&gt;Improve '--detail --export' when run on partitions of an md array.&lt;/li&gt;&lt;/ul&gt;
&lt;ul&gt;&lt;li&gt;Fix regression with removing 'failed' or 'detached' devices.&lt;/li&gt;&lt;/ul&gt;
&lt;ul&gt;&lt;li&gt;Fixes for &amp;quot;--assemble --force&amp;quot; in various unusual cases.&lt;/li&gt;&lt;/ul&gt;
&lt;ul&gt;&lt;li&gt;Allow '-Y' to mean --export.  This was documented but not implemented.&lt;/li&gt;&lt;/ul&gt;
&lt;ul&gt;&lt;li&gt;Various fixed for handling 'ddf' metadata.  This is now more reliable
    but could benefit from more interoperability testing.&lt;/li&gt;&lt;/ul&gt;
&lt;ul&gt;&lt;li&gt;Correctly list subarrays of a container in &amp;quot;--detail&amp;quot; output.&lt;/li&gt;&lt;/ul&gt;
&lt;ul&gt;&lt;li&gt;Improve checks on whether the requested number of devices is supported
    by the metadata - both for --create and --grow.&lt;/li&gt;&lt;/ul&gt;
&lt;ul&gt;&lt;li&gt;Don't remove partitions from a device that is being included in an
    array until we are fully committed to including it.&lt;/li&gt;&lt;/ul&gt;
&lt;ul&gt;&lt;li&gt;Allow &amp;quot;--assemble --update=no-bitmap&amp;quot; so an array with a corrupt
    bitmap can still be assembled.&lt;/li&gt;&lt;/ul&gt;
&lt;ul&gt;&lt;li&gt;Don't allow --add to succeed if it looks like a &amp;quot;--re-add&amp;quot; is probably
    wanted, but cannot succeed.  This avoids inadvertently turning
    devices into spares when an array is failed.&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;As you can see - lots of little bits and pieces.

&lt;p&gt;I hope to release 3.2.1 soon.  For people who want to use the Intel metadata format (Intel Matrix Storage Manager - IMSM) on Intel motherboards which have BIOS support and MS-Windows support, you should probably wait for 3.2.1.  For anyone else, 3.1.5 is what you want.

&lt;p&gt;3.2.1 should be released soonish.  I probably won't even start on 3.2.2 for a couple of months, though I already have a number of thoughts about what I want to include.  A lot of it will be cleaning up and re-organising the code:  stuff I wanted to do for 3.2 but ran out of time.

&lt;p&gt;As always, mdadm can be found via git at &lt;a href=&quot;git://neil.brown.name/mdadm/&quot;&gt;git://neil.brown.name/mdadm/&lt;/a&gt; or from
&lt;a href=&quot;http://www.kernel.org/pub/linux/utils/raid/mdadm/&quot;&gt;http://www.kernel.org/pub/linux/utils/raid/mdadm/&lt;/a&gt;.

&lt;p&gt;&lt;p&gt;&lt;a href=http://neil.brown.name/blog/20110323045910&gt;(No comments)&lt;/a&gt;</content>
</entry>
<entry>
<title>Off-the-road-map: Data checksums</title>
<issued>2011-02-27T11:42:01Z</issued>
<modified>2011-02-27T11:42:01Z</modified>
<id>http://neil.brown.name/blog/20110227114201</id>
<link rel="alternate" type="text/html" href="http://neil.brown.name/blog/20110227114201"/>
<content type="text/html" mode="escaped">

&lt;p&gt;Among the responses I received to my recent post of a development road-map for md/raid were some suggestions for features that I believe are wrong and should not be implemented.  So rather than being simple ommisions, they are deliberate exclusions.  On of these suggestions in the idea of calculating, storing, and checking a checksum of each data  block.

&lt;p&gt;Checksums are in general a good idea.  Whether it is a simple parity bit, an ECC, a CRC or a full cryptographic hash, a checksum can help detect single bit and some multi-bit errors and stop those error propagating further into a system.  It is generally better to know that you have lost some data rather than believe that some wrong data is actually good, and checksums allow you to do that.

&lt;p&gt;So I am in favour of checksum in general, but I don't think it is appropriate to sprinkle them around everywhere and in particular I don't think that it is the role of md to manage checksums for all data blocks.

&lt;p&gt;To make this belief more concrete, I see that there are two classes of places where checksums are important.  I call these &amp;quot;link checksums&amp;quot; and &amp;quot;end-to-end checksums&amp;quot;.
&lt;p&gt;&lt;a href=http://neil.brown.name/blog/20110227114201&gt;read more...(No comments)&lt;/a&gt;</content>
</entry>
<entry>
<title>MD/RAID road-map 2011</title>
<issued>2011-02-16T04:40:02Z</issued>
<modified>2011-02-16T04:40:02Z</modified>
<id>http://neil.brown.name/blog/20110216044002</id>
<link rel="alternate" type="text/html" href="http://neil.brown.name/blog/20110216044002"/>
<content type="text/html" mode="escaped">

&lt;p&gt;It is about 2 years since I last published a
&lt;a href=&quot;http://neil.brown.name/blog/20090129234603&quot;&gt;road-map&lt;/a&gt;
for md/raid
so I thought it was time for another one.  Unfortunately quite a few
things on the previous list remain undone, but there has been some
progress.

&lt;p&gt;I think one of the problems with some to-do lists is that they aren't
detailed enough.  High-level design, low level design, implementation,
and testing are all very different sorts of tasks that seem to require
different styles of thinking and so are best done separately.  As
writing up a road-map is a high-level design task it makes sense to do
the full high-level design at that point so that the tasks are
detailed enough to be addressed individually with little reference to
the other tasks in the list (except what is explicit in the road map).

&lt;p&gt;A particular need I am finding for this road map is to make explicit
the required ordering and interdependence of certain tasks.  Hopefully
that will make it easier to address them in an appropriate order, and
mean that I waste less time saying &amp;quot;this is too hard, I might go read
some email instead&amp;quot;.

&lt;p&gt;So the following is a detailed road-map for md raid for the coming
months.
&lt;p&gt;&lt;a href=http://neil.brown.name/blog/20110216044002&gt;read more...(8 comments)&lt;/a&gt;</content>
</entry>
<entry>
<title>A talk on dm/md convergence</title>
<issued>2010-09-08T07:20:29Z</issued>
<modified>2010-09-08T07:20:29Z</modified>
<id>http://neil.brown.name/blog/20100908072029</id>
<link rel="alternate" type="text/html" href="http://neil.brown.name/blog/20100908072029"/>
<content type="text/html" mode="escaped">


&lt;p&gt;I know that slides from a talk tend to raise more questions than they answer as all the discussion is missing.  But maybe raising questions is good...

&lt;p&gt;Anyway, here are  the slides of a talk I gave in July about possiblies of convergence between md and dm.

&lt;p&gt;Enjoy ... or not.

&lt;p&gt;&lt;a href=&quot;http://neil.brown.name/blog-files/201009/08072029/converge.odp&quot;&gt;converge.odp&lt;/a&gt;

&lt;p&gt;&lt;p&gt;&lt;a href=http://neil.brown.name/blog/20100908072029&gt;(2 comments)&lt;/a&gt;</content>
</entry>
<entry>
<title>Design notes for a bad-block list in md/raid</title>
<issued>2010-05-19T04:37:30Z</issued>
<modified>2010-05-19T04:37:30Z</modified>
<id>http://neil.brown.name/blog/20100519043730</id>
<link rel="alternate" type="text/html" href="http://neil.brown.name/blog/20100519043730"/>
<content type="text/html" mode="escaped">

&lt;p&gt;I'm in the middle of (finally) implementing a bad block list for Linux
md/raid, and I find that the motivation and the desired behaviour
isn't (or wasn't) quite as obvious as I expected.  So now that I think
I have sorted it out, it seems sensible to write it up so that you, my
faithful reader, can point out any glaring problems.

&lt;p&gt;The bad block list is simply a list of blocks - one list for each
device - which are to be treated as 'bad'.  This does not include any
relocation of bad blocks to some good location.  That might be done by
the underlying device, but md doesn't do it.  md just tracks which
blocks are bad and which, by implication, are good.

&lt;p&gt;The difficulty comes in understanding exactly what &amp;quot;bad&amp;quot; means, why we
need to record badness, and what to do when we find that we might want
to perform IO against a recorded bad block.

&lt;p&gt;&lt;p&gt;&lt;a href=http://neil.brown.name/blog/20100519043730&gt;read more...(23 comments)&lt;/a&gt;</content>
</entry>
<entry>
<title>Smart or simple RAID recovery??</title>
<issued>2010-02-11T05:03:55Z</issued>
<modified>2010-02-11T05:03:55Z</modified>
<id>http://neil.brown.name/blog/20100211050355</id>
<link rel="alternate" type="text/html" href="http://neil.brown.name/blog/20100211050355"/>
<content type="text/html" mode="escaped">

&lt;p&gt;I frequently see comments, particularly on the linux-raid mailing list
to the effect that md should be more clever when recovering from an
inconsistent stripe in an array.

&lt;p&gt;In particular, it is suggested that for a RAID1 with more than 2
devices, a vote should be held and if one content occurs more often
than the others (e.g. 2 devices have the same content, the third is
different) then the majority vote should rule and the most common
content be copied over the less common content.

&lt;p&gt;Similarly with RAID6 if the P and Q blocks don't match the data
blocks, it may be possible to find exactly one data block which can be
corrected so as to make both P and Q match - so we could change just
one data block instead of two &amp;quot;parity&amp;quot; blocks to achieve consistency.

&lt;p&gt;I will call this approach the &amp;quot;Smart recovery&amp;quot; approach.

&lt;p&gt;The assertion is that smart recovery will not only make the stripe
consistent, but will also make it &amp;quot;correct&amp;quot;.

&lt;p&gt;I do not agree with these comments.  It is my position that if there
is an inconsistency that needs to be corrected then it should be
corrected in a simple predictable way and that any extra complexity is
unjustified.   For RAID1, that means copying to first block over all
the others.  For RAID6, that means calculating new P and Q blocks
based on the data.  This is the &amp;quot;simple recovery&amp;quot; approach.

&lt;p&gt;This note is an attempt to justify this position, both to myself and
to you, my loyal reader.
&lt;p&gt;&lt;a href=http://neil.brown.name/blog/20100211050355&gt;read more...(6 comments)&lt;/a&gt;</content>
</entry>
<entry>
<title>Converting RAID5 to RAID6 and other shape changing in md/raid</title>
<issued>2009-08-17T00:09:31Z</issued>
<modified>2009-08-17T00:09:31Z</modified>
<id>http://neil.brown.name/blog/20090817000931</id>
<link rel="alternate" type="text/html" href="http://neil.brown.name/blog/20090817000931"/>
<content type="text/html" mode="escaped">
Back in early 2006 md/raid5 gained the ability to increase the number of devices in a RAID5,
thus making more space available.  As you can imagine, this is a slow process as every
block of data (except possibly those in the first stripe) needs to be relocated.  i.e
they need to be read from one place and written to another.  md/raid5 allows this reshaping to
happen while the array is live.  It temporarily blocks access to a few stripes at a time while
those stripes a rearranged.  So instead of the whole array being unavailable for several hours,
little bits are unavailable for a fraction of a second each.

&lt;p&gt;Then in early 2007 we gained the same functionality for RAID6.  This was no more complex than
RAID5, it just involved a little more code and testing.

&lt;p&gt;Now, in mid 2009, we have most of the rest of the reshaping options that had been planned.
These include changing the stripe size, changing the layout (i.e. where the parity blocks get stored) 
and reducing the number of devices.

&lt;p&gt;Changing the layout provides valuable functionality as it is an important part of converting a RAID5 
to a RAID6.
&lt;p&gt;&lt;a href=http://neil.brown.name/blog/20090817000931&gt;read more...(111 comments)&lt;/a&gt;</content>
</entry>
<entry>
<title>Road map for md/raid driver - sort of</title>
<issued>2009-01-29T23:46:03Z</issued>
<modified>2009-01-29T23:46:03Z</modified>
<id>http://neil.brown.name/blog/20090129234603</id>
<link rel="alternate" type="text/html" href="http://neil.brown.name/blog/20090129234603"/>
<content type="text/html" mode="escaped">
In mid-December 2008 I wrote a bit of a &amp;quot;road-map&amp;quot; containing some of my thoughts about development work that could usefully be on on the MD/RAID driver in the Linux kernel.  Some of it might get done.  Some of it might not.  It is not a promise at all, more of a discussion starter in case people want to encourage features or suggest different features.

&lt;p&gt;But I really should put this stuff in my blog so, 6 weeks later, here it is.

&lt;p&gt;&lt;p&gt;&lt;a href=http://neil.brown.name/blog/20090129234603&gt;read more...(33 comments)&lt;/a&gt;</content>
</entry>
<entry>
<title>mdadm 2.6.1 released</title>
<issued>2007-02-22T04:22:26Z</issued>
<modified>2007-02-22T04:22:26Z</modified>
<id>http://neil.brown.name/blog/20070222042226</id>
<link rel="alternate" type="text/html" href="http://neil.brown.name/blog/20070222042226"/>
<content type="text/html" mode="escaped">


&lt;p&gt;Yes, I forgot to announce 2.6 here, sorry about that.

&lt;p&gt;2.6.1 is just some minor bug fixes.  The release is motivated primarily by the fact that I have 
implemented raid6 reshape (i.e. add one or more devices to a raid6 while online).  For the moment
you need to collect patches from the linux-raid mailing list or wait for the next -mm release.
They will hopefully be in 2.6.21-rc2.  Earlier versions of mdadm can start a raid6 reshape with a new kernel,
but there is one small case where it didn't quite do the right thing so I wanted to get that fix out.

&lt;p&gt;2.6 introduced --incremental mode.  This is intended for interfacing with 'udev'.  When a new device is
discoverred it is passed to &amp;quot;mdadm --incremental&amp;quot; and mdadm tries to include it in an md array if that is
appropriate.  As soon as all devices become available, the array is ready.  Of course if one device
is missing, we have a problem. Do we start the array degraded as soon as possible, or wait for the
missing device to appear, possible waiting forever...  No go answers to this question yet.  mdadm allows
you to try either.

&lt;p&gt;&lt;p&gt;&lt;a href=http://neil.brown.name/blog/20070222042226&gt;(41 comments)&lt;/a&gt;</content>
</entry>

</feed>

