Neil Brown

19 February 2006, 17:58 UTC

Neil Brown (one of many...)

Hi, you seem to have found my home page. Like most things on the web, it is under construction. As of May 2006 I have a new blog system with room for user comments...

Note that this blog is served through an ADSL link, and as I live in Australia, it isn't blindingly fast - be patient :-)

You can look at my previous home-page if you like, or just browse around here and read the following random musings.



[permalink (2 comments)]


14 June 2011, 10:17 UTCClosing the RAID5 write hole
Over a year ago I wrote some thoughts about closing the RAID5 write hole in an answer to a comment on a blog post:

http://neil.brown.name/blog/20090129234603 and http://neil.brown.name/blog/20090129234603-028.

I recently had some interest shown in this so I thought it might be useful to write up some thoughts more coherently and completely.

read more... (No comments)


28 March 2011, 02:54 UTCAnother mdadm release: 3.2.1

Hot on the heals of mdadm-3.1.5 I have just released 3.2.1.

The 3.2 series contains two particular sets of new functionality.

Firstly there is the "policy" framework. This allows us to set policy for different devices based on where they are connected (e.g. which controller) so that e.g. when a device is hot-plugged it can immediately be made a hot-spare for an array without further operator intervention. It also allows broader controller of spare-migration between arrays. It is likely that more functionality will be added to this framework over time

Secondly, the support for Intel Matrix Storage Manager (IMSM) arrays has been substantially enhanced. Spare migration is now possible as is level migration and OLCE (OnLine Capacity Expansion). This support is not quite complete yet and requires MDADM_EXPERIMENTAL=1 in the environment to ensure people only use it with care. In particular if you start a reshape in Linux and then shutdown and boot into Window, the Windows driver may not correctly restart the reshape. And vice-versa.

If you don't want any of the new functionality then it is probably safest to stay with 3.1.5 as it has all recent bug fixes. But if you are at all interested in the new functionality, then by all means give 3.2.1 a try. It should work fine and is no more likely to eat your data than any other program out there.

[permalink (10 comments)]


23 March 2011, 04:59 UTCRelease of mdadm-3.1.5

The last release of mdadm that I mentioned in this blog was 2.6.1. As I am now announcing 3.1.5 you can see that I missed a few. That's OK though as I keep the release announcements in the source distribution so you can always go and read them there.

3.1.5 is just bugfixes. It is essentially 3.1.4 plus all the bug fixes found while working on 3.2 and 3.2.1. The list from the release announcement is:

As you can see - lots of little bits and pieces.

I hope to release 3.2.1 soon. For people who want to use the Intel metadata format (Intel Matrix Storage Manager - IMSM) on Intel motherboards which have BIOS support and MS-Windows support, you should probably wait for 3.2.1. For anyone else, 3.1.5 is what you want.

3.2.1 should be released soonish. I probably won't even start on 3.2.2 for a couple of months, though I already have a number of thoughts about what I want to include. A lot of it will be cleaning up and re-organising the code: stuff I wanted to do for 3.2 but ran out of time.

As always, mdadm can be found via git at git://neil.brown.name/mdadm/ or from http://www.kernel.org/pub/linux/utils/raid/mdadm/.

[permalink (No comments)]


08 March 2011, 07:47 UTClog segments and RAID6 reshaping

Part of the design approach of LaFS - and any other log structured filesystem - is to divide the device space into relatively large segments. Each segment is many megabytes in size so the time to write a whole segment is much more than the time to seek to a new segment. Writes happen sequentially through a segment, so write throughput should be as high as the device can manage.

(obviously there needs to be a way to find or create segments with no live data so they can be written to. This is called cleaning and will not be discussed further here).

One of the innovations of LaFS is to allow segments to be aligned with the stripes in a RAID5 or RAID6 array so that each segment is a whole number of stripes and so that LaFS knows the details of the layout including chunk size and width (number of data devices).

This allows LaFS to always write in whole 'strips' - where a 'strip' is one block from each device chosen such that they all contribute to the one parity block. Blocks in a strip may not be contiguous (they only are if the chunksize matches the block size), so one would not normally write a single strip. However doing so is the most efficient way to write to RAID6 as no pre-reading is needed. So as LaFS knows the precise geometry and is free with how it chooses where to write, it can easily write just a strip if needed. It can also pad out the write with blocks of NULs to make sure a whole strip is written each time.

Normally one would hope that several strip would be written at once, hopefully a whole stripe or more, but it is very valuable to be able to write whole strips at a time.

This is lovely in theory but in practice there is a problem. People like to make their RAID6 arrays bigger, often by adding one or two devices to the array and "restriping" or "reshaping" the array. When you do this the geometry changes significantly and the alignment of strips and stripes and segments will be quite different. Suddenly the efficient IO practice of LaFS becomes very inefficient.

There are two ways to address this, one which I have had in mind since the beginning, one which only occurred to me recently.

read more... (No comments)


27 February 2011, 11:42 UTCOff-the-road-map: Data checksums

Among the responses I received to my recent post of a development road-map for md/raid were some suggestions for features that I believe are wrong and should not be implemented. So rather than being simple ommisions, they are deliberate exclusions. On of these suggestions in the idea of calculating, storing, and checking a checksum of each data block.

Checksums are in general a good idea. Whether it is a simple parity bit, an ECC, a CRC or a full cryptographic hash, a checksum can help detect single bit and some multi-bit errors and stop those error propagating further into a system. It is generally better to know that you have lost some data rather than believe that some wrong data is actually good, and checksums allow you to do that.

So I am in favour of checksum in general, but I don't think it is appropriate to sprinkle them around everywhere and in particular I don't think that it is the role of md to manage checksums for all data blocks.

To make this belief more concrete, I see that there are two classes of places where checksums are important. I call these "link checksums" and "end-to-end checksums".

read more... (No comments)


16 February 2011, 04:40 UTCMD/RAID road-map 2011

It is about 2 years since I last published a road-map for md/raid so I thought it was time for another one. Unfortunately quite a few things on the previous list remain undone, but there has been some progress.

I think one of the problems with some to-do lists is that they aren't detailed enough. High-level design, low level design, implementation, and testing are all very different sorts of tasks that seem to require different styles of thinking and so are best done separately. As writing up a road-map is a high-level design task it makes sense to do the full high-level design at that point so that the tasks are detailed enough to be addressed individually with little reference to the other tasks in the list (except what is explicit in the road map).

A particular need I am finding for this road map is to make explicit the required ordering and interdependence of certain tasks. Hopefully that will make it easier to address them in an appropriate order, and mean that I waste less time saying "this is too hard, I might go read some email instead".

So the following is a detailed road-map for md raid for the coming months.

read more... (8 comments)


08 September 2010, 07:20 UTCA talk on dm/md convergence

I know that slides from a talk tend to raise more questions than they answer as all the discussion is missing. But maybe raising questions is good...

Anyway, here are the slides of a talk I gave in July about possiblies of convergence between md and dm.

Enjoy ... or not.

converge.odp

[permalink (2 comments)]


19 May 2010, 04:37 UTCDesign notes for a bad-block list in md/raid

I'm in the middle of (finally) implementing a bad block list for Linux md/raid, and I find that the motivation and the desired behaviour isn't (or wasn't) quite as obvious as I expected. So now that I think I have sorted it out, it seems sensible to write it up so that you, my faithful reader, can point out any glaring problems.

The bad block list is simply a list of blocks - one list for each device - which are to be treated as 'bad'. This does not include any relocation of bad blocks to some good location. That might be done by the underlying device, but md doesn't do it. md just tracks which blocks are bad and which, by implication, are good.

The difficulty comes in understanding exactly what "bad" means, why we need to record badness, and what to do when we find that we might want to perform IO against a recorded bad block.

read more... (23 comments)


24 March 2010, 06:46 UTCA new release of wiggle

A long time ago, while in a job far far away....

Back in 2003 I wrote a program called "wiggle". Like many interesting projects it was written to scratch an itch.

While developing code for the Linux kernel I would often need to apply patches made for earlier versions against later versions. Sometimes there would be trivial conflicts and the "patch" program would just give up an create a reject file. After the 50th time that I applied a patch like this by hand it decided that enough was enough so I wrote "wiggle". It takes patches that don't quite apply properly and wiggles them in to place. If there is a change in part of the code that the patch doesn't actually change, wiggle doesn't let that get in the way. If there is a change in part of the code that the patch also changes, wiggle reports that inline as a conflict in a way that makes it easy to resolve by hand.

Since 2003 I have made a few improvements and fixed a few bugs. Just recently the Debian package of wiggle got a new maintainer who was very proactive in trying to get some patches upstream to me, and get some languishing bugs fixed.

Always keen to reward such friendly behaviour I applied the patches, fixed the bugs and finally made a new release of wiggle, the first in nearly 7 years.

Version 0.7 can be found in my git tree at git://neil.brown.name/wiggle or browsers at http://neil.brown.name/git?p=wiggle;a=summary or downloaded as a 'tar' archive from http://neil.brown.name/wiggle.

Feedback always welcome.

What I really want to know is how to get git to always use wiggle for merging conflicts. I can do it on a per-repository basis by setting the 'merge' attribute (I think) but I cannot make it automatically apply to all of my git trees...

[permalink (7 comments)]


11 February 2010, 05:03 UTCSmart or simple RAID recovery??

I frequently see comments, particularly on the linux-raid mailing list to the effect that md should be more clever when recovering from an inconsistent stripe in an array.

In particular, it is suggested that for a RAID1 with more than 2 devices, a vote should be held and if one content occurs more often than the others (e.g. 2 devices have the same content, the third is different) then the majority vote should rule and the most common content be copied over the less common content.

Similarly with RAID6 if the P and Q blocks don't match the data blocks, it may be possible to find exactly one data block which can be corrected so as to make both P and Q match - so we could change just one data block instead of two "parity" blocks to achieve consistency.

I will call this approach the "Smart recovery" approach.

The assertion is that smart recovery will not only make the stripe consistent, but will also make it "correct".

I do not agree with these comments. It is my position that if there is an inconsistency that needs to be corrected then it should be corrected in a simple predictable way and that any extra complexity is unjustified. For RAID1, that means copying to first block over all the others. For RAID6, that means calculating new P and Q blocks based on the data. This is the "simple recovery" approach.

This note is an attempt to justify this position, both to myself and to you, my loyal reader.

read more... (6 comments)



04 January 2010, 00:29 UTCReturned my Dgtec HDPVR5009 to Dick Smith
17 August 2009, 00:09 UTCConverting RAID5 to RAID6 and other shape changing in md/raid
28 February 2009, 12:37 UTCThe LaFS directory structure
24 February 2009, 19:53 UTCMeasuring Freerunner battery life [UPDATED]
15 February 2009, 22:42 UTCtapinput: Yet another soft keyboard for the freerunner.
12 February 2009, 20:54 UTCMoving to Debian on my Neo Freerunner
08 February 2009, 05:22 UTCWhy I wrote my own 'gsmd'
31 January 2009, 20:51 UTCgsm0710muxd without DBUS or ptys
30 January 2009, 21:18 UTCNext Freerunner toys - battery applet and runit
29 January 2009, 23:46 UTCRoad map for md/raid driver - sort of
29 January 2009, 03:03 UTCLinux.conf.au 2009 - Hobart Tasmania
28 January 2009, 02:56 UTCScreen Lock on the Freerunner
22 February 2007, 04:22 UTCmdadm 2.6.1 released
08 September 2006, 05:25 UTCNewer, faster, better
24 June 2006, 10:53 UTCImproved support for comments
17 June 2006, 08:24 UTCAnother TODO list : nfsd
11 June 2006, 10:13 UTCMetad - a daemon for controlling daemons
11 June 2006, 04:43 UTCA good tongue to speak in.
02 June 2006, 04:54 UTCA new lease of life for my network
27 May 2006, 10:53 UTCA man after God's own heart

list of all entries


Contact
git
Mercurial



[atom feed]  
[æ]