A typical week in RAID-land

29 June 2012, 06:53 UTC

I should probably write more. I enjoy writing but don't do enough of it. I probably have Elisabeth Bennet's condition. She says to Darcy on the dance floor:

We are each of an unsocial, taciturn disposition, unwilling to speak, unless we expect to say something that will amaze the whole room, and be handed down to posterity with all the eclat of a proverb.

I sometime feel unwilling to write unless I'll say something to amaze the whole Internet. But I think I should try to push through that.

So what has happened this week? I doubt it is really a typical week as no week is really typical - I imagine them all outside my window screaming in chorus "We are all individuals". But I can't really know unless I record it, then compare it with future weeks.

There is a continuing slow dribble of people who have been affected by the nasty bug I reported on recently. People do seem to be getting their data back, though it has turned into a struggle for some.

I'm very pleased that it hasn't resulted in any animosity. I realise that most people are sane and calm and deal with misfortune in a mature way, but when the pressure is on and a deadline is looming and one more thing goes wrong in a sequence of things none of which should have happened, I can well imagine sane and mature people letting off steam in less than mature ways. So I'm grateful that no-one has ... or maybe the people who do that inhabit other forums (fora?) and I'm protected from it.

I suspect the dribble to continue for a few more weeks, though there is probably enough information out there now so that many people won't bother posting, they'll just find the required information and apply the fix. So I'll never know how many people were bitten by my carelessness. Probably better that way.

The other thing of interest that that I've been reviewing a bunch of patches for md. Having been on leave for the previous 2 week, and been somewhat busy before that, quite a lot had queued up - from two authors in particular.

I'm in two minds about patch review. It is very hard to get into the "flow" or the "zone" when reviewing patches - each is like a new interruption. So it seems to take longer than it should and I don't feel productive. But then if the patches fix bugs, then I really want them. Even if it takes me a while to review a patch, it would take a lot longer to find the bug - hunting through my own code for bugs is always difficult, because I *know* the code is right. Even when it isn't. So they feel like interrupts, but in reality they are usually good value.

majianpeng (maybe that is Majian Peng?) is amazing. He (?? they? one doesn't like to assume) seems to be reading the md code with a fine-tooth comb and finding all sort of odd errors. Most of them are newly created, but as few have been around for several releases or even more. It is a little disappointing that there are so many errors to be found, but very pleasing to have them found and fixed with relatively little effort on my part.

I don't always agree with the actual patch that majianpeng submits, but that is a small matter really. The really important thing is finding the bug - that is the hard work. Creating the best patch is comparatively easy. So a big thankyou to majianpeng.

The other patches I've been looking at are from Shaohua Li. Shaohua works for FusionIO, which seems to have been absorbing Linux developers lately. I hope that's a good thing.

Shaohua Li has been trying to make md go faster, not something I've really looked at for years .... maybe not since I talked about it at LCA-2000. Yikes that is a long time ago.

My work on md has mostly been on adding functionality and improving reliability and hasn't really focussed on speed. Hardware has been changing and tradeoff change along with them.

Some of Shaohua's patches have been quite specific to SSDs (that is what FusionIO make as I understand it). The optimisation of seek-free devices will clearly be different from those where seeks have a cost, and md needs to be taught about those.

But other fixes he has come up with are more general. All the real RAID level (i.e. not RAID0 or Linear) have a kernel thread to do some of the handling, and all write requests get handled by that thread. With RAID5, some of the read requests do to.

On a single processor machine, that isn't much of a problem. On a 2, or 4, or 8 processor machine, we can be wasting valuable resources by single-threading everything. When you have an SSD with very fast IO, that is really noticeable. What surprised me was that with boring old SATA drives on a dual processor it can still be noticed.

I didn't use Shaohua's patch itself as I wanted to try a different approach to the same problem. But encouraged and guided by his work I wrote a patch to avoid using the thread in common cases, and a simple "mkfs" benchmark ran 25% faster. I was impressed! And depressed at how poor the performance was previously.

This definitely needs more testing and careful study as another simple benchmark (sequential write with 'dd') was a little slower - but I suspect that can be fixed with a little care.

So a big thankyou to Shaohua Li too. md is going to be better thanks to you.

And now I think I know why I don't blog so often. Once I get started I keep rambling for a while and it all takes time, which I don't always have much spare of. Maybe if I blog more often, I'll have less to say, and it'll impose less on my time. We'll have to wait and see.

Until next time...