I just read something that makes a lot of what I read about Hardware and Software RAID seem to be mostly conjecture... so I was hoping that I could ask your opinion.
Adaptec notes on tehir website:
'... RAID controllers are designed to timeout a given command if the disk drive becomes unresponsive within a given time frame. The result will be that the drive will appear off line or will be marked bad and an alert will be given to the customer. Enterprise class drives (or drives designed for RAID environments), have a retry limit before a sector is marked bad. This retry limit enables the drive to respond to the RAID controller within the expected time frame. While desktop drives may work with a RAID controller, the array will progressively go off line as the disk drive ages and may result in data loss.'
This is certainly my experience running both RAID on large (500GB+ SATA desktop disks).
Could the vendors advising this really sell that many enterprise class disks at 6x the price just by knobbling the 'time-out' period in the controller firmwares?
I read that Google did a whitepaper (can't find it yet), showing there is no difference in drive reliability between these two 'classes' of drive. I doubt Google use hardware raid controllers in their beige box fleet of course (maybe mdadm?), but surely will know all about it.
Really my dream is to use mdadm to build a RAID 5 or 6 (though RAID 1+0 would be nicer ;) ZFS array with a hot-spare, using 5 or more desktop SATA 2TB disks. It has failed in the past with similar symptoms to the ones Adaptec describes.
Does one have a 'practical size limit' when using mdadm and cheap disks? I remember rading that <320GB disks helps to avoid I/O errors during array rebulds.
