i.e. in roll_valid. If the head isn't complete, we can still
use this to commit some previous checkpoints.
-15ba/ roll forward should not BUG on bad data like inodefile in
+DONE 15ba/ roll forward should not BUG on bad data like inodefile in
non-primary filesystem.
-15bb/ Do I need to sync something before copying an update over part
+DONE 15bb/ Do I need to sync something before copying an update over part
of an inode, then reloading the inode.
-15bc/ Handle DescHole in roll forward.
+DONE 15bc/ Handle DescHole in roll forward.
-15bd/ Call lafs_add_block_address from writeback rather than iolock
+DONE 15bd/ Call lafs_add_block_address from writeback rather than iolock
in roll forward, just for consistency.
-15be/ Confirm various files loaded at mount time (segusage, orphan ...)
+DONE 15be/ Confirm various files loaded at mount time (segusage, orphan ...)
are actually the correct type.
-15bf/ Avoid quadratics in lafs_seg_put_all - nothing else should be doing
+DONE 15bf/ Avoid quadratics in lafs_seg_put_all - nothing else should be doing
a lookup - or at least we can test for that.
lafs_seg_apply_all has similar problems and needs a good solution.
-15bg/ lafs_seg_ref_block is worried about losing implicit ref on parent
+DONE 15bg/ lafs_seg_ref_block is worried about losing implicit ref on parent
if parent splits. See what to do about that.
-15bh/ after roll-forward, check that free_blocks hasn't gone negative.
+DONE 15bh/ after roll-forward, check that free_blocks hasn't gone negative.
or handle if it has.
DONE 15bi/ Set EmergencyClean a bit later - need at least one checkpoint first.
to twostage.
-15bj/ Make sure .last link in segtracker is kepts uptodate, particularly in
+DONE 15bj/ Make sure .last link in segtracker is kept uptodate, particularly in
segdelete.
-15bk/ make sure get_cleanable doesn't lose a race before calling add_clean
+DONE 15bk/ make sure get_cleanable doesn't lose a race before calling add_clean
-15bl/ better checks for 'valid state block address' in valid_devblock
+DONE 15bl/ better checks for 'valid state block address' in valid_devblock
include that segment_count is credible
also in valid_stateblock
15bq/ check readonly status in lafs_get_sb
-15br/ sync_fs should probably wait for something if 'wait'.
+DONE 15br/ sync_fs should probably wait for something if 'wait'.
-15bs/ set f_fsid properly in lafs_statfs
+DONE 15bs/ set f_fsid properly in lafs_statfs
- - use new write_begin / write_end
- - review how we ensure that credit remain with block.
+DONE - use new write_begin / write_end
+
+15bt/ - review how we ensure that credit remain with block.
15ca/ When pin inode data block, pin it as well as index block I think
It is still kept of the leaf list until the index block is done with
15cc/ free any stray B_ASync block found in destroy_inode
15cd/ Some code assumes a cluster header does not exceed 1 page.
- Is this safe? Is in true? Is it enforced?
+ Is this safe? Is in true? Is it enforced?p
+ roll-forward now handles large cluster_head.
+ Need cleaner to handle it, and need to possibly write large
+ cluster head when making new clusters.
15ce/ classify BUGs as
- internal logic errors
than just iolock, for consistency...
DONE 36f/ What to do if table becomes full when add_block_address in
roll_block ??
-36g/ Write roll_mini for directories.
+DONE 36g/ Write roll_mini for directories.
DONE 36h/ In roll_one, use the cluster counting code to find block number and
make sure we don't exceed the segment.
DONE 36i/ add more general error checking to lafs_mount -
Need owner/group/perm for device file, but not for symlink.
Can we create unique inode numbers?
hard links for dev-files would be problematic.
- What do we gain? Maybe something for sort symlinks.
- 40 seems a ood length to et 70% of symlinks.
+ What do we gain? Maybe something for short symlinks.
+ 40 seems a good length to get 70% of symlinks.
59/ Fix NeedFlush handling so we don't drop-then-retake
a mutex as that isn't sensible.
If a cross-directory rename happens care is needed: either flush updates
first or ensure that a flush does happen before the cross-directory
update is flushed.
+ Note that if the target of a rename is a directory, it must also be fully
+ flushed before the rename can proceed.
26June2010
Investigating 5a
It is OK to delay the write-out of these until an fsync, and not bother
if a checkpoint happens.
So add that to th TODO list - item 66.
+
+28feb2010
+ - roll forward directory updates ... I wonder if I got it right :-)(untested).
+
+
+ I don't seem to have easy-access notes about the various meaning of
+ 'width' and 'stride'
+
+ width: The number of independent devices across which the (virtual) device
+ is placed. The normal goal is to write 'width' blocks on every single write.
+ On a RAID4/5/6 this will avoid the need to pre-read for parity calculations,
+ and it will keep all devices equally busy with writes.
+ The 'width' blocks probably aren't consecutive.
+
+ There are two different layouts - one with width*stride <= segment_size
+ and one with width*stride > segment_size.
+
+ width*stride <= segment_size
+ This is a traditional striped layout like RAID0/4/5/6.
+ The 'stride' is the chunk size, so 'width*stride' is the stripe size,
+ and segment_size must be a multiple of this.
+ In this case all addresses in a single segment are contigious. We don't
+ necessarily write them in order if we want to write less than one stripe.
+ segment_offset will normally be a multiple of width*stride though this isn't
+ enforced as one could have a partition with an non-aligned start.
+
+ width*stride > segment_size
+ This implies a catentated layout. If parity-redundancy is in use when the
+ blocks which combine to form a stripe are 'stride' blocks apart.
+ The benefit of this layout is that an extra drive can be added by simply
+ zeroing it and joining it to the array - no re-stripe needed.
+ This will make all stripes slightly larger so at first the space will not
+ be available. As cleaning happens the space will gradually become
+ available. This still requires restriping, but unlike a normal
+ raid5 restripe, the space becomes available in small amounts immediately,
+ when there is no demand for more space, the re-striping (cleaning) can happen
+ at a very low priority with no cost.
+
+ In this case the blocks in a segment are not contiguous.
+ 'segment_size/width' are, then there is a large gap (in virtual address
+ space) to the next chunk.
+
+ The segment_offset is an amount of space which is free at the start of
+ each device. 0..segment_offset and stride..stride+segment_offset etc
+ do not contain data and can be used for metadata.
+
+ When width > 1 it makes sense to replicate each state block across
+ every device - as we want to write the whole stripe anyway.
+ For now we only write and read the first two copies at the beginning, and
+ the last two at the end...
+
+ Question: what do we want to do about metadata on flash devices? We really
+ don't want a small number of locations to store the metadata, but a large
+ number that we search through - possibly a binary search.
+ These could be all at start/end or scattered throughout the device.
+ The later would make it impossible to find efficiently - there is no way to
+ create useful linkage without writing something else at start of end.
+ As many devices optimise for random writes where the FAT table would be,
+ it make sense to just put the metadata there and not at the end.
+ We should allow one 'page' for each metadatum, which probably meanss
+ 32K.
+ So we should allow all state blocks to be near the start.