DONE 15i/ separate thread management from 'cleaner' name.
-15j/ review rules in getref_locked - and document them
+DONE 15j/ review rules in getref_locked - and document them
- - fix accesses to iblock
+DONE - fix accesses to iblock
DONE 15k/ newblocks should probably be a count of segments. Review that.
DONE 15ad/ review lafs_phase_flip's call to lafs_add_block_address and wonder
if more is needed.
-15ae/ refile wonders about a race with cluster_allocate which gets IOLock
+DONE 15ae/ refile wonders about a race with cluster_allocate which gets IOLock
before removing from lru.
-15af/ Review all locking in lafs_refile
+DONE 15af/ Review all locking in lafs_refile
-15ag/ Don't allocate data part of InoIdx block.
+DONE 15ag/ Don't allocate data part of InoIdx block.
-15ah/ Is there a problem with lafs_allocated_block putting an
+DONE 15ah/ Is there a problem with lafs_allocated_block putting an
about-to-be-truncated block on an uninc list?
-15ai/ When allocating a new segment during checkpoint, delay the
+DONE 15ai/ When allocating a new segment during checkpoint, delay the
youth-block update until after the checkpoint
-15aj/ When roll-forward finds a new segment, make sure youth number is
+DONE 15aj/ When roll-forward finds a new segment, make sure youth number is
updated.
15ak/ Load orphan file during roll-forward and make every block an
15al/ set filesystem update_time somewhere.
-15am/ filesystem 'name' needs to be handle uniformly.
+15am/ filesystem 'name' needs to be handled uniformly.
-15an/ can we be sure 'b' will be non-null in delete_inode?
+DONE 15an/ can we be sure 'b' will be non-null in delete_inode?
15ao/ determine what locking is needed to walk the children list
in lafs_inode_handle_orphans. Probably the address_space private lock.
-15ap/ Make sure write_inode has been cleaned up. See if this apply to
+15ap/ Make sure write_inode has been cleaned up. See if this applies to
rollforward of a symlink (see FIXME)
15aq/ change inode map to be little-endian, not host-endian
15bh/ after roll-forward, check that free_blocks hasn't gone negative.
or handle if it has.
-15bi/ Set EmergencyClean a bit later - need at least one checkpoint first.
+DONE 15bi/ Set EmergencyClean a bit later - need at least one checkpoint first.
to twostage.
15bj/ Make sure .last link in segtracker is kepts uptodate, particularly in
15bm/ make sure everything gets free properly on error during mount / lafs_load
-15bn/ How does refcountsing of 'struct fs' work with multiple filesets?
+15bn/ How does refcounting of 'struct fs' work with multiple filesets?
15bo/ use put_super to drop last refer to superblocks
-15bp/ review all superblock - maybe use more anon??
+15bp/ review all superblocks - maybe use more anon??
15bq/ check readonly status in lafs_get_sb
lafs_allocated_block, when this happens, checks that the parent is dirty/realloc as appropriate.
Inf this case, realloc isn't an issue, only dirty. lafs_incorporate must have made it dirty and
it won't get written while it has these in-phase children, so all is happy.
+
+ 15ae - refile race? Someone might set B_IOLock before removing from lru, so
+ onlru is 0 and refcnt is elevated so it doesn't seem to be unused.
+ But then whoever has the refer will refile again when dropping it and
+ so the right thing will be done.
+ But more generally, do we really want the lru etc to own a counted reference?
+ If it didn't:
+ - we would need to refile when removing from any list
+ - we would need to get a ref when removing from list.
+ uhmmm..
+
+ lafs_refile does:
+ clear PinPending if refcnt is low
+ unpin if not PinPending, or dirty etc and data or refcnt is low
+ place on leaf list - if pinned etc - this can be earlier
+ drop parent linkm if refcnt is low, and not pinned etc
+ handle dblock issues
+
+ if lru was not refcounted, then the only things we might do when refcnt isn't zero are:
+ unpin a dblock once it is not dirty
+ add to lru
+
+ But if we don't count lru, then we can lose the refcount on dblock
+
+ Hmmm - we cannot leave things on the leaf list forever as they thus hold a reference and
+ don't get freed.
+
+ I think I want things on 'leafs' list to not hold a counted reference.
+ Things *only* get removed while walking the list.
+ InoIdx blocks hold a ref on the dblock both when counted and some other time. Possibly
+ when pinned. This ensure they are held InoIdx is while a real leaf.
+ But: When we take that first ref, how do we know the dblock even exists?
+
+ What is the lifetime of ->dblock?
+ removed when page is released
+ set by lafs_import_inode
+ set by lafs_inode_dblock
+ removed by clear_inode
+ So if I don't hold a ref, I always need to be ready to call lafs_inode_dblock
+ This is currently callers of getiref_locked
+ - erase_dblock_locked ?? shouldn't need a lock
+ - ihash_lookup - never on InoIdx
+ - lafs_make_iblock - already have dblock
+ So none of those really need lafs_inode_dblock
+ What about when we set Pinned
+ only really from set_phase ... messy.
+ What about when we set ->parent
+ grow index tree - not relevant
+ ditto do_incorporate_*
+ block_adopt
+ Can be called on InoIdx from:
+ lafs_make_iblock only!!
+
+15sep2010
+
+ I have tidied lafs_refile up a lot but I need to make locking a lot cleaner.
+ In particular I want a single lock I can take when the refcnt hits zero which will ensure no ref
+ is taken until I have finished my cleanup. I suspect the inode private_lock is the one to use.
+ I also need to clean up getiref_locked and getref_locked - having both is awkward.
+
+ So: when are they called?
+
+ getref_locked:
+ lafs_get_flushable - hold fs->lock
+ first_in_seg - holds private_lock, but shouldn't need _locked as hold a ref through child.
+ (getiref_locked)
+ pin_all_children - hold private_lock
+ find_better - private_lock
+ getdref_locked
+ lafs_invalidate_page - to get a ref on each block to either erase or invalidate it
+ presumably page is locked
+ lafs_get_block - holds private_lock - plus once with only page_lock
+ lafs_release_page - holds private_lock
+ (getiref_locked on dblock) - no locking
+ lafs_inode_dblock - private_lock of my_inode...
+ lafs_delete_inode - private_lock of my_inode
+ lafs_destroy_inode - ditto
+ lafs_drop_inode - ditto
+ getiref_locked
+ erase_dblock_locked - private_lock
+ lafs_get_flushable - fs->lock
+ ihash_lookup - lafs_hash_lock
+ lafs_make_iblock - private_lock
+
+ So private_lock looks like a good choice. Issues are:
+ - what is the story with dblock on my_inode->private_lock
+ - what is the lock ordering
+ - what can refile negate that we need to be careful of.
+ i.e. we want to keep things stable while refile does its tests, but what do we need to keep
+ stable for others?
+ + we break the parent link?? and so the siblings link
+ + move things to freelist
+ + can put_page
+ + free dblock if not page_private
+
+ Lock_ordering. private_lock, then fs->lock, then lafs_hash_lock
+ So if we have to hold lafs_hash_lock, we increment refcnt, drop the lock, get/drop private_lock
+
+ This is getting messy - I need something nice and clear.
+ So:
+ Index Blocks.
+ If Pinned, either has references or is on a leaf list - possibly both
+ If no references and not pinned then not on leaf list, so can be on free list
+
+ Pinned can only be set when there are references, and can only be cleared under private_lock
+ This is violated by phase_flip, which badly reads refcnt
+ If refcnt is zero and not pinned, then can be moved to free_list
+ If on freelist and refcnt is zero under hash_lock, can be freed
+
+ So if lafs_get_flushable finds a block that is not pinned, then we can delete and ignore.
+ Someone else must hold a ref and will put it and it will refile. but that is pointless as
+ it could immediately be cleared after we test Pinned.
+
+ lafs_get_flushable should get a reference before deleting from list. This ensure it won't be freed
+ by lafs_shrinker, though it could be on the free list. If it is, then it isn't pinned so it is not
+ interestin to us.
+
+
+ Data Blocks:
+ These are removed from lru when freed - we just need the extra refcnt check after removing from list.
+ No we don't - these are only pinned while refcnt or dirty and can only loose dirty while refcnt
+ so they cannot disappear
+
+ What is the story with my_inode->private_lock though? This is used to protect ->dblock accesses.
+ I guess we need to get or hold the other lock .... look at what the race is - what else is checked when dblock is cleared?
+ dblock is cleared in refile for the dblock,
+ or in clear_inode under the inode rivate lock.
+
+ So:
+ There are various places that hold a non-counted reference to a block.
+ These include
+ - index hash table lafs_hash_lock
+ - index free list lafs_hash_lock
+ - phase_leafs / clean_leafs fs->lock only if pinned
+ - inode->iblock lafs_hash_lock
+ - inode->dblock inode->i_data.private_lock
+
+ Each of these is protected by its own lock, but not all the same lock.
+ When we turn one of these into a counted reference, we increment refcnt under the local lock,
+ then after dropping that lock we take and drop b->inode->i_data.private_lock to ensure refile has
+ finished. This must be done before changing/using the block in any way.
+ To free an index block it must first be removed from _leafs list. Then if the refcount is still
+ zero it can be freed - or put on freelist and subsequently freed.
+ An InoIdx block - we need to hold hash_lock as well as private_lock to take a reference.
+ To free a data block we similarly need to recheck refcnt after removing from leaf list.
+ If it is in an inode file we also take that inode's private_lock to clear dblock.
+ We use rcu to get the inode, the lock it, then clear dblock if refcnt is still zero.
+
+17sep2010
+ review lafs_refile - are some of those tests redundant? - yes, one is gone.
+
+ So:
+ 15ah - What about truncated blocks sitting on an uninc chain?
+ I don't see the problem. It will eventually get incorporated and do the right thing...
+
+ 15ai - We don't want to touch the youth block during a checkpoint else it is awkward to write it out in
+ a stable way.....
+ No, I don't think that is really a problem. It only gets written out in the tail of the checkpoint after
+ the root. I guess it could then get a youth number for a segment that it has no count for, if the root is
+ written at the end of one segment and the segusage/youth written at the start of the next.
+
+ But I think roll-forward is missing something. Blocks in the next phase need to be counted into segusage.
+ Are they? oh, yes - they are. - cleaned and index blocks are ignored so they might be some wasted space,
+ but the important blocks picked up by the roll-forward are handled.
+
+ So....
+
+ A checkpoint could cover multiple segments. We need to be sure these each get a valid youth number.
+ Probably most of them will, but we need a consistent approach to be sure.
+ They don't need to be added to the segtracker, except the last needs to be active, and it already is.
+ So as we find a new segment we want to do much like was lafs_free_get does youth_update.
+ But the data block - isn't that youthblk? When it that set?
+ segsum_find sets if it ssnum == 0