umount
I need to stop the cleaner and flush everything before trying to
clean up.
+
+ This is awkward though.
+ The 'sync' of umount is done by kill_block_super, but I call
+ that rather late, after checking that the tree is empty.
+ There are pinned/dirty bits left after sync that we want to magically
+ clean.
+ We have:
+ - segusage/youth blocks. Maybe if we don't seg_apply_all...
+ - orphan block. Maybe don't mark it dirty when we remove things?
+ - inode map?? why is that dirty
+
+ - root directory is dirty still?? But it has been erased.
+ InoIdx is valid-but-empty. Inode Data is dirty
+ Data block 0 is Dirty at block 0.
+
+ ......
+ Ahh... need to mark page dirty when block is marked dirty !!
+
+ The seg usage blocks are now flushed out but not incorporated.
+ I feel that might be correct - we don't want to care about
+ incorporation as we will never use it.
+ For this, segusage and quota are very special cases.
+
+ Inode map is no longer dirty, but is pinned
+ Orphan does have a dirty block still
+ The orphan table contains the root directory.
+ root is now clean and gone
+
+ Segusage doesn't get incorporated after last checkpoint now
+ so that is better.
+ But now we have a circular reference for SegRef. This should not
+ be surprising given the circular problems we had setting SegRef.
+ I guess we just erase the references in the segsum table...
+
+22nd July 2009
+ Hurray!!! I can unmount without crashing!
+ Now I need to sort through all the fixes required to achieve that
+ and make discrete patches, and be sure it is all OK.
+
+DONE - (block.c) lafs_get_block should not have to lock that page just to do a lookup.
+DONE - (block.c) Mark page dirty when block becomes dirty
+DONE - (checkpoint.c) print orphan_slot with Orphan flag
+DONE - Don't incorporate segcount etc after final checkpoint
+DONE - Don't apply seg changes after final checkpoint.
+DONE - Don't start opportunistic checkpoint after final.
+DONE - (checkpoint) if InoIdx isn't dirty but InodeData is, then still allocate
+DONE - (checkpoint) when waiting, wait for checkpointneeded to get cleared
+DONE - (cluster) be more flexible about credit usage when flushing InoIdx
+DONE - (dir) do add_orphan when we abort as well as on success
+DONE - use inode_dec_link_count, not i_nlink--
+DONE - (file.c) lafs_writepage: remove from leafs when we cluster_allocate
+DONE - change %d/%d to strblk
+DONE - (index.c) refile: IF B_IOLOCK, the it isn't on LRU
+DONE - (index) refile: when unpinning, remove from lru
+ - lafs_refile: ->iblock can be non-null for inode 0.
+DONE - Make sure I_Deleting gets cleared when deleting finished.
+DONE - phase_flip should have something separate to call, not lafs_allocated_block
+ - inode.c: lafs_dirty_inode: getref_lock used to get dblock
+NONO - ?? getref_locked allowed if PagePrivate
+DONE - segment: lafs_seg_put_all needed at unmount
+DONE - segdelete_all: need to put intable references
+DONE - lafs_free_get: put the intable references
+DONE - lafs_get_cleanable: put the intable references
+DONE - fix sort splitting in add_cleanable
+DONE - add lafs_empty_segment_table for unmount
+DONE - lafs_release: flush all dirty blocks
+DONE - lafs_release: force a final checkpoint
+DONE - lafs_release: move kill_block_super before final check
+DONE - lafs_put_super: release orphans and segsum files.
+DONE - lafs_destroy_inode: putref should be 'iblock'
+ - lafs_destroy_inode: allow for iblock to be present but no ref held....
+DONE - can roll forward call lafs_allocated_block without dirty???
+
+27th July 2009.
+ - I've re-arranged lafs_release so that the flush is all done in
+ generic_shutdown_super. However it calls invalidate_inodes, and that has
+ problems with pinned inodes. So we need for fsync_super to checkpoint
+ out all inodes that we don't hold our own reference to.
+ If we do hold a reference, then invalidate_inodes will skip them,
+ and ->put_super can be used to drop the references and perform the final
+ checkpoint.
+ fsync_super calls ->sync_fs. after syncing call files. Maybe I can
+ do some sort of checkpoint there...
+ There almost is a checkpoint in there.... But only when called without
+ 'wait'....
+ I need to understand 's_dirt'.
+ This is controlled entirely by the filesystem, common code only examines it.
+ If it is set:
+ file_fsync (the generic 'fsync' method) will call ->write_super
+ fsync_super will call write_super
+ generic_shutdown_super will call write_super
+ sync_supers will call write_super
+ sync_filesystems(0) will call ->sync_fs
+ sync_fs is called:
+ twice from 'sync', once with '0', once with '1' for 'wait'.
+ (though in emergency_sync, both are '0').
+ once from unmount and remount with 'wait' set to '1'.
+ We don't want two checkpoints for a 'sync', but we want to start
+ on 'wait=0'.
+ Maybe if we get called with '0', we set a flag and treat the '1'
+ differently.. There is no locking to make this really safe, but
+ it will probably be OK... I could take a process_id, but then
+ parallel 'sync's could race.
+ write_super is called before the syncs. So it could start the checkpoint,
+ and sync could wait for it.
+ write_super is called multiple times at shutdown, We really need
+ to utilise sb_dirt to avoid some of these.
+ We set sb_dirty to 0 when we set CheckpointNeeded, and set it to 1:
+ - when we pin a dblock or dirty a this-phase iblock.
+
+29jul2009
+ at unmount, we iput the root inode which de-references the dblock
+ before clearing ->iblock, which fails an assertion ... why?
+ Apart from the shinker, ->iblock is only set to NULL in refile
+ when we find an I_Destroyed inode... I guess the root block isn't
+ getting Destroyed...
+ The protocol for freeing iblocks is bad. Should be:
+ - it only gets freed by the shrinker
+ - when inode dies, set ->inode to NULL
+ - when InoIdx iblock dies, set ->iblock to NULL
+ ...???
+30Jul2009
+ So, what exactly is the protocol?
+ - index blocks live either in the parent/sibling tree, or
+ on the inode's free_index list
+ - when refcnt is 0, they live on 'freelist.lru'. When refcount
+ is elevated they stay on lru until they need to be
+ added to some other lru (leafs or cluster)
+ - when shrinker finds block on freelist.lru with non-zero refcnt,
+ it just removes from lru
+ - when shrinker finds free block, it removes from free_index and discards
+ the block FIXME can refcnt=0 still have Pinned,Uninc,Realloc,Dirty ??
+ I think not as such would either have children or be on an lru
+ - When we destroy an inode, all index blocks get disconnected from the
+ inode and freed. This must include the ->iblock
+ - When an index block becomes free due to index tree shrinkage,
+ we set the ->depth to -1 so that it cannot be found by mistake,
+ and leave it for shrinker or inode destruction.
+
+ Confused about inode<->dblock dependence.
+ We don't want the inode to refcnt the dblock as that wastes space.
+ We don't want the dblock to refcnt the inode as that stops it from being freed.
+ So each must disconnect from other when freed.
+ What locking?
+ inode takes private_lock, then checks dblock
+ dblock cannot take private_lock before checking ->my_inode..
+ Maybe: destroy_inode takes ref on dblock, thensets I_Destroyed, then
+ drops ref
+
+1Aug2009.
+ Tracking down the 'credit' count and making sure it stays correct.
+ It seems that I have a Dirty InoIdx block which is not pinned.
+ Due to this it has no refcount and so the data block disappears so
+ the InoIdx block is not visible in the tree. This isn't a definite bug
+ but it means I cannot count credits properly.
+ And surely Dirty index blocks must always be pinned!!??
+
+ When as small file is flushed to the inode we were dirtying the
+ iblock. That seems wrong - should dirty the dblock? Need to
+ check that is valid
+
+ I got a hang in 'rm adir/4'.
+ rm is in lafs_cluster_update_commit_both
+ getting a mutex.
+ cleaner is in lafs_do_checkpoint+0xe4
+ pdflush is in writepage/lafs_cluster_flush waiting on a lock
+ so I guess cleaner is holding a mutex and waiting for something
+ that wont happen?
+
+
+ Hang again at 'seq 1 200' in 'cd /mnt/1/adir'.
+ cleaner is at some point, holding a mutex to stop 'sh'.
+ 0e4 == 228
+
+ ahh.. prepare checkpoint holds wc[0].lock while waiting for checkpoint
+ to be allowed.
+ So when something locks the checkpoint and needs to flush, we have problems....
+
+
+ I seem to have fixed the above. Now:
+ Free space is a real problem. When I remount after the successful unmount,
+ we find a usage pattern like:
+CLEANABLE: 0/0 y=10 u=34179
+CLEANABLE: 0/1 y=0 u=65144
+CLEANABLE: 0/2 y=0 u=65535
+CLEANABLE: 0/3 y=32773 u=32910
+CLEANABLE: 0/4 y=32772 u=149
+CLEANABLE: 0/5 y=0 u=0
+CLEANABLE: 0/6 y=32770 u=16529
+CLEANABLE: 0/7 y=32769 u=35084
+CLEANABLE: 0/8 y=32768 u=31877
+
+ Which is ridiculous.
+ Better fix up what I have first...