but it gets messy for directories (on first attempt anyway).
For directories we can just use i_mutex.
Maybe i_mutex for files as well?
+
+27Aug2009
+ Orphan handling is going well... but not perfect.
+ I'm using IOLock to ensure exclusion for orphan handling.
+ However:
+ I'm not really implementing that on directories
+ Inodes go bad because lafs_erase_dblock needs the lock too.
+ The call from rmdir will always faile because we hold i_mutex.
+
+ Bigger problem. I'm IOLocking inodes across checkpoints to preserve
+ Orphan status. But that might stop the checkpoint proceeding.
+ .. so use i_mutex, not IOLock - find.
+
+ Now... it seems I've confused myself. Orphans don't get handled
+ immediately. In particular, inodes should not be handled until
+ they final delete_inode. So setting the B_Orphan flag and putting
+ on the list are two separate events. The flag must come first,
+ but the list may come much later. So some of that mucking around
+ with i_mutex is pointless.
+ So:
+ make_orphan makes sure it is in orphan file, sets bit, and removes
+ from list (if present).
+ add_orphan puts it on the list for handling.
+
+ For inodes: lafs_new_inode sets the bit and delete_inode puts on queue,
+ as does any unlink/rmdir/rename that fails.
+
+ For directories: put it on list in commit/abort.
+
+
+ And...
+ I hit the BUG where find_leaf wants and address of 0.
+ If an index block gets cleaned out it doesn't disappear
+ immediately.. there is no leaf to find in that direction.
+ We probably need to avoid non-Valid blocks or something...
+ And...
+ Orphans 0/299 to 0/329 and 0/280 are still on the list
+ but are not orphans.
+ Maybe I need to catch mutex_unlock to run the orphans??
+ And...
+ We underflow a segment through orphans are unmount.
+ We are cleaning and truncating at the same time.
+ The same block gets allocated to 0 and to 1225
+ in quick succession.
+ Problem is that we apply new address while in writeback
+ so a new lafs_allocated_block
+
+29Aug2009
+
+ Review of inodes in orphan list:
+ lafs_new_inode makes are orphan for a non-existant inode.
+ If the inode cannot be created, orphan_release is called.
+ If it can, a 'struct inode' is filled in with valid type
+ and nlink==1 (!!) and attached. The inode will only be
+ detached when the refcnt hits 0, and the orphan list implies
+ a refcount, so if we ever find something on the orphan list
+ with a NULL my_inode, it must be very new and can be ignored.
+
+ When we find an inode block with a my_inode there are a few options:
+ if I_Trunc is set, we must progress truncation providing we can
+ get the i_mutex
+ else if I_Deleting we must delete the inode
+ else if nlink is 0, we remove from the list
+ else nlink > 0 and we must remove orphan status.
+ This means that if nlink is elevated, we need to be holding the mutex...
+ So don't elevate nlink any more...
+
+ When nlink becomes non-zero the block need to be put back on the
+ orphan list (it must already be an orphan). Also when we set
+ I_Deleting or I_Trunc it must go on the list.
+ .. OK, I think I have all of that.
+
+
+30Aug2009.
+ I have some wierdness that seems to be caused by the orphan stuff,
+ probably due to it all being async now.
+ - A deleted inode clears I_Trunc and then sets it again. The only
+ explanation seem to be that delete_inode is being called again,
+ so I must be igrabing it again, maybe from cleaning.
+ - bits of directories aren't getting deleted. Sometimes single
+ blocks, though the referred files are deleted. Sometimes
+ the whole directory... More interestingly, those blocks then
+ don't get cleaned, so something about them means that they
+ don't get deleted and don't get cleaned either.
+
+ Even weird... I just had a case where file 331 had a different
+ index block for every 4 data blocks...
+
+
+ FIXME:
+ - What stops pinned blocks from being flushed by bdflush in middle
+ of operation and so losing allocation? Must make sure to set
+ them dirty very late.
+ - orphan_release can fail, so much make sure we can always call
+ it, even if my_inode is NULL.... but how?
+
+
+ - make_orphan could fail due to lack of space, which is not OK.
+ I made it loop, but I'm not 100% sure that is right... it isn't.
+ I need to pass down the 'I'm freeing space' flag, and I need to
+ not require Credit of Dirty is set, etc.
+
+
+ - I seem to have a deadlock and unmount.
+ umount is waiting for lafs_checkpoint_lock_wait in
+ lafs_put_super
+ pdflush is in down_read in sync_supers
+ lafs_cleaner is iget_locked/ifind_fast/inode_wait
+ This is waiting for I_LOCK to be clear.
+
+
+31Aug2009
+ - When a file shrinks and becomes level-0, make sure
+ old addresses get deallocated. I seem to have
+ a directory where they didn't.
+
+ - Due to the fact that we over-preallocate, we really shouldn't
+ return ENOSPC until we have flushed dirty data and performed
+ a checkpoint??
+
+
+ - When I removed the last index from an inode
+ (Indirect type) it seems that I didn't write
+ out the corrected block..??
+
+1sep2009
+ I ran my simple test run repeatedly overnight.
+ It ran 208 times before I stopped it.
+ There are 3 possible failure modes:
+ 1/ didn't completed within 500 seconds
+ 2/ triggered a BUG
+ 3/ appeared to complete, the number of blocks
+ in use was not the correct '7'.
+
+ 74 (35%) did not fail!
+ 31 () did not complete
+ 40 () triggered a BUG
+ 2 did not complete but did not trigger a bug
+
+ 94 of those that failed did not have a BUG
+ 92 actually completed. Of these:
+ 1 final blocks 1
+ 1 final blocks 110
+ 1 final blocks 23
+ 2 final blocks 12
+ 5 final blocks 0
+ 6 final blocks 10
+ 11 final blocks 8
+ 21 final blocks 11
+ 44 final blocks 9
+
+ of the BUGs,
+ 1 BUG: sleeping function called from invalid context at kernel/nsproxy.c:217
+ 1 BUG: spinlock lockup on CPU#0, rm/1330, cfb2dae4
+ 1 kernel BUG at /home/neilb/work/nfsbrick/fs/module/block.c:485!
+ 1 kernel BUG at /home/neilb/work/nfsbrick/fs/module/cluster.c:1219!
+ 1 kernel BUG at /home/neilb/work/nfsbrick/fs/module/inode.c:821!
+ 2 BUG: soft lockup - CPU#0 stuck for 61s! [lafs_cleaner:1177]
+ 3 kernel BUG at /home/neilb/work/nfsbrick/fs/module/segments.c:1028!
+ 3 kernel BUG at /home/neilb/work/nfsbrick/fs/module/segments.c:351!
+ 5 kernel BUG at /home/neilb/work/nfsbrick/fs/module/lafs.h:276!
+ 6 kernel BUG at /home/neilb/work/nfsbrick/fs/module/block.c:529!
+ 7 BUG: unable to handle kernel paging request at 6b6b6bfb
+ 11 kernel BUG at /home/neilb/work/nfsbrick/fs/module/super.c:655!
+
+
+ super.c:655 is "block is still pinned" at unmount time.
+ The block was always an InoIdx with a child.
+ Either inode 0 or 16.
+ child is held by various things:
+ [cfb555cc]16/1(2098)r131E:Valid,Async,SegRef,CN,CNI,UninCredit,PhysValid async(1) clean2(130)
+ [cfb554f0]16/0(1050)r25E:Valid,SegRef,CN,CNI,PhysValid clean2(25)
+ [cfa57c58]0/2(3676)r0E:Valid,Dirty,UninCredit,PhysValid
+ [cfa5bc58]0/2(3110)r0E:Valid,Dirty,UninCredit,PhysValid
+ [ce5b94f0]16/0(519)r1E:Valid,Async,SegRef,CN,CNI,PhysValid async(1)
+ [cfb4d4f0]16/0(4249)r1E:Valid,Async,SegRef,CN,CNI,PhysValid async(1)
+ [ce5ad4f0]16/0(612)r1E:Valid,Async,SegRef,CN,CNI,PhysValid async(1)
+ [ce5c2fc8]0/74(0)r129E:SegRef,C,Claimed,PhysValid clean2(129)
+ [cfa57c58]0/2(1895)r0E:Valid,Dirty,UninCredit,PhysValid
+ [cfb4d5cc]16/1(4543)r105E:Valid,SegRef,CN,CNI,UninCredit,PhysValid clean2(105)
+ [ce5754f0]16/0(1290)r178E:Valid,SegRef,CN,CNI,PhysValid clean2(178)
+
+ The "unable to handle kernel paging request" is always in
+ umount.
+ invalidate_inode_buffers(26/46)/lock_acquire
+
+
+ block.c:529
+ This is iblock valid when erasing a block
+ The block we are erasing is always 0/327 or 0/328. It is
+ an orphan we are handling, iolocked but not always pinned
+
+ lafs.h:276
+ Map an iblock which is not IOLocked
+ always in lafs_clear_index for the InoIdx block for a directory
+ which is in Writeback.
+ Call is in lafs_allocated_block from cluster_flush.
+
+ segments.c:351
+ seg_inc reduces seg usage below 0
+ - lots of blocks (inode 327) that were cleaned, where then erased twice.
+ - 2 block (inode 328) were erased twice, both from prune
+ - ditto
+
+ segments.c: 1028
+ The free list is empty.... odd as only first segment is currently
+ in use.
+
+ soft lockup:
+ Still orphan: 0/328 Index(1) is in Writeback and Dirty
+ again inode_handle_orphan2 is in Writeback
+
+ inode.c:821
+ inode_handle_orphan are end, child list is not empty.
+ The children seem to be in Realloc - cleaner need to let go.
+
+ cluster.c:1219
+ my_inode is null while cluster_flush an inode and want to set
+ WritePhase.
+
+
+ block.c:485
+ no ICredit for unincredit in dirty_dblock from dir_delete_commit
+ from lafs_unlock.
+
+
+ spinlock lockup in subsequent to real bug
+ ditto for sleeping function.
+
+ Of the '44' which claimed final blocks of 9, 14 really had 7, and 4
+ appear to have other strange values....
+
+ A select '9' has two extra block for the directory '74'.
+ But that directory is long gone.
+ These dir blocks are currently fully populated with numbers.
+ This seems to be the pattern with all non-7 blocks.
+