README update

author NeilBrown <neilb@suse.de>

Thu, 3 Sep 2009 05:28:21 +0000 (15:28 +1000)

committer NeilBrown <neilb@suse.de>

Thu, 3 Sep 2009 05:28:21 +0000 (15:28 +1000)
author NeilBrown <neilb@suse.de>
Thu, 3 Sep 2009 05:28:21 +0000 (15:28 +1000)
committer NeilBrown <neilb@suse.de>
Thu, 3 Sep 2009 05:28:21 +0000 (15:28 +1000)
diff --git a/README b/README

index 65aa005284572e41f00b9439ec02b6eb5dbebe6f..97c055787af8fa99bfddc4a27ca94b2d391e630a 100644 (file)
--- a/README
+++ b/README
@@ -3363,3 +3363,240 @@ FIXED orphans don't get cleaned up.  It seems a 'create' fails and leaves
    but it gets messy for directories (on first attempt anyway).
    For directories we can just use i_mutex.
    Maybe i_mutex for files as well?
+
+27Aug2009
+  Orphan handling is going well... but not perfect.
+  I'm using IOLock to ensure exclusion for orphan handling.
+  However:
+    I'm not really implementing that on directories
+    Inodes go bad because lafs_erase_dblock needs the lock too.
+    The call from rmdir will always faile because we hold i_mutex.
+
+  Bigger problem.  I'm IOLocking inodes across checkpoints to preserve
+   Orphan status.  But that might stop the checkpoint proceeding.
+   .. so use i_mutex, not IOLock - find.
+
+  Now... it seems I've confused myself.  Orphans don't get handled
+  immediately.  In particular, inodes should not be handled until
+  they final delete_inode.  So setting the B_Orphan flag and putting
+  on the list are two separate events.  The flag must come first,
+  but the list may come much later.  So some of that mucking around
+  with i_mutex is pointless.
+  So:
+    make_orphan makes sure it is in orphan file, sets bit, and removes
+      from list (if present).
+    add_orphan puts it on the list for handling.
+ 
+    For inodes: lafs_new_inode sets the bit and delete_inode puts on queue,
+        as does any unlink/rmdir/rename that fails.
+
+    For directories: put it on list in commit/abort.
+
+
+  And...
+    I hit the BUG where find_leaf wants and address of 0.
+      If an index block gets cleaned out it doesn't disappear
+      immediately.. there is no leaf to find in that direction.
+      We probably need to avoid non-Valid blocks or something...
+  And...
+    Orphans 0/299 to 0/329 and  0/280 are still on the list
+     but are not orphans.
+     Maybe I need to catch mutex_unlock to run the orphans??
+  And...
+    We underflow a segment through orphans are unmount.
+      We are cleaning and truncating at the same time.
+      The same block gets allocated to 0 and to 1225
+      in quick succession.
+      Problem is that we apply new address while in writeback
+      so a new lafs_allocated_block
+
+29Aug2009
+
+  Review of inodes in orphan list:
+    lafs_new_inode makes are orphan for a non-existant inode.
+    If the inode cannot be created, orphan_release is called.
+    If it can, a 'struct inode' is filled in with valid type
+    and nlink==1 (!!) and attached.  The inode will only be
+    detached when the refcnt hits 0, and the orphan list implies
+    a refcount, so if we ever find something on the orphan list
+    with a NULL my_inode, it must be very new and can be ignored.
+
+    When we find an inode block with a my_inode there are a few options:
+      if I_Trunc is set, we must progress truncation providing we can
+            get the i_mutex
+      else if I_Deleting we must delete the inode
+      else if nlink is 0, we remove from the list
+      else nlink > 0 and we must remove orphan status.
+    This means that if nlink is elevated, we need to be holding the mutex...
+    So don't elevate nlink any more...
+
+    When nlink becomes non-zero the block need to be put back on the
+    orphan list (it must already be an orphan).  Also when we set
+    I_Deleting or I_Trunc it must go on the list.
+   .. OK, I think I have all of that.
+
+
+30Aug2009.
+   I have some wierdness that seems to be caused by the orphan stuff,
+   probably due to it all being async now.
+   - A deleted inode clears I_Trunc and then sets it again.  The only
+     explanation seem to be that delete_inode is being called again,
+     so I must be igrabing it again, maybe from cleaning.
+   - bits of directories aren't getting deleted.  Sometimes single
+     blocks, though the referred files are deleted.  Sometimes
+     the whole directory... More interestingly, those blocks then
+     don't get cleaned, so something about them means that they
+     don't get deleted and don't get cleaned either.
+
+   Even weird... I just had a case where file 331 had a different
+   index block for every 4 data blocks...
+
+
+   FIXME:
+    - What stops pinned blocks from being flushed by bdflush in middle
+      of operation and so losing allocation?  Must make sure to set
+      them dirty very late.
+    - orphan_release can fail, so much make sure we can always call
+      it, even if my_inode is NULL.... but how?
+
+
+    - make_orphan could fail due to lack of space, which is not OK.
+      I made it loop, but I'm not 100% sure that is right... it isn't.
+      I need to pass down the 'I'm freeing space' flag, and I need to
+      not require Credit of Dirty is set, etc.
+
+
+    - I seem to have a deadlock and unmount.
+       umount is waiting for lafs_checkpoint_lock_wait in
+          lafs_put_super
+       pdflush is in down_read in sync_supers
+       lafs_cleaner is iget_locked/ifind_fast/inode_wait
+                This is waiting for I_LOCK to be clear.
+      
+
+31Aug2009
+  - When a file shrinks and becomes level-0, make sure
+    old addresses get deallocated.  I seem to have
+    a directory where they didn't.
+
+  - Due to the fact that we over-preallocate, we really shouldn't
+    return ENOSPC until we have flushed dirty data and performed
+    a checkpoint??
+
+
+  - When I removed the last index from an inode
+    (Indirect type) it seems that I didn't write
+    out the corrected block..??
+
+1sep2009
+ I ran my simple test run repeatedly overnight.
+ It ran 208 times before I stopped it.
+ There are 3 possible failure modes:
+   1/ didn't completed within 500 seconds
+   2/ triggered a BUG
+   3/ appeared to complete, the number of blocks
+      in use was not the correct '7'.
+
+ 74 (35%) did not fail!
+ 31 () did not complete
+ 40 () triggered a BUG
+ 2 did not complete but did not trigger a bug
+
+ 94 of those that failed did not have a BUG
+ 92 actually completed.  Of these:
+      1 final blocks 1
+      1 final blocks 110
+      1 final blocks 23
+      2 final blocks 12
+      5 final blocks 0
+      6 final blocks 10
+     11 final blocks 8
+     21 final blocks 11
+     44 final blocks 9
+
+ of the BUGs,
+       1 BUG: sleeping function called from invalid context at kernel/nsproxy.c:217
+      1 BUG: spinlock lockup on CPU#0, rm/1330, cfb2dae4
+      1 kernel BUG at /home/neilb/work/nfsbrick/fs/module/block.c:485!
+      1 kernel BUG at /home/neilb/work/nfsbrick/fs/module/cluster.c:1219!
+      1 kernel BUG at /home/neilb/work/nfsbrick/fs/module/inode.c:821!
+      2 BUG: soft lockup - CPU#0 stuck for 61s! [lafs_cleaner:1177]
+      3 kernel BUG at /home/neilb/work/nfsbrick/fs/module/segments.c:1028!
+      3 kernel BUG at /home/neilb/work/nfsbrick/fs/module/segments.c:351!
+      5 kernel BUG at /home/neilb/work/nfsbrick/fs/module/lafs.h:276!
+      6 kernel BUG at /home/neilb/work/nfsbrick/fs/module/block.c:529!
+      7 BUG: unable to handle kernel paging request at 6b6b6bfb
+     11 kernel BUG at /home/neilb/work/nfsbrick/fs/module/super.c:655!
+
+
+ super.c:655 is "block is still pinned" at unmount time.
+  The block was always an InoIdx with a child.
+  Either inode 0 or 16.
+  child is held by various things:
+      [cfb555cc]16/1(2098)r131E:Valid,Async,SegRef,CN,CNI,UninCredit,PhysValid async(1) clean2(130)
+      [cfb554f0]16/0(1050)r25E:Valid,SegRef,CN,CNI,PhysValid clean2(25)
+      [cfa57c58]0/2(3676)r0E:Valid,Dirty,UninCredit,PhysValid
+      [cfa5bc58]0/2(3110)r0E:Valid,Dirty,UninCredit,PhysValid
+      [ce5b94f0]16/0(519)r1E:Valid,Async,SegRef,CN,CNI,PhysValid async(1)
+      [cfb4d4f0]16/0(4249)r1E:Valid,Async,SegRef,CN,CNI,PhysValid async(1)
+      [ce5ad4f0]16/0(612)r1E:Valid,Async,SegRef,CN,CNI,PhysValid async(1)
+      [ce5c2fc8]0/74(0)r129E:SegRef,C,Claimed,PhysValid clean2(129)
+      [cfa57c58]0/2(1895)r0E:Valid,Dirty,UninCredit,PhysValid
+      [cfb4d5cc]16/1(4543)r105E:Valid,SegRef,CN,CNI,UninCredit,PhysValid clean2(105)
+      [ce5754f0]16/0(1290)r178E:Valid,SegRef,CN,CNI,PhysValid clean2(178)
+
+ The "unable to handle kernel paging request" is always in
+ umount.
+     invalidate_inode_buffers(26/46)/lock_acquire
+
+
+ block.c:529
+    This is iblock valid when erasing a block
+    The block we are erasing is always 0/327 or 0/328.  It is
+    an orphan we are handling, iolocked but not always pinned
+
+ lafs.h:276
+    Map an iblock which is not IOLocked
+       always in lafs_clear_index for the InoIdx block for a directory
+       which is in Writeback.
+       Call is in lafs_allocated_block from cluster_flush.
+
+ segments.c:351
+    seg_inc reduces seg usage below 0
+      - lots of blocks (inode 327) that were cleaned, where then erased twice.
+      - 2 block (inode 328) were erased twice, both from prune
+      - ditto
+
+ segments.c: 1028
+     The free list is empty.... odd as only first segment is currently
+     in use.
+
+ soft lockup:
+     Still orphan: 0/328  Index(1) is in Writeback and Dirty
+       again inode_handle_orphan2 is in Writeback
+
+ inode.c:821
+     inode_handle_orphan are end, child list is not empty.
+       The children seem to be in Realloc - cleaner need to let go.
+
+ cluster.c:1219
+     my_inode is null while cluster_flush an inode and want to set
+        WritePhase.
+
+
+ block.c:485
+     no ICredit for unincredit in dirty_dblock from dir_delete_commit
+     from lafs_unlock.
+
+
+ spinlock lockup in subsequent to real bug
+ ditto for sleeping function.
+
+ Of the '44' which claimed final blocks of 9, 14 really had 7, and 4
+ appear to have other strange values....
+
+ A select '9' has two extra block for the directory '74'.
+ But that directory is long gone.
+ These dir blocks are currently fully populated with numbers.
+ This seems to be the pattern with all non-7 blocks.
+
author	NeilBrown <neilb@suse.de>
	Thu, 3 Sep 2009 05:28:21 +0000 (15:28 +1000)
committer	NeilBrown <neilb@suse.de>
	Thu, 3 Sep 2009 05:28:21 +0000 (15:28 +1000)