README update and minor cosmetic changes.

author NeilBrown <neilb@suse.de>

Sun, 2 Aug 2009 11:09:49 +0000 (21:09 +1000)

committer NeilBrown <neilb@suse.de>

Sun, 2 Aug 2009 11:09:49 +0000 (21:09 +1000)
author NeilBrown <neilb@suse.de>
Sun, 2 Aug 2009 11:09:49 +0000 (21:09 +1000)
committer NeilBrown <neilb@suse.de>
Sun, 2 Aug 2009 11:09:49 +0000 (21:09 +1000)
diff --git a/README b/README

index 7d3d34682005cbce03590d46d51428f1ae10d27c..db8ffeda166eeffacc97269989b201eb5da14bfc 100644 (file)
--- a/README
+++ b/README
@@ -2530,3 +2530,197 @@ This is a list from 18 months ago, with updates
   umount
   I need to stop the cleaner and flush everything before trying to
   clean up.
+
+ This is awkward though.
+ The 'sync' of umount is done by kill_block_super, but I call
+ that rather late, after checking that the tree is empty.
+ There are pinned/dirty bits left after sync that we want to magically
+  clean.
+ We have:
+   - segusage/youth blocks.  Maybe if we don't seg_apply_all...
+   - orphan block.  Maybe don't mark it dirty when we remove things?
+   - inode map?? why is that dirty
+
+   - root directory is dirty still??  But it has been erased.
+     InoIdx is valid-but-empty.  Inode Data is dirty
+        Data block 0 is Dirty at block 0.
+
+  ......
+ Ahh... need to mark page dirty when block is marked dirty !!
+
+ The seg usage blocks are now flushed out but not incorporated.
+ I feel that might be correct - we don't want to care about
+ incorporation as we will never use it.
+ For this, segusage and quota are very special cases.
+
+ Inode map is no longer dirty, but is pinned
+ Orphan does have a dirty block still
+    The orphan table contains the root directory.
+ root is now clean and gone
+
+ Segusage doesn't get incorporated after last checkpoint now
+ so that is better.
+ But now we have a circular reference for SegRef.  This should not
+ be surprising given the circular problems we had setting SegRef.
+ I guess we just erase the references in the segsum table...
+
+22nd July 2009
+ Hurray!!! I can unmount without crashing!
+ Now I need to sort through all the fixes required to achieve that
+ and make discrete patches, and be sure it is all OK.
+
+DONE - (block.c) lafs_get_block should not have to lock that page just to do a lookup.
+DONE - (block.c) Mark page dirty when block becomes dirty
+DONE - (checkpoint.c) print orphan_slot with Orphan flag
+DONE - Don't incorporate segcount etc after final checkpoint
+DONE - Don't apply seg changes after final checkpoint.
+DONE - Don't start opportunistic checkpoint after final.
+DONE - (checkpoint) if InoIdx isn't dirty but InodeData is, then still allocate
+DONE - (checkpoint) when waiting, wait for checkpointneeded to get cleared
+DONE - (cluster) be more flexible about credit usage when flushing InoIdx
+DONE - (dir) do add_orphan when we abort as well as on success
+DONE - use inode_dec_link_count, not i_nlink--
+DONE - (file.c) lafs_writepage: remove from leafs when we cluster_allocate
+DONE - change %d/%d to strblk
+DONE - (index.c) refile: IF B_IOLOCK, the it isn't on LRU
+DONE - (index) refile: when unpinning, remove from lru
+ - lafs_refile: ->iblock can be non-null for inode 0.
+DONE - Make sure I_Deleting gets cleared when deleting finished.
+DONE - phase_flip should have something separate to call, not lafs_allocated_block
+ - inode.c: lafs_dirty_inode: getref_lock used to get dblock
+NONO - ?? getref_locked allowed if PagePrivate
+DONE - segment: lafs_seg_put_all needed at unmount
+DONE - segdelete_all: need to put intable references
+DONE - lafs_free_get: put the intable references
+DONE - lafs_get_cleanable: put the intable references
+DONE - fix sort splitting in add_cleanable
+DONE - add lafs_empty_segment_table for unmount
+DONE - lafs_release: flush all dirty blocks
+DONE - lafs_release: force a final checkpoint
+DONE - lafs_release: move kill_block_super before final check
+DONE - lafs_put_super: release orphans and segsum files.
+DONE - lafs_destroy_inode: putref should be 'iblock'
+ - lafs_destroy_inode: allow for iblock to be present but no ref held....
+DONE - can roll forward call lafs_allocated_block without dirty???
+
+27th July 2009.
+ - I've re-arranged lafs_release so that the flush is all done in
+   generic_shutdown_super.  However it calls invalidate_inodes, and that has
+   problems with pinned inodes.  So we need for fsync_super to checkpoint
+   out all inodes that we don't hold our own reference to.  
+   If we do hold a reference, then invalidate_inodes will skip them,
+   and ->put_super can be used to drop the references and perform the final
+   checkpoint.
+   fsync_super calls ->sync_fs. after syncing call files.  Maybe I can
+   do some sort of checkpoint there...
+   There almost is a checkpoint in there.... But only when called without
+   'wait'....
+   I need to understand 's_dirt'.
+   This is controlled entirely by the filesystem, common code only examines it.
+   If it is set:
+          file_fsync (the generic 'fsync' method) will call ->write_super
+          fsync_super will call write_super
+          generic_shutdown_super will call write_super
+          sync_supers will call write_super
+          sync_filesystems(0) will call ->sync_fs
+   sync_fs is called:
+        twice from 'sync', once with '0', once with '1' for 'wait'.
+             (though in emergency_sync, both are '0').
+        once from unmount and remount with 'wait' set to '1'.
+        We don't want two checkpoints for a 'sync', but we want to start
+        on 'wait=0'.
+        Maybe if we get called with '0', we set a flag and treat the '1'
+        differently..  There is no locking to make this really safe, but
+        it will probably be OK...  I could take a process_id, but then
+        parallel 'sync's could race.
+        write_super is called before the syncs.  So it could start the checkpoint,
+        and sync could wait for it.
+        write_super is called multiple times at shutdown,  We really need 
+        to utilise sb_dirt to avoid some of these.
+        We set sb_dirty to 0 when we set CheckpointNeeded, and set it to 1:
+            - when we pin a dblock or dirty a this-phase iblock.
+
+29jul2009
+  at unmount, we iput the root inode which de-references the dblock
+  before clearing ->iblock, which fails an assertion ... why?
+   Apart from the shinker, ->iblock is only set to NULL in refile
+   when we find an I_Destroyed inode... I guess the root block isn't
+   getting Destroyed...
+ The protocol for freeing iblocks is bad.  Should be:
+   - it only gets freed by the shrinker
+   - when inode dies, set ->inode to NULL
+   - when InoIdx iblock dies, set ->iblock to NULL
+   ...???
+30Jul2009
+  So, what exactly is the protocol?
+    - index blocks live either in the parent/sibling tree, or
+      on the inode's free_index list
+    - when refcnt is 0, they live on 'freelist.lru'.  When refcount
+      is elevated they stay on lru until they need to be 
+      added to some other lru (leafs or cluster)
+    - when shrinker finds block on freelist.lru with non-zero refcnt,
+      it just removes from lru
+    - when shrinker finds free block, it removes from free_index and discards
+      the block FIXME can refcnt=0 still have Pinned,Uninc,Realloc,Dirty ??
+        I think not as such would either have children or be on an lru
+    - When we destroy an inode, all index blocks get disconnected from the
+      inode and freed.  This must include the ->iblock
+    - When an index block becomes free due to index tree shrinkage,
+      we set the ->depth to -1 so that it cannot be found by mistake,
+      and leave it for shrinker or inode destruction.
+
+   Confused about inode<->dblock dependence.
+   We don't want the inode to refcnt the dblock as that wastes space.
+   We don't want the dblock to refcnt the inode as that stops it from being freed.
+   So each must disconnect from other when freed.
+   What locking?
+   inode takes private_lock, then checks dblock
+   dblock cannot take private_lock before checking ->my_inode..
+   Maybe: destroy_inode takes ref on dblock, thensets I_Destroyed, then
+     drops ref
+
+1Aug2009.
+  Tracking down the 'credit' count and making sure it stays correct.
+  It seems that I have a Dirty InoIdx block which is not pinned.
+  Due to this it has no refcount and so the data block disappears so
+  the InoIdx block is not visible in the tree.  This isn't a definite bug
+  but it means I cannot count credits properly.
+  And surely Dirty index blocks must always be pinned!!??
+
+  When as small file is flushed to the inode we were dirtying the
+  iblock.  That seems wrong - should dirty the dblock?  Need to 
+  check that is valid
+
+  I got a hang in 'rm adir/4'.
+  rm is in lafs_cluster_update_commit_both
+       getting a mutex.
+  cleaner is in lafs_do_checkpoint+0xe4
+  pdflush is in writepage/lafs_cluster_flush waiting on a lock
+  so I guess cleaner is holding a mutex and waiting for something
+   that wont happen?
+
+
+  Hang again at 'seq 1 200' in 'cd /mnt/1/adir'.
+   cleaner is at some point, holding a mutex to stop 'sh'.
+  0e4 == 228
+
+  ahh.. prepare checkpoint holds wc[0].lock while waiting for checkpoint
+   to be allowed.
+  So when something locks the checkpoint and needs to flush, we have problems....
+
+
+  I seem to have fixed the above.  Now:
+    Free space is a real problem.  When I remount after the successful unmount,
+    we find a usage pattern like:
+CLEANABLE: 0/0 y=10 u=34179
+CLEANABLE: 0/1 y=0 u=65144
+CLEANABLE: 0/2 y=0 u=65535
+CLEANABLE: 0/3 y=32773 u=32910
+CLEANABLE: 0/4 y=32772 u=149
+CLEANABLE: 0/5 y=0 u=0
+CLEANABLE: 0/6 y=32770 u=16529
+CLEANABLE: 0/7 y=32769 u=35084
+CLEANABLE: 0/8 y=32768 u=31877
+
+    Which is ridiculous. 
+   Better fix up what I have first...
diff --git a/checkpoint.c b/checkpoint.c

index dd6fb8e75293a831e8c20427ff5009b7caeff13c..baad30e3ea888dfc6fe9a4cf393ec9564c944ac9 100644 (file)
--- a/checkpoint.c
+++ b/checkpoint.c
@@ -488,6 +488,7 @@ unsigned long lafs_do_checkpoint(struct fs *fs)
  unsigned long long lafs_checkpoint_start(struct fs *fs)
  {
         unsigned long long cp = fs->wc[0].cluster_seq;
+       WARN_ON(test_bit(FinalCheckpoint, &fs->fsstate));
         set_bit(CheckpointNeeded, &fs->fsstate);
         fs->prime_sb->s_dirt = 0;
         lafs_wake_cleaner(fs);
diff --git a/dir.c b/dir.c

index 431b2acb1e8d13b6847a61ab2fba654b3b8f078e..b1a35a84bb0cb82ee3b717c8b740ce15b6c72225 100644 (file)
--- a/dir.c
+++ b/dir.c
@@ -755,8 +755,7 @@ lafs_unlink(struct inode *dir, struct dentry *de)
         struct datablock *inodb;
         int err;
  
-       dprintk("enter unlink: refcnt = %d\n",
-               atomic_read(&LAFSI(inode)->dblock->b.refcnt));
+       dprintk("unlink %s\n", de->d_name.name);
  
         err = dir_delete_prepare(fs, dir, de->d_name.name, de->d_name.len,
                                  &doh);
@@ -770,8 +769,6 @@ lafs_unlink(struct inode *dir, struct dentry *de)
         lafs_checkpoint_lock(fs);
  
         err = dir_delete_pin(&doh);
-       if (err)
-               printk("E err=%d\n", err);
         err = err ?: lafs_cluster_update_pin(&uh);
         err = err ?: lafs_pin_dblock(inodb);
         if (err == 0 && last)
@@ -823,6 +820,8 @@ lafs_rmdir(struct inode *dir, struct dentry *de)
         if (inode->i_size || inode->i_nlink > 2)
                 return -ENOTEMPTY;
  
+       dprintk("rmdir %s\n", de->d_name.name);
+
         err = dir_delete_prepare(fs, dir, de->d_name.name, de->d_name.len,
                                  &doh);
         err = dir_log_prepare(&uh, fs, &de->d_name) ?: err;
@@ -1119,6 +1118,8 @@ lafs_rename(struct inode *old_dir, struct dentry *old_dentry,
                                 return -EMLINK;
                 }
         }
+       dprintk("rename %s %s\n", old_dentry->d_name.name,
+              new_dentry->d_name.name);
  
         /* old entry gets deleted, new entry gets created or updated. */
         err = dir_delete_prepare(fs, old_dir,
diff --git a/orphan.c b/orphan.c

index 7cd68f5911994d0a3a86e3727d54c7c2f645fbe0..a1889b44cadd042d3b57888a26e76321b5c9e156 100644 (file)
--- a/orphan.c
+++ b/orphan.c
@@ -58,8 +58,8 @@ void lafs_dump_orphans(void)
         mutex_lock_nested(&dfs->orphans->i_mutex, I_MUTEX_QUOTA);
  
         om = &LAFSI(dfs->orphans)->md.orphan;
-       printk("nextfree = %u", (unsigned)om->nextfree);
-       printk("reserved = %u", (unsigned)om->reserved);
+       printk("nextfree = %u\n", (unsigned)om->nextfree);
+       printk("reserved = %u\n", (unsigned)om->reserved);
  
         for (slot = 0; slot < om->nextfree; slot++) {
                 struct orphan *or;
diff --git a/segments.c b/segments.c

index c4585e336738b1cbd173825729840c3162248783..dc6e2e4115f7e613fc21d7bab3e2ec4c4704706c 100644 (file)
--- a/segments.c
+++ b/segments.c
@@ -1064,6 +1064,8 @@ void lafs_free_get(struct fs *fs, unsigned int *dev, u32 *seg,
                 ssum = segsum_find(fs, ss->segment, ss->dev, ssnum);
                 if (IS_ERR(ss))
                         /* ?? what do I need to release etc */
+                       /* Maybe this cannot fail because we own references
+                        * to the two blocks !! */
                         BUG();
                 lafs_checkpoint_lock(fs);
                 (void)lafs_reserve_block(&ssum->ssblk->b, CleanSpace);
author	NeilBrown <neilb@suse.de>
	Sun, 2 Aug 2009 11:09:49 +0000 (21:09 +1000)
committer	NeilBrown <neilb@suse.de>
	Sun, 2 Aug 2009 11:09:49 +0000 (21:09 +1000)
README		patch \| blob \| history
checkpoint.c		patch \| blob \| history
dir.c		patch \| blob \| history
orphan.c		patch \| blob \| history
segments.c		patch \| blob \| history