]> git.neil.brown.name Git - LaFS.git/log
LaFS.git
13 years agoFix calculation of table_size
NeilBrown [Fri, 1 Oct 2010 12:40:13 +0000 (22:40 +1000)]
Fix calculation of table_size

I was confused about which table I was sizing.
This is the table of which there are several in the segusage  files.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoMinor formatting improvements.
NeilBrown [Fri, 1 Oct 2010 12:38:48 +0000 (22:38 +1000)]
Minor formatting improvements.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFix bug in sort_block
NeilBrown [Fri, 24 Sep 2010 01:43:42 +0000 (11:43 +1000)]
Fix bug in sort_block

It doesn't handle 2 blocks the same - which isn't a big deal, but it
is best to have the code 'right'

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoUpdate filesys mtime
NeilBrown [Sun, 19 Sep 2010 12:09:00 +0000 (22:09 +1000)]
Update filesys mtime

Do this when inode is dirtied.  Maybe this isn't the perfect time,
but it is fairly good for now.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agouse i_mtime for file-set update time.
NeilBrown [Sun, 19 Sep 2010 11:59:35 +0000 (21:59 +1000)]
use i_mtime for file-set update time.

No need to have a separate field in the inode.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoProcess orphan file during mount.
NeilBrown [Sun, 19 Sep 2010 11:46:46 +0000 (21:46 +1000)]
Process orphan file during mount.

We need to read the orphan file to set nextfree, and then to
add all the blocks to the orphan list.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoREADME update
NeilBrown [Sun, 19 Sep 2010 04:42:24 +0000 (14:42 +1000)]
README update

13 years agoroll-forward: update youth block for new segments.
NeilBrown [Sun, 19 Sep 2010 04:39:31 +0000 (14:39 +1000)]
roll-forward: update youth block for new segments.

Any new segments found during roll-forward need their youth value set.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoUpdate youth during seg_apply_all
NeilBrown [Sun, 19 Sep 2010 04:23:18 +0000 (14:23 +1000)]
Update youth during seg_apply_all

If we added blocks to a segment is seg_apply_all, make sure the youth
has been updated properly.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agosplit out set_youth.
NeilBrown [Sun, 19 Sep 2010 04:16:00 +0000 (14:16 +1000)]
split out set_youth.

Setting of the youth value is now a separate function.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoGeneralise segunused to seg_pop
NeilBrown [Sun, 19 Sep 2010 04:05:41 +0000 (14:05 +1000)]
Generalise segunused to seg_pop

And use seg_pop more broadly,
and re-arrange some code.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoIntroduce DelayYouth
NeilBrown [Sat, 18 Sep 2010 13:00:48 +0000 (23:00 +1000)]
Introduce DelayYouth

When writing out the accounting blocks we need to not update the youth
block if we happen to start a new segment.
We already do that at unmount time, so generalise it with a new flag.

Signed-Off-By: NeilBrown <neilb@suse.de>
13 years agoUse segsum rather than separate dblock for youth in lafs_free_get
NeilBrown [Sat, 18 Sep 2010 12:40:54 +0000 (22:40 +1000)]
Use segsum rather than separate dblock for youth in lafs_free_get

As segsum now always have a youthblk for ss==0, we don't need to
separately find a data block.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agosegment.c: allow the else branch to just fall-through
NeilBrown [Sat, 18 Sep 2010 12:25:42 +0000 (22:25 +1000)]
segment.c: allow the else branch to just fall-through

The 'then' branch does a goto, so we don't need the else.
This patch is mostly just re-indenting.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agosegments: swap then/else branches.
NeilBrown [Sat, 18 Sep 2010 12:24:36 +0000 (22:24 +1000)]
segments: swap then/else branches.

Swap 'then' and 'else' branches, inverting condition

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFactor out seg_add_new
NeilBrown [Sat, 18 Sep 2010 06:38:46 +0000 (16:38 +1000)]
Factor out seg_add_new

We had multiple places that add entries to the segtracker.  Now we
only have one.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoadd_block_address: be careful with physaddr == 0
NeilBrown [Fri, 17 Sep 2010 06:00:15 +0000 (16:00 +1000)]
add_block_address: be careful with physaddr == 0

An address of '0' is not consecutive with and address of '1' and while
it is very unlikely to ever be a problem, make sure we don't try to
combine those addresses into a range in the uninc table.

Also remove a comment about a possible problem that doesn't seem to be
a real problem.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoSimplify test in lafs_refile
NeilBrown [Fri, 17 Sep 2010 05:47:34 +0000 (15:47 +1000)]
Simplify test in lafs_refile

If index block has non-zero pending_cnt, then it must be
dirty or realloc, so we can drop a test here.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agolafs_refile: lots of locking and refcount changes.
NeilBrown [Fri, 17 Sep 2010 05:38:10 +0000 (15:38 +1000)]
lafs_refile: lots of locking and refcount changes.

See the comments added to lafs.h for more detail.

There are really several changes and fixes in here but I don't think
it is worth the effort to separate them all out.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agore-indent lafs_file.
NeilBrown [Wed, 15 Sep 2010 05:20:36 +0000 (15:20 +1000)]
re-indent lafs_file.

This patch is simply a re-indent.  No code change.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agolafs_refile: group all functionality that requires refcnt==0
NeilBrown [Wed, 15 Sep 2010 05:19:50 +0000 (15:19 +1000)]
lafs_refile:  group all functionality that requires refcnt==0

There are multiple silly (racy) tests on refcnt==dec.  Combine all
these tests into a single 'if' which is run only when the refcnt
reaches zero.

This patch just changes the code without fixing the indentation.  That
makes it easier to review.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agolafs_refile: factor out check for unpinnable data blocks.
NeilBrown [Wed, 15 Sep 2010 04:57:02 +0000 (14:57 +1000)]
lafs_refile: factor out check for unpinnable data blocks.

Data blocks can be unpinned while refcounts are still present.

Factor that case out into a stand-alone test.

We still test for datablocks being unpinnable in the 'refcnt==0'
case as PinPending might have only just been cleared.

This leaves a large bulk of tests all of which depend on
refcnt reaching zero which can be improved in a subsequent patch.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoSmall lafs_refile clean up
NeilBrown [Wed, 15 Sep 2010 04:34:23 +0000 (14:34 +1000)]
Small lafs_refile clean up

We don't need the 'onlru' variable any more.
And we shouldn't really delete data  blocks from the lru at that
point, as they could be on a cluster or io-pending list.
Index blocks are safe as their refcount is zero, so they can only
be on the leaf lru.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoDon't hold counted reference while on leafs list.
NeilBrown [Wed, 15 Sep 2010 03:58:39 +0000 (13:58 +1000)]
Don't hold counted reference while on leafs list.

Doing so makes the refcount calculations in lafs_refile messy.

We don't drop a block while it is dirty etc anyway so there is no loss

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoHold reference while erasing blocks in lafs_invalidate_page
NeilBrown [Wed, 15 Sep 2010 03:56:02 +0000 (13:56 +1000)]
Hold reference while erasing blocks in lafs_invalidate_page

else we sometimes violate assertions about having non-zero refcount

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoTake a ref on dblock whenever the InoIdx block has a ->parent link.
NeilBrown [Wed, 15 Sep 2010 02:57:07 +0000 (12:57 +1000)]
Take a ref on dblock whenever the InoIdx block has a ->parent link.

This will be needed to ensure the dblock stays referenced when we stop
holding a reference to blocks on the 'leafs' lists.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agogetiref_locked fixes
NeilBrown [Tue, 14 Sep 2010 07:01:21 +0000 (17:01 +1000)]
getiref_locked fixes

The #defines were a bit wrong here, so it would not have compiled
properly without DEBUG_REF

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoChange readpage to use ->chain instead of ->lru
NeilBrown [Tue, 14 Sep 2010 05:44:41 +0000 (15:44 +1000)]
Change readpage to use ->chain instead of ->lru

->lru is rather overused, and ->chain is certainly never used on
non-B_Valid blocks, so convert readpage to use chain instead.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agolafs_refile - split out set_lru
NeilBrown [Tue, 14 Sep 2010 04:22:31 +0000 (14:22 +1000)]
lafs_refile - split out set_lru

setting the lru is not dependent on other changes so it can be move to
the top and outside the lock.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agolafs_refile: separate out consistency checking
NeilBrown [Tue, 14 Sep 2010 03:00:20 +0000 (13:00 +1000)]
lafs_refile: separate out consistency checking

lafs_refile is too big and clumsy.  Time to tidy up.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoREADME update and some comment fixes.
NeilBrown [Tue, 14 Sep 2010 01:50:34 +0000 (11:50 +1000)]
README update and some comment fixes.

Still working through the TODO list.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoMake sure checkpoint happens after a timeout.
NeilBrown [Tue, 14 Sep 2010 01:10:25 +0000 (11:10 +1000)]
Make sure checkpoint happens after a timeout.

If anything has been written since last check point, ensure another
checkpoint happens within 30 seconds.

This is partly to ensure that index block don't remain dirty forever.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agotidy lafs_get_flushable a bit.
NeilBrown [Mon, 13 Sep 2010 08:15:46 +0000 (18:15 +1000)]
tidy lafs_get_flushable a bit.

There is some code in the wrong place - probably a hang over from a
previous arrangement before we made lafs_is_leaf a function.
No function change here.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoRemove old comment
NeilBrown [Mon, 13 Sep 2010 07:58:13 +0000 (17:58 +1000)]
Remove old comment

that issue is already resolved - commit f90959e6f492b6

13 years agoLog symlink creation
NeilBrown [Mon, 13 Sep 2010 07:33:54 +0000 (17:33 +1000)]
Log symlink creation

We cannot include it in an update, so just make sure it goes in the
next write cluster.  This will be before an sync or fsync and
roll-forward should pick it up, so all is OK

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoRemove lafs_space_use
NeilBrown [Mon, 13 Sep 2010 07:13:49 +0000 (17:13 +1000)]
Remove lafs_space_use

it is identical to lafs_space_return, and no-one uses it
anyway.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agocluster_allocate - minor code rearrangement.
NeilBrown [Mon, 13 Sep 2010 05:44:05 +0000 (15:44 +1000)]
cluster_allocate - minor code rearrangement.

extract common code and general clean up

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoGet SegRef on parent of phys==0 blocks.
NeilBrown [Mon, 13 Sep 2010 05:28:08 +0000 (15:28 +1000)]
Get SegRef on parent of phys==0 blocks.

We still need those parents to be segrefed!

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoRemove pinning of iblock in place of dblock
NeilBrown [Mon, 13 Sep 2010 05:07:27 +0000 (15:07 +1000)]
Remove pinning of iblock in place of dblock

We now allow both iblock and dblock to be pinned at the same time.
So when pinning the inode dblock, just do it and don't go bothering
the inode iblock.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoRefinements for triggering checkpoint when we are low on space.
NeilBrown [Mon, 13 Sep 2010 04:39:18 +0000 (14:39 +1000)]
Refinements for triggering checkpoint when we are low on space.

This is getting messy but seems to work.

Not sure now on the difference between CleanerBlocks and
EmergencyPending.
I guess the one makes sure the cleaner does what it can and then
triggers a checkpoint.
The other prepares for EmergencyClean to be set after the next
checkpoint.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoClean up error returns in lafs_reserve_block
NeilBrown [Sun, 12 Sep 2010 23:46:36 +0000 (09:46 +1000)]
Clean up error returns in lafs_reserve_block

We only want to consider EAGAIN if lafs_prealloc returns an error.
When other calls return an error, we want to pass exactly that error
back.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoNever set SegRef on an InoIdx block
NeilBrown [Sun, 15 Aug 2010 08:39:28 +0000 (18:39 +1000)]
Never set SegRef on an InoIdx block

An InoIdx block doesn't have a uptodate ->physaddr, that is only
updated in the data block.  So setting SegRef on it is pointless.

Instead, when we find an InoIdx block while setting SegRef, use the
data block instead.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoWait for a checkpoint before returning ENOSPC
NeilBrown [Sun, 15 Aug 2010 08:30:36 +0000 (18:30 +1000)]
Wait for a checkpoint before returning ENOSPC

If we seem to run out of space, it is worth waiting for
a checkpoint as that might free up some space.  So add
an extra step to the sequence leading from 'no space' to 'ENOSPC'.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoclean.c - assorted tidy-ups
NeilBrown [Sun, 15 Aug 2010 05:08:49 +0000 (15:08 +1000)]
clean.c - assorted tidy-ups

Change some magic constants into named constants, and
improves some comments.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoCheck if dirblock can be orphan before making it one.
NeilBrown [Sat, 14 Aug 2010 12:25:38 +0000 (22:25 +1000)]
Check if dirblock can be orphan before making it one.

Only certain sorts of deletions can make a directory block
into and orphan - check them out before committing resources.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoflush orphans when renaming to empty directory just like rmdir
NeilBrown [Sat, 14 Aug 2010 11:50:14 +0000 (21:50 +1000)]
flush orphans when renaming to empty directory just like rmdir

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoCleaner: add a memory barrier to ensure we see i_size promptly.
NeilBrown [Sat, 14 Aug 2010 11:37:56 +0000 (21:37 +1000)]
Cleaner: add a memory barrier to ensure we see i_size promptly.

There is a possible race that we need a barrier to protect against.
I think... or hope.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoCleaner: remove pointless signed compare.
NeilBrown [Sat, 14 Aug 2010 11:11:12 +0000 (21:11 +1000)]
Cleaner: remove pointless signed compare.

bcnt can never be negative, so don't pretend.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAllow cleaner to skip new filesystems.
NeilBrown [Sat, 14 Aug 2010 11:09:40 +0000 (21:09 +1000)]
Allow cleaner to skip new filesystems.

If we can tell the a subset-filesystem is too new to
match what we find in the write-cluster, we can skip it
quickly.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoWhen cleaner finds a block beyond EOF, ignore whole descriptor.
NeilBrown [Sat, 14 Aug 2010 11:05:55 +0000 (21:05 +1000)]
When cleaner finds a block beyond EOF, ignore whole descriptor.

All other blocks in descriptor must also be beyond EOF,
so ignore them all at once.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoUpdate trunc_gen whenever we truncate a file to zero.
NeilBrown [Sat, 14 Aug 2010 11:04:08 +0000 (21:04 +1000)]
Update trunc_gen whenever we truncate a file to zero.

That allows some minor optimisations in the cleaner to work.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agocleaner_parse: use truncate number to avoid looking at old inodes.
NeilBrown [Sat, 14 Aug 2010 10:55:23 +0000 (20:55 +1000)]
cleaner_parse: use truncate number to avoid looking at old inodes.

That is why we have the truncnumber in the cluster head after all.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAllow cleaner_parse to request multiple inodes at once.
NeilBrown [Sat, 14 Aug 2010 10:52:11 +0000 (20:52 +1000)]
Allow cleaner_parse to request multiple inodes at once.

Currently cleaner_parse stops when it hits an inode that it cannot
load immediately.  This reduced the opportunities for parallelism.

Instream allow up to 16 -EAGAINs from inode lookups.
This requires that we mark headers for inodes which failed, and
always start again from the beginning of the cluster head.
We already reduce the bcnt to 0, so for inodes that can be
found, we won't lookup the blocks twice.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoRefactor try_clean
NeilBrown [Sat, 14 Aug 2010 10:34:45 +0000 (20:34 +1000)]
Refactor try_clean

It is a very big function - change it to 3 moderate sized functions.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoGive to_clean.ss a meaningful name.
NeilBrown [Sat, 14 Aug 2010 10:08:53 +0000 (20:08 +1000)]
Give to_clean.ss a meaningful name.

It is a flag set when we have a valid segment address.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoKeep track of 'seq' number in cleaner
NeilBrown [Sat, 14 Aug 2010 08:17:48 +0000 (18:17 +1000)]
Keep track of 'seq' number in cleaner

When reading cluster-heads, track the seq number, both for validation
and, later, for optimisation.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoREADME update
NeilBrown [Sat, 14 Aug 2010 08:04:47 +0000 (18:04 +1000)]
README update

13 years agoClean up interaction between cleaner and checkpoint.
NeilBrown [Sat, 14 Aug 2010 06:48:39 +0000 (16:48 +1000)]
Clean up interaction between cleaner and checkpoint.

If a checkpoint is wanted, the cleaner shouldn't start any more work.
If the cleaner or segscan is active a checkpoint cannot start, but
when they complete they should wake the checkpoint process.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoRelease stray B_Async blocks if we find them.
NeilBrown [Sat, 14 Aug 2010 06:38:14 +0000 (16:38 +1000)]
Release stray B_Async blocks if we find them.

There could still be some stray index blocks...
maybe fix that later.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoCombine cleaning and orphan list_heads.
NeilBrown [Sat, 14 Aug 2010 05:41:59 +0000 (15:41 +1000)]
Combine cleaning and orphan list_heads.

A datablock is very rarely both an orphan and requiring cleaning, so
having two list_heads is a waste.

If is an orphan it will have full parent linkage and addresses already
so it will be handled promptly and removed from the cleaning list.

So arrange that if a block wants to be both, it is preferentially on
the cleaning list, and when removed from the cleaning list is gets
added back to the pending_orphan list in case it needs processing.

Note that only directory and inode blocks can ever be orphans so some
optimisation of spinlocks is possible.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoChange when orphan blocks are refcounted.
NeilBrown [Sat, 14 Aug 2010 05:16:37 +0000 (15:16 +1000)]
Change when orphan blocks are refcounted.

Count when while B_Orphan is set, rather than while on a list.

This gives us some freedom to do different things with the list,
and ensures that we never lose the flag by the block disappearing.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoUse a new flag to identify blocks being processed by the cleaner.
NeilBrown [Sat, 14 Aug 2010 05:01:34 +0000 (15:01 +1000)]
Use a new flag to identify blocks being processed by the cleaner.

This will help future patch which will unify cleaning and orphans
list_heads, and make is clear when a refcount is being help.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoApply single-exit pattern in try_clean
NeilBrown [Sat, 14 Aug 2010 04:43:11 +0000 (14:43 +1000)]
Apply single-exit pattern in try_clean

This removes a lot of duplications

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoImplement youth decay.
NeilBrown [Sat, 14 Aug 2010 03:36:36 +0000 (13:36 +1000)]
Implement youth decay.

 - After a checkpoint, check if we are close enough to the end
   of youth space to need a decay.
 - when we record a new youth number, un-decay it if the block hasn't
   been decayed yet (and convert endian properly)
 - Change scan_seg to updates free_block/free_dev atomically in just
   one place, and do a block worth of decay at that point.
   As part of this, the youth block is only released at one place now.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoUse the right value of creation_age of subsets.
NeilBrown [Sat, 14 Aug 2010 02:08:38 +0000 (12:08 +1000)]
Use the right value of creation_age of subsets.

It should be cluster seq number.  This never wraps and
is used to compare against write cluster to trim searches of new
filesets early.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoEnsure lafs_orphan_release doesn't block too much
NeilBrown [Sat, 14 Aug 2010 01:16:04 +0000 (11:16 +1000)]
Ensure lafs_orphan_release doesn't block too much

Make sure orphan->i_mutex isn't held for long
periods, and ensure that orphan_abort doesn't block in
erase_dblock.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoREADME update
NeilBrown [Fri, 13 Aug 2010 11:58:57 +0000 (21:58 +1000)]
README update

13 years agoTune checkpoint freq by segments, not blocks.
NeilBrown [Fri, 13 Aug 2010 11:56:49 +0000 (21:56 +1000)]
Tune checkpoint freq by segments, not blocks.

If nothing else does, we should force a checkpoint
every few segments rather than every so-many blocks.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoSeparate thread management from the cleaning.
NeilBrown [Fri, 13 Aug 2010 11:48:36 +0000 (21:48 +1000)]
Separate thread management from the cleaning.

The thread does a lot more than just 'clean' so don't call it the
'cleaner' any more - just the 'thread' or 'lafsd'.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoroll: don't update index if block address hasn't changed.
NeilBrown [Fri, 13 Aug 2010 11:32:11 +0000 (21:32 +1000)]
roll: don't update index if block address hasn't changed.

This is quite possible if the block was pushed out in the previous
phase, and could save some work

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoCreate a backing_dev_info for a lafs filesystem.
NeilBrown [Mon, 9 Aug 2010 04:06:58 +0000 (14:06 +1000)]
Create a backing_dev_info for a lafs filesystem.

As a lafs filesystem can span multiple devices, we need our own
bdi to handle congestion notification and unplugging.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoGive subset objects their own operations
NeilBrown [Fri, 13 Aug 2010 09:36:06 +0000 (19:36 +1000)]
Give subset objects their own operations

And make sure they 'stat' like the sort of directory
that can be used to create them.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAdd missing set_anon_super for subset mounts
NeilBrown [Fri, 13 Aug 2010 06:43:37 +0000 (16:43 +1000)]
Add missing set_anon_super for subset mounts

oops - missed that.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAdd test code for subset mounts.
NeilBrown [Fri, 13 Aug 2010 06:26:55 +0000 (16:26 +1000)]
Add test code for subset mounts.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoMaybe add a bug-on in dirty_dblock
NeilBrown [Fri, 13 Aug 2010 06:26:30 +0000 (16:26 +1000)]
Maybe add a bug-on in dirty_dblock

I think we want this bug_on, but it doesn't quite work
yet - leave it as a reminder of '15ca/' in README

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoBetter handling of changing a Directory into an InodeFile
NeilBrown [Fri, 13 Aug 2010 06:23:28 +0000 (16:23 +1000)]
Better handling of changing a Directory into an InodeFile

- actually change the type !!!
- make sure the on-disk block gets a proper index update.

Note that the checkpointing before creating things in the FS is
important for this to be correct.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoVarious fixes for lafs_get_subset
NeilBrown [Fri, 13 Aug 2010 06:14:27 +0000 (16:14 +1000)]
Various fixes for lafs_get_subset

- make sure root directory is created if it doesn't exist
- also create inode usage map.
- hold a ref on the inode while the fs is mounted.
- free the sb_key at unmount.
- set s_bdi from the prime_sb

13 years agolafs_get_subset: balance locks properly
NeilBrown [Fri, 13 Aug 2010 06:09:53 +0000 (16:09 +1000)]
lafs_get_subset: balance locks properly

We drop the mutex outside the 'if' so we must take it outside
the 'if' too - which is safer as well.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFix lafs_put_super for subset mounts.
NeilBrown [Fri, 13 Aug 2010 06:06:06 +0000 (16:06 +1000)]
Fix lafs_put_super for subset mounts.

- we still need a checkpoint - though not a final one - to ensure
  that all dirty blocks from the fileset are written.

- We it isn't the root for a snapshot, we don't want to put
  to root inode - the root inode will be in the main filesystem,
  not in this one.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoImprove lafs_iget_fs
NeilBrown [Fri, 13 Aug 2010 04:00:56 +0000 (14:00 +1000)]
Improve lafs_iget_fs

Allow getting inodes in other filesystem.

This isn't quite perfect yet though.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoSet PinPending in flush_data_to_inode.
NeilBrown [Fri, 13 Aug 2010 03:59:10 +0000 (13:59 +1000)]
Set PinPending in flush_data_to_inode.

Should always have this set when we pin a block.
It keeps the block pinned until it is dirtied.
As lafs_pin_block does a refile at the end, it can drop the Pinned
state as soon as it is set.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoiget_my_inode - fix for case of ino == NULL
NeilBrown [Fri, 13 Aug 2010 03:56:57 +0000 (13:56 +1000)]
iget_my_inode - fix for case of ino == NULL

igrab doesn't handle NULL inodes, so we must.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agolafs_write_end: set new file size correctly.
NeilBrown [Fri, 13 Aug 2010 03:55:46 +0000 (13:55 +1000)]
lafs_write_end: set new file size correctly.

We were setting the size to the start of the write, not the end!!

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoDiscard filesys field from lafs_inode
NeilBrown [Wed, 11 Aug 2010 23:42:56 +0000 (09:42 +1000)]
Discard filesys field from lafs_inode

i_sb can be used just as well.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoChange filesys arg of lafs_new_inode to struct super_block
NeilBrown [Wed, 11 Aug 2010 22:57:55 +0000 (08:57 +1000)]
Change filesys arg of lafs_new_inode to struct super_block

It is more direct in most cases to use a super_block rather
than a filesys inode.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoMake lafs_new_inode work when given an explicit inode number.
NeilBrown [Tue, 10 Aug 2010 05:31:12 +0000 (15:31 +1000)]
Make lafs_new_inode work when given an explicit inode number.

In this case imni->mb isn't set, so we have to cope with that.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoChoose_free_inum: never return number below 16
NeilBrown [Tue, 10 Aug 2010 05:16:03 +0000 (15:16 +1000)]
Choose_free_inum: never return number below 16

They are for internal use.

Also fix a missing B_PinPending setting.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAdd 'filesys' arg to lafs_new_inode
NeilBrown [Mon, 9 Aug 2010 11:17:11 +0000 (21:17 +1000)]
Add 'filesys' arg to lafs_new_inode

This allows it to be called with dir == NULL - when creating an inode
that isn't in a directory.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoMake dir arg to lafs_new_inode optional.
NeilBrown [Mon, 9 Aug 2010 11:06:34 +0000 (21:06 +1000)]
Make dir arg to lafs_new_inode optional.

After all, some inodes will be created without a directory (root and
other special inodes).

Make inodbp optional too.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoRevise handling of filesystem inconsistency: nlink == 0
NeilBrown [Mon, 9 Aug 2010 10:43:15 +0000 (20:43 +1000)]
Revise handling of filesystem inconsistency: nlink == 0

Our handling wasn't really correct, and made the less-safe
assumption.
So change it to simply increment the linkcount.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agocheckpoint must wait for both dblock and iblock of root to change phase.
NeilBrown [Mon, 9 Aug 2010 10:26:25 +0000 (20:26 +1000)]
checkpoint must wait for both dblock and iblock of root to change phase.

Only waiting for iblock isn't enough - dblock might still be in the
old phase, which gets rather confusing.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoUnpin data blocks from previous phase before allowing them to be dirty.
NeilBrown [Mon, 9 Aug 2010 10:24:20 +0000 (20:24 +1000)]
Unpin data blocks from previous phase before allowing them to be dirty.

While checkpointing will unpin PinPending blocks, it might not
manage to do it before the block gets Dirtied again.
So before we Pin the block - which is a required precursor to dirtying
them, unpin the block.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoBe more careful about waking cleaner in cluster_end_io.
NeilBrown [Mon, 9 Aug 2010 10:20:16 +0000 (20:20 +1000)]
Be more careful about waking cleaner in cluster_end_io.

If done was set as well as wake, we didn't wake the cleaner
so *FlushNeeded wouldn't necessarily be effective.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoDon't release an orphan just because an inode cannot be found.
NeilBrown [Mon, 9 Aug 2010 10:17:09 +0000 (20:17 +1000)]
Don't release an orphan just because an inode cannot be found.

This is over-reacting.  We could be between last_iput and setting
I_Delete for example, so orphan_release would be premature and wrong.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoREADME update
NeilBrown [Mon, 9 Aug 2010 02:41:42 +0000 (12:41 +1000)]
README update

13 years agoUse async erase_block in inode orphan handling
NeilBrown [Mon, 9 Aug 2010 02:34:47 +0000 (12:34 +1000)]
Use async erase_block in inode orphan handling

otherwise we could deadlock.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAdd tracing for when we actually wait for writeback.
NeilBrown [Mon, 9 Aug 2010 02:06:26 +0000 (12:06 +1000)]
Add tracing for when we actually wait for writeback.

This helps track deadlock bugs, just like the similar code
in lafs_iolock_block.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agorcu locking protection for ->my_inode
NeilBrown [Sun, 1 Aug 2010 03:53:23 +0000 (13:53 +1000)]
rcu locking protection for ->my_inode

We use rcu to free inodes, and use rcu locking to protect
access to ->my_inode.

Part of this required that once I_Deleting is set it stays set,
so remove the pointless clearing of it.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoUse lafs_iget_fs rather than multiple get_blocks in orphan lookup.
NeilBrown [Sun, 1 Aug 2010 03:38:06 +0000 (13:38 +1000)]
Use lafs_iget_fs rather than multiple get_blocks in orphan lookup.

When compacting the orphan table so so changing the orphan
slot for a block, use lafs_iget_fs to help find the orphan block.
This avoids allocating blocks if the inodes exist (which they
should).

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agohold ref to inode for directory orphans.
NeilBrown [Sun, 1 Aug 2010 03:01:10 +0000 (13:01 +1000)]
hold ref to inode for directory orphans.

When we directory block is an orphan, make sure we hold
a reference on the inode so it cannot disappear on us.

Signed-off-by: NeilBrown <neilb@suse.de>