NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
Avoid a race in lafs_get_cleanable.
If we find a cleanable segment that is actually clean, then we
drop the lock and try to add it. If something else removed it at
just this time we end up with a refcount issue as we are meant to
take references to youth_db and usage0_db when adding things to the
table, and we don't have them any more.
So simply allow the 'add_clean' to fail as that is safe.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
Allow for the possibilty of 'free_blocks' going negative.
While this shouldn't happen, the value of free_blocks isn't really
valid until the first segment scan completes. If something gets
subtracted before that it could go negative temporarily.
We really want to handle that sort of situation gracefully.
So make it a 'signed' value.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
Avoid quadratic searches in segment table.
There are a couple of time that we need to iterate of the segment
table (stable) and need to drop the lock each time.
Rather than already restart after we find something of interest,
keep track of when we make a change to the table (thus possibly
breaking an iterator) and only restart when that change is seen.
Both the iterators are completely serialised so resetting
stable_changed when we retry cannot confuse a different iterator.
NeilBrown [Fri, 4 Mar 2011 01:47:29 +0000 (12:47 +1100)]
roll: change handling of distinction between rename to existing and non-existing name
Having DIROP_REN_TARGET_OLD and DIROP_REN_TARGET_NEW is not needed as
the miniblock also holds the inode number from the target which can be
zero or non-zero.
So remove that distinction and just use the inode number.
NeilBrown [Fri, 4 Mar 2011 01:47:26 +0000 (12:47 +1100)]
roll: handle cluster headers larger than one page.
If roll-forward finds cluster headers larger than one page, it now
tries to allocate that much space and if the allocation succeeds the
roll-forward will not succeed, rather than always failing.
NeilBrown [Fri, 15 Oct 2010 04:05:37 +0000 (15:05 +1100)]
roll-forward: don't iput any nlink==0 inode before roll-forward finishes.
Such inodes might still need to participate in roll-forward, and might
yet get linked into the directory tree. So don't risk them being
deleted yet.
So any inode with nlink==0 is put on the list to be dealt with at the
end.
NeilBrown [Sun, 10 Oct 2010 23:47:33 +0000 (10:47 +1100)]
While truncating, hold ref on superblock as well as inode.
An unmount of a subsect filesystem could happen while still truncating
a file on the filesystem. So hold a ref to the superblock so that it
doesn't go away.
When deleting, we cannot grab the inode, so we need to modify
lafs_iput_fs to not do an iput in that case.
NeilBrown [Sun, 10 Oct 2010 22:58:36 +0000 (09:58 +1100)]
Use igrab_fs to hod refcounts on the inodes of orphans.
We currently hold a refcount on the inodes of dir orphans.
We need the filesystem (super_block) as well.
Also, while we don't really need a similar refcount for inode orphans,
it doesn't hurt.
So simplify the tracking of whether we need to take such a refcount,
use iget_fs to grab the super_block as well, and also take a ref in
lafs_add_orphans, which was missing.
NeilBrown [Sat, 2 Oct 2010 05:13:25 +0000 (15:13 +1000)]
Combine dirty_iblock with setting Realloc on iblock.
This makes it easier to do the right thing on iblocks that
we have just split ... not that we would expect that when cleaning,
but lots of things are possible, and elegant code is good.
NeilBrown [Sat, 2 Oct 2010 04:31:21 +0000 (14:31 +1000)]
Don't use I_ICredit for UnincCredit when cleaning.
ICredit is only to be used when dirtying a block, so any setting of
UnincCredit for cleaning must get the credit from elsewhere. If no
such credit is available, fall back on dirtying the block.
NeilBrown [Sat, 2 Oct 2010 01:46:04 +0000 (11:46 +1000)]
use little-endian bit operations for inode usage map.
Must not use hos-endian here, so use generic 'le' operations.
Also protect all operations with i_mutex. Even if the bitops
were atomic, we need the locking when punching a hole in the file,
or adding a new block.
NeilBrown [Sat, 2 Oct 2010 00:42:34 +0000 (10:42 +1000)]
Improve handling of snapshot name.
Name stored in fileset inode is now variable length and empty
on subordinate filesets. Snapshots have space for a name depending
on how much space was allocated when fs was created.
When writing out the accounting blocks we need to not update the youth
block if we happen to start a new segment.
We already do that at unmount time, so generalise it with a new flag.
An address of '0' is not consecutive with and address of '1' and while
it is very unlikely to ever be a problem, make sure we don't try to
combine those addresses into a range in the uninc table.
Also remove a comment about a possible problem that doesn't seem to be
a real problem.
We don't need the 'onlru' variable any more.
And we shouldn't really delete data blocks from the lru at that
point, as they could be on a cluster or io-pending list.
Index blocks are safe as their refcount is zero, so they can only
be on the leaf lru.
There is some code in the wrong place - probably a hang over from a
previous arrangement before we made lafs_is_leaf a function.
No function change here.
We cannot include it in an update, so just make sure it goes in the
next write cluster. This will be before an sync or fsync and
roll-forward should pick it up, so all is OK
We now allow both iblock and dblock to be pinned at the same time.
So when pinning the inode dblock, just do it and don't go bothering
the inode iblock.
Refinements for triggering checkpoint when we are low on space.
This is getting messy but seems to work.
Not sure now on the difference between CleanerBlocks and
EmergencyPending.
I guess the one makes sure the cleaner does what it can and then
triggers a checkpoint.
The other prepares for EmergencyClean to be set after the next
checkpoint.
NeilBrown [Sun, 15 Aug 2010 08:30:36 +0000 (18:30 +1000)]
Wait for a checkpoint before returning ENOSPC
If we seem to run out of space, it is worth waiting for
a checkpoint as that might free up some space. So add
an extra step to the sequence leading from 'no space' to 'ENOSPC'.
NeilBrown [Sat, 14 Aug 2010 10:52:11 +0000 (20:52 +1000)]
Allow cleaner_parse to request multiple inodes at once.
Currently cleaner_parse stops when it hits an inode that it cannot
load immediately. This reduced the opportunities for parallelism.
Instream allow up to 16 -EAGAINs from inode lookups.
This requires that we mark headers for inodes which failed, and
always start again from the beginning of the cluster head.
We already reduce the bcnt to 0, so for inodes that can be
found, we won't lookup the blocks twice.
NeilBrown [Sat, 14 Aug 2010 06:48:39 +0000 (16:48 +1000)]
Clean up interaction between cleaner and checkpoint.
If a checkpoint is wanted, the cleaner shouldn't start any more work.
If the cleaner or segscan is active a checkpoint cannot start, but
when they complete they should wake the checkpoint process.