NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
Avoid quadratic searches in segment table.
There are a couple of time that we need to iterate of the segment
table (stable) and need to drop the lock each time.
Rather than already restart after we find something of interest,
keep track of when we make a change to the table (thus possibly
breaking an iterator) and only restart when that change is seen.
Both the iterators are completely serialised so resetting
stable_changed when we retry cannot confuse a different iterator.
NeilBrown [Fri, 4 Mar 2011 01:47:29 +0000 (12:47 +1100)]
roll: change handling of distinction between rename to existing and non-existing name
Having DIROP_REN_TARGET_OLD and DIROP_REN_TARGET_NEW is not needed as
the miniblock also holds the inode number from the target which can be
zero or non-zero.
So remove that distinction and just use the inode number.
NeilBrown [Fri, 4 Mar 2011 01:47:26 +0000 (12:47 +1100)]
roll: handle cluster headers larger than one page.
If roll-forward finds cluster headers larger than one page, it now
tries to allocate that much space and if the allocation succeeds the
roll-forward will not succeed, rather than always failing.
NeilBrown [Fri, 15 Oct 2010 04:05:37 +0000 (15:05 +1100)]
roll-forward: don't iput any nlink==0 inode before roll-forward finishes.
Such inodes might still need to participate in roll-forward, and might
yet get linked into the directory tree. So don't risk them being
deleted yet.
So any inode with nlink==0 is put on the list to be dealt with at the
end.
NeilBrown [Sun, 10 Oct 2010 23:47:33 +0000 (10:47 +1100)]
While truncating, hold ref on superblock as well as inode.
An unmount of a subsect filesystem could happen while still truncating
a file on the filesystem. So hold a ref to the superblock so that it
doesn't go away.
When deleting, we cannot grab the inode, so we need to modify
lafs_iput_fs to not do an iput in that case.
NeilBrown [Sun, 10 Oct 2010 22:58:36 +0000 (09:58 +1100)]
Use igrab_fs to hod refcounts on the inodes of orphans.
We currently hold a refcount on the inodes of dir orphans.
We need the filesystem (super_block) as well.
Also, while we don't really need a similar refcount for inode orphans,
it doesn't hurt.
So simplify the tracking of whether we need to take such a refcount,
use iget_fs to grab the super_block as well, and also take a ref in
lafs_add_orphans, which was missing.
NeilBrown [Sat, 2 Oct 2010 05:13:25 +0000 (15:13 +1000)]
Combine dirty_iblock with setting Realloc on iblock.
This makes it easier to do the right thing on iblocks that
we have just split ... not that we would expect that when cleaning,
but lots of things are possible, and elegant code is good.
NeilBrown [Sat, 2 Oct 2010 04:31:21 +0000 (14:31 +1000)]
Don't use I_ICredit for UnincCredit when cleaning.
ICredit is only to be used when dirtying a block, so any setting of
UnincCredit for cleaning must get the credit from elsewhere. If no
such credit is available, fall back on dirtying the block.
NeilBrown [Sat, 2 Oct 2010 01:46:04 +0000 (11:46 +1000)]
use little-endian bit operations for inode usage map.
Must not use hos-endian here, so use generic 'le' operations.
Also protect all operations with i_mutex. Even if the bitops
were atomic, we need the locking when punching a hole in the file,
or adding a new block.
NeilBrown [Sat, 2 Oct 2010 00:42:34 +0000 (10:42 +1000)]
Improve handling of snapshot name.
Name stored in fileset inode is now variable length and empty
on subordinate filesets. Snapshots have space for a name depending
on how much space was allocated when fs was created.
When writing out the accounting blocks we need to not update the youth
block if we happen to start a new segment.
We already do that at unmount time, so generalise it with a new flag.
An address of '0' is not consecutive with and address of '1' and while
it is very unlikely to ever be a problem, make sure we don't try to
combine those addresses into a range in the uninc table.
Also remove a comment about a possible problem that doesn't seem to be
a real problem.
We don't need the 'onlru' variable any more.
And we shouldn't really delete data blocks from the lru at that
point, as they could be on a cluster or io-pending list.
Index blocks are safe as their refcount is zero, so they can only
be on the leaf lru.
There is some code in the wrong place - probably a hang over from a
previous arrangement before we made lafs_is_leaf a function.
No function change here.
We cannot include it in an update, so just make sure it goes in the
next write cluster. This will be before an sync or fsync and
roll-forward should pick it up, so all is OK
We now allow both iblock and dblock to be pinned at the same time.
So when pinning the inode dblock, just do it and don't go bothering
the inode iblock.
Refinements for triggering checkpoint when we are low on space.
This is getting messy but seems to work.
Not sure now on the difference between CleanerBlocks and
EmergencyPending.
I guess the one makes sure the cleaner does what it can and then
triggers a checkpoint.
The other prepares for EmergencyClean to be set after the next
checkpoint.
NeilBrown [Sun, 15 Aug 2010 08:30:36 +0000 (18:30 +1000)]
Wait for a checkpoint before returning ENOSPC
If we seem to run out of space, it is worth waiting for
a checkpoint as that might free up some space. So add
an extra step to the sequence leading from 'no space' to 'ENOSPC'.
NeilBrown [Sat, 14 Aug 2010 10:52:11 +0000 (20:52 +1000)]
Allow cleaner_parse to request multiple inodes at once.
Currently cleaner_parse stops when it hits an inode that it cannot
load immediately. This reduced the opportunities for parallelism.
Instream allow up to 16 -EAGAINs from inode lookups.
This requires that we mark headers for inodes which failed, and
always start again from the beginning of the cluster head.
We already reduce the bcnt to 0, so for inodes that can be
found, we won't lookup the blocks twice.
NeilBrown [Sat, 14 Aug 2010 06:48:39 +0000 (16:48 +1000)]
Clean up interaction between cleaner and checkpoint.
If a checkpoint is wanted, the cleaner shouldn't start any more work.
If the cleaner or segscan is active a checkpoint cannot start, but
when they complete they should wake the checkpoint process.
NeilBrown [Sat, 14 Aug 2010 05:41:59 +0000 (15:41 +1000)]
Combine cleaning and orphan list_heads.
A datablock is very rarely both an orphan and requiring cleaning, so
having two list_heads is a waste.
If is an orphan it will have full parent linkage and addresses already
so it will be handled promptly and removed from the cleaning list.
So arrange that if a block wants to be both, it is preferentially on
the cleaning list, and when removed from the cleaning list is gets
added back to the pending_orphan list in case it needs processing.
Note that only directory and inode blocks can ever be orphans so some
optimisation of spinlocks is possible.