NeilBrown [Wed, 4 May 2011 05:17:40 +0000 (15:17 +1000)]
Prepare for more parallelism in write clusters.
We want to allow write clusters to be written in parallel to
multiple devices. That means we need a different Verify mechanism.
Introduce - but don't yet implement - VerifyDevNext{,2} which allows
a cluster to be verified by the next header on the same device.
NeilBrown [Tue, 3 May 2011 04:16:11 +0000 (14:16 +1000)]
cleaner: revise rules for emergency clean.
Normally we only clean when there is free space available for
dedicated cleaning segments.
However if space is tight we might need to clean to the main segment.
Do this only if there are no 'clean' segments (otherwise force a
checkpoint so those clean segments become free).
clean_reserved doesn't really factor into this decision.
NeilBrown [Mon, 2 May 2011 01:33:23 +0000 (11:33 +1000)]
atime: make sure block is reserved before updating.
We need to 'reserve' space for the atime file blocks before updating
them. This could of course fail. If it does we could lose the
update, but as long as that is very rare it shouldn't be a problem.
When we fix things so the block hardly ever gets written, we should
also fix it to reserve the block when we first take the reference.
Then also refresh the reservation after writing.
NeilBrown [Mon, 28 Mar 2011 10:20:32 +0000 (21:20 +1100)]
roll: make sure unlinked inodes before orphans.
If we find an unlinked inode during roll-forward it should end up
being an orphan.
But we don't look at the orphan file soon enough, and if it isn't in
the orphan file then we can BUG, which is bad.
So if we do find one, make it an orphan. If it turns out to be in the
orphan file that entry will just be discarded.
NeilBrown [Mon, 7 Mar 2011 05:58:56 +0000 (16:58 +1100)]
lafs_iget takes a filesystem-inode rather than a super_block.
On the path to reducing the number of superblocks.... each superblock
can hold multiple subset filesystems so we need more than the
inode number to identify a file - we need the filesystem inode too.
So change lafs_iget to take a filesystem inode rather than a
superblock.
We still need to get at the superblock. In a few patches we can just
use ->i_sb, but for now it might not be the same so stash it in
i_private.
NeilBrown [Mon, 7 Mar 2011 05:48:37 +0000 (16:48 +1100)]
Add 'filesys' pointer to lafs_inode
This is the first patch in a series to switch from having one
struct super_block for each subset filesystem, to only having one
for each snapshot. i.e. one which is writable and some number
which are read-only snapshots.
The struct super_block described the primary filesystem and
all of the subset filesystems.
NeilBrown [Sat, 5 Mar 2011 00:25:20 +0000 (11:25 +1100)]
Fix error-return value from seg_addr
Returning '0' as an error indicator from seg_addr is bad
because 0 can be a valid address - the first block in the first
segment (used as a cluster header).
So return "-1" instead (as an unsigned of course).
NeilBrown [Sat, 5 Mar 2011 00:25:20 +0000 (11:25 +1100)]
Small fix for 'df' on subset filesystems
We should never report more available space than free space.
'free' space might be limited by the blocks_allowed, while
the 'available' space calculation is for the total filesystem.
So if free is actually less than the calculated 'available', we must
reduce the 'available'.
NeilBrown [Fri, 4 Mar 2011 23:44:23 +0000 (10:44 +1100)]
fix a BUG assertion in lafs_dirty_iblock.
The InoIdx block for an inode with depth==0 is not Valid,
as it contains no index information.
However it still can be appropriate to mark it dirty when a data block
is allocated, to ensure incorporation happens correctly.
So it is not a bug to dirty a non-Valid iblock - at least if the depth
is 0.
NeilBrown [Fri, 4 Mar 2011 23:44:23 +0000 (10:44 +1100)]
checkpin: don't hold references on primary superblock.
This isn't really a need to hold a reference on the primary
superblock as everything else does and when the last reference goes
it will stop the cleaner so these references won't be needed any.
This is only an interim solution - we will be removing the
multiple superblocks soon and all this will go away.
NeilBrown [Fri, 4 Mar 2011 23:44:22 +0000 (10:44 +1100)]
Store atime safely when dirty_inode is called.
When dirty_inode is called the atime might have been the only thing
updated.
If it was, then we want to record it in the atime file and not mark
the inode dirty.
If it wasn't (or if the inode is easily marked dirty) we want to
simply mark the inode dirty and make sure the correct atime is
recorded when the inode is flushed.
So detect the cases based on whether the inode dblock is available and
pinned. When it isn't just update the atime-delta in the atime file.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
Avoid a race in lafs_get_cleanable.
If we find a cleanable segment that is actually clean, then we
drop the lock and try to add it. If something else removed it at
just this time we end up with a refcount issue as we are meant to
take references to youth_db and usage0_db when adding things to the
table, and we don't have them any more.
So simply allow the 'add_clean' to fail as that is safe.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
Allow for the possibilty of 'free_blocks' going negative.
While this shouldn't happen, the value of free_blocks isn't really
valid until the first segment scan completes. If something gets
subtracted before that it could go negative temporarily.
We really want to handle that sort of situation gracefully.
So make it a 'signed' value.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
Avoid quadratic searches in segment table.
There are a couple of time that we need to iterate of the segment
table (stable) and need to drop the lock each time.
Rather than already restart after we find something of interest,
keep track of when we make a change to the table (thus possibly
breaking an iterator) and only restart when that change is seen.
Both the iterators are completely serialised so resetting
stable_changed when we retry cannot confuse a different iterator.
NeilBrown [Fri, 4 Mar 2011 01:47:29 +0000 (12:47 +1100)]
roll: change handling of distinction between rename to existing and non-existing name
Having DIROP_REN_TARGET_OLD and DIROP_REN_TARGET_NEW is not needed as
the miniblock also holds the inode number from the target which can be
zero or non-zero.
So remove that distinction and just use the inode number.
NeilBrown [Fri, 4 Mar 2011 01:47:26 +0000 (12:47 +1100)]
roll: handle cluster headers larger than one page.
If roll-forward finds cluster headers larger than one page, it now
tries to allocate that much space and if the allocation succeeds the
roll-forward will not succeed, rather than always failing.
NeilBrown [Fri, 15 Oct 2010 04:05:37 +0000 (15:05 +1100)]
roll-forward: don't iput any nlink==0 inode before roll-forward finishes.
Such inodes might still need to participate in roll-forward, and might
yet get linked into the directory tree. So don't risk them being
deleted yet.
So any inode with nlink==0 is put on the list to be dealt with at the
end.
NeilBrown [Sun, 10 Oct 2010 23:47:33 +0000 (10:47 +1100)]
While truncating, hold ref on superblock as well as inode.
An unmount of a subsect filesystem could happen while still truncating
a file on the filesystem. So hold a ref to the superblock so that it
doesn't go away.
When deleting, we cannot grab the inode, so we need to modify
lafs_iput_fs to not do an iput in that case.
NeilBrown [Sun, 10 Oct 2010 22:58:36 +0000 (09:58 +1100)]
Use igrab_fs to hod refcounts on the inodes of orphans.
We currently hold a refcount on the inodes of dir orphans.
We need the filesystem (super_block) as well.
Also, while we don't really need a similar refcount for inode orphans,
it doesn't hurt.
So simplify the tracking of whether we need to take such a refcount,
use iget_fs to grab the super_block as well, and also take a ref in
lafs_add_orphans, which was missing.
NeilBrown [Sat, 2 Oct 2010 05:13:25 +0000 (15:13 +1000)]
Combine dirty_iblock with setting Realloc on iblock.
This makes it easier to do the right thing on iblocks that
we have just split ... not that we would expect that when cleaning,
but lots of things are possible, and elegant code is good.
NeilBrown [Sat, 2 Oct 2010 04:31:21 +0000 (14:31 +1000)]
Don't use I_ICredit for UnincCredit when cleaning.
ICredit is only to be used when dirtying a block, so any setting of
UnincCredit for cleaning must get the credit from elsewhere. If no
such credit is available, fall back on dirtying the block.
NeilBrown [Sat, 2 Oct 2010 01:46:04 +0000 (11:46 +1000)]
use little-endian bit operations for inode usage map.
Must not use hos-endian here, so use generic 'le' operations.
Also protect all operations with i_mutex. Even if the bitops
were atomic, we need the locking when punching a hole in the file,
or adding a new block.
NeilBrown [Sat, 2 Oct 2010 00:42:34 +0000 (10:42 +1000)]
Improve handling of snapshot name.
Name stored in fileset inode is now variable length and empty
on subordinate filesets. Snapshots have space for a name depending
on how much space was allocated when fs was created.
When writing out the accounting blocks we need to not update the youth
block if we happen to start a new segment.
We already do that at unmount time, so generalise it with a new flag.