]> git.neil.brown.name Git - LaFS.git/log
LaFS.git
13 years agoREADME update master
NeilBrown [Wed, 4 May 2011 07:07:12 +0000 (17:07 +1000)]
README update

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFORMAT CHANGE Various changes to 'superblock' format.
NeilBrown [Wed, 4 May 2011 07:04:36 +0000 (17:04 +1000)]
FORMAT CHANGE Various changes to 'superblock' format.

Simplify version number.
Allow reference to readonly base.
Allow arbitrary config options.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoPrepare for more parallelism in write clusters.
NeilBrown [Wed, 4 May 2011 05:17:40 +0000 (15:17 +1000)]
Prepare for more parallelism in write clusters.

We want to allow write clusters to be written in parallel to
multiple devices.  That means we need a different Verify mechanism.
Introduce - but don't yet implement - VerifyDevNext{,2} which allows
a cluster to be verified by the next header on the same device.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFORMAT CHANGE add timestamp to group head in write cluster
NeilBrown [Wed, 4 May 2011 03:47:01 +0000 (13:47 +1000)]
FORMAT CHANGE add timestamp to group head in write cluster

This allows accurate preservation of timestamps during
roll forward.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFORMAT CHANGE use 32bit block counts in segusage file.
NeilBrown [Wed, 4 May 2011 03:15:00 +0000 (13:15 +1000)]
FORMAT CHANGE use 32bit block counts in segusage file.

This allows much bigger segments which can be useful.

Continue to use 16bit youth numbers.  It isn't clear that this is
needed, but condensing the range for old segments seems to make sense.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFORMAT CHANGE use crc32 instead of toy checksum for cluster head
NeilBrown [Tue, 3 May 2011 05:47:09 +0000 (15:47 +1000)]
FORMAT CHANGE use crc32 instead of toy checksum for cluster head

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFORMAT CHANGE: add parent field to InodeFile
NeilBrown [Tue, 3 May 2011 05:29:22 +0000 (15:29 +1000)]
FORMAT CHANGE: add parent field to InodeFile

As an InodeFile for a subset looks like a directory,
it needs a 'parent' pointer.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoroll: add a possibly-useful assertion.
NeilBrown [Tue, 3 May 2011 05:13:46 +0000 (15:13 +1000)]
roll: add a possibly-useful assertion.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agomisc comment fixes
NeilBrown [Tue, 3 May 2011 05:12:43 +0000 (15:12 +1000)]
misc comment fixes

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoerase_dblock - add a random clear_bit
NeilBrown [Tue, 3 May 2011 05:09:30 +0000 (15:09 +1000)]
erase_dblock - add a random clear_bit

It shouldn't be set, and it is safe if it definitely isn't.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoCleaner: force checkpoint after an emergency clean.
NeilBrown [Tue, 3 May 2011 05:08:45 +0000 (15:08 +1000)]
Cleaner: force checkpoint after an emergency clean.

The check on clean.cnt isn't enough to stop consecutive
emergency cleans, so force a checkpoint.  The checkpoint
won't happen until the cleaner stops.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agolafs_do_clean - reindent and reformat a bit
NeilBrown [Tue, 3 May 2011 04:19:44 +0000 (14:19 +1000)]
lafs_do_clean - reindent and reformat a bit

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agocleaner: revise rules for emergency clean.
NeilBrown [Tue, 3 May 2011 04:16:11 +0000 (14:16 +1000)]
cleaner: revise rules for emergency clean.

Normally we only clean when there is free space available for
dedicated cleaning segments.
However if space is tight we might need to clean to the main segment.
Do this only if there are no 'clean' segments (otherwise force a
checkpoint so those clean segments become free).

clean_reserved doesn't really factor into this decision.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agolafs_free_get: fix refcounting problem.
NeilBrown [Tue, 3 May 2011 03:55:39 +0000 (13:55 +1000)]
lafs_free_get: fix refcounting problem.

We are re-using 'ssum' here and can lose a reference.
So use a new ssum2 instead, and fix the BUG trigger.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agosubset: add dump lookup and readdir routines.
NeilBrown [Mon, 2 May 2011 02:03:16 +0000 (12:03 +1000)]
subset: add dump lookup and readdir routines.

For a subset filesystem object to look convincingly like and
empty directory it needs lookup and readdir routines which
do nothing useful.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoREADME update
NeilBrown [Mon, 2 May 2011 01:41:14 +0000 (11:41 +1000)]
README update

13 years agoatime: make sure block is reserved before updating.
NeilBrown [Mon, 2 May 2011 01:33:23 +0000 (11:33 +1000)]
atime: make sure block is reserved before updating.

We need to 'reserve' space for the atime file blocks before updating
them.  This could of course fail.  If it does we could lose the
update, but as long as that is very rare it shouldn't be a problem.

When we fix things so the block hardly ever gets written, we should
also fix it to reserve the block when we first take the reference.
Then also refresh the reservation after writing.

Also fix up some comments and stuff.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoroll: make sure unlinked inodes before orphans.
NeilBrown [Mon, 28 Mar 2011 10:20:32 +0000 (21:20 +1100)]
roll: make sure unlinked inodes before orphans.

If we find an unlinked inode during roll-forward it should end up
being an orphan.
But we don't look at the orphan file soon enough, and if it isn't in
the orphan file then we can BUG, which is bad.
So if we do find one, make it an orphan.  If it turns out to be in the
orphan file that entry will just be discarded.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoChange to self-test scheme
NeilBrown [Sun, 27 Mar 2011 21:59:40 +0000 (08:59 +1100)]
Change to self-test scheme

Not read yet.

13 years agoUpdate to 2.6.38
NeilBrown [Sun, 20 Mar 2011 22:40:07 +0000 (09:40 +1100)]
Update to 2.6.38

- evict inode replaces delete_inode/clear_inode
    and don't need drop_inode
- setattr changes for truncate sequence
- changes to sync_file
- new shrinker signature
- REQ_UNPLUG replaces BIO_RW_UNPLUG
- open_bdev_exclusive replaced by blkdev_get_by_path

and various bug fixes.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoMount: need to return the superblock with s_umount held.
NeilBrown [Sun, 20 Mar 2011 21:22:34 +0000 (08:22 +1100)]
Mount: need to return the superblock with s_umount held.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoStop creating a super_block for each subset filesystem.
NeilBrown [Mon, 7 Mar 2011 06:00:03 +0000 (17:00 +1100)]
Stop creating a super_block for each subset filesystem.

Rather, we use just one super_block and differentiate based on
->filesys.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoChange various other functions to take an inode rather than a superblock.
NeilBrown [Mon, 7 Mar 2011 05:58:57 +0000 (16:58 +1100)]
Change various other functions to take an inode rather than a superblock.

This is a natural follow-on from previous patch.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agolafs_iget takes a filesystem-inode rather than a super_block.
NeilBrown [Mon, 7 Mar 2011 05:58:56 +0000 (16:58 +1100)]
lafs_iget takes a filesystem-inode rather than a super_block.

On the path to reducing the number of superblocks.... each superblock
can hold multiple subset filesystems so we need more than the
inode number to identify a file - we need the filesystem inode too.

So change lafs_iget to take a filesystem inode rather than a
superblock.

We still need to get at the superblock.  In a few patches we can just
use ->i_sb, but for now it might not be the same so stash it in
i_private.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoReplace lots of ino_from_sb uses.
NeilBrown [Mon, 7 Mar 2011 05:58:53 +0000 (16:58 +1100)]
Replace lots of ino_from_sb uses.

Most of these can now simply dereference ->filesys.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAdd 'filesys' pointer to lafs_inode
NeilBrown [Mon, 7 Mar 2011 05:48:37 +0000 (16:48 +1100)]
Add 'filesys' pointer to lafs_inode

This is the first patch in a series to switch from having one
struct super_block for each subset filesystem, to only having one
for each snapshot.  i.e. one which is writable and some number
which are read-only snapshots.
The struct super_block described the primary filesystem and
all of the subset filesystems.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoImprove test script to catch some errors.
NeilBrown [Sat, 5 Mar 2011 00:25:20 +0000 (11:25 +1100)]
Improve test script to catch some errors.

Make sure failures in the subset filesystem don't go unnoticed.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAll seg_addr to report the address at the end of the cluster
NeilBrown [Sat, 5 Mar 2011 00:25:20 +0000 (11:25 +1100)]
All seg_addr to report the address at the end of the cluster

This is needed as the first address of the next cluster.
We don't check for errors in very many places anyway so this
is won't cause and problems.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFix error-return value from seg_addr
NeilBrown [Sat, 5 Mar 2011 00:25:20 +0000 (11:25 +1100)]
Fix error-return value from seg_addr

Returning '0' as an error indicator from seg_addr is bad
because 0 can be a valid address - the first block in the first
segment (used as a cluster header).

So return "-1" instead (as an unsigned of course).

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoSmall fix for 'df' on subset filesystems
NeilBrown [Sat, 5 Mar 2011 00:25:20 +0000 (11:25 +1100)]
Small fix for 'df' on subset filesystems

We should never report more available space than free space.

'free' space might be limited by the blocks_allowed, while
the 'available' space calculation is for the total filesystem.
So if free is actually less than the calculated 'available', we must
reduce the 'available'.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoDon't try to set access time on objects that don't record access time.
NeilBrown [Sat, 5 Mar 2011 00:25:04 +0000 (11:25 +1100)]
Don't try to set access time on objects that don't record access time.

e.g. TypeInodeFile, TypeSegmentMap etc.

Doing so can corrupt other data.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agofix a BUG assertion in lafs_dirty_iblock.
NeilBrown [Fri, 4 Mar 2011 23:44:23 +0000 (10:44 +1100)]
fix a BUG assertion in lafs_dirty_iblock.

The InoIdx block for an inode with depth==0 is not Valid,
as it contains no index information.
However it still can be appropriate to mark it dirty when a data block
is allocated, to ensure incorporation happens correctly.

So it is not a bug to dirty a non-Valid iblock - at least if the depth
is 0.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFix block number selection in roll-forward
NeilBrown [Fri, 4 Mar 2011 23:44:23 +0000 (10:44 +1100)]
Fix block number selection in roll-forward

When I changed this to use lafs_seg_next I broke it, and didn't
test properly.
So enhance the testing to catch this.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoHandle mount-time errors better.
NeilBrown [Fri, 4 Mar 2011 23:44:23 +0000 (10:44 +1100)]
Handle mount-time errors better.

If we get an error during mount/roll-forward, we need to be careful
during shutdown that we don't assume the filesystem was active.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agocheckpin: don't hold references on primary superblock.
NeilBrown [Fri, 4 Mar 2011 23:44:23 +0000 (10:44 +1100)]
checkpin: don't hold references on primary superblock.

This isn't really a need to hold a reference on the primary
superblock as everything else does and when the last reference goes
it will stop the cleaner so these references won't be needed any.

This is only an interim solution - we will be removing the
multiple superblocks soon and all this will go away.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAvoid unbalanced refcount on fs if lafs_iget fails.
NeilBrown [Fri, 4 Mar 2011 23:44:23 +0000 (10:44 +1100)]
Avoid unbalanced refcount on fs if lafs_iget fails.

if lafs_iget_fs is about to fail, we don't want to gold any
references, so be more careful.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoMake sure the inode's bdi matches the superblock.
NeilBrown [Fri, 4 Mar 2011 23:44:23 +0000 (10:44 +1100)]
Make sure the inode's bdi matches the superblock.

inode_init_always defaults i_data.backing_dev_info to
the bdi for sb->s_bdev if that exists, otherwise
default_backing_dev_info.

Why it doesn't just use sb->s_bdi is not clear.
But in any case it doesn't so we must.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agogetattr: make sure atime reported is repeatable.
NeilBrown [Fri, 4 Mar 2011 23:44:22 +0000 (10:44 +1100)]
getattr: make sure atime reported is repeatable.

We cannot allow getattr to simply report i_atime as we may not
store it with sufficient granularity.

So always recompute for the stored values, which are kept in
the in-mem inode.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoStore atime safely when dirty_inode is called.
NeilBrown [Fri, 4 Mar 2011 23:44:22 +0000 (10:44 +1100)]
Store atime safely when dirty_inode is called.

When dirty_inode is called the atime might have been the only thing
updated.
If it was, then we want to record it in the atime file and not mark
the inode dirty.
If it wasn't (or if the inode is easily marked dirty) we want to
simply mark the inode dirty and make sure the correct atime is
recorded when the inode is flushed.

So detect the cases based on whether the inode dblock is available and
pinned.  When it isn't just update the atime-delta in the atime file.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoaccesstime: load delta from atime file and apply when loading inode.
NeilBrown [Fri, 4 Mar 2011 23:44:22 +0000 (10:44 +1100)]
accesstime: load delta from atime file and apply when loading inode.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoStart implementation of separate access-time file.
NeilBrown [Fri, 4 Mar 2011 23:44:02 +0000 (10:44 +1100)]
Start implementation of separate access-time file.

Add README documentation,
load file at mount time, release at unmount.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoREADME update
NeilBrown [Fri, 4 Mar 2011 01:47:31 +0000 (12:47 +1100)]
README update

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agosync: wait for checkpoint to actually finish on a sync
NeilBrown [Fri, 4 Mar 2011 01:47:31 +0000 (12:47 +1100)]
sync: wait for checkpoint to actually finish on a sync

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agostatfs fixes.
NeilBrown [Fri, 4 Mar 2011 01:47:31 +0000 (12:47 +1100)]
statfs fixes.

1/ use space for 'this' filesystem, not primary filesystem
2/ include more entropy in fsuid
3/ make different filesets have different fsuid.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agostateblock: fix range check on maxsnapshot
NeilBrown [Fri, 4 Mar 2011 01:47:31 +0000 (12:47 +1100)]
stateblock: fix range check on maxsnapshot

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agodevblock: include size of device in checks on validity of devblock.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
devblock: include size of device in checks on validity of devblock.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agodevblock: validate the addresses of the state blocks a bit better.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
devblock: validate the addresses of the state blocks a bit better.

Allow all to be at start or end, but ensure they don't overlap data.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAvoid a race in lafs_get_cleanable.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
Avoid a race in lafs_get_cleanable.

If we find a cleanable segment that is actually clean, then we
drop the lock and try to add it.  If something else removed it at
just this time we end up with a refcount issue as we are meant to
take references to youth_db and usage0_db when adding things to the
table, and we don't have them any more.

So simply allow the 'add_clean' to fail as that is safe.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agosegments: make sure the 'last' pointer is always correct.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
segments: make sure the 'last' pointer is always correct.

A couple of places forgot to handle this properly.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAllow for the possibilty of 'free_blocks' going negative.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
Allow for the possibilty of 'free_blocks' going negative.

While this shouldn't happen, the value of free_blocks isn't really
valid until the first segment scan completes.  If something gets
subtracted before that it could go negative temporarily.
We really want to handle that sort of situation gracefully.
So make it a 'signed' value.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoHold an extra block reference in lafs_seg_ref_block.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
Hold an extra block reference in lafs_seg_ref_block.

See the comment in the code to understand why.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAvoid quadratic searches in segment table.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
Avoid quadratic searches in segment table.

There are a couple of time that we need to iterate of the segment
table (stable) and need to drop the lock each time.
Rather than already restart after we find something of interest,
keep track of when we make a change to the table (thus possibly
breaking an iterator) and only restart when that change is seen.

Both the iterators are completely serialised so resetting
stable_changed when we retry cannot confuse a different iterator.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoroll: permit minimal mini-block handling for regular files.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
roll: permit minimal mini-block handling for regular files.

If the miniblock is at the start of the file, write it out
to the file.  This allows inodes which contain data to commit
that data.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoroll: handle directory updates.
NeilBrown [Fri, 4 Mar 2011 01:47:30 +0000 (12:47 +1100)]
roll: handle directory updates.

Handle directory updates found during roll-forward by making the
relevant change to the directory and inode.

This has not been tested yet!!

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoroll: change handling of distinction between rename to existing and non-existing...
NeilBrown [Fri, 4 Mar 2011 01:47:29 +0000 (12:47 +1100)]
roll: change handling of distinction between rename to existing and non-existing name

Having DIROP_REN_TARGET_OLD and DIROP_REN_TARGET_NEW is not needed as
the miniblock also holds the inode number from the target which can be
zero or non-zero.
So remove that distinction and just use the inode number.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoREADME update
NeilBrown [Fri, 4 Mar 2011 01:47:29 +0000 (12:47 +1100)]
README update

After a break of  4.5 months.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoroll: handle cluster headers larger than one page.
NeilBrown [Fri, 4 Mar 2011 01:47:26 +0000 (12:47 +1100)]
roll: handle cluster headers larger than one page.

If roll-forward finds cluster headers larger than one page, it now
tries to allocate that much space and if the allocation succeeds the
roll-forward will not succeed, rather than always failing.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoChange 'kvm' to 'qemu-kvm'.
NeilBrown [Sun, 27 Feb 2011 22:28:32 +0000 (09:28 +1100)]
Change 'kvm' to 'qemu-kvm'.

openSUSE seem to use a different name to Debian..

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agomore error tests in lafs_mount
NeilBrown [Mon, 18 Oct 2010 01:00:41 +0000 (12:00 +1100)]
more error tests in lafs_mount

Nothing particularly interesting, just catching all possible errors
and handling gracefully.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFix lafs_seg_next to talk write-cluster properly.
NeilBrown [Mon, 18 Oct 2010 00:48:22 +0000 (11:48 +1100)]
Fix lafs_seg_next to talk write-cluster properly.

When we have a 2D layout, we want to walk down columns before
across rows, as that encourages adjacent blocks.

So fix lafs_seg_next to do the right thing.

Also check for exceeding the declared size of the segment and
react accordingly.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agorollforward: use segpos functions to walk the addresses in the write cluster.
NeilBrown [Mon, 18 Oct 2010 00:32:02 +0000 (11:32 +1100)]
rollforward: use segpos functions to walk the addresses in the write cluster.

This ensures we get the right addresses in a 2-D striped layout.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agorollforward: don't get upset about lafs_add_block_address returning 0.
NeilBrown [Sun, 17 Oct 2010 23:49:58 +0000 (10:49 +1100)]
rollforward: don't get upset about lafs_add_block_address returning 0.

This is (now) quite acceptable, we simply retry.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agorollforward: user writeback during add_block_address
NeilBrown [Sun, 17 Oct 2010 23:46:18 +0000 (10:46 +1100)]
rollforward: user writeback during add_block_address

For consistency with other callers of lafs_add_block_address,
hold writeback rather than iolock over this call.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agorollforward: finish DescHole handling.
NeilBrown [Sun, 17 Oct 2010 23:43:33 +0000 (10:43 +1100)]
rollforward: finish DescHole handling.

This should cause DescHole to be handled correctly.  But as we don't
generate it yet, it is hard to be sure.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoHandle new inodes during roll-forward.
NeilBrown [Sun, 17 Oct 2010 23:32:33 +0000 (10:32 +1100)]
Handle new inodes during roll-forward.

If we find an update for a new inode, we need to allow that,
and also clear the relevant bit in the inode map.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agolafs_get_block: be consistent in interpreting return value.
NeilBrown [Sun, 17 Oct 2010 22:52:40 +0000 (09:52 +1100)]
lafs_get_block: be consistent in interpreting return value.

This returns NULL on failure, not an ERR_PTR, so check for that
as, not for IS_ERR

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoroll-forward: don't iput any nlink==0 inode before roll-forward finishes.
NeilBrown [Fri, 15 Oct 2010 04:05:37 +0000 (15:05 +1100)]
roll-forward: don't iput any nlink==0 inode before roll-forward finishes.

Such inodes might still need to participate in roll-forward, and might
yet get linked into the directory tree.  So don't risk them being
deleted yet.
So any inode with nlink==0 is put on the list to be dealt with at the
end.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoRoll-forward: change handling of 'desc' type.
NeilBrown [Fri, 15 Oct 2010 02:50:37 +0000 (13:50 +1100)]
Roll-forward: change handling of 'desc' type.

Move handling of DescHole/DescIndex and flg out of
roll_mini/roll_block and into roll_one where they are easier to
handle.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoroll-forwards : review and minor fixes / cleanups.
NeilBrown [Mon, 11 Oct 2010 01:47:26 +0000 (12:47 +1100)]
roll-forwards : review and minor fixes / cleanups.

Now have a nice list of issues to address.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agocluster.c - various tidy-ups.
Neil Brown [Mon, 11 Oct 2010 07:12:15 +0000 (18:12 +1100)]
cluster.c - various tidy-ups.

factor some common code,
remove some 'inline' defnitions
general cleanup

13 years agoREADME update
NeilBrown [Sun, 10 Oct 2010 23:57:41 +0000 (10:57 +1100)]
README update

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoWhile truncating, hold ref on superblock as well as inode.
NeilBrown [Sun, 10 Oct 2010 23:47:33 +0000 (10:47 +1100)]
While truncating, hold ref on superblock as well as inode.

An unmount of a subsect filesystem could happen while still truncating
a file on the filesystem.  So hold a ref to the superblock so that it
doesn't go away.
When deleting, we cannot grab the inode, so we need to modify
lafs_iput_fs to not do an iput in that case.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoUse igrab_fs for I_Pinned handling.
NeilBrown [Sun, 10 Oct 2010 23:05:12 +0000 (10:05 +1100)]
Use igrab_fs for I_Pinned handling.

We hold references on inodes when the InoIdx block is pinned.
This is needed for cleaning to make sure the inode doesn't disappear.

But we also need the superblock to be held in this context.
So use lafs_igrab_fs

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoUse igrab_fs to hod refcounts on the inodes of orphans.
NeilBrown [Sun, 10 Oct 2010 22:58:36 +0000 (09:58 +1100)]
Use igrab_fs to hod refcounts on the inodes of orphans.

We currently hold a refcount on the inodes of dir orphans.
We need the filesystem (super_block) as well.

Also, while we don't really need a similar refcount for inode orphans,
it doesn't hurt.

So simplify the tracking of whether we need to take such a refcount,
use iget_fs to grab the super_block as well, and also take a ref in
lafs_add_orphans, which was missing.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFix lafs_iget_fs for subset filesystems.
NeilBrown [Sun, 3 Oct 2010 09:33:16 +0000 (20:33 +1100)]
Fix lafs_iget_fs for subset filesystems.

This requires spliting code out from the s_get function so
we can just get a super_block given the parent inode.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoReturn ref on sb as well as ino from lafs_iget_fs
NeilBrown [Sun, 3 Oct 2010 09:04:27 +0000 (20:04 +1100)]
Return ref on sb as well as ino from lafs_iget_fs

As the sb might not be mounted, we need to hold a reference.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoroll forward clean up
NeilBrown [Sat, 2 Oct 2010 11:56:29 +0000 (21:56 +1000)]
roll forward clean up

More validation
improved mem allocation
general clean up

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoREADME/comment update
NeilBrown [Sat, 2 Oct 2010 11:23:41 +0000 (21:23 +1000)]
README/comment update

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoBe more robust in face of read errors on orphan list.
NeilBrown [Sat, 2 Oct 2010 11:15:39 +0000 (21:15 +1000)]
Be more robust in face of read errors on orphan list.

This isn't *very* robust, but we shouldn't BUG now.  Worst case
is we loose some orphans and some orphan slots.  fsck will have
to deal with that.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoorphan: replace some pointless tests with BUGs.
NeilBrown [Sat, 2 Oct 2010 06:56:55 +0000 (16:56 +1000)]
orphan: replace some pointless tests with BUGs.

All orphan file blocks always have a reference and are Valid,
so lots of testing is not needed.

Also, make sure this really is true when reading the orphan file
at mount time.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoCombine dirty_iblock with setting Realloc on iblock.
NeilBrown [Sat, 2 Oct 2010 05:13:25 +0000 (15:13 +1000)]
Combine dirty_iblock with setting Realloc on iblock.

This makes it easier to do the right thing on iblocks that
we have just split ... not that we would expect that when cleaning,
but lots of things are possible, and elegant code is good.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoDon't use I_ICredit for UnincCredit when cleaning.
NeilBrown [Sat, 2 Oct 2010 04:31:21 +0000 (14:31 +1000)]
Don't use I_ICredit for UnincCredit when cleaning.

ICredit is only to be used when dirtying a block, so any setting of
UnincCredit for cleaning must get the credit from elsewhere.  If no
such credit is available, fall back on dirtying the block.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoMinor comments etc
NeilBrown [Sat, 2 Oct 2010 02:51:50 +0000 (12:51 +1000)]
Minor comments etc

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoUse wait_on_bit / wake_bit to get more wait_queues
NeilBrown [Sat, 2 Oct 2010 02:41:00 +0000 (12:41 +1000)]
Use wait_on_bit / wake_bit to get more wait_queues

... rather than having just one wait queue for all IO.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoREADME update and comment fix.
NeilBrown [Sat, 2 Oct 2010 01:52:16 +0000 (11:52 +1000)]
README update and comment fix.

Indeed, there is nothing we can do about errors during truncate,
except ignore them.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agouse little-endian bit operations for inode usage map.
NeilBrown [Sat, 2 Oct 2010 01:46:04 +0000 (11:46 +1000)]
use little-endian bit operations for inode usage map.

Must not use hos-endian here, so use generic 'le' operations.
Also protect all operations with i_mutex.  Even if the bitops
were atomic, we need the locking when punching a hole in the file,
or adding a new block.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoAdd proper locking to inode_handle_orphan
NeilBrown [Sat, 2 Oct 2010 00:59:32 +0000 (10:59 +1000)]
Add proper locking to inode_handle_orphan

When walking the indexblock looking for things to purge
we need to hold the inode private_lock.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoImprove handling of snapshot name.
NeilBrown [Sat, 2 Oct 2010 00:42:34 +0000 (10:42 +1000)]
Improve handling of snapshot name.

Name stored in fileset inode is now variable length and empty
on subordinate filesets.  Snapshots have space for a name depending
on how much space was allocated when fs was created.

Name is only used for snapshots.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFix calculation of table_size
NeilBrown [Fri, 1 Oct 2010 12:40:13 +0000 (22:40 +1000)]
Fix calculation of table_size

I was confused about which table I was sizing.
This is the table of which there are several in the segusage  files.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoMinor formatting improvements.
NeilBrown [Fri, 1 Oct 2010 12:38:48 +0000 (22:38 +1000)]
Minor formatting improvements.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoFix bug in sort_block
NeilBrown [Fri, 24 Sep 2010 01:43:42 +0000 (11:43 +1000)]
Fix bug in sort_block

It doesn't handle 2 blocks the same - which isn't a big deal, but it
is best to have the code 'right'

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoUpdate filesys mtime
NeilBrown [Sun, 19 Sep 2010 12:09:00 +0000 (22:09 +1000)]
Update filesys mtime

Do this when inode is dirtied.  Maybe this isn't the perfect time,
but it is fairly good for now.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agouse i_mtime for file-set update time.
NeilBrown [Sun, 19 Sep 2010 11:59:35 +0000 (21:59 +1000)]
use i_mtime for file-set update time.

No need to have a separate field in the inode.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoProcess orphan file during mount.
NeilBrown [Sun, 19 Sep 2010 11:46:46 +0000 (21:46 +1000)]
Process orphan file during mount.

We need to read the orphan file to set nextfree, and then to
add all the blocks to the orphan list.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoREADME update
NeilBrown [Sun, 19 Sep 2010 04:42:24 +0000 (14:42 +1000)]
README update

13 years agoroll-forward: update youth block for new segments.
NeilBrown [Sun, 19 Sep 2010 04:39:31 +0000 (14:39 +1000)]
roll-forward: update youth block for new segments.

Any new segments found during roll-forward need their youth value set.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoUpdate youth during seg_apply_all
NeilBrown [Sun, 19 Sep 2010 04:23:18 +0000 (14:23 +1000)]
Update youth during seg_apply_all

If we added blocks to a segment is seg_apply_all, make sure the youth
has been updated properly.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agosplit out set_youth.
NeilBrown [Sun, 19 Sep 2010 04:16:00 +0000 (14:16 +1000)]
split out set_youth.

Setting of the youth value is now a separate function.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoGeneralise segunused to seg_pop
NeilBrown [Sun, 19 Sep 2010 04:05:41 +0000 (14:05 +1000)]
Generalise segunused to seg_pop

And use seg_pop more broadly,
and re-arrange some code.

Signed-off-by: NeilBrown <neilb@suse.de>
13 years agoIntroduce DelayYouth
NeilBrown [Sat, 18 Sep 2010 13:00:48 +0000 (23:00 +1000)]
Introduce DelayYouth

When writing out the accounting blocks we need to not update the youth
block if we happen to start a new segment.
We already do that at unmount time, so generalise it with a new flag.

Signed-Off-By: NeilBrown <neilb@suse.de>