writeout.doc

   1
   2 Thinking through reservation for writeout ... again.
   3
   4 We would like to be able to deny or delay all
   5 updates incase space is not available.
   6 In general, 'deny' is appropriate if the the block in the
   7 file hasn't been allocated yet, and 'delay' is appropriate
   8 if the block has been allocated, as writing will
   9 eventually free something up.
  10
  11 Sometimes we need to reserve the right to write
  12 a block quite a lot later.  We might reserve that
  13 right several times.  Each time might happen in
  14 a different checkpoint and the space is used multiple
  15 times.  Further: index blocks may need to be written
  16 in order to write out a block.  So for each reservation
  17 we take against a block, we really don't know
  18 may many storage blocks might be needed.  This
  19 is awkward.
  20 So:
  21  Every write-reservation we take on a block
  22  preallocates 4 storage blocks, 2 each for
  23  this phase and next phase, one for the block
  24  and one to contribute towards indexes.
  25  Note that this is index growth.  Pre-existing
  26  index blocks need some preallocation of their
  27  own.
  28
  29  When a block passes through a phase change and is written
  30  out, we know there is space to write out once more,
  31  But we might need to write out multiple times.
  32  We force allocation and essentially block all further
  33  new data writes until some cleaning has happen.
  34
  35 Writes that happen though system calls such as directory updates and
  36 'write' can easily be blocked until enough space is available.
  37 (except for index block space).
  38
  39 Write due to dirty mmapped memory is a little harder.
  40 Maybe:
  41   When a write request is made for a block that cannot
  42   immediately be allocated more space, we unmap it
  43   thus requiring nopage to bring it back in.
  44   We block in the nopage operation so when the filesystem
  45   is very full, we might be pausing a lot waiting for
  46   checkpoints and cleaning to happen.
  47
  48
  49 Question: how much waiting is allowed between allocating
  50 and using?
  51  We might have to read from disk.
  52  We might kmalloc which could mean writing some things to disk.
  53 If we could limit this to one phase change, we could be more
  54 comfortable about all the preallocation.
  55 But how would we enforce that.  The second phase change
  56 would have to wait for allocations in the previous to be used.
  57
  58 Hmmmm...
  59 Biggest problem seems to be index block which might need to be
  60 written every checkpoint.
  61
  62 I guess we do just block everything until we can allocate
  63 enough space.
  64
  65 So actually doing a write either refreshes the allocation,
  66 or unmaps the block and makes sure any future attempt
  67 to map or write it blocks.
  68 But some reservations might have already happened.
  69 We need to allow them to commit.  When?
  70 We need to know precisely what is being waited for,
  71 and ensure that once we hit problems things start to fail
  72 so we back out or commit quickly.
  73
  74 --------------------------------
  75 We don't have multiple locks on a datablock. It is either locked for
  76 writing or not.
  77
  78 If we want a datablock to be written in a particular checkpoint
  79 we:
  80      checkpointlock
  81      try to lock
  82      If that fails, either give up, or
  83          unlock,
  84          wait for a checkpoint
  85          retry
  86
  87 We give up if there are any new-block allocations.  We only retry
  88 if all blocks we try to write have been allocated space previously.
  89 (*)
  90
  91 If we don't care when a datablock is written we
  92  Try to lock outside the checkpoint, blocking if appropriate
  93  On success we allow writes to happen, either via syscall
  94    or mmap
  95  When a flush or whatever finally writes the block we either
  96  relock or unmap the block thus blocking future writes.
  97 As 'locking' reserves enough space for 2 writes we don't have to
  98 unmap before writing unless a previous refill failed.
  99
 100
 101
 102 (*) If we take a snapshot then updating existing blocks may not be
 103  possible - no amount of cleaning will free up space until a snapshot
 104  is dropped.  I guess that is primarily a sysadmin problem.  If space
 105  runs out that badly snapshots must be dropped before progress is
 106  possible.
 107