2 Credits allow us to be sure that we can write out all dirty blocks
3 without exhausting the log.
5 Before a block can be marked as 'Dirty' it must hold two credits, one
6 for itself and one to assist with writing index blocks (which may
8 Further, every ancestor must also have the same two credits.
10 If a data block is not Pinned to the current phase, then it and
11 every ancestor must also have a pair of credits for the next phase.
12 If these credits are hard to come by, we can still Dirty a data block
13 by pinning it. This requires a checkpoint lock.
15 Each block has 5 credit flags.
16 2 for this phase (Credit and ICredit)
17 2 for next phase (NCredit and NICredit)
18 1 to hold the ICredit between allocation and incorporation.
20 When we might want to dirty a data block (e.g. PinPending or just
21 before actually dirtying it), we allocate the needed credits. If
22 credits aren't available (i.e. the filesystem is nearly full, or needs
23 cleaning) that might block (if the update is expected to free up space
24 and we are eating into the first reserve) or it might fail (e.g. if we
25 are allocating new space).
27 When we release a data block, we free all the credits. Keeping proper
28 track of all credits is important.
30 When we write a block, the Credit is used to pay for that writeout.
31 The ICredit is moved to UnincCredit until incorporation.
32 If the block is dirtied again, before incorporation, two new credits
33 will be needed. If another writeout happens while UnincCredit is
34 still set, the ICredit is simply returned.
36 When incorporation happens, if no splitting is needed, the UnincCredits
37 from all new blocks are simply returned.
38 If splitting is needed, the UnincCredits are used to fill the Credit
39 requirements of any new block. However there might not be enough.
41 In the worst case, a split is required with just one UnincCredit.
42 This can provide the required Credit, but no ICredit.
43 In this case we need to steal Next-phase credits from lower levels in
44 the tree by pinning data blocks to the current phase....
46 If everything below is pinned to the current phase, we can take our
47 own next-phase credits.... unless we are on a phase-change.
49 Note: Old-phase index blocks are never incorporated if they have
52 Hmmm.. ICredit only needed on index blocks which do not have room for
54 Thus if there are 3 or more children, we get enough credits from them.
55 If there is only 1 or 2, we won't split so don't need ICredit...
59 Index block marked dirty when first child allocated.
60 So it needs to have Credit when not dirty/realloc.
64 1/ All incorporations work.
66 Inherited credits propagate up filling gaps.
69 2/ Incorporation cause an h-split
71 worst case: We have an index block which is full of large extents.
72 One block in the middle of one extent is relocated. This one
73 block caries an ICredit.
75 The index block has an ICredit too, so there are two ICredits.
76 One dirties the new block. One is made available to parent.
77 This pattern propagates up.
79 So: when a set of children are incorporated, at least half of them
80 must have ICredits. These are used to split a block.
81 If a double-split is needed, there must be plenty of credits.
84 3/ Incorporation requires a v-split
86 This is when the inode is full and we need to create an
88 This block is dirtied by the ICredit if the inode block.
89 Which will be filled in again when incorporation happens below.
91 Worst case is that inode is full of large extents. A single block is
92 added. The vsplit happens.
93 The ICredit from the single block becomes the ICredit for the inode.
97 If we incorporate a given index block multiple times in one checkpoint
98 it will lose all it's credits, as we only refresh them when new blocks are
99 dirtied, or on phase change..... no, that's not right.
101 When we allocate any block, we pin all the parents.
102 When we unpin a parent (or re-pin to next phase) we claim the N* credits.
103 If we cannot refill the N* credits, we pin children to the current phase.
105 1/ Steal their N* credits which may give us enough credit to continue, or
106 2/ ensure that after this block is written, there will be no
110 The prepare / pin / commit / abort pattern.
111 ==========================================
112 A number of places in the code we need to pre-allocate a number
113 of different things before we can commit to an operation.
114 Often this will require updating a block, for which we may be
115 required to wait for space to become available (via the cleaner).
116 If different resources need to wait for space, we need to do all the
117 waiting at once. For this purpose, each resource provides 4 methods,
118 prepare, pin, commit, abort
119 prepare does anything that can be done immediately and checks if
120 the operation could credibly continue.
121 pin is run with a checkpoint lock and pins relevant blocks to the
122 current phase. It tries to reserve space, which might fail
123 commit is run (still under the checkpoint lock) if all the pin
124 operations succeeded. It may not fail.
125 abort is run if any pin fails permanently. The prepare will have
126 been run, but pin may not even have been tried.
129 The other time when we wait for space is when doing an unpinned
130 file write. If space isn't available, we wait for the next checkpoint
131 and try again. Much easier.
133 We have lafs_pin_dblock, lafs_reserve_block, lafs_prealloc.
134 lafs_reserve calls prealloc and also does quota reservations.
135 pin_dblock calls reserve and also pin.
136 We really want the reserve separate from the prealloc.
137 The prealloc must happen in a phase-lock. The reserve doesn't.
139 lafs_reserve (sets ->parent and does quota stuff)
141 lafs_prealloc or lafs_pin
146 Ok, maybe I have it now... Let's be as detailed as possible.
148 Every data block has 2 bits signifying pre-allocation. Each bit
149 reflects one credit (block of space). When the block is dirtied
150 or locked, one bit is cleared to reflect the space about to be
151 used. The other is moved to 'Uninccredit'.
153 When Uninc is cleared by incorporation, Unincredit is available
154 for index splitting etc. Note that by this time the block could
155 be pre-allocated or dirty again, which means more blocks
158 Every index block has 4 bits signifying pre-allocation credits.
159 There are two for each phase. One is to write out the block. One is to
160 allow for splitting/incorporation. The second is not required
161 when the index block has room for 3 new allocations (30 bytes of
164 The primary rule is that if a datablock is locked or dirty, then
165 all index blocks up to the root must have at least preallocation
166 in the current phase, and two if there may not be room for 3 more
170 Before a datablocks can be locked or dirtied, it must have
171 preallocation, and all indexblocks above it must have
172 preallocation in both phases. If not, either fail or block.
174 Before an out-of-phase leaf index block can have incorporation,
175 either it must have no dirty child, or it and all index blocks
176 above it must be have all prealloc bits set. i.e. if their
177 is a failure setting some unset bit, then flush all children.
178 We know new children cannot become dirty until the bits are set.
180 Before any in-phase index block can have incorporation, it and
181 all parents must have both bits set. If there is a failure
182 setting any bits, we wait for a phase-change (which probably
185 When incorporation happens there is at least one block provided by
186 every two being incorporated, and each index block being
187 incorporated into has at least one block, or two if it doesn't have
188 room for 4 more indicies.
190 If all indicies fit, then the one that the block owns is used to
191 write it, and the one from the at-least-one child is made available
192 to the parent. Any others are discarded (in the out-of-phase case).
194 If a split is required the next-phase bits are shared. If there
195 aren't 2, then there are no children, and none are needed. If
196 there are, then they go one-each as both blocks will have some
199 If a split is required and there are 4 or more children, then there
200 is one block that the iblock owns, and 2 or more from the children.
201 These are used to write out the two blocks and pass one up. If
202 there are more children incorporated, their bits are given to the
203 next phase as they might then be needed, and possibly passed up as
206 If a split is required and there are 3 or fewer children, then
207 there are two blocks of space that the iblock owns, and 1 or more
208 from the children. These are enough to write out the two blocks and
209 pass 1 set bits on allset bits on allup.
211 When an inode 'splits' (vertically), we leave all bits with the
212 lower block as the inode will be written using the space allocated
213 to the data block. However when there is excess credits after an
214 incorporation, they should be given to the parent incase another
215 depth increase is needed.
217 So, what bits do we need:
218 - 1 - does Uninc carry a credit - sometimes it won't. - B_UnincCredit
219 - 2 - this-phase credits B_Credit B_ICredit
220 - 2 - next-phase credits B_NCredit B_NICredit
222 When we dirty or lock a block, the credits from the appropriate
223 phase move into the dirty/lock bit and the uninc_credit bit.
224 On writing, the dirty bit is cleared
225 On incorporation, the uninc credit bit is used.
227 The next-phase credit bits are never used for data blocks.
229 --------------------------------
231 What causes us to flush everything at a phase change when short of
234 When an index block changes phase, if we cannot allocate credits for
235 the new NextPhase, we Pin all data blocks below, so at the next
236 phase change everything gets written out.
237 --------------------------------
239 Credit and ICredit are set before block is dirty.
240 They are moved to Dirty and UnincCredit when block is dirtied
241 These are then cleared when block is written/incorporated.
242 NCredit/NICredit are only needed for Index blocks.
243 They are transferred to Credit and ICredit at phase change.
244 They then need to be refreshed. If that fails, we need to pin
245 all descendents to the next phase.
247 An InoIdx block doesn't get written out or incorporated. Only the
248 matching data block does.
249 So on phase-flip of an InoIdx, we move Dirty/UnincCredit to the
250 data block and Pin it to the old phase.
251 Once Pinned, metadata updates stay in the inode and don't flush
252 through to the data block. When unpinned, if I_Dirty is set, we
253 flush the inode to the data block