2 I thought I understood pinning......
4 When a block is 'Pinned', Phase1 is meaningful and the block is pinned
7 This means that it must be considerred for writeout in that phase.
8 (If they aren't dirty, they don't need to be written out).
10 A block must have a parent to be pinned.
11 A block must have a parent to be a parent.
13 Once a block gets a parent, it keeps the parent until
14 it is no longer pinned or dirty, (Hole, Realloc) and has no references.
16 So a parent link hangs around for a while.
18 Once a block is pinned it remains pinned while it is:
21 it has pinned children
22 it has unincorporate changes (which means it is potentially dirty)
23 ** it has dirty (unpinned) children **
25 it has PinPending set (which is only for a short time before possibly
30 Purposes of pinning and related things:
32 1/ Keep the blocks in memory so all indexing information is available
33 without reading from storage.
34 This is achieved by the parent link and a refcount or dirty bit on leafs.
36 2/ Tracking which blocks need to be written at a checkpoint. Given the parent
37 linkage for all possible blocks, this can easily be moved around by setting
38 bits and list manipulations.
40 3/ Reserving space to write out all this blocks with changes.
41 This is achieved by setting various Credit flags up to the root.
44 When a pinned block has a phase change, we need to decide whether to pin
45 it to the next phase. This happens if it has pinned children or unincorporated
46 changes in the new phase. For data blocks, it doesn't happen.
48 We also need to be concerned about allocated space. Primarily we move
49 the N credits to regular Credits. We then need to allocate N Credits.
50 If we cannot, then we need to steal NCredits from a child. Any children will
51 either be locked to this phase, or will have N credits. We can steal the credits
52 and lock. If all children have insufficient Ncredits, they must be locked
53 to this phase, and we lock the current block too.
56 set_parent sets the ->parent link all the way up
57 pin_block sets the pinned bit all the way up
58 refile removes pinned or ->parent as needed
59 reserve sets the Credits
62 Inodes and their data/index blocks.
64 Normally only one of the index and data blocks for an inode should be pinned
66 If the index block exists, it might be pinned to ensure an update.
67 If the index block doesn't exist, the data block can be pinned.
68 When we create an iblock, if the dblock is pinned we must move the
70 The exception to the above is during a checkpoint.
71 When the index block is has dirty/realloc cleared, we transfer the pinning
72 and credits to the data block. It is then promptly written out and
75 There are several different 'dirty' states for an inode.
76 1/ I_Dirty is set if metadata has been changed but has not yet been
77 copied into the datablock. We usually try to avoid this and
78 copy directly into the inode, but phase state might make this impossible.
79 2/ Metadata might be changed which needs to be written out by sync_inode.
80 This is typically mode/owner stuff. We want to write this out even
81 if not writing the full inode, so we write a log entry.
82 If this is needed, we set B_Dirty on the data block for the inode.
83 3/ Indexing information is dirty when a child gets incorporated.
84 This only gets written due to tree flushing when all children are
85 clean. For this we use the B_Dirty flag on the index block.
86 4/ The file lives in the inode. This is like indexing information,
87 and we use B_Dirty on the index block when we copy the file data
88 from the file data block 0.
90 So dirty_inode sets I_Dirty or copies into the datablock, clears I_Dirty
91 and sets datablock->B_Dirty and indexblock->B_Dirty
92 write_inode (called by e.g. fsync) just writes a log entry if the datablock
93 is dirty, and 'cleans' the datablock. Alternately if the index block
94 is not pinned (or has no pinned children) it can be flushed.
95 incorporation will set B_Dirty on the index block (only)
96 flush and checkpoint will write the block (after incorporation) and
97 will clear both B_Dirty flags.
99 An index block gets pinned when a child gets allocated. i.e. there is
100 an address to be incorporated. It stays pinned until everything is
101 incorporated and it is written out.
106 When we write to a data block, we set up the parent relationship
107 and reserve space to the top of the tree. We don't pin anything though.
108 When we allocate the block to a cluster, we pin the block and hence
109 the parent to the current phase.
110 metadata updates associated with writing data, such as size and mtime,
111 get applied lazily... somehow.
112 The 'struct inode' is updated promptly of course.
113 The I_Dirty bit is set so a flush will trigger a writeback
114 We don't journal these changes. roll-forward sets them based on data found.
116 space allocations should be dropped when ->parent is dropped.
118 If we have a number of dirty block beneath an index block, the index block
119 will have credits for 'this' and 'next' phase.
120 When some (but not all) blocks are written, the index block will be pinned
121 and the 'this' credits used.
122 It will then be unpinned rather than phase-flipped, so the 'next' credits
124 When the index block is unpinned, we need to allocate new credits.
125 If we cannot, we must pin all descendant data blocks to this phase.
129 ------------------------------------------------
130 refcounts: how much does refile care about them?
132 1/ if > 0, remove from freelist or ....
133 2/ if 0 for inode data, maybe destroy inode
134 3/ drop parent link when refcount hits zero - maybe
135 4/ move to freelist when refoucnt hits zero - maybe
137 That is all. Doesn't seem worth it...
138 We can remove from the free list lazily, and
141 More interesting is the pincnt.
142 Maybe we want a 'pin' and 'unpin' function which does the
143 deed and propagates the effect.
144 But some changes of other flags affect pinning. So we still need a 'refile'
147 Punching holes should be done separately. Why did we have the refcount thing?
150 Inodes and refcounts to blocks..
151 If the inode hold a refcount on the blocks, they never go away. Sad.
152 ->dblock and ->iblock should not be refcounted... or maybe if ->iblock
153 is present, that puts a refcount on dblock. If iblock disappears, we
154 drop the dblock refcount.... That sounds sensiblish.
157 Only ever on leafs, or internal thing, or free.
158 And when know where it is (Alloc -> internal, Pinned->leafs, nothing->free).
159 So can unlink from free using global spinlock