2 I want to use "find next block" rather than "find previous block".
3 It has the advantage that we could up until we hit the block number.
4 But I want to start in '0'.
6 Entry with hash X is stored in first block at or after X+1.
7 If there is no such block, it is stored in '0'.
8 Thus when we walk for readdir, we start at block 1, and finish at
14 POSIX (and NFS) requires that we can find an entry by name or by a
17 So, we treat the directory file like a hash table with internal
18 chaining. The chain function is simple sequencing. i.e. if the
19 hash key we want is in use, increment and try again.
21 We store a random secret in the directory inode so that knowledge of
22 the hash values cannot be used to generate a large number of colliding
23 entries (except for testing).
25 The hash for a name is then generated from the secret and the name,
26 probably using 'tea' or whatever ext3 does.
28 The directory file is sparse. The entry with a given key will be
29 stored in the allocated block (non-hole) with the highest index that
30 is not greater than the key.
32 A block can contain multiple entries. They are arranged in an AVL
33 tree. Each entry is aligned on 1/256 of the block size and contains:
35 fore/back pointers (1 byte each)
37 hash-chain info (2bits +).
38 AVL 'longer' flags (2 bits).
43 The hash value for an entry is computed by calculating the hash for
44 the name and adding some number from the hash-chain info:
45 If this is 0-1 - that number is added
46 If 2 - a single byte follows which is added
47 If 3 - 4 bytes follow to be added.
49 An entry may be kept for a deleted name to ensure that internal
50 chains are not broken. In this case the inode will be 0 and possibly
51 the len will be zero and the hash offset will be 4 bytes.
53 Each directory block is divided into 256 'pieces'. An entry starts
54 at the start of a piece, This one byte can index an entry. The first
55 piece is used for a header, so indexes are always non-zero.
57 - head of AVL tree (one byte)
58 - last piece in use (one byte)
59 - number of pieces used by removed entries (one byte)
63 We use a 31bit hash. This still allows for 2billion entries,
64 and means we can add '.' and '..' with indexes without pushing
68 To find a name is a directory we calculate the hash and probe for
70 To "probe" means to find the last allocated block with an index not
71 greater than the value, then search within that block.
72 If the value isn't found, the name does not exist.
73 If the value is found and the entry has the right name, the name
75 Otherwise we increment that hash value and probe again.
78 Readdir walks the blocks of the file in order, and returns
79 (non-deleted) entries in hash order. '.' is 0, '..' is 1, others
83 To add an entry, repeat the probe pattern of lookup, stopping when
84 a missing hash or a deleted entry is found. Use that hash number,
85 removing the 'deleted' entry if found.
86 If there is room in the block, add the new entry in the same block.
87 If not, we try repacking the directory block first. If there
88 is still not enough space, move half the entries into a new block
89 and write it at an appropriate index. This will involve copying all
90 entries which may free up quite a bit of space.
93 To remove any entry we first find it, then find the entry with a
94 hash one less. If this entry does not exist, we can remove the
95 target entry, otherwise we simply mark it as deleted.
96 If 'one less' is in a separate block, we do not go looking for it,
97 but instead schedule the block for orphan handling.
98 If the entry is removed, we look up the next index. If it is marked
99 as deleted, we remove it and try again. If the 'next' index is
100 in a separate block, we similarly schedule orphan handling.
103 If a block is processed as an orphan, there are two tasks required:
105 1/ lookup the index one less than the block number. If that
106 does not exist, then remove any leading 'deleted' entries in
108 2/ load the 'next' block and if the leading entry is 'deleted',
109 schedule orphan processing on it.
111 ----------------------------------------------------
114 Any update is committed by a transaction that records the operation.
115 Each transaction has a name, an inode number, a type, and optionally
116 a blocknumber (well, it is always there but often ignored).
119 LINK - create the name as a new link to the inode
120 UNLINK - remove the name. If it is a directory, take care of
121 the linkcount in the parent as well.
122 REN_SOURCE - The name is about to be moved. Record the name
123 and record the bnum to track 'this' rename
124 REN_NEW_TARGET - This name didn't exist before but is collecting
125 a previos REN_SOURCE. Effect the rename at this point.
126 REN_OLD_TARGET - remove this name, and replace it with the matching
127 REN_SOURCE which gets unlinked.
131 While a directory update is happening, the directory must be locked
132 against other updates (sorry).
134 Perform a lookup to find where changes might happen,
135 preallocate all memory and reserve space in the filesystem.
137 If that works for all updates we:
138 lock against a checkpoint
140 complete the require commit steps
141 including registering orphans
142 unlock the checkpoint
143 schedule the orphan handling
145 If that doesn't work we
146 return any free blocks
148 Then unlock the directory.