I suggested in my other article on AVL insertion that deletion would probably be fairly symetric to insertion. It isn't, though there are similarities.
Deletion is somewhat more complicated than insertion, but not greatly so. Like insertion, it can be done neatly without recursion and using constant storage (no stack necessary). The cost of constant storage is that some comparisons need to be done twice (though on average, it is less than one per deletion).
These extra comparisons can be avoided by using one bit of storage for each level of the tree, which is less than twice the base-2 logarithm of the number of elements in the tree. If the tree is stored in memory, then the storage equivalent of two pointers will provide enough bits. This is small enough to be considered constant.
I have found other descriptions of non-recursive AVL deletion elsewhere on the web, specifically at http://benpfaff.org/avl/algorithm.ps and http://www.stanford.edu/~blp/avl/libavl.html/Deleting-from-an-AVL-Tree.html.
I don't find either of these treatments particularly easy to read, but that might be my problem, not their's. If you don't like what I have written, please try one of those.
Enough introduction - on to the code.
Deletion from an AVL tree
To delete a node from an AVL tree, we obviously need to find it, and then unlink it. A first attempt might look like: int avl_delete_NOT(node *treep, value_t target) { /* delete the target from the tree, returning 1 on success * or 0 if it wasn't found */ node tree = *treep; while (tree && tree->value != target) { direction dir = (target > value); treep = &tree->next[dir]; tree = *treep; } if (!tree) return 0; if (tree->next[LEFT] == NULL) *treep = tree->next[RIGHT]; else if (tree->next[RIGHT] == NULL) *treep = tree->next[LEFT]; else MAGIC(); free(tree); return 1; }
It should not take long to see that this cannot work, except in a few cases. Firstly, it relys on MAGIC if the node to be deleted has two children, and secondly it doesn't give any consideration to preserving the balance properties of the AVL tree.
We will deal with these two problems in turn.
Deleting an interior node
It is easy to remove a node when it only has one child, as that child can replace the deleted node as the child of the grandparent. But what should we do when the target node as two children?The easiest approach is to swap the node with a node that does have at most one child. This can work providing we can find such a node that is adjacent to the target node in the sort-order. A few moments consideration will confirm that if a node has two children, then there must be a node both immediately above and below it in the sort order, and that both of these must be further down the tree than the target node, and that neither of these can have two children (as if they did, then one of the children would lie between that node and the target, which contradicts the selection of that node).
Some approaches to coding this swap and deletion consider a number of cases. There is the case where the target has a NULL pointer on the left, the case where there is a NULL on the right, the case whether the left child has a NULL on the right, and then the case there the node to be swapped is much further down the tree. I prefer to avoid such a breakdown into cases if at all possible. If we can produce code where a swap always happens, but where the two items being swapped may sometimes happen to be the same item, then we can manage to swap without any conditionals and this produces cleaner code, and given the pipelineing in modern processors and the effects that branches have on those pipelines, it may well produce faster code.
This goal is quite achieveable as we will see. The key is to describe the general case in the right way, so that all the special cases mentioned above are included in that general case.
We start by determining to search not for the target item itself, but for an hypothetical item that would lie between the target and it's immeidate predecessor in the sort order. This can be done with a minor change on the loop given in the first example:
while (tree) { direction dir = (target > value); treep = &tree->next[dir]; tree = *treep; } This will effectively find the point between the target and the element immediately below it. The node that holds this point is the one that we will swap with the target. The above code does not actually provide that node, nor that target, so we need to modify it slightly to keep these values.
direction dir; node *targetp; while(tree) { dir = (target > value); if (target == value) targetp = treep; if (tree->next[dir] == NULL) break; treep = &tree->next[dir]; tree = *treep; } if (targetp == NULL) return 0;
When this loop completes, *targetp will point to the target node, and *treep will point the last node we found on the search through the tree. dir will show which side of treep is known to be empty (NULL). Note that targetp and treep could quite possibly be the same.
Now we can unlink the bottom node (tree) and swap it in place of the target node (*targetp) with a few simple assignments. void avl_swap_del(node *targetp, node *treep, direction dir) { node targetn = *targetp; node tree = *treep; *targetp = tree; *treep = tree->next[1-dir]; tree->next[LEFT] = targetn->next[LEFT]; tree->next[RIGHT] = targetn->next[RIGHT]; tree->longer = targetn->longer; free(targetn); }
A careful consideration of this code in each of the interesting cases shows that it does the right thing. Sometimes some of the assignments have no effect (they assign a value that is already present). Sometimes they might change a value only to change it back again. But in each case the net result is to remove the target value and the leave a well-defined tree with values in the correct order. It does not however preserve the balance properties of an AVL tree. These must now be dealt with next after presenting the whole routine as it now stands.
int avl_delete_closer(node *treep, value_t target) { /* delete the target from the tree, returning 1 on success * or 0 if it wasn't found */ node tree = *treep; direction dir; node *targetp, targetn; while(tree) { dir = (target > value); if (target == value) targetp = treep; if (tree->next[dir] == NULL) break; treep = &tree->next[dir]; tree = *treep; } if (targetp == NULL) return 0; avl_swap_del(targetp, treep, dir); retun 1; }
Restoring Balance
When studying insertion, we found that tree growth below some nodes would not cause tree growth at those nodes, and so nothing further up the tree would need to be rebalanced, while other nodes were such that tree growth beneath them would propagate up as tree growth at that node itself. Considering the path from the root down to the insertion point, some prefix of this path would not need any changes, while the remainder of the path would. Of this remainder, the top node might need to undergo a rotation, and the remaining nodes would just need their balance value (longer) updated.
The case with delete has distinct similarities, yet substantial differences. Of the path down from the root to the node which is swapped with the target, some prefix does not see any tree shrinkage at all and can be left untouch. The remaining nodes need attention. There is a greater variety of attention that is needed than with insert. While insert only needed one rotation, delete may need rotation at every level.
To discover what attention is needed, and which nodes can be ignored, we need to discover the effect on several sorts of nodes of the shrinkage of a subtree. If the shrinkage of a subtree causes the node's tree to shrink, the change must propagate up. If it doesn't, then all higher nodes in the tree will not be affected.
Shrinkage beneath a balanced node
In each case we will consider shrinkage of the right subtree. Shrinkage of the left subtree has identical effects, in a mirror image. The code will be identical providing it is parameterised by dir. In the pictures we draw, dir will be assumed to be RIGHT or 1, but the code will use dir explicitly.
The balanced node case (node->longer == NEITHER) is the simplest. When a subtree shrinks, the tree stays the same height, but the balance changes. The longer subtree will be the one in the other direction. So if we see a balanced node on the way down the tree, we know we can ignore everything above it, and can correct it's balance with
node->longer = 1-dir;
Shrinkage on the longer path
The case of a node where the longer path shrinks (i.e. node->longer == dir) is nearly as simple. The balance can be corrected simply by recording that the node is now balanced:
node->longer == NEITHER;
However in this case, the tree from this node shrinks, so the parent must be considered for updates as well.
Shrinkage on the shorter path - case 1
If the shorter path from the node sees shrinkage (node->longer == 1-dir) then we obviously cannot simply update the balance record - we need to perform a rotation. There are three cases we need to consider here depending on the balance of the first node down that longer path (note that we can be sure there will be a first node).
In that case where that first node is balanced, we perform a 2-point rotation like this: D(x+2) B(x+2) / \ / \ B(x+1) E(x) ===> A(x) D(x+1) / \ / \ A(x) C(x) C(x) E(x-1) where the number in parentheses is the depth of the tree at each node (noting that E has shrunk from x to x-1). After this rotation, B is longer on the direction of shrinkage (B->longer = dir), D is longer in the other direction (D->longer = dir-1) and the tree has not changed height (so the parent of this tree does not see any change).
Shrinkage on the shorter path - case 2
The second case is where the first node (down the longer path from a node where the shorter path sees shrinkage) is unbalanced in the same way as the node in question. This case can also be handled with a 2-point rotation. D(x+2) B(x+1) / \ / \ B(x+1) E(x) ===> A(x) D(x) / \ / \ A(x) C(x-1) C(x-1) E(x-1) In this case, the tree shrinks (from x+2 to x+1) and the two rotated nodes (B and D) finish balanced.
Shrinkage on the shorter path - case 3
The third case for shinkage on the shorter path, and the last case over all, is where the first node down the longer path is balanced in the other direction to the top node. i.e. node->next[1-dir]->balance == dir. This case requires a 3-point rotation to produce an AVL tree. F(x+2) ___D(x+1)__ / \ / \ B(x+1) G(x) B(x) F(x) / \ ===> / \ / \ A(x-1) D(x) A(x-1) C(x-?) E(x-?) G(x-1) / \ C(x-?) E(x-?) The nodes C and E are either at depth x-1 or x-2 depending on the balance of D. At most one of them can be at x-2.
This rotation reduces the height of the tree and creates a balanced node D at the top. B and F may be balanced, or may have a longer path leading outwards, depending on the length of C and E (which depend inturn on the balance of D).
Refining the Code
Now that we (hopefully) have an understanding of how the rebalancing has to happen we can start refining the code to achieve this. As we walk down the tree searching for the deletion point, we keep track of the lowest node that will not shrink, knowing that all nodes above this node do not require rebalancing at all. The nodes which do not shrink are those which are balanced (node->longer == NEITHER) and those which fit case 1 above (node->longer == 1-dir && node->next[1-dir]->longer == NEITHER). Thus the searching loop become: node *path_top = treep; while(tree) { dir = (target > value); if (target == value) targetp = treep; if (tree->next[dir] == NULL) break; if (tree->longer == NEITHER || (tree->longer == 1-dir && tree->next[1-dir]->longer == NEITHER)) path_top = treep; treep = &tree->next[dir]; tree = *treep; }
Now we need to rebalance each node from *tree_top down to the node that will be swapped out (though we don't actually rebalance that last node). Some rebalancing only requires updating the ->longer field and can be done in-line, but other nodes need rotation. This rotation is very similar to that used in the insertion case, but is sufficiently different to require new functions. Cases 1 and 2 of 2-point rotation can be handled with the same function as the only different is the resulting balance values. void avl_rotate_2_shrink(node *path_top, direction dir) { /* The path on the "dir" of *path_top is about to shrink. * Perform a 2-point rotation to preserve balance. */ node D, B,C; D = *path_top; B = D->next[1-dir]; C = B->next[dir]; *path_top = B; B->next[dir] = D; B->next[1-dir] = C; if (Balanced(B)) { B->longer = dir; D->longer = 1-dir; } else { B->longer = NEITHER; D->longer = NEITHER; } } It is worth noting that the conditional for setting the new ->longer values at the end of that function could be written more succintly as
D->longer = -dir - B->longer; B->longer = (1 - D->longer) | (D->longer >> 1);
however it is not clear that this would be an improvement, given how obscure the code becomes.
The 3-point balance is again very similar to the insert case. void avl_rotate_3_shrink(node *path_top, direction dir) { node B, C, D, E, F; F = *path_top; B = F->next[1-dir]; D = B->next[dir]; C = D->next[1-dir]; E = D->next[dir]; *path_top = D; D->next[1-dir] = B; D->next[dir] = F; B->next[dir] = C; F->next[1-dir] = E; B->longer = NEITHER; F->longer = NEITHER; if (D->longer == dir) B->longer = 1-dir; if (D->logner == 1-dir) F->longer = dir; D->longer = NEITHER; }
Putting it all together
We now have code to find the target node, a node that can be swapped in place of it, and the top of the sub-path that needs rebalancing, and we have code to rebalance each node. We now need to put it all together. However that isn't completely straight forward.Both the node swapping, and the rebalancing need to track where certain nodes are linked into their parents, and this linkage can be changed by both swapping and rebalancing. So we need to decide which to do first, and we need to watch out for awkward changes while doing the first.
It turns out that it is easiest if the rebalance is performed first and the swap is performed last. One reason for this is that a swap will change the result of comparison as the swapped node, so we cannot walk down the rebalancing path so easily. Another reason is that there are fewer nodes involved in the swap, so there is less to keep track of.
The swap involves two nodes, and upper node (the target) and a lower node (the end of the path), and the pointers to them. The lowest node to be rebalanced is the node above the last node in the path (the last node doesn't need rebalancing as it is about to be removed) and as rebalancing a node doesn't change it's pointer to the next node in the path (it primarily affects things above and on the other side) we know that the pointer to the lowest node in the path does not change. This means we only need to keep track of the pointer to upper of the two nodes that will be swapped. i.e. the pointer to targetn which is stored at targetp.
Thus the code of rebalancing should be called between the search and the swap, and should keep targetp up to date, as follows. node *avl_rebalance_del(node *treep, value_t target, node *targetp) { node targetn = *targetp; while(1) { node tree = *treep; direction dir = (target > tree->value); if (tree->next[dir] == NULL) break; if (Balanced(tree)) tree->longer = 1-dir; else if (tree->longer == dir) tree->longer = NEITHER; else { /* a rotation is needed, and targetp might change */ if (tree->next[1-dir]->longer == dir) avl_rotate_3_shrink(treep, dir); else avl_rotate_2_shrink(treep, dir); if (tree == targetn) *targetp = &(*treep)->next[dir]; } treep = &tree->next[dir]; } return targetp; }
Putting this together with the search and swap code we get int avl_delete(node *treep, value_t target) { /* delete the target from the tree, returning 1 on success * or 0 if it wasn't found */ node tree = *treep; direction dir; node *targetp, targetn; while(tree) { dir = (target > value); if (target == value) targetp = treep; if (tree->next[dir] == NULL) break; if (tree->longer == NEITHER || (tree->longer == 1-dir && tree->next[1-dir]->longer == NEITHER)) path_top = treep; treep = &tree->next[dir]; tree = *treep; } if (targetp == NULL) return 0; targetp = avl_rebalance_del(path_top, target, targetp); avl_swap_del(targetp, treep, dir); return 1; } and there you have a non-recursive AVL tree deletion using constant storage space.
Code
The code for this, and the insertion code is available in avl2.c. It is inside a test program which accepts integers on the command line and will insert each integer that isn't already present, and will delete each integer which is present. The resulting tree is printed out in an ASCII-art format.
Comments...
Re: Non-recursive algorithm for AVL tree deletion (23 October 2009, 06:58 UTC)
cool write up and code. i have two different compilers that generate different results using this code and wonder how it could have happened. The bug being that in the rebalance_del code the check is for tree->next[dir]==NULL yet in my crash case on one compiler tree->next[1_dir] is NULL while tree->next[dir] isnt so trying to use that pointer (1-dir) obviously fails. Ever seen something like that?
Re: Non-recursive algorithm for AVL tree deletion (24 October 2009, 08:45 UTC)
turns out that making the variable "longer" a 2 bit bitfield is what was causing the other compiler to have issues, but only if using the Balanced(node) call (didnt matter if it was inline or not).

Re: Non-recursive algorithm for AVL tree deletion (03 June 2007, 01:17 UTC)
thanks for the code .keep up the good work.it was a tremendous help indeed*
[permalink][hide]