Non-recursive algorithm for AVL tree insertion

24 November 2004, 10:18 UTC

I recently wanted to write code for insertion into an AVL tree. Because this code will eventually (hopefully) live inside the Linux kernel, I wanted it to be non-recursive (as recursive code doesn't sit will in the context of the kernel).

I found that taking a non-recursive approach leads to code that is (arguably) cleaner and more transparent than the traditional recursive approach. In fact, I liked what I came up with so much that I thought I would share it.

AVL overview

An AVL tree, for those who don't know or have forgotten, is a semi-balanced binary search tree. i.e. it is a datastructure composed of nodes where every node has a datum and two pointers to other nodes. It is not perfectly balanced, but there is a good limit on how unbalanced it can be.

This limit is that the two subtrees below any given node can differ in height by at most one level. i.e. either they will be the same height, or the one of the left will be one level higher, or the one on the right will be one level higher. After a new node is added it is possible that the tree will no longer obey this property. However the propery can be restored by performing a "rotation", which is a simple manipulation of 2 or 3 nodes somewhere in the tree.

It is an important and useful propertly that only one rotation - though it could be a so-called double rotation - is needed to restore balance after a single insertion.

Problems with the traditional recursive approach

All the example code that I have seen for AVL has been in a recursive style. The general case inserts into a subtree, and then checks if a rotation is needed at this level. This check happens at every level in the recursion, however we know that it will only be needed at one level.

This fact, that the need to rotate only once is not made apparent in the algorithm, is the particular problem that I see with the recursive approach. It feels like some optimisation could be made based on this fact, and this optimisation opportunity is lost.

The other problem I have with the traditional approach is the use of "left" and "right" pointers and the code duplication this results in. There are often pairs of very similar functions, one that handles a "left" case and one that handles a "right" case. To my eye, abstracting out the direction, and having an array of pointers instead of two explicit pointers makes for more elegant code.

Introduction to my algorithm

Before presenting the code, we need some data structures. As mentioned, I like the subtree pointers to be in an array. The index of this array is a 'direction', whether that is 'fore' and 'back', or 'left' and 'right'. It is important to be able to find the 'other' direction given a direction in a variable. This makes a simple enumerated type like enum {left, right} inappropriate. One possibility is to use a boolean: direction = new Boolean but being a C coder by nature, I prefer to use a small number. So I will just use 0 for 'back' or 'left' and 1 for 'fore' or 'right'. Then the direction that isn't dir must be 1 - dir.

As well as a direction, we need a way to store the current 'balance' at any node in the tree. This 'balance' has one of three values. The possibilties are: that both subtrees are the same height, that the left subtree is longer, or that the right subtree is longer. It is convenient to use the same value for 'left subtree is longer' as for the direction 'left', and for 'right subtree is longer' as for the direction 'right', thus these will be 0 and 1. The third value can be anything else. -1 is possibly a good choice, though 2 or 3 would be equally good (they all fit in 2 bits). We will assume a signed bitfield and use '-1' to say 'neither subtree is longer'.

So, we can now present the basic data structure - in C.

#define LEFT 0 #define RIGHT 1 #define NEITHER -1 typedef int direction; typedef struct node_s { value_t value; struct node_s *next[2]; int longer:2; } *node; #define Balanced(n) (n->longer < 0)

With this structure, and assuming that value_t can be compared with normal comparison operators, a search in an AVL tree would look like this. node avl_find(node tree, value_t target) { while (tree && target != tree->value) { direction next_step = (target > tree->value); tree = tree->next[next_step]; } return tree; }

A simplistic insertion routine that doesn't do any rebalancing (and hence is wrong) could be written. int avl_insert_broken(node *treep, value_t target) { /* insert the target into the tree, returning 1 on success or 0 if it * already existed */ node tree = *treep; while (tree && target != tree->value) { direction next_step = (target > tree->value); treep = &tree->next[next_step]; tree = *treep; } if (tree) return 0; tree = malloc(sizeof(*tree)); tree->next[0] = tree->next[1] = NULL; tree->longer = NEITHER; tree->value = target; *treep = tree; return 1; } Note that the tree is passed as a pointer to a place to store a tree. This is because the root could change, and we need to be able to change it.

Towards AVL Insertion

We know the result that at most one rotation (possibly double) is needed after an insertion, we want to find out where that rotation must take place. To understand this, it helps to imagine the path that is taken down the tree while searching for the insertion point.

Each node on this path has two properties that we need to be aware of. There is the balance value which we call "longest" and can be LEFT, RIGHT, or NEITHER, and there is the direction that we stepped at that point, which is either LEFT or RIGHT.

At each point we need to consider three possibilities concerning which step we actually took. They are as follows.

Either path from a NEITHER node;

If we step down from a NEITHER node, we must be aware that if the subtree grows we will need to change the balance information for this node, but there will be no need for a rotation at this point. The growth is perfectly legal, it just needs to be recorded. Any growth of this subtree will also cause the parent subtree to grow, so the parent node might need attention as well.
The path that isn't longer - the RIGHT path if longer is LEFT, and the LEFT path if longer is RIGHT.
If we step down a path to a shorter subtree, we know that nothing we do will change the height of this tree. The tree may become balanced if the subtree grows, but it will not get larger. This means that we can completely forget about all parents and ancestors of the current node. They won't be affected at all. They will not need their balance value to change, as neither subtree will grow, and they definitely won't need rotating. The most that will be needed at this point is that the node that we stepped down from might change from being unbalanced to being balanced, so we need to remember it to possibly apply that change, but that is all.
The path that is longer - the RIGHT path if longer is RIGHT, or the LEFT path if longer is LEFT.
This is the more interesting case as this is the node at which a rotation might be needed.

It is worth noting first that there is a similarity with the previous case in that when we step down from a node to the longest subtree, that node's tree will not get any larger. If the subtree we descend into grows, a rotation will be needed, but this will result in this tree being the same height as it was before. The only difference at this level is that it will be balanced whereas it wasn't before. This means that when we step down a longer path, we can forget about all the nodes above the one we step down from. They cannot be affected as neither of their subtrees will grow.

This observation, that when we step down a longer path we can forget everything before that node tells us where the rotation has to happen if at all. It has to happen at the last node on the path to the insertion point where we stepped down to a longer subtree. As we also know that we can forget all parents when we step down to a shorter subtree, we can further assert that the place that a rotation happens is the last node where we stepped to a longer subtree, providing that we never subsequently step to a shorter subtree. Thus we only need to rotate if we step to a longer subtree, and then all subsequent nodes are balanced, without longer or shorter subtrees.

The actual rotation that happens (if needed) will depend on what happens immediately after stepping down to a longer subtree. This is where you really need to start drawing pictures, but I will assert without proof that if the next step is in the same direction as the step into a larger subtree, then a single rotation will be needed. If the next step is in the other direction, then a double rotation will be needed.

Now that we have some understanding (I hope) about what we will be doing, we can work out what needs to be recorded on the way down so that rotations and balance changes can be effected properly.

The important data that we need is that part of the path from the last unbalanced node to the insertion point. All nodes in this list other than the first will be balanced, including the newly inserted node. All of these nodes will either take part in a rotation, or will need to have their balance value changed (though the new node won't need it's balance changed. It could take part in a rotation though).

This path may not actually start with an unbalanced node. If every node in the path from the root is balanced, then that whole path needs to be recorded, and every node will have it's balanced changed.

There are many ways that this path could be recorded. One way is to only store the top of the interesting path and to re-calculate the path on the way down, performing the comparison again at each step. Depending on how expensive the comparison function is, this may be unacceptable, or it may be optimal. If we want to record the path separately we could have an array of directions and record each step, or we could add an extra bit to the node data structure and record the steps in there. Which is best will depend on the particular context of the implementation. For the example code here, I will not store the path but will recompute it from the top, which will be recorded.

AVL insertion

Using the above results we can extend the simple (and broken) avl_insert_broken to record the top of the interesting path (the node where, if any, a rotate might happen), and to call out for help to rebalance.

int avl_insert(node *treep, value_t target) { /* insert the target into the tree, returning 1 on success or 0 if it * already existed */ node tree = *treep; node *path_top = treep; while (tree && target != tree->value) { direction next_step = (target > tree->value); if (!Balanced(tree)) path_top = treep; treep = &tree->next[next_step]; tree = *treep; } if (tree) return 0; tree = malloc(sizeof(*tree)); tree->next[0] = tree->next[1] = NULL; tree->longer = NEITHER; tree->value = target; *treep = tree; avl_rebalance(path_top, target); return 1; }

Note that we store the top of the path as a node * as a rotation may need to change the top node of the subtree.

Now we need to implement avl_rebalance. If this top node is balanced, we walk down to where the target was inserted marking the path that we take as longer. If this top node is unbalanced and we take the shortest path, we need to make the top node balanced, and unbalance all the children down to the insert point.

If the top node is unbalanced, we need to perform a rotation (or a double rotation) and from some node beneath that rotation, walk down to the target and modify the balance along the way. Exactly what we do in this case will depend on the direction of the first 1, 2, or 3 steps.

I'll just take a moment at this point to say that I don't like the term "double rotation" either (maybe I was born grumpy...). There are two sorts of rotations that could be needed. One is a rotation of two nodes and is sometimes called a single rotation. The other is a rotation of three nodes and is sometimes called a double rotation, though you could argue that it is only one-and-a-half times the first. I will call them 2-point rotation and 3-point rotation.

void avl_rebalance_path(node path, value_t target) { /* Each node in path is currently balanced. * Until we find target, mark each node as longer * in the direction of target because we know we have * inserted target there */ while (path && target != path->value) { direction next_step = (target > path->value); path->longer = next_step; path = path->next[next_step]; } } void avl_rebalance(node *path_top, value_t target) { node path = *path_top; direction first, second, third; if (Balanced(path)) { avl_rebalance_path(path, target); return; } first = (target > path->value); if (path->longer != first) { /* took the shorter path */ path->longer = NEITHER; avl_rebalance_path(path->next[first], target); return; } /* took the longer path, need to rotate */ second = (target > path->next[first]->value); if (first == second) { /* just a two-point rotate */ path = avl_rotate_2(path_top, first); avl_rebalance_path(path, target); return; } /* fine details of the 3 point rotate depend on the third step. * However there may not be a third step, if the third point of the * rotation is the newly inserted point. In that case we record * the third step as NEITHER */ path = path->next[first]->next[second]; if (target == path->value) third = NEITHER; else third = (target > path->value); path = avl_rotate_3(path_top, first, third); avl_rebalance_path(path, target); }

All that remains is to define the two rotations. You really need to draw some pictures for this. I'll just do some ascii-art for now.

A 2-point rotation looks like this, where B is the path_top where the rotation must happen, and E or below is where the new node was inserted. B D / \ / \ A D ==> B E / \ / \ C E A C

Note that B and D have swapped places, and C has moved from D to B. We know that the D tree was longer than A, and is now even longer due to E growing. So if A has height x, C is x, and E was x but is now (x+1). The new B is also (x+1), so the new D is balanced, the new B is balanced, and we need to update the balance info of E and beyond, so: node avl_rotate_2(node *path_top, direction dir) { node B, C, D, E; B = *path_top; D = B->next[dir]; C = D->next[1-dir]; E = D->next[dir]; *path_top = D; D->next[1-dir] = B; B->next[dir] = C; B->longer = NEITHER; D->longer = NEITHER; return E; }

The 3-point rotation looks like this, where B is the top of the path needing rotation, and the insertion point is D, or at-or-below one for C and E, depending on the value of "third". B _D_ / \ / \ A F B F / \ ===> / \ / \ D G A C E G / \ C E Note that B, F, and D (the first three nodes on the path) have been rotated, and the path to the insertion is through either C or E, unless D was the inserted node. Looking at tree heights, A was and is x, F was x+1 and balanced, so D was x as was and is G. C and E were x-1 and the one that grew is now x.

In the new tree, B is height x+1 and is balanced only if C holds the insertion. F is height x+1 and is balanced only if E holds the insertion. Thus D is of height x+2 (so the tree hasn't changed height) and is balanced. Note that in the special case of D being the insertion, and C and E not existing, 'x' is 0, so B and F are both balanced.

Thus: node avl_rotate_3(node *path_top, direction dir, direction third) { node B, F, D, C, E; B = *path_top; F = B->next[dir]; D = F->next[1-dir]; /* node: C and E can be NULL */ C = D->next[1-dir]; E = D->next[dir]; *path_top = D; D->next[1-dir] = B; D->next[dir] = F; B->next[dir] = C; F->next[1-dir] = E; D->longer = NEITHER; /* assume both trees are balanced */ B->longer = F->longer = NEITHER; if (third == NEITHER) return NULL; if (third == dir) { /* E holds the insertion so B is unbalanced */ B->longer = 1-dir; return E; } else { /* C holds the insertion so F is unbalanced */ F->longer = dir; return C; } } and there you have an AVL insertion, in non-recursive style, using only constant space, but sometimes performing more comparisons that absolutely needed.

Ofcourse, deletions would be useful too. I suspect that the deletion code would be very symetrical to this code, but I haven't worked through it yet, so I cannot say for sure. May when I do, I'll write another article. For now, you can find working C code in the file avl.c.

Licence

The code contained in this article, I place in the Public Domain. Do with it what you will. If you find it helpful, I would like to hear about it though.

The description of the algorthim I don't place in the Public Domain. If you want to publish it elsewhere, please ask....




Comments...

Re: Non-recursive algorithm for AVL tree insertion (14 August 2007, 18:55 UTC)
There are a lot of syntax errors.

[permalink][hide]

Re: Non-recursive algorithm for AVL tree insertion (12 November 2007, 17:48 UTC)
Thank you very much :) There are a couple of problems with the deletion code you provide, but these are fixed in the avl2.c that you provide.

[permalink][hide]

Re: Non-recursive algorithm for AVL tree insertion (15 November 2007, 03:09 UTC)

I have iterative AVL algorithms in C# and C++. I was reading your article and it indicated that information needs to be recorded on the search for the insertion point. The algorithms that I use record no information when searching for the insertion point. In fact, an insertion point can be given from a different search or via some other means and the insertion can still be effected. It is possible to start at the bottom of a tree, insert a node and balance upwards with no previous state information.

Yours Sincerely, Ben McNamara.

[permalink][hide]

Re: Non-recursive algorithm for AVL tree insertion (15 November 2007, 03:56 UTC)

Starting at the bottom and balancing upward suggests that you up upward links, which otherwise are not required.

The information I store on the way down is precisely the path that was used - a series of "left" or "right" steps. If you have enough linkage information in the tree to determine the path given any node in the tree, then you would not need to store any information.

But isn't having all that extra linkage information a bit of a waste of space?

[permalink][hide]

Comment (09 September 2009, 09:32 UTC)
Dear Neil,

I am in need of your AVL tree complete implementation in C. I am requsting you to please do send that to my email address 21.madhu@gmail.com and also I am requesting you to give me some suggestion on how to implelement in the code where on sending a node as a parameter to a function, the function should return an array which contains all the ancistors of that particular node. I am really in need of your help.Please help me out as soon as you can. I will be waiting for your response.

with regards Madhusudhan

[permalink][hide]




[æ]