### Working With Recursive Data Part II: Tree Traversal

In the first post in this series on working with recursive data I talked about several different ways to store the information in a database. Some of them were promising, but they all had complications of some kind or another. I was using the phpBB Doctor Project Manager database design as an example, but there are quite a few different scenarios where recursive data will be found. Since SQL is not a recursive language, I am trying to find the best way to model the data so that I can access it with minimal fuss.

As an example, in my project management system I need to be able to quickly and easily identify the parent task, if the task has any sub-tasks (child records), and which tasks are at the same level (siblings). I would like to be able to traverse the tree in either direction (up to the parent or down to the child) without using recursion. In order to do that, I need a model that is different from anything presented in the prior post.

I am going to return to the initial table design and add a few columns that will look very familiar to phpBB3 MOD authors. The new columns are called “left” and “right” which means this is the new structure for my task table:

+-----------------+-----------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------------+-----------------------+------+-----+---------+----------------+ | task_id | int(11) unsigned | | PRI | NULL | auto_increment | | task_number | mediumint(8) unsigned | | | 0 | | | parent_task_id | int(11) unsigned | | MUL | 0 | | | project_id | mediumint(8) unsigned | | MUL | 0 | | | task_lvl | tinyint(2) unsigned | YES | | 0 | | | task_left | mediumint(8) unsigned | | | 0 | | | task_right | mediumint(8) unsigned | | | 0 | | | task_status | tinyint(2) unsigned | YES | | 0 | | | task_name | varchar(64) | | | | | | estimated_hours | decimal(8,2) | YES | | 0.00 | | +-----------------+-----------------------+------+-----+---------+----------------+

There are three new columns in my task table now, and I will detail each of them next. The `task_left`

and `task_right`

columns are going to give me the structure that I need to do what is called a preorder tree traversal. Don’t worry, I will explain what that means shortly. With pictures. The net of it is I can recreate the task tree with a single query and therefore recursion is not required.

The task level (`task_lvl`

in my example) can be calculated on the fly but I prefer to generate this number and store it as it saves time. The level starts at zero for a top-level task, and is incremented by one for each “step” away from the parent. So if the WBS for a task (see prior post for a definition) is 1.2.1 then that task is at level two. Count the dots and that’s the assigned level for the task. That one is easy.

### Preorder Tree Traversal

I haven’t yet explained how the left and right values are assigned or used. It’s fairly simple, and the best way to describe it is to convert my task data into a tree, as shown here:

For simplicity I have only shown one branch of the task tree. This is a quite pretty picture but I can’t store the picture in the database and use it for any sort of structure. What I need is a way to convert the picture into information, and that’s easily done with what is sometimes called a “depth first” or “preorder tree traversal” process. Without going into all of the math (there is a wiki link at the end of the post for those that are interested) here is how I traverse the tree:

- Pick up a pencil (or a crayon, in this case )
- Put my crayon on the left side of the top node
- Draw around the tree starting down the left side, going around each node, and back up and around until I end up on the right side of the parent node. Don’t lift the crayon, and don’t cross any lines while drawing.

Here’s what I end up with after that process:

Once I have walked the tree (following the crayon marks in the image above), I will go back over the line and each time I pass a node (task) on either the left or the right I add a number. The left side of the parent node starts at one, and I increment by one each time I pass a node. Here is the tree that has been traversed and numbered. In this case I used blue for the left side of the node, and orange for the right side of the node, for reasons which will become clear later in the post.

Now that I have made a mess of things, what do I do next? It turns out that the left and right values are very interesting numbers because by looking at the these values for a node I can determine all sorts of things with some simple observations.

First, any node where left+1 = right has to be a “leaf” or terminal node. This means with this model I can look at any node in the tree and immediately determine if it’s a leaf. That is not possible with a recursive model because I don’t know if some other node points “up” to that node as a parent without running a query first. For example, the task node 1.1 has a left value of 2 and a right value of 3, so it is a leaf. I can easily verify the rest of the leaf nodes fit this same pattern.

Second, I can determine how many children a node has with a very simple formula. (This formula works for leaf nodes as well, it’s just not needed there.) The formula is `(Right - Left - 1) / 2`

. If I look at node 1.2.1 I see that the left value is 5, the right value is 10, and there are two child nodes. If I apply the formula I can see that `(10 - 5 - 1) / 2`

gives me a result of 2, which is exactly the number of children for that node. If I apply the formula to the very top node I get `(16 - 1 - 1) / 2`

or 7.

### Conclusion

At this point I have shown how to create the tree, how the left and right values are assigned, and a few tricks that I can do with simple math. For my next post in this series I plan to start from my tree structure and show how easy it is to manage the data during operations such as adding a node, deleting a node, or moving a node. I also plan to show how easy it is to use this model to efficiently process and display any sort of hierarchical data without having to resort to recursive programming.

**Related Links**

My brain hurts. Funnily enough, I may just have to make use of this in one of my projects. Not looking forward to it. At least now I get why phpBB3 does this.

Comment by Micheal — August 2, 2008 @ 3:36 pm

Hi, Micheal, it seems we’re both making the “blog rounds” today.

The next post I’m working on in this series is going to show (in pseudo-code) how I manage the tree structure. I have functions called add_node() and delete_node() and move_node() and so on. By setting up these functions it becomes very easy to manage the tree data. No doubt there is something very similar in phpBB3.

Comment by Dave Rathbun — August 2, 2008 @ 3:52 pm

Hi

So to workout the left and right values I have to first draw it out and then run a line around it. How does this work in the real world? Assigning the left and right values in this way just isn’t practical.

Thanks

Comment by Aftab — November 21, 2008 @ 5:40 am

Hi, Aftab, no… you don’t have to draw it out. A tree starts with only a root node. The left and right are 1 and 2. As you add a node to the tree, you determine where it is to be inserted and run a fairly simple insert routine. The insert routine consists of an update statement and an insert statement. The update is to increment the left and right values for all nodes that will be beyond the new node in the tree.

There are other maintenance routines run when you delete a node. Moving a node is a call to the delete and then the insert process, so there is no special code for that.

I will create another blog post to answer this question.

Comment by Dave Rathbun — November 21, 2008 @ 9:50 am