In the first post in this series on working with recursive data I talked about several different ways to store the information in a database. Some of them were promising, but they all had complications of some kind or another. I was using the phpBB Doctor Project Manager database design as an example, but there are quite a few different scenarios where recursive data will be found. Since SQL is not a recursive language, I am trying to find the best way to model the data so that I can access it with minimal fuss.
As an example, in my project management system I need to be able to quickly and easily identify the parent task, if the task has any sub-tasks (child records), and which tasks are at the same level (siblings). I would like to be able to traverse the tree in either direction (up to the parent or down to the child) without using recursion. In order to do that, I need a model that is different from anything presented in the prior post.
I am going to return to the initial table design and add a few columns that will look very familiar to phpBB3 MOD authors. The new columns are called “left” and “right” which means this is the new structure for my task table:
+-----------------+-----------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-----------------+-----------------------+------+-----+---------+----------------+ | task_id | int(11) unsigned | | PRI | NULL | auto_increment | | task_number | mediumint(8) unsigned | | | 0 | | | parent_task_id | int(11) unsigned | | MUL | 0 | | | project_id | mediumint(8) unsigned | | MUL | 0 | | | task_lvl | tinyint(2) unsigned | YES | | 0 | | | task_left | mediumint(8) unsigned | | | 0 | | | task_right | mediumint(8) unsigned | | | 0 | | | task_status | tinyint(2) unsigned | YES | | 0 | | | task_name | varchar(64) | | | | | | estimated_hours | decimal(8,2) | YES | | 0.00 | | +-----------------+-----------------------+------+-----+---------+----------------+
There are three new columns in my task table now, and I will detail each of them next. The
task_right columns are going to give me the structure that I need to do what is called a preorder tree traversal. Don’t worry, I will explain what that means shortly. With pictures. The net of it is I can recreate the task tree with a single query and therefore recursion is not required.
The task level (
task_lvl in my example) can be calculated on the fly but I prefer to generate this number and store it as it saves time. The level starts at zero for a top-level task, and is incremented by one for each “step” away from the parent. So if the WBS for a task (see prior post for a definition) is 1.2.1 then that task is at level two. Count the dots and that’s the assigned level for the task. That one is easy.
Preorder Tree Traversal
I haven’t yet explained how the left and right values are assigned or used. It’s fairly simple, and the best way to describe it is to convert my task data into a tree, as shown here:
For simplicity I have only shown one branch of the task tree. This is a quite pretty picture but I can’t store the picture in the database and use it for any sort of structure. What I need is a way to convert the picture into information, and that’s easily done with what is sometimes called a “depth first” or “preorder tree traversal” process. Without going into all of the math (there is a wiki link at the end of the post for those that are interested) here is how I traverse the tree:
- Pick up a pencil (or a crayon, in this case )
- Put my crayon on the left side of the top node
- Draw around the tree starting down the left side, going around each node, and back up and around until I end up on the right side of the parent node. Don’t lift the crayon, and don’t cross any lines while drawing.
Here’s what I end up with after that process:
Once I have walked the tree (following the crayon marks in the image above), I will go back over the line and each time I pass a node (task) on either the left or the right I add a number. The left side of the parent node starts at one, and I increment by one each time I pass a node. Here is the tree that has been traversed and numbered. In this case I used blue for the left side of the node, and orange for the right side of the node, for reasons which will become clear later in the post.
Now that I have made a mess of things, what do I do next? It turns out that the left and right values are very interesting numbers because by looking at the these values for a node I can determine all sorts of things with some simple observations.
First, any node where left+1 = right has to be a “leaf” or terminal node. This means with this model I can look at any node in the tree and immediately determine if it’s a leaf. That is not possible with a recursive model because I don’t know if some other node points “up” to that node as a parent without running a query first. For example, the task node 1.1 has a left value of 2 and a right value of 3, so it is a leaf. I can easily verify the rest of the leaf nodes fit this same pattern.
Second, I can determine how many children a node has with a very simple formula. (This formula works for leaf nodes as well, it’s just not needed there.) The formula is
(Right - Left - 1) / 2. If I look at node 1.2.1 I see that the left value is 5, the right value is 10, and there are two child nodes. If I apply the formula I can see that
(10 - 5 - 1) / 2 gives me a result of 2, which is exactly the number of children for that node. If I apply the formula to the very top node I get
(16 - 1 - 1) / 2 or 7.
At this point I have shown how to create the tree, how the left and right values are assigned, and a few tricks that I can do with simple math. For my next post in this series I plan to start from my tree structure and show how easy it is to manage the data during operations such as adding a node, deleting a node, or moving a node. I also plan to show how easy it is to use this model to efficiently process and display any sort of hierarchical data without having to resort to recursive programming.