The Adjacency Model vs. Nested Sets
Most developers have had to manage hierarchical data at some point, often in the form of categories on a website. The problem is, sql offers no real solution to handling hierarchy… not without a little creativity that is.
The traditional solution is called, in geek vernacular, the adjacency model. Each category references its parent category by id or name (it helps me to think of a child’s foreign key referencing the parent’s primary key). While this is fine for smaller sets of data, it begins to slow down the server something fierce when our categories start going five, six levels deep.
So then what’s the solution? Nested sets. An almost simple idea when you first hear about it, nested sets provide some many more possibilities than the adjacency model ever could. Consider the following diagram representing a simple category tree using the nested set model.
Every category is given a left and a right value. Look at `Electronics`. Notice how its left value is 1 and its right value is 20? The top-most category encompasses the entire tree, and all subcategories have left and right values within this range. In fact this is always the same no matter where you are in the tree; the children of a parent category are always within the parents left and right value. Look at a more traditional tree diagram.
Visualize a strangely mathematically-adept caterpillar starting at the top and moving to the left; it adds one at each step as it crawls all the way around.
“So what?” you ask. That’s the beauty of nested set simplicity. This simple fact allows for the creation of queries that can reference the relationships between our categories in a myriad of ways. And oh, is it fast in a properly indexed database. Consider the following, common, function on most websites.
Here is how we might retrieve a breadcrumbing path from our current location in the tree all the way back to the top most level using the adjacency model.
select t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4 from category AS t1 LEFT JOIN category AS t2 ON t2.parent = t1.category_id LEFT JOIN category AS t3 ON t3.parent = t2.category_id LEFT JOIN category AS t4 ON t4.parent = t3.category_id WHERE t1.name = 'ELECTRONICS' AND t4.name = 'FLASH';
And now here is the same result obtained with the nested set model.
SELECT parent.name FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt AND node.name = 'FLASH' ORDER BY parent.lft;
OK, so that’s four aliased tables mysql has to load into memory for the adjacency model vs two for nested sets. Resources aside, the single biggest advantage of the nested set model is the management of hierarchical data;that is, deleting and adding branches or leafs (seriously, the tree analogy could not be more useful).
What happens to the rest of your categories’ relationships to each other when our tree is pruned, or new branches are added. Are some categories left orphaned, or do they just fall of the tree entirely?
That is a danger in both models, but I have found nested sets to be much more useful as the technology to drive a category management system. I could give a content manager drag-and-drop functionality for placing the category where he/she pleased. Moving sub-trees around is easy and sorting a snap.
Hopefully I’ve given you enough to whet your appetite and you will continue reading about the awesomeness of nested sets.