Representing Hierarchies in Relational Databases

Jacob Rief

Playlists: 'djangocon2018' videos starting here / audio / related events

In this talk, I’ll explain the fundamental problem representing deep hierarchies in relational databases. To address this problem, we can use a database design pattern, named Materialized Path Trees.

Many data structures require a representation, where one parent node can have any arbitrary number of children. Inside relational databases, this typically is represented by a foreign key onto its own table. In Django’s ORM, we use ``models.ForeignKey('self', ...)``, to create this kind of recursive relationship.
The major problem with this kind of representation is, that it doesn’t scale for deep trees. Whenever we have to traverse the tree from a given starting node, our code has to perform one database query per hierarchy level.
To circumvent this, some database vendors implemented SQL dialects, to fetch a whole subtree with one query. Long time ago, Oracle for instance implemented 'CONNECT BY', which is proprietary and not part of the SQL standard. Nowadays, newer releases of most major database vendors implemented the 'WITH RECURSIVE' clause, which has been added to the SQL-99 standard. This allows us to build recursive queries.

Fortunately there is a clever recipe to represent hierarchies in relational databases using standard SQL techniques, but without the mentioned scaling problem: Materialized Path Trees, discovered by Vadim Tropashko. Django’s ecosystem offers two libraries, which implement this design pattern: **django-mptt** and **django-treebeard**. I also would like to mention **django-tree**, which only works on Postgres, using their SQL extension mentioned before.
In this talk I’ll explain the design patterns for Materialized Path Trees. Furthermore I’ll show the pros and cons of both libraries.