Over the past few days, I spent my time working on recursive SQL queries. These queries help in querying hierarchical or transitive data. We can implement these queries using Common Table Expressions (CTEs). A common table expression is a temporary named result set which can store result of a query inside it while the query executes. Here is how a CTE looks like.
WITH first_query AS (
SELECT 'Learn' as column1
),
second_query AS (
SELECT 'SQL' as column2
)
SELECT * FROM first_query, second_query
With recursive query, the first select statement will initialise with some base result and store it in result set. When second select runs, it will query against the rows already stored in result set and store its result back in result set. This continues recursively against rows returned from last result and stops only when the second select return empty result set. The UNION ALL merges all rows into the final result set. Below is an example on how recursive query works.
-- first "select" initialize n with 10 for the result to be used in next iteration
-- second "select" is a recursive query which read rows from previous iteration to produce a sub result
-- second "select" continue recursion until no more rows are found or condition is not satisfied
-- now union all the result from each iteration and store in cte named print_numbers
WITH RECURSIVE print_numbers AS (
-- initialize n with 10
SELECT 10 AS n
UNION ALL
-- continue recursion using previous result until no more rows are found
SELECT n - 1 FROM print_numbers WHERE n > 1
)
SELECT * FROM print_numbers;
This recursive query will print the following output
Now let’s create a table which will store relationship between a child and its parent. So creating table with the following structure.
CREATE TABLE groups (
id int PRIMARY KEY,
parent_group text,
child_group text
)
After inserting some data and querying for the child-parent relationship
Result representing hierarchical data
Basically, it looks likes this
This is basically top-down approach. When we need to find all children of a tree starting at particular group in sub tree and keep moving down until all children having been found. Here the first Select will get the initial rows. Then second Select will run the query from the last iteration result and continue doing recursion until no more rows are found. Then it union all the results to produce a final result set. This is one great example to query hierarchical data and recursive SQL query can be a great boon.
WITH RECURSIVE find_child_groups AS (
SELECT child_group FROM groups
WHERE parent_group = 'ParentGroupA'
UNION ALL
SELECT g.child_group FROM groups g, find_child_groups cg
WHERE cg.child_group = g.parent_group
)
SELECT * FROM find_child_groups;
The Result from above recursive query is
This query will follow bottom-up approach. This will help us find all the parents of a group. Again, the first Select will get the initial rows. Then second Select will run the query from the last iteration and continue doing recursion until no more rows are found and union the final result set.
WITH RECURSIVE find_parent_groups AS (
SELECT parent_group FROM groups
WHERE child_group = 'GrandChildGroupA21'
UNION ALL
SELECT g.parent_group FROM groups g, find_parent_groups pg
WHERE pg.parent_group = g.child_group
)
SELECT * from find_parent_groups;
The Result from above recursive query is
So, this way a recursive query can prove so much helpful in querying hierarchical data. We don’t need to introduce different tables to keep a direct relationship between every table data. Thus, it reduces so much overhead to keep denormalized data in correct state.