TechTip: DB2's CONNECT BY Simplifies Recursive Processing

DB2
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

At first glance, recursion appears to be a computer science method that's relevant only in the world of academia.

 

Recursion is a great for traversing linked lists or generating the numbers of a Fibonacci sequence, but recursion can't be applied to business problems…or can it? 


In reality, there are several industries where data elements have inherent relationships with other elements, and the best way to connect these data elements is with recursive (or hierarchical) processing. The classic example is the generation of a bill of materials explosion in the manufacturing industry. A bill of materials explosion breaks apart each assembly or sub-assembly into its component parts.

 

Now that you understand that recursion has applicability to the business world, why would a database like DB2 need to support recursion? The reason is that these data elements with inherent relationships are often stored in tables. Based on this fact, it makes sense for DB2 to provide functionality to make it easy to navigate these data relationships, as opposed to forcing programmers to write a program that manually navigates data hierarchies.

 

DB2 for i has actually provided this capability since the V5R4 release with an SQL feature known as "recursive common table expressions." The following code contains an example of a query using the recursive common table expression to navigate an organizational hierarchy. While recursive common table expressions have helped IBM i developers solve business problems, you can see from the example query below that the solution is complex.

 

  WITH emp_list (level, empid, name) AS

         SELECT 1, empid, name  FROM emp

             WHERE name = 'Carfino'

          UNION ALL

         SELECT o.level+1, next_layer.empid, next_layer.name

             FROM emp as next_layer, emp_list o

             WHERE o.empid = next_layer.mgrid )

   SELECT level, name FROM emp_list

 

The good news is that the DB2 for i 7.1 release addresses the complexity issue with a new CONNECT BY clause. The CONNECT BY clause enables a query to process recursive relationships and hierarchical data with much simpler syntax. Usage of the CONNECT BY support requires level 9 or greater of the IBM i 7.1 Database Group PTF to be installed. This requirement is documented in the DB2 for i Technology Updates wiki.

 

Let's walk through an example of traversing the organizational chart in Figure 1 to better understand the complexity of the CONNECT BY support. For this example, a program must return a list of the employees that are in the organization led by a manager named Carfino.

 

041312Milligan_CB_Fig2

Figure 1: This sample org chart helps to explain CONNECT BY support.

 

The organizational data from Figure 1 is stored in a DB2 table named EMP. The code below contains the table definition for the EMP table.

 

CREATE TABLE emp(

                   empid INTEGER PRIMARY KEY,

                   name  VARCHAR(10),

                   salary DECIMAL(9, 2),

                   mgrid INTEGER)

 

The first step when creating a recursive or hierarchical query is to identify the columns that logically link together two rows of a data. In this example, the columns are the empid and mgrid columns. The direct reports of Carfino will be found by finding those employees that have a mgrid value that is equal to the employee id value for Carfino. Once those linking or connecting columns are identified, those are the column names specified on the CONNECT BY clause, as you can see in this SELECT statement: 

 

SELECT LEVEL, name

   FROM emp

 START WITH name = 'Carfino'

 CONNECT BY mgrid = PRIOR empid

 

This CONNECT BY clause directs DB2 to find an employee row whose mgrid column matches the employee id value of the prior employee. The first prior employee is Carfino due to the search predicate specified on the START BY clause. Thus, the first pass of this recursive process would identify the employees named Davis and Payne. The second recursive iteration would then find the employees that work for Davis and Payne: Lohaus, Brookins, Horton, Bullard, Settles. The recursive processing would end with the list of employees that are managed by Davis and Payne because those employees are not managers. The employee id value for Lohaus would never show up in the mgrid column for any row in the EMP table.

 

The results produced by the SELECT statement are contained in Figure 2. The recursive query successfully returned the name of every employee who works in Carfino's section of the organizational chart. The Level value in the result set is produced by the LEVEL pseudo-column in the SELECT statement. The LEVEL pseudo-column returns the recursive step that produced the row. The rows produced by the START BY clause will always have a level value of 1, and then each recursive iteration increments the level value. The LEVEL pseudo-column can only be used in conjunction with the CONNECT BY clause. As you can see in the first bit of code in this TechTip, the level value has to be manually generated when using the recursive common table expression query.

 

041312Milligan_CB_Fig3 

Figure 2: This is the output of the recursive query.

 

The CONNECT BY support in IBM i 7.1 includes additional syntax and options that aren't discussed here. Consult the DB2 for i 7.1 SQL Reference for additional details.

 

Hopefully, this simple organizational chart example has demonstrated the relevance of recursion and shown how easy it is to process data hierarchies and relationships with the CONNECT BY clause.

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$