Practical SQL: OLAP, Part 2

SQL
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

In this next installment on SQL's Online Analytical Processing (OLAP) functions, you'll see how OLAP continues to provide SQL users the business functions we've enjoyed in RPG for so long.

 

In my previous OLAP article, I used the OLAP function ROW_NUMBER to provide the concept of "next record" that we take for granted in RPG's native access (contrast RPG's native ISAM access with SQL's relational access). In this article, I'll show you the magic of GROUPING SETS and how they can be used to provide the same functionality as control-level breaks.

 

Did You Say Control-Level Breaks?

 

Why yes, yes I did. You recall those lovely little L1 and L2 indicators, don't you? Control-level breaks (also known as level breaks or control breaks) have been around for a very, very long time. In fact, we can reach back nearly two decades into the MC Press archives to read an article on control breaks written by my favorite technical writer of all time, Ted Holt. And of course the concept reaches back decades before then. Control breaks are just an indicator to the programmer that a key field has changed. The simplest use of this concept is when accumulating totals; a change in a key field indicates that a line should be printed and an accumulator should be cleared.

This is an easy concept in RPG. In fact, judicious use of level breaks and blank-after field controls allowed RPG programmers to write reports that tabulated incoming data with very few lines of actual code. The mainline code simply accumulated each record into total fields, and then the level break indicators were used to write the data to a report. The very act of writing the report line cleared the accumulators and the program continued on. I refuse to replicate the code; as wonderful as the cycle was for its time, I don't wish to revisit the days of I-specs, O-specs, and fixed-format C-specs. But trust me when I say it didn't take a lot of programming to do this:

SALES REP   CUSTOMER     Quantity           Amount

---------   --------     --------         --------

123      1001         200.00         2,464.00

123      1002         343.00         5,155.00

123      2001       1,500.00           780.00

123                 2,043.00         8,399.00

555      1001          50.00         1,000.00

555      2001         150.00         1,805.00

555                   200.00         2,805.00

                    2,243.00        11,204.00

The file processed is the order file, which has among its fields Sales Rep (salesrep), Customer (custnum), Quantity, and Amount. The program simply reads through the file in salesrep/custnum order and accumulates quantity and amount at the three levels of the report: salesrep/custnum, salesrep, and grand totals. It prints out a line at the total break for each level. A good RPG programmer knocks this out in maybe 30 lines of code, including F-, I-, C-, and O-specs.

GROUP BY Doesn't Quite Cut It

In SQL, it's not quite so simple. Yes, you can use GROUP BY to accumulate things in SQL, but it doesn't lend itself very well to interspersing totals and details, or to expressing totals at different levels. In practice, to do the simple list above, you have to do three separate selects with GROUP BY clauses: one for the sales rep and customer totals, one for the sales rep totals, and one for the grand totals. You need to insert dummy values into the customer and sales rep fields in the higher-level accumulations and then finally perform a union between those three. It's not always an easy task. In this case, it's doable, but the results are far from pretty and the code is not particularly intuitive. Here's the SQL:

with t1 as

   (select salesrep, custnum, sum(quantity), sum(amount)

   from salesord group by salesrep, custnum),                      

t2 as

   (select salesrep, 999999 custnum, sum(quantity), sum(amount)

   from salesord group by salesrep),                                

t3 as

   (select 999 salesrep, 999999 custnum, sum(quantity),      

   sum(amount) from salesord)                                      

select * from t1 union                                          

select * from t2 union                                          

select * from t3 order by salesrep, custnum                      

You'll see three common table expressions (CTEs): those are the sub-selects with identifiers T1, T2, and T3. T1 accumulates quantity and sales amount by salesrep and custnum, T2 accumulates by salesrep using a dummy custnum of 999999, and finally T3 accumulates the grand totals using dummy values of 999 for salesrep and 999999 for custnum. The last part of the statement then ties those three subselects together with a UNION and orders them. The result is functional, if not particularly pretty (for brevity's sake, I'm not going to bother with column headings for the rest of the article):

123       1001         200.00         2,464.00

123       1002         343.00         5,155.00

123       2001       1,500.00           780.00

123     999999       2,043.00         8,399.00

555       1001          50.00         1,000.00

555       2001         150.00         1,805.00

555     999999         200.00         2,805.00

999     999999       2,243.00        11,204.00

You end up with the right numbers, certainly, as long as you recognize the 999 and 999999 values as special values that indicate a total line. But I hope you'd agree that the SQL statement isn't exactly the most intuitive thing you've ever seen. Not to mention the problem you might have if you actually have a salesrep number 999!

Enter GROUPING SETS

And that brings us to the subject of this article, the concept of grouping sets (be careful with this link; the IBM page goes into detail not only on grouping sets but also on the related but somewhat more complex and in my opinion less flexible CUBE and ROLLUP functions). Grouping sets perform for SQL the same function that level breaks and accumulators perform for our older RPG programs, and they do it in a very intuitive fashion. The SQL statement is quite simple:

select salesrep, custnum, sum(quantity), sum(amount) from salesord                                            

group by grouping sets ( (salesrep, custnum), (salesrep), () )

order by salesrep, custnum                              

Notice that the statement requires no CTEs and no dummy values. Summing is handled by the GROUPING SETS clause, which specifies that you want totals at three different levels: salesrep and custnum, salesrep only, and then grand totals, indicated by the empty set "()". Here's what you get:

123     1001       200.00         2,464.00

123     1002       343.00         5,155.00

123     2001     1,500.00           780.00

123        -     2,043.00         8,399.00

555     1001        50.00         1,000.00

555     2001       150.00         1,805.00

555        -       200.00         2,805.00

  -        -     2,243.00        11,204.00

Nice! The dashes indicate null values; you can check for the nulls in your program or you can cast them to a special value using IFNULL in your select statement. Either way, the SQL code is very simple. More important to me is the fact that it's really easy to change the grouping. Here's how I run the same query, grouping by salesrep and item instead of salesrep and custnum:

select custnum, item, sum(quantity), sum(amount) from salesord      

group by grouping sets ( (custnum, item), (custnum), () )

order by custnum, item

1001 ABC123                 200.00         2,464.00

1001 DEF333                  50.00         1,000.00

1001 -                      250.00         3,464.00

1002 ABC123                 220.00         2,695.00

1002 DEF333                 123.00         2,460.00

1002 -                      343.00         5,155.00

2001 ABC123                 150.00         1,805.00

2001 GHI987               1,500.00           780.00

2001 -                    1,650.00         2,585.00

   - -                    2,243.00        11,204.00

Boom! The report is reordered. And you know the totals are correct because the grand totals match the grand totals from the other report. I don't even want to go through the changes required for the other report. They're not horrible by any means, but in my opinion they're not nearly as intuitive as the simple changes required here. So, the next time you need to do some accumulation in your SQL, please take a look at the GRUOPING SETS clause and see how it can help you. I've got more OLAP functions in the works, so stay tuned!

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$