SQL 101: DML Recap—Joining Tables, Part 1

SQL
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

They say no man is an island. We’re social creatures and (most of us) need to live in a community. In a properly structured database, no table is an island either, so we need to know how to join them in our queries.

Listing the contents of a table, even with comparisons (or search conditions) that constrain the output, as shown in the previous TechTip, is a bit limited in a real-life situation. Typically, our queries are built using information from multiple tables. I’ll go through almost all the ways you can join tables in SQL; I’ll leave a special type of join, called EXCEPTION JOIN, for later. Let’s start with the simplest of them all: the INNER JOIN.

The JOIN You’ve Been Using Without Realizing: The INNER JOIN

The INNER JOIN is arguably the most-used type of join. Actually, you’ve probably been using it without realizing it, because it can be hidden in the WHERE clause, in what is commonly called an implicit join.

For instance, let’s say you want to list all the students who have enrolled at least one class. This is a complete intersection between the Students and Classes tables, a classic INNER JOIN (even though it could be simply a SELECT over the Classes table because it contains the student name, but let’s ignore that for the moment, because it will be useful to illustrate another type of join). We know, from two articles ago, that the link between the PFSTM (Students table) and the PFCLM (Classes table) is the student name. It’s true that it’s not a brilliant solution, but we’ll have to live with it…for now. An SQL statement that lists all the students that have, at any given point, enrolled in a class is often written using an implicit join, like this:

SELECT      STNM

            , CLNM

            , CLCN

   FROM     UMADB_CHP2.PFSTM ST, UMADB_CHP2.PFCLM CL

WHERE      CL.CLSN = ST.STNM

;

However, if there are more than two tables in the SELECT statement, things might get a little hazy. That’s why I think it’s much clearer to write the same statement using an INNER JOIN:

SELECT      STNM

            , CLNM

            , CLCN

   FROM     UMADB_CHP2.PFSTM ST

   INNER JOIN     UMADB_CHP2.PFCLM CL ON CL.CLSN = ST.STNM

;

If you’re familiar with the INNER JOIN syntax, there’s really nothing new for you here, except perhaps the use of aliases for the tables (ST and CL), which make the statement slightly more readable. However, if you’re used to the implicit join instead, there are a couple of things worth mentioning. First, notice how the line following the FROM clause starts with the join type: in this case it’s an INNER JOIN, but there are others, as you’ll see in a moment. The type of join identifier is then followed by the table name (and optionally, an alias), which in turn is followed by the ON keyword. This keyword is used to specify how the connection between the tables is supposed to work. In this case, the link between the Students and Classes tables is achieved via the student name, but it could be a more complex condition, resorting to multiple columns.

Let me close this article with a more complete example that uses multiple INNER JOINs to link all the tables of our sample database. The objective is to obtain a bit more information about a student’s grades and the classes he or she attended. Here’s the statement:

SELECT      STNM AS STUDENT_NAME

            , CONM AS COURSE_NAME

            , CODE AS COURSE_DIRECTOR

            , TETR AS TEACHER_RANK

            , CLNM AS CLASS_NAME

            , CLCY AS CLASS_YEAR

            , GRGR AS GRADE

   FROM     UMADB_CHP2.PFSTM ST

   INNER JOIN UMADB_CHP2.PFCLM CL ON      ST.STNM = CL.CLSN

   INNER JOIN UMADB_CHP2.PFGRM GR ON      GR.GRCN = CL.CLNM

                                    AND GR.GRCY= CL.CLCY

                                    AND GR.GRSN= ST.STNM

   INNER JOIN UMADB_CHP2.PFCOM CO ON      CO.CONM = CL.CLCN

   INNER JOIN UMADB_CHP2.PFTEM TE ON      TE.TENM = CO.CODE

   WHERE    GRSN = 'Anthony, Mark'

;

Even though the statement is a bit longer than the previous examples, it’s a simple SELECT. The difference is that it uses a lot more tables, which can make it confusing really quickly. I’d like to emphasize the importance of indentation to improve the statement’s readability and the use of user-friendly column names. Also note that I’ve used aliases for all the tables. At this time, the aliases are not critically important, because the column names all identify the name of the table to which they belong. In a future article, I’ll show you how to “get the best of both worlds” by providing long, human-readable names and keeping the short “RPG-standard” cryptic names that currently exist. Then you’ll see that making a habit of using aliases for the tables and using them when referring to columns in your statements is of paramount importance to the readability and maintainability of your code.

That’s all for now. I hope this gave you something to think about, especially how to apply these tricks to your own code!

 

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$