SQL 101: Introducing the UMADB Database, Part 2

SQL
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Last time around, I started introducing the UMADB database. Let’s proceed with that discussion now.

I discussed the Students table in the previous article, so now let’s go over the rest of the tables of our fictitious database, starting with the Courses table.

The Courses Table

Let me take a moment to explain the structure of the information in this database. The students take classes, which are taught by teachers and are part of courses. At the end of each semester, the teachers grade the students on each of the classes they attended. This may sound obvious and redundant, but it’s important to keep the structure in mind from this point on. In a way, the Students and Courses tables are the center of the database, because all the other tables are somehow linked to one (or both) of these tables.

Now let’s take a look at the Courses table structure, shown in Table 1.

Table Name

Column Name

Data Type

Length

Dec. Pos.

Description

PFCOM

CONM

Char

60

Course name

PFCOM

CODS

Char

100

Course description

PFCOM

CODN

Char

50

Department name

PFCOM

CODE

Char

60

Course director name

PFCOM

COTA

Char

60

Course teaching assistant name

PFCOM

COSC

Char

1

Status

Table 1: The Courses table structure

Again, the table is pretty typical: the same cryptic names and the lack of a unique record identifier that characterized the Students table. By the way, I imagine that you’re curious about the hidden flaw in the Students table, mentioned earlier. Don’t worry, it’s going to become obvious in a moment, when we look at the Teachers table.

Meanwhile, there’s something common to all the tables in this database: status. As the name implies, it indicates the status of the record. The convention used here is the following:

  • 0: Created but not active record
  • 1: Active record
  • 9: Inactive (deleted) record

This is something that has to be taken into account when querying the database, and it has been the source of many misunderstandings. Sometimes the users forget to include a condition in their queries and end up mixing active and inactive records, which leads to inconsistent or just plain wrong information.

The Teachers Table

The teachers are a very important part of any teaching system. They’re also a very important part of the application our database supports, even though the table that keeps their records is not very “polished.” You’ll see what I mean when we analyze the Teachers table, shown in Table 2.

Table Name

Column Name

Data Type

Length

Dec. Pos.

Description

PFTEM

TENM

Char

60

Teacher name

PFTEM

TETR

Char

20

Teacher rank

PFTEM

TEDB

Decimal

8

 0

Date of birth

PFTEM

TEAD

Char

60

Home address

PFTEM

TEPN

Char

15

Home phone number

PFTEM

TEMN

Char

15

Mobile number

PFTEM

TEEM

Char

60

Email address

PFTEM

TEDL

Char

20

Driver’s license

PFTEM

TESN

Char

11

Social security number

PFTEM

TEST

Char

200

Subjects taught

PFTEM

TESA

Decimal

11

2

Salary

PFTEM

TESC

Char

1

Status

Table 2: The Teachers table structure

This table follows the same line as the previous ones, but it introduces the first sensitive piece of information of the database: the teacher’s salary. As things stand, anyone with access to the table can see how much each teacher earns, which might not be a very good idea. I’ll get back to this later, when I discuss how to hide a column’s data from prying eyes.

Notice the similarities between this and the Students table: the personal information (addresses and IDs) is the same. Even though this makes sense—both teachers and students are people and share the same type of information—it brings up a question: what if a student becomes a teacher or vice versa? There will be duplicate and possibly inconsistent information in the database. I’ll address this issue in a later article, in the discussion about database normalization and how that translates to SQL.

Having said that, let me take a moment to explain the other columns in the table. Besides the obvious teacher name and the aforementioned personal information, this table also includes “teacher rank” (which can be something like Assistant Professor, Professor, and so on) and “subjects taught.” The latter is supposed to link to the Classes table, presented in the next section, but the connection is kept by humans, not the database. Because the same person can teach multiple classes in the same school year, the application’s manager thought it would be simpler to manually track the link between teachers and classes—yet another shortcoming we’ll need to solve later.

It’s now time to move on to the next section and review the Classes table.

The Classes Table

Here’s where things start to get interesting: finally, a table with links to other tables. The Classes table contains information about the students that form a class of a given subject during a given year and the course to which the class belongs. As I said before, the teacher is not part of the setup, at least not at database level. Table 3 shows the complete Classes table structure.

Table Name

Column Name

Data Type

Length

Dec. Pos.

Description

PFCLM

CLNM

Char

60

0

Class name

PFCLM

CLCY

Decimal

4

0

Class year

PFCLM

CLCN

Char

60

0

Course name

PFCLM

CLSN

Char

60

Student name

PFCLM

CLSA

Char

60

Student home address

PFCLM

CLSE

Char

60

Student email address

PFCLM

CLSC

Char

1

Status

Table 3: The Classes table structure

As you can see from this table, the links I mentioned before are based on the names of the student and the course, which might cause some problems. The ideal situation would be having record identifiers in each of the tables and keep those IDs, instead of the respective names, on the Classes table records. The next issue is the duplicate student information. The application manager thinks this duplication makes sense, because the student information might change from school year to school year, and keeping the information here allows the teacher to contact the student using the most current address. We’ll also have to deal with this situation later.

Finally, the last table of the downsized version of the UMADB database is the Grades table. Let’s analyze it in the next section.

The Grades Table

After the end of the semester, the students are graded on their performance in each of the classes they attended. The results are stored in the Grades table, shown in detail in Table 4.

Table Name

Column Name

Data Type

Length

Dec. Pos.

Description

PFGRM

GRSN

Char

60

Student name

PFGRM

GRCN

Char

60

0

Class name

PFGRM

GRCY

Decimal

4

0

Class year

PFGRM

GRGR

Char

2

Grade

Table 4: The Grades table structure

Just like the Classes table, this one also depends on another table’s information to form its unique key. In this case, that key is formed by the student name, class name, and class year. Of these three, two are names stored in character strings. This makes them prone to error (character fields usually make awful keys because of the possible mismatches caused by different character cases—“John” is not the same as “john,” for example) and slower to work with (because it takes longer to process a string of characters than a numeric value). The other problem with this table is the Grade column: there’s no validation on the database to prevent inconsistent values, such as invalid grades. It’s assumed that the letters A, B, C, D, and F will be used, optionally followed by a plus or minus sign, but there’s no actual check for a valid grade at the database level. Just like the student’s date of birth validation, this one also exists at the application level, buried in some RPG program.

Just a Few Tables, and So Many Problems

From what you’ve read so far, you probably concluded this (exaggerated) scenario has some similarities with real-life issues on IBM i databases you’ve seen. Probably not all at the same time, but you know what I mean. It’s true that some of the issues are very basic and easy to solve, while others require some database redesign and ingenuity. I’ll address all these issues and a few more, which are related to the non-implemented functionalities that are currently handled outside the application’s scope, over the next subseries of the SQL 101 TechTip series.

You can skip a few articles, but keep in mind that the database will evolve and each subseries will build upon the foundations laid by its predecessor. If you’re comfortable with the topics discussed in a subseries, you can simply have a quick look at the SQL code samples to keep track of the changes to the database.

And that’s all for now. In the next few articles, I’ll start by reviewing some SQL data manipulation language (DML) statements and sharing a few tricks I’ve learned over the years that can, hopefully, help you get more productive when it comes to manipulating data using SQL. Later, we’ll start to fix this broken database!

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$