AS/400 Basics: Database Files

IBM i (OS/400, i5/OS)
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Database Files

Next to CL programming, one of the greatest traumas for the new AS/400 programmer is learning about database files. RPG II programmers from the System/36 (and earlier machines) suffer most. Ironically, a PC programmer familiar with products such as dBase will feel right at home on the AS/400, particularly with the advent of C/400 and the BASIC compiler (which, by the way, is not too shabby).

It seems surprising that the greatest potential pool of programmers, those from the S/36, should have the most problems. If you read between the lines a bit, you will realize that a job advertisement for a programmer specifying that he or she "must have 1-2 years RPG/400" really means that they want a programmer who knows about database files. The other differences between RPG II and RPG/400 can be understood in a matter of days, but this business of creating and using a database requires some seasoning.

What are Database Files?

At this point, calling files "database files" is getting tiresome. I am reminded of RFPs (Requests for Proposals) and other vendor qualification forms, on which the question, "Does this system use database files?" is sure to appear. Will somebody please supply some definitions here?

Of course, we can get more specifically imprecise, and ask "Does this system use a relational database?" Wise vendors will always say yes, knowing that the functionality that comprises a relational database can always be implemented in programming, if it is not provided by the operating system.

Now the AS/400 is described as a "relational database platform," with some hemming and hawing about not implementing all of Codd's Rules. I believe there were a dozen or so rules, but have recently read that good Dr. Codd has decided that there must be more rules. If you invent a language, then you can criticize everybody else when they do not use it correctly.

The point of this discussion is that you will encounter these controversies, especially about the AS/400. What you probably will not encounter is simple advice about how to proceed. We will try to supply that in this article, avoiding the "how it should be" debates. The facilities provided with the AS/400 database, such as they are, are sufficient to provide an enormous boost in productivity.

We will not specifically review how to convert your current files to database files, but rather talk about the background that you need to know to proceed.

Files, the Old Way

In an RPG II program, you can reference a disk file on the F-, E-, I-, C- and O-specs. The F spec tells the name of the file, the record length, and indicates whether there are keys used. Table files can be defined on the E-spec. The I- and O-specs, for most programs, are where the "definition" of the file is located. The I-specs are the most complete definition, since each field is defined in terms of from/through positions, in addition to indicating the number of decimal positions. The O-specs are not really useful for definition, since they provide the least information.

Other than primary and secondary files, the "action" associated with a file happens in the C- and O-specs. For example, READ, CHAIN, and on O-specs, add, update and delete.

All in all, this is not the best situation. It seems that using a file in RPG II can be quite involved, and can be done anywhere in a program.

An external file can reduce the number of references in a program to two specification types: the F-spec and the C-spec.

External Files

An external file is defined "externally" to the program. By this, we mean that the description of the file, including its record length, key lengths and positions, and field definitions, are not given in a program, but rather, are kept "out there." Where out there?

It turns out that on the AS/400, the definition of the file is kept within the file itself. How this definition is kept is not readily apparent to us, although there are some system commands that let us retrieve the definitions. Our main responsibility is to define and create the file. Defining a file is done in terms of source statements that are kept in a member in a source file. Creating a file is done with system commands, similar to the commands used on the S/36.

To define a file, you use a spec called the "A-spec". I don't know why they chose the letter "A", but that is not important. On the A-spec, you define a record format name and field names and descriptions. You can define the various attributes of fields, such as lengths, decimal positions, numeric type (zoned, packed, binary or float). You can also give a brief text description of each field. Following the field definitions, you can list the keys if there are any.

What is a "Record Format?"

To understand what a record format is, we must look first at some other issues. The primary concern is how you store different but related data on your system. An example of this situation might be a sales order entry application, where you have an order header record, one or more order detail records, and possibly one or more text records describing the order. Usually each of these records will be related to the other, using the order number as a key, probably followed by some record identification field and sequence number. Many, many S/36 systems were programmed so that these three record types were contained within the same file.

The AS/400 imposes a rather harsh rule about this: each physical file can have only one type of record format. Now that rule also contains a way out: notice that the rule says each "physical" file. A physical file is where data is actually stored, and when you create any type of system, you must create your physical files first. And even though you can have only one record format in a physical file, you must supply a name for that record format. RPG is even more picky, and insists that the record format name be different from the file name. There are ways to override this, but it is really simpler to simply supply a different name when you define the file in your A-specs.

Having taken apart your file, creating several files where there was once one, the AS/400 lets you combine those files again with a "logical file." This is where the record format is useful. When you combine two or more different physical files, you need some way of knowing which record type you have accessed. For example, if you are reading the orders logical file sequentially by key, you need to know if you have just read a header, detail or text record. Because the system supplies the name of the record format to your program, and because you can associate a record identifying indicator with each record format, you can duplicate exactly the function of the RPG II program, with the added bonus(!) that you no longer need to define a field for a "record identifier."

Defining Keys

I mentioned earlier that you can define keys when you create your file definition. Defining keys on the AS/400 is a good deal simpler and much more versatile than the S/36 method. To review, you define a S/36 key in terms of from/through positions, and with alternate index files, you can specify up to three key fields (not "fields" in the AS/400 sense, but rather combinations of from/through positions). This is somewhat limiting, especially if the original file design was not carefully thought through; you can easily exceed the three-key limit. This happened to me recently when I stored a name as first, then middle, then last. It turned out that I needed two other keys first, then a key by name. Obviously, my first definition was faulty, since I couldn't get what I wanted by defining the third key over the first/middle/last names. Fortunately, I was not far into the project, so I was able to redefine the file with the name in last/first/middle order, and create the needed third key.

This type of situation is not a problem with the AS/400, since you specify keys in terms of the fields that are in a record format, rather than in terms of where in the format that field appears. You can define up to 120 bytes total for a key, which, strictly speaking, is not 120 bytes of fields, since you need to allow some bytes to glue things together. In any case, I have never approached that limit. To define a key, you simply list the fields that are in the key, in order, with the "K" indication to denote the key fields.

One of the issues that always comes up with key fields is the following: should you define a key within a physical file, or should all keys be in logical files? There are arguments favoring either approach, and I tend to use each about half the time. I like the idea of keeping physical files "keyless," although I usually slip and put the "primary key" on the physical file. For example, for the customer master, I use the customer number as the key. The idea of keeping all data in one place and all keys in another appeals to me, and I will probably be strict about that in the future.

Logical Files, Again

Defining logical files is where it starts to get interesting, and where you start to see how the AS/400 differs from the S/36. At their simplest, logical files can duplicate exactly the functionality of alternate index files. You get more benefit from logical files because you can define more than three noncontiguous fields as keys, because you can specify ascending and descending ordering by each field within the key, and because you can select and omit records.

Selecting and omitting is not a function of the "key," but it does affect what your program gets when it accesses the file. As far as your program is concerned, if a record is not selected, then it is not in the file. (Except, of course, if you try to add a record with a duplicate key. Watch out for this one since it is a difficult bug: your program chains to the file, tells you the record is not there, then complains if you try to add it. I cannot tell you how many times we encountered this "compiler error" until we learned to check the files we were using.)

Back to RPG (and Other Languages)

So now you have defined physical and possibly logical files, and entered the specifications into source files. You create those files with CRTPF (Create Physical File) and CRTLF (Create Logical File) commands. Other articles in Midrange Computing have dealt extensively with the mechanics of these commands, so I will not go into that here. You need to have successfully run the CRTPF and CRTLF commands before compiling programs that reference those files.

This is another, possibly confusing difference between RPG II and RPG/400. With RPG II, it did not matter if the file actually existed on the system when you compiled the program, since all of the knowledge the program needed about a file was contained within the program. With an externally defined file, there is no "knowledge" of the file until the file is created. I suppose you could argue that the compiler should be able to use the A-spec definitions of a nonexistent file, but that is not how it works. You must be aware of this because you may try to compile programs that use work files. It may be that the work files are created only for the duration of the job that uses them, so they are never permanent residents on the system. Before you compile the program, you must verify that the files referenced within the program exist. There is really no good solution to this, although there are a number of commercially available programming utilities.

Within an RPG program, all you must do to define a file is code an F-spec. Depending upon the usage of the file, the compiler will generate I- and O-specs as needed. Again, the generated definitions are derived from the existing physical or logical file, not from the source specs for the file. You can see pretty much the same information that the compiler sees with the DSPFFD (Display File Field Description) command. That command lists the field names, attributes, and from and through positions. So there is nothing too mysterious going on here; the compiler merely refers to the external (to the program) definition and through simple arithmetic determines the record layout. Records are still of certain lengths, and fields still appear in certain locations, although we no longer must concern ourselves with those mundane details. Which is pretty much as it should be. After all, if we can get the data, why worry about where in the record it is kept?

But I do not mean to trivialize the advantages here. The primary benefit of the external file is that your job as a programmer is much simpler when the record layout changes. For example, one of the worst jobs in programming is to add another field to a record or to change the size of a field. The severity of this problem is related to where in the record the change occurs: if it is between other fields, it is dreadful. If you merely have to add a field, you might be able to squeeze it into the ubiquitous "filler" found in S/36 files. If you are out of filler, then you probably have to expand the record length, and on and on.

With an external file, you can simply make a backup of the current file, redefine and recreate the file, and then use the CPYF (Copy File) command to load the new version from the old. Then you simply recompile all of the programs that reference the file. This process can be explained, understood and programmed within a few minutes. Depending upon how long the recompiles take, you can probably be done within a few hours. This is in comparison to some of the thankless S/36 "projects" that took weeks to complete.

As an aside, if you are still on the S/36 and will be there for some time or intend to stay, you must consider putting your file definitions into /COPY modules. Doing that will give you many of the advantages of externally defined files, and will also be very helpful in the future, no matter what machine you convert to. Past Midrange Computing articles explain this, so look into this if you have not already done so.

So, you have learned that external files, physical and logical, provide equivalents to S/36 files and alternate indexes. As for RPG, external files are somewhat easier to use then /COPY modules, with the slight drawback that you must verify the existence of the file before compiling.

Is That All There Is?

No. There are many other features and functions of externally described files that I will briefly describe here. Hopefully this will get you in the mood to investigate further, and to start thinking of your data in terms of external definitions.

In addition to logical files, one of the more exciting developments on the System/38, upon which the AS/400 is modeled, was the "join" logical file. This literally lets you join data from two or more files into one (virtual) record format. For example, if you read an order header record, it probably contains the customer number. In traditional programming, to get at the customer name and address, you then use the customer number to chain to the customer file. With a join logical file, you define the file such that when you retrieve the order header record, the system also supplies the customer data as part of the "record." This is advantageous since it moves a great deal of programming out of the program, putting the burden instead on the system. Supposedly the system is more attuned at the lower levels to this type of retrieval than at the higher level of a program, so there might be some performance advantage. The main benefit is that the programs can be simpler, but creating join logical files is by no means the easiest thing to do. Chances are you will have to study the examples in the manuals and try quite a few test cases. Since my time limit for trying tricky techniques seems to be about fifteen minutes, I frankly have not created many join logical files. Sometimes I wish I had, but now that I am involved with projects on the S/36 and AS/400, it is more important to me to have source code compatibility, so I still use the old read-one-record, chain-to-the-next method.

One of the most exotic usages of external files is the open query file, which has nothing to do with query, the utility. An open query file can combine data from many different, related files, but unlike a join file, it can also summarize, reformat and perform mathematical operations upon the data. As always, your program receives its input in the form of a record. The advantage of the open query file is that a great deal of programming can again be "pushed out" of the program, and given to the system to handle. This type of technique can be useful in a "division of labor" situation; for example, experienced AS/400 programmers can create the definitions that other, less-experienced AS/400 programers will use.

The big disadvantage of the join logical file and open query file is that the data can only be used as input. That is, you cannot update data or write new records through the join or open query file. Of course, you can write or update the underlying files, but this sort of sours the situation, since for an update you must retrieve the records that were just used. This is an example of a vendor, in this case IBM, providing what we all think is a fantastic feature, only to learn that it is not quite everything we might have thought. I personally do not judge them harshly, however, since the problem of allowing updates and writes is really tremendous. For example, if the join logical file is retrieving data from five underlying files and you update or write to it, such that a duplicate key is created, how do you allow your program to recover? Obviously, IBM is aware of the dissatisfaction this has caused, and perhaps someday they will have a solution. But this is not trivial, and not quite as easy to solve as it may seem at first glance.

So there you have two powerful features that you can use, as long as you understand the limitations. Chances are, you probably will not use these much when you first get started, but if you are fortunate enough to be in a "pure" AS/400 environment, you should keep these in mind. You could save yourself a lot of programming at some point.

Getting Ready

To get ready to use database files on the AS/400, you must have command of your data. Unfortunately, the S/36 allowed very sloppy file definitions. By now you have probably heard of the AS/400's intolerance of sloppiness. You simply must come up with a clean definition and clean file if you want the AS/400 to accept it.

But like your eighth-grade English teacher, the stern demands of the machine are really for your long-term benefit. By being unyielding on certain fundamentals, you are creating a sound foundation to build upon. And if you are in a conversion situation, count yourself lucky. The entire computer world is going to database-type systems, and every modern operating system that you will encounter from this point on has some definition system similar to the AS/400. If nothing else, you are insuring your own programming future by learning how to set up things in this fashion. So if all of this is new to you, take your time to really investigate and learn about these things. Do not rush a conversion because of a false sense of urgency imposed by bosses and budgets and such (ha!). Instead, work on some examples with your data, and determine what you will do to position yourself for the future, with a database machine.

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$