Database Performance by the Numbers Feedback

System Administration
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

My article “Database Performance by the Numbers” (in the January 2000 issue of MC) generated a lot of feedback. As I had hoped, many readers downloaded the code and ran the tests themselves and were kind enough to send me their results. This article addresses some of the criticisms raised about the benchmark and also gives a fresh perspective on the problems with AS/400 date and time stamp data types.

Your Test Table Is Too Small

I’ll admit that the WEBTEMP2 table used in the test did not have a lot of records, but that does not matter. I did the tests with 6,000 records, 12,000 records, 50,000 records, and 100,000 records; the records-per-second measurements were the same no matter how the table scaled. I was looking at the overhead of the programming environment and access method. Yes, the object was placed into memory, so I was looking at the overhead of the programming language mechanism and access method, not how fast the AS/400 read data from a disk drive. I did not care about disk I/O; I wanted to know how expensive the language and access method was.

RPG Can Do Calculated Fields Using OPNQRYF

Yes, that is true. However, Open Query File (OPNQRYF) does not support some of the calculations and scalar functions available in SQL. OPNQRYF is also a dead end: IBM will not allow access to views using new SQL features with OPNQRYF beyond what is available in V4R5. Also, in the test looking at the overhead of the access mechanism and programming language, calculated fields are going to be overhead of the database manger, not the programming language and access method. I would, however, like to test OPNQRYF against SQL in the future and welcome ideas from readers on the types of queries and functions they would like to see tested.

You Should Have Used an Access Path in Your Query

OK. But this was not a test of how fast the AS/400 can read data using an access path. This was a test of the overhead of a programming language and the access method used.


You Didn’t Read the REQTS Field in the RPG Program

One reader pointed out that the RPG SQL program was not reading the REQTS field into his result sets like all of the other programs in the benchmark. Yikes, this is true! I will be the first to admit that I am not an RPG programmer, but I should have caught that one. What’s funny is that several people at MC, IBM Rochester, and IBM Toronto looked at the code for this article, and no one caught that difference.

Both RPG programs were tasked to read the fields from the WEBTEMP2 file and load them into an array. Both programs were loading all fields except the REQTS time stamp field, so their numbers were not comparable to those of the other programs. So I modified the programs to retrieve the results: RPG using SQL, 24,377 records per second; RPG using record-level access (RLA), 13,244 records per second.

I found a lot of interesting things when I changed the program to grab the time stamp data into the array. The records-per-second times did fall, but SQL was still almost twice as fast as RLA. So I did additional tests in which I converted the time stamp field into a character representation of a time stamp. Guess what? SQL and RLA ran at almost the same speed. So there seems to be a problem with RLA reading time stamp fields, since it’s half the speed of SQL when doing so.

To explore these performance differences, I made several copies of the WEBTEMP2 table. In one copy, I changed the data type of the REQTS field from time stamp to char(27). In another copy, I changed the REQTS field from a time stamp to a date. Finally, I changed the REQTS field from a date into a char(10) column to hold the date value. I then made multiple copies of my SQL and RLA programs and modified them to read the various table copies, to test the performance of converting date and time stamp values. Figure 1 shows the results.

What’s interesting is that RLA is faster than SQL when the data is not a date or time stamp, and SQL is twice as fast as RLA when the field is a date or time stamp. Also, note that there is significant overhead associated with reading the time stamp and date columns in SQL. An SQL program reading a time stamp as a char(26) can read 50,000 records in 652 milliseconds. The same program reading the data as a time stamp field requires 1,910 milliseconds to do the same operation. It takes 2.9 times as long to read a time stamp as a character representation of a time stamp.

Now look at the record-level access times. Reading a char(26) representation of the time stamp, RLA reads 50,000 records in 487 milliseconds. It takes 7.5 times longer (using RLA) to read the same number of records if the data is stored in a time stamp column. This is “patheticsad,” which means it is both pathetic and sad at the same time.

Why does it take so much longer to read the time stamp and date columns? It has to be something in the way the AS/400 is converting the date and time stamp from the stored representation to the program representation of the data type. You see, the time stamp and date columns store the date portion as a 4-byte integer, called a Scaliger number. This type of date reckoning, called Julian Period, was invented by Joseph Justus Scaliger (1540-
1609). Scaliger named his dating scheme after his father, Julius, who was an Italian physician and scholar. Scaliger’s date system has day number one beginning at noon on January 1, 4713 B.C. This date represents when three major cycles began on the same day: the 28-year solar cycle, the 19-year lunar cycle, and the 15-year Roman tax calendar (called the indiction cycle).

Since the AS/400 uses the Scaliger dating system to store dates in the database, to format a Scaliger date into a string representation of the date requires several division operations. How well you write this conversion can significantly change the speed at which you can manipulate date and time stamp columns. Also affecting this is how well your machine can do integer arithmetic, since the AS/400 most likely stores this data as a 4-byte integer.

Figure 2 shows some simple programs I wrote to test the performance of packed, zoned, binary integer, and floating-point division on my AS/400 in RPG. Notice that there


is a huge difference between the division of floating-point numbers and decimal and binary integer numbers. To further complicate this, I did the test in C using a long, which is a 32- bit integer, and achieved a time of 636 milliseconds on 1,000,000 calculations. Then I tested casting the 32-bit integer into a double-precision number before performing the calculation, and the processor did the calculation in only 524 milliseconds. This points to two things. First, the RPG compiler is generating very slow integer division code. Second, even with the overhead of casting to floating point in the C program, the processor is very efficient for integer operations. With most chips, you would see a small performance penalty, not a gain, for the conversion I did in the C program. This points to the excellent architecture of the PowerPC chip and how well the C compiler exploits it. It also points that IBM has a lot of work to do on the RPG compiler and the way it handles division operations. This test might also be indicative of the problem in converting the time stamp data type in the RLA programs.

The only conclusion I can draw at this point is that I am not working with a processor limitation on the time required to format time stamp and date fields (as evidenced by the fine times turned in by the C programs and Java programs). I am working with non- optimized code generated by the RPG compiler. This needs to be addressed immediately by IBM. The PowerPC chip is an awesome chip; let’s get some of that performance into our RPG programs.

Another observation you might make from this information is that, even if you are not referencing a date or time stamp field in your code, if you access a file that contains these data types in its field collection, you still have to convert the time stamp and date data types once. This is a byproduct of reading the record using the RLA functions. RLA always buffers the entire record when reading, so conversion must be done at least once. You should consider SQL if you need to access large recordsets of information in a table that contains these data types, as SQL is faster at converting the time stamp and date columns; also, if your program is not accessing these columns in its read, you will avoid the overhead entirely.

Keep Those Letters Coming!

Some have complained that this is not a true benchmark, which is most definitely the case. I’m not IBM Rochester, and the folks at Rochester aren’t sending me any money to design the be-all end-all of performance tests. This is a simple set of programs I cooked up to test one aspect of the AS/400 and its operations: the overhead of the programming language and access method. I will be the first to admit that I am not an RPG programmer, nor a Java expert. I am a simple client/server C/VB/SQL/ODBC guy who wants to find the best way to do operations on the AS/400, because this is the question that I am asked by my customers and readers. This is the second in a series of articles on performance in which
I’m going to look for that information, and I’m going to put the code I use on the Web so that you can test it yourself and send in your tips, tricks, kudos, and admonitions. I don’t claim to have all of the answers, but I know that, if we continue to design simple tests and write them in multiple languages and put the code out to the midrange community, we can all find the answers—and have a lot of fun in the process.

REFERENCES AND RELATED MATERIALS

• The Julian Period information page: http://booksrv2.raleigh.ibm.com/cgi-bin/bookmgr/ bookmgr.exe/NOFRAMES/SCIENCE4.25.3


Method Data Type Seconds

RPG RLA time stamp 3.658 RPG RLA char(26) .487 RPG SQL time stamp 1.910 RPG SQL char(26) .652 RPG RLA date 1.777 RPG RLA char(10) .475 RPG SQL date .881 RPG SQL char(10) .648

Figure 1: These are the results of RPG RLA and SQL versus various data types, reading 50,000 records.

Type Compiler Seconds Ops/Second

Zoned RPG DIV 5.912 169,147 Packed RPG DIV 5.038 198,491 Integer RPG DIV 18.056 5,383 Float RPG DIV .406 2,463,052 Zoned RPG EVAL 37.502 26,665 Packed RPG EVAL 36.827 27,153 Integer RPG EVAL 46.453 21,527 Float RPG EVAL .405 2,469,133 Float C .136 7,352,934 Int C .636 1,572,352 Packed C .867 1,153,401 Float Java .374 2,673,794 Int Java .674 1,483,678

Figure 2: These are the results of RPG, C, and Java doing division with various data types 1,000,000 times.


BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$