Performance Implications of Date Storage Solutions

System Administration
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Beginning with V2R1M1 of OS/400, date data types can be defined with DDS. Date, time and timestamp data types are included in this support. In this article, we'll limit our discussion to the date data type which does not include a time element. Dates can be defined in DDS by specifying a data type of L.

The date data type facilitates date operations for functions such as SQL/400, Query/400, Query Management/400 and the Open Query File (OPNQRYF) command. But, they will not necessarily help you in RPG. In fact, the database date type may hurt you from a performance viewpoint. We'll look at some test results which should allow you to make an intelligent decision.

Date Format Support

The DDS DATFMT keyword allows you to specify the format for date fields in several different ways, including a format which includes a four-digit year. The types that are allowed conform to the SQL standard. They are a subset of the FROMFMT and TOFMT values you can specify on the Convert Date (CVTDAT) command. See 1 for the possible formats.

The DDS DATFMT keyword allows you to specify the format for date fields in several different ways, including a format which includes a four-digit year. The types that are allowed conform to the SQL standard. They are a subset of the FROMFMT and TOFMT values you can specify on the Convert Date (CVTDAT) command. See Figure 1 for the possible formats.

As the table illustrates, DDS, RPG and CVTDAT provide different levels of support. Here are some examples.

o If you use the DATFMT keyword in DDS or reference a date field in RPG, the dates always appear with separator values. The CVTDAT command provides the TOSEP parameter which allows you to specify which separator character, if any, you want to use. The query tools also provide attribute settings for date separators.

o If you use the TIME operation code with a 12-digit field in RPG, the date is returned as six-characters in job format.

o If you use the UDATE special name (or the *DATE special value), the format is determined by the job format.

o There is a DDS DATE keyword for display and printer files. It provides a six- character date in the job format.

o At this point, the system determines which century it is by the little algorithm:

 Years 40-99 20th century Years 00-39 21st century 

o The basic system support does not work with four-digit years. CVTDAT only lets you convert dates between 1940 and 2039.

All fields with a data type of L are stored internally in the database in a four-byte format that represents an offset from some arbitrary date. Keeping all date fields in the same internal format expedites comparisons of different date fields or the calculation of days between two dates.

When the system reads a record that contains a date field, the licensed internal code converts the four-byte field into the format you have specified. The length of the field you see in the database record is six, eight or ten bytes, depending on which date format you specify. For example, if you use the Display Physical File Member (DSPPFM) command, date data type values will be displayed as six, eight or ten bytes of character data, not the four-byte internal representation you might expect to see.

From a performance point of view, it's important to remember that this conversion occurs regardless of whether you actually use the field. The conversion occurs even before your program indicates that you are or are not interested in the field. RPG will not move the data from the record to a field unless you are using the field in your program, but the conversion has already taken place.

The Performance Results

I ran a test to show how date fields impact performance. I conducted the performance tests on a model D02-one of the slower AS/400 models. What is important is not the actual times, but the relative difference between performance times generated by the various approaches. The performance overhead is insignificant unless a substantial number of records is processed. This test reads a file that has 50,000 records of 200 bytes each. Three different files are used. The first file does not have any date fields (data type L). The second file has one date field, and the third file has two date fields. None of the date fields were used in the test programs. The results of the test are shown in 2.

I ran a test to show how date fields impact performance. I conducted the performance tests on a model D02-one of the slower AS/400 models. What is important is not the actual times, but the relative difference between performance times generated by the various approaches. The performance overhead is insignificant unless a substantial number of records is processed. This test reads a file that has 50,000 records of 200 bytes each. Three different files are used. The first file does not have any date fields (data type L). The second file has one date field, and the third file has two date fields. None of the date fields were used in the test programs. The results of the test are shown in Figure 2.

Using the date data type produces significant overhead if you process a large number of records. The total job time increases by approximately the additional CPU overhead. This overhead occurs whether you use the fields in your program or not. Roughly speaking, the conversion takes one millisecond per field on a D02 (one thousand conversions per second). This is roughly equivalent to the time it takes to read another record in the file.

You should carefully consider the use of the date data type unless:

o you need the function in one of the previously mentioned query tools.

o you only process a small number of records.

o you process records through a logical definition that does not contain the

date field.

Format Considerations

One thing the DDS date support keyword does accomplish is to get the century information into your database. This information will become more important for most applications in just a few years. However, you should also explore other alternatives for handling century information.

My favorite date solution is the CYYMMDD format in a seven-digit, packed field. This uses a minimum amount of space (four bytes) and makes it simple to compare two dates. You can still use the RPG date edit code (Y) on a seven-digit date. You would see values like 95/01/31 for January 31, 1995, or 101/01/31 for January 31, 2001.

The CYYMMDD format takes up a minimum amount of room when you print a column of dates and makes it easy to see which dates are more recent. It also makes sense for query tool users who just want to compare dates. Yet, the CYYMMDD format would require the user to enter a specific date (e.g., he cannot make a request for all records less than 30 days old).

Assume your file has dates in CYYMMDD packed format and you want to process all the records that are older than 90 days from today. For the best performance, you want to make a simple comparison for every record. This means that you want to convert the date that is 90 days from today to the CYYMMDD format and use the new value to compare against every record.

With the new ILE RPG support, you'll be able to subtract 90 days prior to the current date and then convert the date to a packed field in the CYYMMDD format. Based on preliminary information about the ILE RPG compiler, it does not appear to support the CYYMMDD format; but you can write a bit of code to convert a four-digit year to a one-digit century. The important thing is that you should perform this conversion once-not for every record.

Another alternative is to use the Add Date (ADDDAT) command in QUSRTOOL to front-end your HLL program with an OPNQRYF command. This procedure enables you to select just those records that are equal to or less than 90 days old. Assume your date field name for the CYYMMDD format is CYMD. 3 contains the code to set up the selection.

Another alternative is to use the Add Date (ADDDAT) command in QUSRTOOL to front-end your HLL program with an OPNQRYF command. This procedure enables you to select just those records that are equal to or less than 90 days old. Assume your date field name for the CYYMMDD format is CYMD. Figure 3 contains the code to set up the selection.

The ADDDAT command returns a 10-character field for any of the new date formats (including CYYMMDD). Because the date field in the database is packed, you don't need the double quote characters surrounding the &DATE10 field in the QRYSLT parameter. The return value from ADDDAT is already in character format and therefore ready to use. You can't specify a decimal variable in the QRYSLT statement. If you had passed a decimal variable to your program, you would have to convert it to a character variable to concatenate into the QRYSLT statement. Your program only sees the records that are equal to or less than 90 days from the current date.

Suppose you are trying to delete all of the records that are older than some number of days in the past. The OPNQRYF solution is not only easier to code, but it provides much better performance than reading every record in your HLL program.

You probably have users who like the MMDDYY format. I recommend that you try to convince them to change now in preparation for the 21st century. If they won't buy that, you will have to convert before printing or displaying. A simple data structure or MOVE and MOVEL solution that rearranges the subfields of a date is much faster than storing dates using the new date data type. What's more, you only have to convert values being displayed or printed.

The CVTDAT command supports the CYYMMDD format, but the DATFMT keyword does not. It's unlikely that DATFMT will ever support the CYYMMDD because it is not a standard SQL data format. If you use CYYMMDD, you commit yourself to a strategy that does not use system support.

The other weakness of the CYYMMDD format solution is that it is only good until the year 2899. Check with me then and I'll try to think of an alternative.

Jim Sloan is president of Jim Sloan, Inc., a consulting company. Now a retired IBMer, Sloan was a software planner on the S/38 when it began as a piece of paper. He also worked on the planning and early releases of AS/400. In addition, Jim wrote the TAA tools that exist in QUSRTOOL. He has been a speaker at COMMON and the AS/400 Technical Conferences for many years.


Performance Implications of Date Storage Solutions

Figure 1 Support for Date Formats

 The following chart shows what is supported by CVTDAT, the DATFMT keyword in DDS, and the new ILE RPG date formats (based on preliminary information). A date of January 31, 1994 is used to show an example value for each of the formats. Format Example CVTDAT DDS DATFMT ILE RPG *MDY 01/31/94 Yes Yes Yes *DMY 31/01/94 Yes Yes Yes *YMD 94/01/31 Yes Yes Yes *JUL 94031 Yes Yes Yes *ISO 1994-01-31 Yes Yes Yes *USA 01/31/1994 Yes Yes Yes *EUR 31.01.1994 Yes Yes Yes *JIS 1994-01-31 Yes Yes Yes *CYMD 0940131 Yes No No The system and job dates only allow *MDY, *DMY, *YMD, or *JUL formats. 
Performance Implications of Date Storage Solutions

Figure 2 DATFMT Test Results (50,000 Records)

 Test Description CPU Total Job Seconds Seconds 1 File with no date fields 52.5 69 2 File with one date field 104.7 120 3 File with two date fields 148.6 164 
Performance Implications of Date Storage Solutions

Figure 3 CL Program to Select Records 90 Days Older Than a

 PGM DCL VAR(&DATE10) TYPE(*CHAR) LEN(10) ADDDAT DAYS(-90) DATE(*TODAY) TOVARFMT(*CYMD) TOVAR2(&DATE10) OVRDBF FILE(file_name) SHARE(*YES) OPNQRYF FILE(file_name) QRYSLT('CYMD *LE ' *CAT &DATE10) CALL PGM(pgm_name) CLOF OPNID(file_name) DLTOVR FILE(file_name) ENDPGM 
BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$