When CCSID Constants Vary, Part III

RPG
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Learn some tips and tricks to correctly handle literal constants.

 

My first article described how a CCSID variation can turn a program crazy. My second article described the solution, which requires correctly handling literal constants. In this last article, I share some tips and tricks I found while solving the bug with the @.

INZSR

The first place where I've put the call to ConvCCSID is in the INZSR subroutine. It's not a good idea in many cases: First, I updated the INZSR of the MAIL program, which used *INLR=*ON. No issue. But, when I push the code of the DS PortableChar into a service program or when I close my MAIL program with a SETON RT to stay dormant in memory up to the next call, the INZSR will not be called again. And between two calls to this program, the job's CCSID may have changed. In this case, code the call to ConvCCSID at the beginning of main procedure rather than in the INZSR .

 

Why do the fields of the DS PortableChar have an INZ? For debugging. If the field PortableChar.Dollar is equal to '_' instead of ' $', then the program did not use ConvCCSID (bad idea).

*CMD and *PNLGRP

This is a similar issue because *CMD and *PNLGRP are two objects that are compiled and that can naturally contain a lot of text sensitive to CCSID variation.


I'm not covering the *CMD and *PNLGRP objects here; it's a subject for another article.


The case study must include…

  • Variations between i5/OS versions
  • Variations between hard-coded text and text in message files
  • Default values for command PARMs

 

No Worries! My Code Is Unicode-Ready!

Unicode is a technology that fixes a hexadecimal value to each possible character. If I had already converted my application to Unicode, would I have the issue?

 

Unicode is CCSID. Precisely, there are three Unicode CCSIDs in the IBM i:

  • CCSID(13488) is UCS2
  • CCSID(1200) is UTF16
  • CCSID(1208) is UTF8

 

In UCS2 and UTF16, the @ is coded ux'0040'. In UTF8, the @ is coded x'40'. Since, in Unicode, the code for @ is truly constant, can we assume the problem won't appear? It depends.

 

The Unicode CCSIDs are reserved to fields (see message CPD322C, last line). They cannot be used at object level. You cannot do a CHGJOB CCSID(1208). Therefore, having converted your application to Unicode does not mean you have solved the problem. It all depends on hard-coded literals. You have a @ hard-coded? Is it in a graphic field? With the appropriate CCSID? Try to look at the field's hexadecimal value with the debugger; it will show you the value the compiler has set for the hard-coded field.

 

This makes me remember a collateral dormant disaster:

The Collateral Dormant Disaster

You now know why a *PGM is an object tagged CCSID(*HEXA). It is to forbid the system to do any translation of characters. In addition, you know how, just because of one character, a complete application goes haywire. There is another kind of object that may have the same behavior. It's *FILE objects. Yes, your database.

 

Each time a program gets a record, it is a sort of copy (from disk to memory) that occurs. If the *FILE doesn't have the same CCSID as the job that runs the program, then the i5/OS automatically applies the character conversion to preserve data integrity. Generally, it does that successfully. It is the i5/OS after all. Moreover, if it does not succeed, it sends an error message to the program so you can handle the problem. This is true for every CHAR field of all your *FILEs.

 

Here's an example of preserving database integrity: Imagine that my database field, tagged CCSID(37) (=American), contains This email address is being protected from spambots. You need JavaScript enabled to view it.', and my job is tagged CCSID(297) (=French), and my program reads the file field. My program receives This email address is being protected from spambots. You need JavaScript enabled to view it.'. The @ is automatically converted from x'7c' to x'44' because the read process has successfully converted the American @ to a French @. Conversely, when my program rewrites the record, the French @, coming from the program's memory, is converted to an American @; the @ is automatically converted back from x'44' to x'7c. The job works perfectly. And the database integrity is preserved.

 

There is a special case where the i5/OS does not translate characters and does not send any error message. It is when the database file itself is tagged as a binary object. If you have any database file tagged with CCSID(65535)—that is, you are in binary mode—automatic data-integrity translation is forbidden, and your application has been broken for years (1994 if I remember correctly, the year the OS/400 introduced the CCSID notion in V3R0-V3R1), but the issue is hidden because the application has behaved as expected for many years. The day the problem appears, be prepared to explain to the CIO that you need months to solve the issue with these dirty characters...but not too many months if you want to preserve your job.

 

A database integrity issue appears only when you use more than one CCSID. Client Access File Transfer, a PC application using CCSID(1252), is potentially the first one that makes you use more than one CCSID. Do you use file transfer? With Client Access, FTP, any PC application, or any Web application? You potentially have a CCSID issue to solve.

 

Preserving database integrity is a difficult problem. The solution requires a lot of preparation and a lot of qualification. If you have *FILE with CCSID(*HEXA), when you want to restore automatic database integrity mechanisms, be very careful; you can do this just one time. If you lose the conversion, your data, which was previously just dirty, becomes unmanageable. I'll stop here. This subject, by itself, requires a long article.

 

To enable you to see the problem, use the code below. With this code, you will build a very short file containing the 192 printable characters of the SBCS encoding (from X'40' to X'FF'). I encourage you to play with its CCSID and with file transfer tools. The goal is to obtain in a PC file—for example, under MS Excel—the same characters as a RUNQRY or a DSPPFM on your green-screen.

 

Note: Beware of hidden double-opposite errors.

 

That is to say, when you run RUNQRY or DSPPFM or any 5250 application to display data, if your job has CCSID(65535), automatic integrity mechanisms are disabled between the *FILE PF and the job. And if your job CHRIDCTL is *DEVD, automatic integrity mechanisms are disabled between the *FILE DSPF and the job. This is the most common double-opposite error: in this case, the integrity errors between file and job can be hidden by the error between the job and the screen. Consider my previous example in which my file contains This email address is being protected from spambots. You need JavaScript enabled to view it.. Say my file is tagged CCSID(65535). Then my job receives exactly what is in the database; the @ in the program's memory is x'7c'. This value is correct only in one case—when x'7c' is the good code for the @ for the job CCSID. Let me continue. Say my job is tagged CCSID(297), but my job CHRIDCTL is *DEVD. Then, the program sends to the screen exactly what is in memory. And my emulator is customized with CCSID(37). What I see is a perfectly valid email address. I think my job (and my program) works perfectly. It's wrong.

 

All the system DSPFs are already set to CHRID(*CHRIDCTL). Check yours if necessary. To begin properly, manually set up your job CCSID to match the CCSID of your 5250 emulator and set CHRIDCTL to *JOBCCSID.

Here's the DDS code source for building the EBCDICP file:

 

R EBCDICF

EBCDIC 16A COLHDG(''EBCDIC'')

 

 

This is the SQL code for loading data in the file (to be pasted in iNavigator Run SQL Script, for example):

 

INSERT INTO EBCDICP VALUES(x'404142434445464748494A4B4C4D4E4F');
INSERT INTO EBCDICP VALUES(x'505152535455565758595A5B5C5D5E5F');
INSERT INTO EBCDICP VALUES(x'606162636465666768696A6B6C6D6E6F');
INSERT INTO EBCDICP VALUES(x'707172737475767778797A7B7C7D7E7F');
INSERT INTO EBCDICP VALUES(x'808182838485868788898A8B8C8D8E8F');
INSERT INTO EBCDICP VALUES(x'909192939495969798999A9B9C9D9E9F');
INSERT INTO EBCDICP VALUES(x'A0A1A2A3A4A5A6A7A8A9AAABACADAEAF');
INSERT INTO EBCDICP VALUES(x'B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF');
INSERT INTO EBCDICP VALUES(x'C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF');
INSERT INTO EBCDICP VALUES(x'D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF');
INSERT INTO EBCDICP VALUES(x'E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF');

INSERT INTO EBCDICP VALUES(x'F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF');

 

Use these CMDs to map this hexadecimal file to some CCSIDs:

 

CRTDUPOBJ OBJ(EBCDICP) FROMLIB(*LIBL) OBJTYPE(*FILE) NEWOBJ(EBCDIC875) DATA(*YES)

CRTDUPOBJ OBJ(EBCDICP) FROMLIB(*LIBL) OBJTYPE(*FILE) NEWOBJ(EBCDIC297) DATA(*YES)

CRTDUPOBJ OBJ(EBCDICP) FROMLIB(*LIBL) OBJTYPE(*FILE) NEWOBJ(EBCDIC037) DATA(*YES)

chgpf ebcdic037 ccsid(037)

chgpf ebcdic297 ccsid(297)

chgpf ebcdic875 ccsid(875)

 

 

For the purpose of this demo, the CHGPF must be done after the data are copied to the file. If you CPYF EBCDICP to these new files after having changed the file's CCSID, you will trigger the automatic integrity mechanisms. If you try, check the results with DSPPFM+F10. On your green-screen, properly configured, RUNQRY will return the following.

 

 

Display Report

Report width . . . . . : 16

Position to line . . . . . Shift to column . . . . . .

Line ....+....1....+.

EBCDIC

000001 âäàáãåçñ¢.<(+|

000002 &éêëèíîïìß!$*);¬

000003 -/ÂÄÀÁÃÅÇѦ,%_>?

000004 øÉÊËÈÍÎÏÌ`:#@''=""

000005 Øabcdefghi«»ðýþ±

000006 °jklmnopqrªºæ¸Æ?

000007 µ~stuvwxyz¡¿ÐÝÞ®

000008 ^£¥•©§¶¼½¾[]¯¨´×

000009 {ABCDEFGHI¬ôöòóõ

000010 }JKLMNOPQR¹ûüùúÿ

000011 \÷STUVWXYZ²ÔÖÒÓÕ

000012 0123456789³ÛÜÙÚŸ

****** ******** End of report ********

 

 

Figure 1: You'll get this for EBCDIC037, with emulator host page 37 and job CCSID 37.

 

 

Display Report

Report width . . . . . : 16

Position to line . . . . . Shift to column . . . . . .

Line ....+....1....+.

EBCDIC

000001 âä@áãå\ñ°.<(+!

000002 &{êë}íîïìߧ$*);^

000003 -/ÂÄÀÁÃÅÇÑù,%_>?

000004 øÉÊËÈÍÎÏ̵:£à''=""

000005 Øabcdefghi«»ðýþ±

000006 [jklmnopqrªºæ¸Æ?

000007 `¨stuvwxyz¡¿ÐÝÞ®

000008 ¢#¥•©]¶¼½¾¬|¯~´×

000009 éABCDEFGHI¬ôöòóõ

000010 èJKLMNOPQR¹ûü¦úÿ

000011 ç÷STUVWXYZ²ÔÖÒÓÕ

000012 0123456789³ÛÜÙÚŸ

****** ******** End of report ********

 

 

Figure 2: You'll get this for EBCDIC297, with emulator host page 297 and job CCSID 297.

 

070611LamontreCCSIDarticlefig1

Figure 3: You'll get this for EBCDIC875, with emulator host page 875 and job CCSID 875.

 

Now it is time to try to transfer the files. If you are using MS Excel, it's here:

 

070611LamontreCCSIDarticlefig2

Figure 4: Transfer files in Excel. (Click images to enlarge.)

 

You may get something like this:

 

070611LamontreCCSIDarticlefig3

Figure 5: This is the result of your file transfer to Excel.

 

You will know you were successful when the Excel data is the same as the data in the corresponding PDF here. This page is an index of one PDF per code page.

 

If your Excel file transfer does not produce these files as shown, the first place to check is the CCSID settings of your ODBC connection (it is in the Conversion tab).

CCSID or Code Page?

Excellent question!

 

A CCSID contains one or more code pages. Some CCSIDs are made up of other CCSIDs. All the people I worked with do not need to worry because their CCSID number is equal to their code page number. That is not true for everybody. Look at this chart. You'll notice, for example, that you can find the CCSID 930 on this page, but there is no code page 930. What happens? Have a look at the CCSID 290; there are many common values. The code page 290 exists. The CCSID 930 uses the CCSID 290, I guess. Here's an excerpt from the chart on the IBM Web page:

 

Requested Encoding Scheme

Input
CCSID

1100

1200

1301

2100

2200

2300

4100

4105

4403

5100

5404

37

37

835

937

437

947

950

819

1252

0

367

0

290

290

4396

5026

1041

301

942

0

0

0

0

0

930

290

300

930

1041

301

942

0

0

0

0

5052

 

For further information, read about CCSIDs on Wikipedia and about globalization on IBM's Web site.

as/400, os/400, iseries, system i, i5/os, ibm i, power systems, 6.1, 7.1, V7, V6R1

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$