XML, FOP, and PDF: The Three Tools of Printing

Document Management
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Since I last wrote an article for MC, I’ve focused on the problems surrounding the modernization of existing applications. By the time you read this, E-deployment: The Fastest Path to the Web will be available at www.mc-store.com. This book is my guide to putting green-screen applications onto the Web. Another version of the software handling subfiles and display attributes will be available on my Web site at www.java400.net/edeployment. However, while I was addressing that particular problem, a new issue arose: Once these applications run, how do you get the output to the user?

This article introduces three tools—XML, PDF, and Format Object PDF (FOP)—that, when combined, allow you to render your existing data into a format that can be transmitted over the Web. This article focuses on a small example that generates a simple PDF document from a comma-delimited template. It serves as an introduction to the PDF concept. I also touch on the other technologies and let you know where to get more information. And if there’s interest, I can follow up with some more in-depth coverage of this architecture.

What’s a PDF, and Why Do I Want One?

HTML is good enough for small amounts of data, but, for reports, you really need something more robust, especially when dealing with a lot of fonts and graphics. This is where Adobe Systems Incorporated’s Portable Document Format (PDF) comes into play. PDF documents are the industry standard for platform-independent reporting. A PDF document created on an Apple Computer Macintosh will print exactly the same on a Sun Microsystems Solaris workstation or on a Wintel network. A PDF document easily generates the graphic in Figure 1, which I used as a prototype for an insurance company.

Note the nice box, the centering, the different fonts, and the background colors. PDF is excellent for this, and this document will print with the same format no matter what kind of computer I use to print it. However, PDF documents are notoriously difficult to generate. The internal format is cryptic, difficult to understand, and even more difficult to follow. Here are the first three lines of a typical PDF document:

%PDF-1.3
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>


Trust me, it only gets worse from there. With PDF, the entire document is treated as a collection of objects, and the objects are placed on a page. Unlike traditional print reports where you simply print lines of text and then overflow to the next page, with PDF you lay out the appearance of each page in detail, assigning X and Y coordinates to each individual object.

So, you might wonder whether you have to become a PDF expert to create PDF documents? Well, until recently, the answer was “Yes.” Adobe provides software to generate PDF documents as the output of most major desktop applications, such as Microsoft’s Word and Excel. But if you wanted to create a document programmatically from, say, an RPG program, it wasn’t all that easy. In fact, the work involved far outweighed the benefits. The preferred technique today is to output the report and use a CPYSPLF command to convert the document to something that can be transmitted via email.

The First Step: FOP

Thus, I entered into the world of FOP formatter. This project is completely open source and is run under the auspices of the Apache Software Foundation. Apache is at www.apache.org, and FOP is at xml.apache.org/fop. FOP takes an XML-like language called Formatting Objects (FO) and converts that into a PDF document. Like PDF, FO is not an easy language to use. Here are the first few lines of a FO document:

http://www.w3.org/1999/XSL/Format”>

margin-left=”1cm” margin-bottom=”0.5cm”

margin-top=”0.75cm” page-width=”21cm”

page-height=”29.7cm” page-master-name=”first”>

Not particularly pretty, is it? Therefore, since I’m not particularly good at reading the angle brackets of an XML-style language, I designed my own simple comma-delimited language. The entire source for the insurance card seen in Figure 1 is in Figure 2.

Without going into too much detail, the various lines define the master document’s size and margins, then define a block within that document, and finally define the text within that block. The various parameters define things like size and location. For text elements, the parameters define attributes such as font, space before and after, and colors. Now, how do you convert Figure 2 to a PDF document? Well, I’ve written a simple Java package, com.pbd.cdfo, which includes an application called CdfoToFo. CdfoToFo converts the comma-delimited source of Figure 2 into the required FOP syntax. Then you run the free FOP converter from Apache, and it generates the PDF document.

Why All the Trouble?

Well, I told you at the beginning of this article that I would talk about XML. You may have already noticed that the data in Figure 2 is all constant data. That’s great for generating forms, but not so good for inserting live data from a database application. One solution is to extend my comma-delimited formatting object (CDFO) syntax to support the inclusion of variable data, and I will probably do that in the coming months. However, the better long- term solution is to use XML.

XML is the newest technology for machine-to-machine communications. With XML, your data is completely packaged in a self-defining tag language, so a computer program can easily extract the information. More important, however, is something called Extensible Style Language (XSL). XSL supports the concept of a style sheet that converts the data in an XML document into another format.


An XSL style sheet allows the same XML document to be translated easily into any other style, including HTML or FOP. Since the data in the XML document is self-defining, the XSL processor can extract the relevant information and format it into the target syntax. Thus, you can display the same XML document in a browser, on a wireless device, or as a PDF document. Not only that, but different users can use different style sheets to customize the display to their own particular needs.

The XML for the card might look like the code in Figure 3. The XSL style sheet determines the fonts and sizes for the various types of data, such as headings and warnings.

The Final Word on the Printed Word

Printed documents are still a mainstay of the business world, and it seems that PDF is becoming the standard way to distribute them. However, PDF is cumbersome and difficult to generate, so you can use FOP as a gateway between regular database data and the intricate syntax of PDF. At the same time, I hope FOP will help you learn the basics of XML. Then, once you’ve become comfortable with the XML concepts, you can move on to define your data as XML messages and use XSL to translate to the appropriate output format, whether it is HTML, PDF, or whatever else comes along.

REFERENCES AND RELATED MATERIALS

• Adobe Systems home page: www.adobe.com

• Apache Software Foundation: www.apache.org

• Apache Software Foundation’s FOP page: xml.apache.org/fop

• E-deployment: The Fastest Path to the Web. Joe Pluta. Carlsbad, California: MC Publishing Co., 2000

• W3C XML page: www.w3.org/xml

Figure 1: PDF-generated documents go well beyond simple text, as this insurance card shows.


XML__FOP__and_PDF-_The_Three_Tools_of_Printing03-00.png 495x227

master, 0.75cm, 0.5cm, 1cm, 0.5cm, 29.7cm, 21cm

block, 0cm, 0cm, 6.5cm, 10.5cm, 1.5pt, 2pt

text, sans-serif, 8pt, centered, bold, 10pt

MISREPRESENTATION OF INSURANCE

text, sans-serif, 8pt, centered, bold, 10pt

IS A FIRST DEGREE MISDEMEANOR

text, sans-serif, 8pt, centered, , 8pt, , 8pt

Your cooperation is necessary for your protection.

text, sans-serif, 8pt, centered, bold, 12pt, , , white, black

IN CASE OF ACCIDENT: NOTIFY POLICE

text, sans-serif, 7pt, start, , 7pt, 8pt, 3pt

1. GET NAMES AND ADDRESSES OF ALL WITNESSES.

text, sans-serif, 7pt, start, , 7pt, 3pt

2. GET NAMES AND ADDRESSES OF DRIVERS AND OCCUPANTS OF OTHER CARS AND INJURED.

text, sans-serif, 7pt, start, , 7pt, 3pt

3. GET NAME(S) OF INSURANCE CARRIER(S) OF DRIVER(S) INVOLVED AND POLICY NUMBER(S).

text, sans-serif, 7pt, start, , 7pt, 3pt

4. BE COURTEOUS, DISCUSS ACCIDENT ONLY WITH POLICE OR COMPANY ADJUSTER, DO NOT ADMIT RESPONSIBILITY OR
AGREE TO PAY ANYTHING OR SIGN ANYTHING EXCEPT FOR COMPANY ADJUSTER. text, sans-serif, 7pt, start, , 7pt, 3pt

5. GET ALL FACTS AND REPORT ACCIDENT AT ONCE.

/block
/master

Figure 2: The beauty of CDFO is its simplicity; this short source generates the insurance card.

MISREPRESENTATION OF INSURANCE

IS A FIRST DEGREE MISDEMEANOR

Your cooperation is necessary for your protection.

IN CASE OF ACCIDENT: NOTIFY POLICE

GET NAMES AND ADDRESSES OF ALL WITNESSES.

GET NAMES AND ADDRESSES OF DRIVERS AND OCCUPANTS OF OTHER CARS AND INJURED. element>

GET NAME(S) OF INSURANCE CARRIER(S) OF DRIVER(S) INVOLVED AND POLICY NUMBER(S).

BE COURTEOUS, DISCUSS ACCIDENT ONLY WITH POLICE OR COMPANY ADJUSTER, DO NOT ADMIT
RESPONSIBILITY OR AGREE TO PAY ANYTHING OR SIGN ANYTHING EXCEPT FOR COMPANY ADJUSTER.

GET ALL FACTS AND REPORT ACCIDENT AT ONCE.

Figure 3: In XML, the formatting is left until later. The source simply identifies the elements of the card.


BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$