Writing a UPS Monitoring Program

IT Infrastructure - Other
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Brief: If your system is equipped with an Uninterruptible Power Supply, you should take advantage of the benefits a UPS monitoring program has to offer. Using the techniques described in this article, your critical applications will be better prepared in the event of a power outage.

If you have ever experienced a power failure during peak production hours, you know the time it takes to restart your system and to rebuild indexes. For protection, many have installed Uninterruptible Power Supply (UPS) systems. However, having a UPS system is only the first step in protecting your AS/400 during a power failure.

The idea behind a UPS system is to provide enough electrical power to allow an AS/400 to survive a short power outage or, in the case of a long outage, to shut down in an orderly fashion. The AS/400 provides a number of system values which control actions in the event of a power failure (see "When the Power Fails..." elsewhere in this issue). In many cases, the system values provide adequate support for an automatic shutdown of the system.

However, there are some limitations to letting the system power itself down using the system values alone. (See the Advanced Backup and Recovery Guide for more details.) To really take control of the situation, you should consider implementing a UPS monitoring program.

This article covers one such monitoring program that automatically monitors for a power outage (24 hours a day), powers down the computer and restarts the system once normal power has been restored.

If you decide to use the UPS monitoring program presented here or one of your own, be sure to test the program before you begin relying on it to handle a power outage.

Application Considerations

The system values which relate to UPS systems work very well in most cases. However, what if some critical applications are operational and cannot be terminated abruptly? By using a UPS monitoring program, you can bring applications down in a controlled manner when a power failure occurs. CL and RPG both provide an easy method of detecting whether a controlled shutdown request has been initiated. This gives your critical application programs time to perform any last-minute cleanup (e.g., making sure database files are at a transaction boundary) before ending.

The RPG SHTDN op code turns on an indicator if controlled termination of the program has been initiated (i.e., specifying the *CNTRLD option on the following commands: ENDJOB, ENDSBS, ENDSYS or PWRDWNSYS). Your application must constantly check the SHTDN indicator and take appropriate actions to shut down in an orderly fashion. To complete the cycle, you could set up a procedure that would allow the UPS monitoring program to restart applications when power is restored. How this works depends entirely upon your application design.

In a CL program, you can use the ENDSTS parameter of the Retrieve Job Attributes (RTVJOBA) command. A CL variable receives a value that indicates whether a shutdown has been requested. As with RPG, you would have to program a method of terminating and restarting the application.

Physical Requirements

Deciding whether to implement a UPS monitoring program depends on the type of UPS you have installed on your system. If your AS/400 is equipped with an optional battery feature, you should use the default system values rather than implementing a UPS monitoring program. If you have a vendor-supplied UPS system, then several conditions must exist between the UPS and the AS/400 before you implement a UPS monitoring program.

First, the UPS must be capable of sending out a power-loss signal to the AS/400, over a nine-pin cable. (See Appendix E of the Physical Planning Guide and Reference.) You should contact your UPS vendor to determine whether this is possible. If not, implementing a UPS monitoring program will not work. If you are looking for a UPS system, make sure the vendor can provide the necessary link.

Second, every rack that affects the operation of your AS/400 must have its power supplied via the UPS. The reason for this is obvious. If the AS/400 cannot sense a loss of power to critical units (e.g., disk drives), the monitoring program cannot conduct an orderly shutdown. Components such as printers, terminals, modems and separate tape units-items which are not powered by a rack-do not have to be connected to the UPS. It might be a good idea, though, to have the console powered by the UPS as well as your modems, which would prevent remote users from failing immediately.

Lastly, you must have enough battery power to handle an orderly shutdown. This issue is critical since it determines what the monitoring program can accomplish. Make sure the UPS system you have chosen gives your AS/400 enough time to power down, plus some additional time for safety factors. For example, if it takes 15 minutes to power down, you might want your battery power to last 30 minutes. Work out this figure with your UPS vendor.

Since the amount of time it takes to power down is affected by the number of active jobs in the system, you must know how long it takes to power down during peak loads. Most installations cannot afford to power down during the day with users active just to come up with such an estimate. One approach-to be implemented after hours-is to start enough jobs in each subsystem to correspond to your maximum load. These jobs don't have to be doing any work; they just need to be active. Now when you issue the following command to power down immediately, use can see how long it takes.

 PWRDWNSYS OPTION(*IMMED) 

You can also choose to power down the system in a controlled manner:

 PWRDWNSYS OPTION(*CNTRLD) + DELAY(seconds) 

If you do so, be sure to add your delay time to the estimate.

Another approach is to use a calculation based on the amount of main storage your system has. Allow five minutes for the first 16MB and add one minute for each 16MB thereafter. For example, if your system has 64MB, it would take approximately eight minutes (5 + 3) to power down the system. Keep in mind that this is not an exact number. It is only an approximation based on the average work load of a system of that size.

Be sure to add a buffer to this number to account for a time when your system may be running a higher than average work load.

Whatever method you use to determine the time it takes to power down your system, consider the additional time required for any special shutdown code (e.g., RPG SHTDN operations) you may have in your application.

Another piece of information that you need to know is how long the UPS system takes to recharge the batteries. The monitoring program uses this data when multiple power outages occur.

For the monitoring program presented in this article, we are going to use the following time constraints:

 1. Power-down time..= 15 minutes 2. Battery power.......= 30 minutes 3. Recharge time.......= 10 hours 

Signal Processing

Once the UPS and AS/400 are communicating with each other, you must know how the AS/400 will process UPS power signals.

When a power-loss signal is received, the AS/400 checks for an active UPS monitoring program and sends the signal to that program for processing. The program then performs any and all work. If a monitoring program is not running, the AS/400 determines when to power down. In either case, if the AS/400 is operating on battery power and a weak battery signal is received (if the UPS is set up to send this signal), the computer automatically saves all memory to disk and powers down.

A power-restored signal is handled in exactly the same manner as the power-loss signal. If no UPS monitoring program is active and the computer has not already initiated a power-down, the AS/400 will reset itself and continue as normal. When the AS/400 is powered down while on battery power, it does not reset itself until utility power is restored.

System Values

In order to establish a UPS monitoring program, you must set or use several system values. Do not set these system values until you are ready to install your monitoring program into a production environment.

QIPLSTS (IPL Status Value): This system value is not set. The UPS monitoring program uses it to determine if the IPL being performed is due to a power restore. A value of "1" indicates that a power restore caused the IPL.

QUPSDLYTIM (UPS Delay Time): This system value is used to determine when to power down the computer when a UPS monitoring program is not in use. Set QUPSDLYTIM to *NOMAX for the UPS monitoring program UPS001CL, presented in 1.

QUPSDLYTIM (UPS Delay Time): This system value is used to determine when to power down the computer when a UPS monitoring program is not in use. Set QUPSDLYTIM to *NOMAX for the UPS monitoring program UPS001CL, presented in Figure 1.

QPWRRSTIPL (Automatic IPL after Power Restored): The system uses this value to determine whether the computer should automatically restart once utility power has been restored. For the UPS monitoring program included in this article, this value should be set to 1 (auto-IPL is allowed). This makes it possible to bring the computer back up unattended.

QUPSMSGQ (UPS Message Queue): The AS/400 uses QUPSMSGQ to determine if you are using a program to handle power failures. If a valid message queue is specified for this system value, it must be allocated to a program at all times. If a signal is received and the message queue is not allocated, the AS/400 assumes that you are not using a UPS monitoring program. Set QUPSMSGQ to QGPL/UPSMSGQ when implementing the UPS001CL program.

Objects Required

In addition to the monitoring program, four additional objects must exist. You must create these objects before attempting to install the UPS monitoring program.

1. UPS Message Queue.

 CRTMSGQ MSGQ(QGPL/UPSMSGQ) + TEXT(`UPS Monitoring Message + Queue') 

2. UPS Data Area.

 CRTDTAARA DTAARA(QGPL/UPSDATE) + LEN(12) VALUE(`930101000000') + TEXT(`UPS Date/Time of Last + Power Restore') 

3. CLCDATDIF command from QUSRTOOL. Follow directions in QUSRTOOL for creating this command.

4. CLCTIMDIF command from QUSRTOOL. Follow directions in QUSRTOOL for creating this command.

Monitoring Program

With the basic information provided, you are now ready to take a look at the actual UPS monitoring program. For ease of understanding, I have incorporated all the logic in one program. However, it would be best to incorporate the functions into separate programs to facilitate maintenance and changes.

The first item that requires discussion is the monitoring program's operating environment. This program must be operational 24 hours a day. It must start automatically with every IPL of your system. You can accomplish this by making the program an autostart job in your controlling subsystem, as explained in the Work Management Guide. How you get the job operational is up to you, but it must start up without any operator intervention and, as a result, cannot be started from the command line.

As the UPS monitoring program starts up, it first determines whether the last IPL was an automatic restart after a power failure. If so, you set the data area (UPSDATE) to remember when the power restore took place. Next you restart all the subsystems and any pertinent jobs. (Determining which jobs are pertinent is your call.) If you have a start-up program that executes automatically via the QSTRUPPGM system value, you can place this logic in your start-up program.

Next, the program allocates and monitors the UPS message queue (UPSMSGQ). Every five minutes, if no message is received, the program checks to see if a terminate program request has occurred, which means that the job has been canceled. If so, the program terminates.

If any power signals are received, the program continues to read all messages received until the last message is processed. The reason for this is that the UPS might send out several messages at once. You are interested in only the last message received.

If the message received is for a power loss (CPF1816), the program puts all batch job queues on hold, preventing any new jobs from starting.

The program continues to monitor the UPS message queue for the next minute to see if a power-restored signal (CPF1817) is received. If it is, the job queues are released and the monitoring program resumes its normal operations.

If one minute elapses without a power-restored signal, the program begins the shutdown process. The program presented here first sends out a broadcast message, just in case any users are still active. Next, it terminates the batch subsystems with a time limit of five minutes. You could tie this into a paging system or any other type of notification application you may have. Remember, however, that each task you include in your shutdown procedure affects the amount of time you have available to power down your computer. You will want to include a safety factor because of the amount of battery power available.

Testing the Monitoring Program

To test the UPS monitoring program, you need to send the appropriate power- failure and power-restored messages to the UPS message queue. You can create and send the messages yourself or design a program to do the work for you. This way you can test different scenarios for power failures.

Once you are satisfied with the UPS monitoring program, you are ready to place it in production. Follow the steps listed:

1. Create UPS monitoring program(s).

2. Set up autostart of UPS monitoring program(s).

3. Change system values.

4. Start the monitoring program(s).

Once you've installed the program, you should test a live power outage before one actually occurs unexpectedly. Most UPS systems allow you to control how power flows to the computer so that you can place the AS/400 on battery power as if a power failure has occurred. Your UPS vendor can tell you how to do this.

After the Install

After you have installed the UPS monitoring program, you should keep several points in mind.

First, if your vendor comes in to work on the UPS system, disconnect the nine- pin connector at the AS/400 end. If the vendor runs any type of test, you will probably receive a power-loss signal. This is important since the UPS monitoring program is completely automated. Unplugging and plugging the UPS cable into the AS/400 does not cause any problems with the monitoring program. The AS/400 simply notifies you that the UPS is no longer sending any signals.

As you make changes to your operating environment (e.g., new subsystems, job queues and so forth), don't forget to change the UPS monitoring programs. If you don't make these changes, you may get erroneous results on the power-down and restart procedures. This is one reason it is better to create subprograms to perform specific tasks instead of having one big monitoring program.

If you make changes to the UPS monitoring program, you must remember to cancel and restart the job currently running. If you don't, your changes do not take effect since the program is active at all times. The job continues to point to the "saved" program object or terminates if it can't find that saved object name.

Finally, there may be times when your UPS monitoring program will not be active-for example, when you place your system in a restricted state in order to perform a Save System (SAVSYS) or Reclaim Storage (RCLSTG) procedure. During these times, you should prepare your system for a possible power failure without the use of a UPS monitoring program. You can accomplish this in several ways. For example, you could change the system value QUPSDLYTIM to something other than *NOMAX. This would instruct the system to automatically power itself down after a specified amount of time following a power failure. For other ideas on how to deal with this situation, see "When the Power Fails..." in this issue of MC.

Be Prepared

A UPS monitoring program will be of great help in achieving 24-hour operations in an unattended environment. With such a program, you can safely expect your computer to follow the procedures necessary to handle power outages gracefully. While it does not solve every problem you may encounter, you can at least rest assured that if and when power outages occur, your AS/400 will continue to function properly.

James Coolbaugh is a senior systems analyst in Cleveland, Ohio.

REFERENCES

Advanced Backup and Recovery Guide (SC41-8079, CD-ROM QBKA9101).

Physical Planning Guide and Reference (GA41-9571, CD-ROM QBKA3601).

Work Management Guide (SC41-8078, CD-ROM QBKA9J01).


Writing a UPS Monitoring Program

Figure 1 UPS Monitoring Program UPS001CL

 /*===============================================================*/ /* To compile: */ /* */ /* CRTCLPGM PGM(XXX/UPS001CL) SRCFILE(XXX/QCLSRC) + */ /* USRPRF(*OWNER) */ /* */ /*===============================================================*/ UPS001CL: + PGM DCL VAR(&AUTOIMPL) TYPE(*CHAR) LEN(1) DCL VAR(&DATE) TYPE(*CHAR) LEN(6) DCL VAR(&DAYS) TYPE(*DEC) LEN(7) DCL VAR(&HOURS) TYPE(*DEC) LEN(2) DCL VAR(&MSGID) TYPE(*CHAR) LEN(7) DCL VAR(&QDATE) TYPE(*CHAR) LEN(6) DCL VAR(&QTIME) TYPE(*CHAR) LEN(6) DCL VAR(&SAVMSGID) TYPE(*CHAR) LEN(7) DCL VAR(&SECONDSA) TYPE(*CHAR) LEN(5) DCL VAR(&SECONDS) TYPE(*DEC) LEN(5) DCL VAR(&TIME) TYPE(*CHAR) LEN(6) DCL VAR(&ENDSTS) TYPE(*CHAR) LEN(1) /* STEP 1: Program initialization */ /* */ /* If this program is starting after an IPL after power failure, */ /* A) Save date and time of restart into dataarea. */ /* B) Start up the computer. */ RTVSYSVAL SYSVAL(QIPLSTS) RTNVAR(&AUTOIMPL) IF COND(&AUTOIMPL *EQ '1') THEN(DO) /* Save Date/Time of Restart */ RTVSYSVAL SYSVAL(QDATE) RTNVAR(&QDATE) RTVSYSVAL SYSVAL(QTIME) RTNVAR(&QTIME) CHGDTAARA DTAARA(QGPL/UPSDATA (1 12)) VALUE(&QDATE || &QTIME) /* Restart system */ STRSBS SBSD(QINTER) STRSBS SBSD(QBATCH) STRSBS SBSD(QPGMR) STRPRTWTR DEV(PRT01) OUTQ(QGPL/QPRINT) /*.*/ /*.*/ /*.*/ ENDDO /* Allocate the UPS message queue to this job */ ALCOBJ OBJ((QGPL/UPSMSGQ *MSGQ *SHRRD)) /* Monitor for Power signals */ /* */ /* Set up loop to process the UPS Message Queue */ /* 1. Wait on message for 5 minutes. If no message received */ /* check to see if a TERMINATION has been requested. If so */ /* shutdown the program. */ /* 2. If a POWER Status Change message is received we must */ /* first receive all of the messages in the Queue. This is */ /* order to see the true status of the Power. (See UPS */ /* Manual Page 2-23) */ LOOP: + RCVMSG MSGQ(QGPL/UPSMSGQ) WAIT(300) MSGID(&MSGID) /* Check for TERMINATION request */ IF COND(&MSGID *EQ ' ') THEN(DO) RTVJOBA ENDSTS(&ENDSTS) IF COND(&ENDSTS *EQ '1') THEN(DO) GOTO CMDLBL(ENDPGM) ENDDO GOTO CMDLBL(LOOP) ENDDO /* Find the last power warning message received */ LOOP1: + IF COND((&MSGID *EQ 'CPF1816') *OR (&MSGID *EQ 'CPF1817')) THEN(DO) CHGVAR VAR(&SAVMSGID) VALUE(&MSGID) ENDDO RCVMSG MSGQ(QGPL/UPSMSGQ) WAIT(0) MSGID(&MSGID) IF COND(&MSGID *NE ' ') THEN(GOTO CMDLBL(LOOP1)) /* Check power warning error code, if not power loss, continue */ IF COND(&SAVMSGID *NE 'CPF1816') THEN(GOTO CMDLBL(LOOP)) /* STEP 2: Power Loss Signal Processing */ /* */ /* First check to see if 10 hours has elapsed since power was */ /* restored. If not, batteries have not been fully recharged. */ /* Power down the computer with no restart. */ /* */ /* Place all batch job queues on hold to prevent new jobs from */ /* starting. */ /* */ /* Wait one minute to see if power is restored, if so, release */ /* batch job queues and continue processing. */ /* */ /* If one minute expires, power down the computer. */ RTVDTAARA DTAARA(UPSDATA (1 6)) RTNVAR(&DATE) RTVDTAARA DTAARA(UPSDATA (7 6)) RTNVAR(&TIME) RTVSYSVAL SYSVAL(QDATE) RTNVAR(&QDATE) RTVSYSVAL SYSVAL(QTIME) RTNVAR(&QTIME) /* Calculate difference between power restore and current + failure */ CLCDATDIF FROMDATE(&DATE) TODATE(&QDATE) NBROFDAYS(&DAYS) IF COND(&QTIME *LT &TIME) THEN(DO) CHGVAR VAR(&DAYS) VALUE(&DAYS - 1) CHGVAR VAR(&HOURS) VALUE(%SST(&QTIME 1 2)) CHGVAR VAR(&HOURS) VALUE(&HOURS + 24) CHGVAR VAR(%SST(&QTIME 1 2)) VALUE(&HOURS) ENDDO CLCTIMDIF FROMTIME(&TIME) TOTIME(&QTIME) SECONDS(&SECONDSA) CHGVAR VAR(&SECONDS) VALUE(&SECONDSA) /* Not within 10 hours of power restore */ IF COND((&DAYS *EQ 0) *AND (&SECONDS *LT 36000)) THEN(DO) PWRDWNSYS OPTION(*IMMED) RESTART(*NO) GOTO CMDLBL(ENDPGM) ENDDO /* Hold ALL Batch Job Queues */ HLDJOBQ JOBQ(QBATCH) MONMSG MSGID(CPF0000) HLDJOBQ JOBQ(QPGMR) MONMSG MSGID(CPF0000) /* Check to see if the power has been restored within 1 minute */ RTVSYSVAL SYSVAL(QDATE) RTNVAR(&DATE) RTVSYSVAL SYSVAL(QTIME) RTNVAR(&TIME) LOOP2: + CHGVAR VAR(&SAVMSGID) VALUE(' ') RCVMSG MSGQ(QGPL/UPSMSGQ) WAIT(0) MSGID(&MSGID) LOOP3: + IF COND(&MSGID *EQ 'CPF1817') THEN(DO) CHGVAR VAR(&SAVMSGID) VALUE(&MSGID) ENDDO RCVMSG MSGQ(QGPL/UPSMSGQ) WAIT(5) MSGID(&MSGID) IF COND(&MSGID *NE ' ') THEN(GOTO CMDLBL(LOOP3)) /* Power Restored, Release ALL Batch Job Queues */ IF COND(&SAVMSGID *EQ 'CPF1817') THEN(DO) RLSJOBQ JOBQ(QBATCH) MONMSG MSGID(CPF0000) RLSJOBQ JOBQ(QPGMR) MONMSG MSGID(CPF0000) GOTO CMDLBL(LOOP) ENDDO /* Calculate time lapse since power failure */ RTVSYSVAL SYSVAL(QDATE) RTNVAR(&QDATE) RTVSYSVAL SYSVAL(QTIME) RTNVAR(&QTIME) CLCDATDIF FROMDATE(&DATE) TODATE(&QDATE) NBROFDAYS(&DAYS) IF COND(&QTIME *LT &TIME) THEN(DO) CHGVAR VAR(&DAYS) VALUE(&DAYS - 1) CHGVAR VAR(&HOURS) VALUE(%SST(&QTIME 1 2)) CHGVAR VAR(&HOURS) VALUE(&HOURS + 24) CHGVAR VAR(%SST(&QTIME 1 2)) VALUE(&HOURS) ENDDO CLCTIMDIF FROMTIME(&TIME) TOTIME(&QTIME) SECONDS(&SECONDSA) CHGVAR VAR(&SECONDS) VALUE(&SECONDSA) /* 1 minute has expired, take down the system */ IF COND(&SECONDS *GE 60) THEN(DO) /* Send Message to any active users to signoff */ SNDBRKMSG MSG('Power Failure occured for more than one + minute....System is powering down.') TOMSGQ(QSYSOPR) SNDMSG MSG('A power failure has occured at the Lakewood + Office....Please sign off IMMEDIATELY. Thank you.') + TOUSR(*ALLACT) /* Cancel Interactive Subsystems with a delay of 5 minutes */ ENDSBS SBS(QINTER) DELAY(300) ENDSBS SBS(QCMN) DELAY(300) ENDSBS SBS(QSPL) DELAY(300) /* Cancel Batch Subsystems with a delay of 10 minutes */ ENDSBS SBS(QBATCH) DELAY(600) ENDSBS SBS(QPGMR) DELAY(600) /* Issue POWERDOWN Allow for 15 minutes */ PWRDWNSYS DELAY(900) RESTART(*YES) /* Cancel all writers (helps out QSPL) */ ENDWTR WTR(*ALL) GOTO CMDLBL(ENDPGM) ENDDO GOTO CMDLBL(LOOP2) ENDPGM: + ENDPGM 
BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$