Autonomic Computing

Commentary
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times


"Autonomic" as defined by Webster:
1. Of, relating to, or controlled by the autonomic nervous system.
2. Acting or occurring involuntarily.

"Autonomic" as defined by IBM:
An approach to self-managed computing systems with a minimum of human interference. The term derives from the body's autonomic nervous system, which controls key functions without conscious awareness or involvement.

Autonomic computing is a component of IBM's greater On Demand initiative, and despite what its marketing suggests, on-demand is not something IBM invented. It's what the entire computing world has been moving toward for years, and it will likely be what computers truly provide in the not-so-distant future. All IBM did was crystallize this concept and center its marketing messages around it. "On demand" has all kinds of meanings in computing systems and components, but if the literature and the spin get too confusing, simply go back to the most fundamental definition of on-demand: "I want what I want when I want it." Vendors are endeavoring to provide systems and components that deliver to people and enterprises whatever they need whenever they need it.

Leaps in technology have allowed us to get a glimpse of what true on-demand can mean. It is likely that in our lifetime complex computing systems will to a great degree manage themselves. In our children's lifetime, it is possible that they will be 100% self-managing. But for the present, what on-demand promises to bring to the average IT shop is a tangible speeding up of the delivery of computing needs to stakeholders, as well as increasing levels of reliability.

Again, a key component of on-demand is autonomics: the ability of computing systems to essentially maintain themselves and work optimally with little or no operator input. Just like the autonomic functions of the body, such as breathing, heart rate, metabolism, immunity, etc., autonomic functions replace the need for operators to consciously monitor and maintain processes--usually, because it is less efficient. As with the body, some processes work more efficiently and reliably if they don't have to wait for and rely upon manual input. Think about it. If we had to be constantly making decisions about our heart rate, breathing rate, digestion processes, blood pressure, immune functions, and the myriad other systems of the body, we would have no time for anything else. Plus, we would regularly get it wrong, and we would definitely become incredibly neurotic.

Frankly, within computing environments, human error is a significant cause of problems. A real benefit of automating operations tasks is that operators and administrators can shift their time and attention to higher-value tasks. Yes, it's possible that autonomic functions may put people out of jobs, but it's more likely that these skills would be used for more important activities--like finding ways to get the most out of the company's expensive ERP system.

In 2001, Paul Horn, Senior Vice President of IBM Research, issued an "autonomic manifesto." It is prefaced on IBM's research Web site as follows: "The growing complexity of the IT infrastructure threatens to undermine the very benefits information technology aims to provide. Up until now, we've relied mainly on human intervention and administration to manage this complexity. Unfortunately, we are starting to gunk up the works.... The information technology boom can only explode for so long before it collapses on itself in a jumble of wires, buttons and knobs. IBM knows that increasing processor might, storage capacity and network connectivity must report to some kind of systemic authority if we expect to take advantage of its potential. The human body's self-regulating nervous system presents an excellent model for creating the next generation of computing, autonomic computing."

On-demand systems must consistently accomplish two key goals in order to deliver their potential: prevention and efficiency. Prevention proactively detects and corrects situations that could cause a computing component to no longer be available. Efficiency ensures that available computing resources are always put to the best use. Autonomics is the critical ingredient or "grease" that makes on-demand possible, because it enables the rapid execution of on-demand capabilities.

As we explore the functions of autonomic computing, you will see that many autonomic functions have long been included in computing components. Things like redundant parts, firewalls, and virus scanners are some of the better-known computing functions with autonomic capabilities. But many others that are less well-known, like virtualization, provisioning, and capacity on demand (COD), are becoming pivotal functions in autonomic computing architectures. More about these shortly.

Prevention and Efficiency

IBM's expositions on autonomic computing characteristics present the concept in four distinct categories or quadrants: self-configuring, self-optimizing, self-protecting, and self-healing (Figure 1).

http://www.mcpressonline.com/articles/images/2002/Autonomic%20article%20061704%20V400.png

Figure 1: IBM divides autonomic computing concepts into quadrants. (Click images to enlarge.)

To best explain autonomics in this short space, however, I'd like to group these categories in the context of the two key goals of on-demand computing: prevention and efficiency. Keep in mind that my description of autonomic functions within each category don't correlate exactly with IBM's, but again, I have only shifted things around a bit for the purpose of clarity and brevity.

When examining autonomic functions, it is sometimes easier to first think about prevention, then efficiency.

Prevention

It goes without saying that the ability of computing systems to be efficient is considerably undermined if they are unexpectedly taken offline. So first and foremost, the job of autonomics is prevention--a strong defense. It is the same within the human body. There are so many amazing things that the human body does to self-regulate systems, but they are all compromised if the immune system, or self-protection layer, is poor.

Within the category of prevention are IBM's "self-protecting" and "self-healing" autonomic characteristics. These, of course, generate great interest among computing managers because of their direct relationship to the prevention or reduction of downtime, which makes for happy managers, users, and customers.

Self-Protecting
The first line of defense is to have technology that prevents problems from happening in the first place. It's the well-worn adage: "An ounce of prevention is worth a pound of cure." Of course, in the human body, the immune system is the self-protecting layer; it's what guards the body from the barrage of germs, viruses, and microbes that are encountered every day.

The most obvious and well-known self-protecting functions in computing systems are security-related. In addition to things like firewalls and anti-virus software are functions such as cross system authentication, VPNs, and digital certificates.

Other examples of self-protecting technologies found in hardware components are redundant power that prevents an outage due to power failure; redundant cooling that prevents an outage due to a cooling failure; dual power cords, which enable dual source power to the server; hot-plug power, which allows the replacement of failing power supplies during normal system operation; hot-plug cooling, which allows the replacement of failing fans/chillers during normal system operation; and mirrored disk drives, which ensure that systems continue to run even if a drive fails.

Emerging technologies can also deal with potentially overwhelming bursts of activity. For instance, say an article runs in Time magazine, extolling your small company and its recent IPO, and a million people hit your relatively small-capacity Web site all at once. Functions are available to manage or deflect this barrage of activity, thus preventing your site from crashing.

Excellent examples of the use of self-protecting technologies appear in high availability software, which uses a second server to keep copies of applications synchronized with the production server in real time. That redundancy of data on its own provides self-protecting capabilities; however, when IBM's clustering technologies are integrated, the high availability software can automatically trigger a failover to the mirrored system if error messages are received that indicate a high probability of a component failure. This functionality can automatically switch processes and users to the second machine, thus preventing or minimizing downtime. In some circumstances, it can even bring users back to the transactions they were updating prior to the failure.

Self-Healing
It is nothing new to have computing components detect failures, errors, or other problems, but it is particularly beneficial to have intelligence built into components that can automatically correct a problem when it occurs. Better yet is the ability for potential problem situations to be detected and corrected even before the problem happens.

A good example of a self-healing computing component would be memory modules that detect and remove faulty sectors. For instance, error correcting code (ECC) memory and caches can detect and fix soft or hard failures. Furthermore, "Chipkill" memory technology automatically and transparently removes a failing dual in-line memory module (DIMM) from the configuration and substitutes a spare DIMM in its place.

For years, certain types of disk drives have displayed self-healing abilities through RAID technology, which allows data to be automatically reconstructed on alternate drives if an individual disk failure occurs.

Another good example of self-healing capabilities appears in some high availability software products that are able to detect that an object on the backup system has gotten out of synchronization with the production system. Instead of just notifying an operator that a problem exists, self-healing functions automatically resynchronize the object.

Efficiency

Once computing components can reliably protect themselves and quickly heal themselves (or be quickly and easily fixed by technicians), the next step in the hierarchy of autonomic computing needs is the need to have these components automatically adjust themselves to work as efficiently as possible. IBM's autonomic characteristics as they relate to efficiency are "self-configuring" and "self-optimizing" functions.

It is much the same in the human world. Once your own basic needs of sustenance and protection are covered (i.e., food, clothing, shelter), then you can start focusing on making yourself more comfortable by working to create a life that's easier and more efficient. Within the autonomic systems of the human body, the same applies: If the immune system is strong (self-protecting, self-healing), then other autonomic systems in the body can adjust themselves in order for the body work to more efficiently (self-configuring, self-optimizing). If the immune system is compromised, however, the body is continually assaulted by disease, and autonomic systems "hunker down" in a sort of survival mode. This means that there is not much benefit to the body devoting resources to make it work more efficiently.

Self-Configuring
Just as the body automatically adapts to the introduction of new things, from foods and climates to drugs and transplanted organs, computing systems must automatically adapt to the introduction of new components. This is particularly critical in complex computing environments. The sheer number of variables in these complex environments demands self-configuration. But self-configuration of computing components is really nothing new. Take personal computers: When you install a new piece of software, usually all you need to do is click "Install Now," and off it goes. Behind the scenes, the software automatically installs registry entries, drivers, plug-ins, etc. in order for the software to integrate as seamlessly as possible into the computing environment. In fact, some components, such as anti-virus software or even operating systems, not only self-configure new updates, they also automatically go out to a Web site and download updates--all without any user intervention.

Self-configuring abilities don't apply just to the introduction of new components; they also mean that environments adapt as security needs and workloads change and if components fail.

Self-Optimizing
This category is where efficiency starts to pay off in spades. When computing systems automatically adjust the allocation of resources, efficiency can dramatically improve.

A key driver for self-optimization is a predefined policy that specifies how resources are to be allocated and under what circumstances. This can be either operator-defined or "learned" by the computing resource(s). For instance, complex systems have myriad parameters that must be tuned correctly for the systems to perform at their peak. Autonomic processes monitor the systems and "learn" the most appropriate choices before beginning to tune parameters.

One technology that is getting a good deal of attention is "virtualization," which takes a pool of resources and dynamically allocates them based on the greatest need. Virtualization is sometimes the best way to get the most out of resources, particularly in complex environments. For instance, disk virtualization divides a large disk resource into separate virtual drives, each having capacities of space that can be dynamically adjusted to accommodate fluctuations in demand. It can also do the opposite: take a whole bunch of disk drives and put them together as a single virtual drive. (See "The Power of Storage Virtualization.")

Another self-optimizing capability is "provisioning," which automatically allocates additional increments of available resources as needed--again, based on policies that are predefined or "learned" by the systems.

Among IBM midrange servers, the Capacity Upgrade on Demand (CUoD) function has gotten a good deal of attention in the last few years. This self-optimizing feature takes additional processors that are shipped with the system and enables them (either automatically or manually) when additional horsepower is needed. Of course, there is a cost each time the auxiliary processors are engaged, but having this resource in reserve has proven to be a tremendous resource for companies that experience significant workload spikes.

Other self-optimizing features include dynamic partition creation, self-learning databases, automatic sending of alerts to wireless devices, dynamic adjustment of job priorities, scheduling of jobs based on predetermined events, and much more.

The Dark Side of Autonomics

As systems get more complex, automated functions and artificial intelligence must take over to keep systems online and reduce vulnerability to the caprices of manual intervention. But as it is with any technology, there is a dark side to automating processes.

Arguably, the largest fear factor of autonomics is relinquishing control, not knowing whether the automation is going to cause something to go completely wrong and make a worse mess than could ever be caused by manual operations. Therefore, it is vital that you incorporate only proven autonomic technologies into your systems, and even then, these should probably be added gradually so that you can achieve a level of confidence before bringing on the next layer of automation.

Another real problem is the loss of hands-on experience with hardware and software. This rears its head when complex system management tasks need to be done. Software and hardware engineers typically automate the easiest tasks first, which means that the more complex ones are saved for operators. Because operators lose the practice and familiarity that comes from repeatedly performing the easier tasks, the complex tasks become more difficult to grasp and more prone to error.

Of course, a very real negative is that operations jobs are inevitably lost to automation. But as mentioned earlier, it is more likely that the skills and talents of these operators are reallocated to higher-value IT tasks in organizations.

Think It. Have It. Not Yet.

One thing is for sure: Autonomic and on-demand computing capabilities are going to continue to increase exponentially in the coming years. It could be that sometime in the future, on-demand computing will truly become as simple as always having what you want when you want it. But don't start naming your computer "Hal" quite yet.

Bill Rice is a freelance technology writer and marketing consultant. He can be reached at This email address is being protected from spambots. You need JavaScript enabled to view it..

BLOG COMMENTS POWERED BY DISQUS

LATEST COMMENTS

Support MC Press Online

$0.00 Raised:
$