In a recent article, Chris Walker introduced the IBM Service Management Suite for z/OS and how its products can be used to achieve operational excellence (see link). Today, I’d like to take a closer look at IBM System Automation for z/OS, which is part of the Service Management Suite for z/OS. In this blog, I want to give you a brief overview about the product, share some great news and a personal view about how IBM evolves the product in future.
What is System Automation?
IBM System Automation for z/OS is IBM’s premier product for automated operations and high availability in z/OS environments. It automates hardware and software tasks, from complete system IPL and shutdown to operating individual applications. System Automation exploits IBM Tivoli NetView for z/OS which provides a powerful automation framework, especially regarding message management. When System Automation detects application failures it can quickly restart the application in place or on another system in the sysplex. These capabilities are also the basis for disaster recovery solutions provided by IBM Geographically Dispersed Parallel Sysplex (GDPS).
A remarkable characteristic is that System Automation does not require user scripts to automate the resources on a system or within a sysplex. Instead it uses a policy that defines the resources, their attributes and the relationships to other resources in the policy. There are no limits as far as the size of this policy is concerned. Some customers manage tens of thousands of resources in it.
System Automation manages the resources towards their desired status. This is also referred to as goal-oriented automation. Whenever there is a mismatch between observed status and desired status, System Automation will adjust. It does so by considering dependencies and the desired status of other, related resources to determine exactly how to adjust.
What makes up System Automation?
To answer that question, I would like to draw the metaphor of a house. Over the past releases, System Automation has evolved in a way that balances the different needs of its customers. This balance is necessary to retain a stable and symmetric architecture.
As the foundation, System Automation addresses core automation requirements that enable its use on today’s mainframe servers. Recent examples are day-1 support in System Automation for latest IBM Z hardware and z/OS software, including the middleware. Also, incremental continuous enhancements in NetView ensure that this foundation is stable and prepared for what is coming in future.
As we build the house, to stay with the metaphor, one side is built upon key productivity capabilities that are unique in the market and that distinguish System Automation from other automation products. Two recent examples include run modes and pacing gates.
- With run modes, it is possible to distinguish different sets of resources that are permitted to be active at a given time. Operators can switch between run modes using a single command. A typical example is a run mode BASIC and a run mode COMPLETE. The run mode BASIC would encompass only those resources that are needed for basic, e.g. maintenance activities. The run mode COMPLETE on the other side, would encompass the resources in BASIC but also those representing production work, such as CICS or Db2. Using the INGRUN command to set the run mode to BASIC, the operator can easily switch a system from production to maintenance level.
- Pacing gates allow you to pace the amount of resources that are started or stopped at the same time. If certain resources take a lot of CPU during these times, throttling them a little bit leaves enough CPU to other resources which can lead to reduced IPL and shutdown times. At first sight, this seems to be the opposite of dealing with all resources as fast as possible. But in fact it is a very powerful means to orchestrate these busy phases as it avoids other artificial tricks to accomplish the same results.
The other side of the house consists of capabilities that I would categorize as Enterprise Usage. These are specific functions that address large customers’ needs but that can be beneficial to all customers. Examples from the past are System Automation’s contribution to GDPS or automation support of IMSplex environments. System Automation’s concept of resource level security also belongs to Enterprise Usage. Mostly banks or other companies in the financial business sector have a strong demand for restricting the operators’ access to only the resources that they are permitted to operate. However, these days, security certainly affects every client, small and large.
And finally, there is the roof of the house which is Modernization, the efforts to simplify the usage of the product. The motto of the Service Management Suite for z/OS certainly is also that of System Automation: Easier for the Beginner, Faster for the Expert. An example in this category is the ongoing activity of the development team to evolve the set of best practices policies. From the base operating system stack, to middleware like Db2, to individual products like IBM z/OS Connect EE or IBM OMEGAMON for JVM, you can find sample policies for most of the typical software products that you can import into your own policy and adopt to the naming conventions of your environment.
What is new in System Automation?
In March 2017, a new version of System Automation, V4.1.0 was released. It once again includes many unique capabilities that can lead to a higher and more robust degree of automation while greatly enhancing the productivity of its users. In addition, further enhancements have been delivered in September 2017 that round up these new functions. So, what are the highlights?
Core capabilities – New Hardware Support and Faster Problem Isolation
2017 is the year of the IBM z14™, the newest member of the IBM Z® family. Like in the past, System Automation provides day-1 toleration support of the new server. This support includes an updated hardware API for customers that use the System Automation BCP internal interface for GDPS or the SNMP-based Processor Operations component.
Occasionally, automated resources are in a status that operators cannot explain. In the past it often required expert skills to address such situations. Now, System Automation takes a different approach. The new problem isolation function INGWHY explains the situation, its reasons and possible courses of actions at a glance. As such, INGWHY is a major leap towards ease of use for System Automation. As a foundational utility for small and enterprise customers, even an experienced operator can use it to quickly understand why a resource is not in the status that it is expected to be. This combined with the ability to create tailored, installation-specific actions can make it a purpose fit solution for your environment and increase operator productivity a lot.
Key Productivity Capabilities – Suspend / Resume
Normally, there is an always growing demand for more and more automation. However, at times, you want to avoid the involvement of automation. Take planned maintenance hours, for instance, where Subject Matter Experts want to manage CICS-regions manually. Or when you want to prepare a new IMS subsystem in the policy that is scheduled for production only in a month from now.
Wouldn’t it be great if you could simply suspend the automation for your CICS-environment without impacting the automation of any other workload and without alerting the operations bridge? Wouldn’t it be even better if you could proactively plan the automated resources for IMS and prepare everything within System Automation to merely include these new resources into automation when time has come?
This is now possible with System Automation’s Suspend/Resume capability. An operator or SME can use the new INGSUSPD command to suspend the automation with a single command. He can do that for a single resource or a set of resources including dependent resources. While resources are suspended, they are not started or stopped automatically and no automation takes place on behalf of messages. Operators will always be presented with a satisfactory resource status to avoid being alerted. When a suspended resource is started or stopped manually from the console, System Automation will follow the resource’s status, but will never react.
In the case of preparing automation for one or more resources, administrators define these resources as they do today. In addition, they list them in a special data set. When a new automation configuration is loaded, the resources listed in this data set are suspended and remain suspended until an operator explicitly resumes the suspension.
In the picture below, suspended resources are shown in turquoise color. Suspension can be toggled on/off using line commands S and R for Suspend and Resume, respectively.
Enterprise Usage – Cross Sysplex Automation
Most enterprise customers have multiple sysplexes, and often there are multi-tiered applications spread across these sysplexes with mutual dependencies. For such applications, the same rules can apply that apply to many started tasks in a single z/OS system: Start and stop sequences matter. Not obeying these rules can lead to errors and alerts are presented at the operations bridge. At minimum, this causes confusion but often manual service restoration is required too.
To support smooth operations within your mainframe environment, even across sysplex boundaries, System Automation now provides a cross-sysplex automation capability. It allows you to define domains in your automation policy, that represents those (remote) sysplexes that are in the span of control of the local, managing sysplex. In addition, you specify resource references, representing ordinary System Automation resources in a remote sysplex, and associate them with a domain. For the resource references, the same relationship and grouping concepts apply which allow you to exploit all the capabilities that you already know.
With such a policy in effect, System Automation can now honor start or stop dependencies for resources automated in different sysplexes. A typical example is an ISV log management product that consists of a single master and many client components that are spread across all z/OS systems in the enterprise. Stopping the master for planned maintenance reasons produces alarms on the client-side systems when they attempt to contact it, because they assume the master is running. With cross-sysplex automation, clients can be defined as child resources to the master and hence, stopping the master would automatically cause first stopping of the clients. This eliminates false alarms and avoids manual coordination of such activities across possibly different organizations in your enterprise.
Operators work with resource references in the same way as with any other resource. They either use the traditional panels within the NetView environment or, much more easily, logon to Service Management Unite explained next.
Modernization – Service Management Unite
Finally, System Automation customers are now also entitled to use Service Management Unite (SMU) for Automation as their new user interface for operators. SMU, providing a graphical, web-based user experience, is simple, intuitive and modern. It allows operators to easily determine the health of automated resources and restore service for the whole enterprise on a single screen. There are individual dashboards for different tasks and users can easily navigate between them. On the dashboards, users find all information in context of that task on a single pane in tabular or graphical, e.g. topology, views. Using the context menu, operators can pick an action in context of the selected resource. Examples are to stop it, to view NetView CANZLOG data or to define a new planned schedule.
Another characteristic of SMU is its ability to customize the dashboards. It is possible to create dashboards for managers that present the information aggregated in colored status gauges. On the other side, operators and subject matter experts can see the information in a table or in form of a topology chart. For other users, information may be filtered to focus the user on only what he should be concerned with. The content of the dashboards is not limited to only System Automation data. Dashboards can also include information from another IBM or non-IBM product or from some custom data source, like a database.
All the above is possible, because the underlying framework, JazzSM and DASH, provides all these capabilities already for free. If you have IBM Netcool OMNIbus or IBM Workload Scheduler, it is even possible to use the very same server not just for these products alone but with SMU added on top of it.
To give you an impression how a typical SMU dashboard looks like, have a look at the picture below. It illustrates INGWHY’s capabilities for problem isolation explained above, rendered in a dashboard.
What we are going to focus on next?
In future, we will continue this course and provide focussed capabilities that address the needs of all our customers. Thus, we will make sure that the automation house continues to stand solid and that we are ready to face tomorrow’s challenges.
Upcoming development efforts continue with modernizing the product focusing on application groups and message management. If you want to learn more I would like to invite you to join our Early Access Program. In this program you can get first-hand experience from the developers and give feedback. And if you like you can even become a sponsor user to more deeply influence the journey of System Automation.
Be sure to subscribe to our bi-monthly zITSM newsletter to stay informed of the latest announcements, upcoming events, educational material, and product introductions and updates. Subscribe at IBM.biz/zSubscribe.
Also, check out Doc Buddy v2.0, which now includes top industry news, thought leadership articles, and more.