RMF LPAR CPU Management

The z/OS Intelligent Resource Director (IRD) helps data centers move resources to where the workloads need them (even in different logical partitions). IRD combines the powers of three elements: Workload Manager (WLM), Parallel Sysplex, and PR/SM. It consists of three separate functions: dynamic channel path management (DCM), channel subsystem I/O priority queuing, and WLM LPAR CPU management.
micsrm140
The z/OS Intelligent Resource Director (IRD) helps data centers move resources to where the workloads need them (even in different logical partitions). IRD combines the powers of three elements: Workload Manager (WLM), Parallel Sysplex, and PR/SM. It consists of three separate functions: dynamic channel path management (DCM), channel subsystem I/O priority queuing, and WLM LPAR CPU management.
The rest of this article focuses on this last function of the IRD, as it is the only one that directly affects RMF measurements related to CPU utilization in a PR/SM context.

Introduction to LPAR CPU Management

LPAR CPU management, available for LPARs with shared processors, consists of two parts:
  • WLM LPAR weight management, which dynamically adjusts the relative weights of logical partitions to help workloads that are missing their goals.
  • WLM LPAR Vary CPU management, which dynamically adjusts the number of online (actually assigned) logical processors in a partition, so that the CPU resource available to the LPAR matches the capacity required.
For an LPAR to be a candidate for WLM LPAR CPU management, it must:
  • be running z/OS in z/Architecture (64-bit) mode
  • be running in LPAR mode on an IBM System z
  • be using shared standard CP processors
  • not be hard capped
  • be in WLM goal mode
  • be in a parallel sysplex
WLM LPAR CPU Management functions occur within an LPAR cluster, which is a set of logical partitions running on the same CPC and belonging to the same parallel sysplex. Other LPARs (outside of the cluster) are not affected.

WLM LPAR Weight Management

In an LPAR environment, access to central processors is based on the active partitions' relative weights. When an LPAR is not managed by WLM, its processing weight is set when the partition is defined and can be modified at any time by the console operator. The problem is, that, if the workload mix changes, weights would have to be manually adjusted to optimize the CPU resource utilization. This would require 24/7 human monitoring of the workload performance, and an outstanding ability to anticipate workload variations, not an easy task in the new eBusiness world. Alternatively, you could configure enough capacity to handle peak periods, but, outside of these periods, the excess capacity goes unused, and, therefore, is lost.
With WLM LPAR weight management, the relative shares of the participating partitions will be automatically adjusted, so that the LPARs hosting the most important workloads, based on WLM policy, get the necessary CPU resources.
This process occurs within a set of partitions, explicitly defined as being managed by WLM, an LPAR cluster, and the total weight of the cluster remains constant. This way, LPARs outside of the cluster, are not affected.

WLM LPAR Vary CPU Management

This function allows WLM to dynamically vary unneeded standard CP processors offline. It means that, from RMF perspective, a standard CP processor can actually be online to a partition for only a portion of its recording interval, instead of the complete interval duration as before. The immediate impact of this capability is that for WLM managed logical partitions, the reported number of standard CP processors for an LPAR is not the number of defined processors anymore, but the number of online (in fact, actually assigned by WLM) standard CP processors. This value can be fractional.
But, the most important change is in the definition of the "CPU Time," and hence, the way CPU busy calculations are performed:
Before IRD:
CPU Time = Interval Time - Wait Time and CPU Time CPU (%) = ------------------------------ * 100 Interval Time * No. Processor
After IRD:
CPU Time = Online Time - Wait Time and CPU Time CPU (%) = ------------------------------ * 100 Online Time
The SMF type 70 subtype 1 CPU Activity record provides the time, within the RMF interval when a processor was online to the logical partition (SMF70ONT). From
MICS
perspective, all data elements representing online times come from this raw field, which is always available, whether the LPAR is managed by WLM or not.