The SLM View
(From 20.3.1) In the Service-Level Management (SLM) view, you create service-level agreements (SLAs) and their component service-level objectives (SLOs) and quality of service (QoS) constraints. With this tool, you can build powerful, extensible, and measurable agreements with clients. Once you define SLAs in the SLM view, data is recorded and compliance is computed automatically.
You must have the SLM Admin permission set in the Access Control List (ACL) to view the contents of the SLM interface. If you do not have permissions or you are an account user, regardless of permissions, a "Permission Denied" message appears when you try to open the SLM view.
The following illustration provides an overview of the information covered in this article:
About SLM and SLAs
SLM is an industry-standard framework that is used for the primary management of network and application services. SLM uses a hierarchical set of measurable criteria to monitor and ensure the validity of SLAs between customers and service providers. Among other aspects, SLAs typically define a service provider's hours of operation, maintenance windows, up-time guarantees, timeliness in responding to issues, recovery aspects, and service performance.
The components of SLM form the following hierarchy:
- Service level agreements (SLAs). SLAs typically define a service provider's hours of operation, maintenance windows, uptime guarantees, timeliness in responding to issues, recovery aspects, and service performance. Operational aspects of SLAs are defined in one or more SLOs.
- Service level objectives (SLOs). SLOs are specific measurable characteristics of the SLA, such as availability, throughput, frequency, response time, or quality. System component measurements that support SLOs are defined in one or more quality of service (QoS) constraints.
- Quality of Service (QoS) constraints. QoS constraints specify source, target, threshold, and operating period settings, and are combined to produce the SLO achievement value.
Compliance Percentage and Compliance Period
A compliance percentage is the percentage of time that QoS constraints meet defined thresholds for the QoS object and its SLA. SLM checks each data sample for a defined QoS object, compares the value to the defined threshold, summarizes it as failed or successful, and calculates the percentage of samples that exceed ("breach") the threshold.
Compliance percentage is calculated according to the threshold appropriate to the QoS object: Some QoS constraints require a minimum performance measure (such as speed), a maximum performance measure (such as capacity), or a numerical performance measure (such as queue length). Thresholds can be calculated according to different calculation methods: best value, worst value, mean value, or number.
Compliance is tracked over time in two ways: the compliance period and the operating period. The compliance period is the overall contract period for the SLA, measured in days, weeks, or months. The operating period is the business-critical period within the compliance period, such as active business hours, and is defined in hours during calendar days. Operating periods are defined at the QoS object level. If no operating period is defined, the operating period and the compliance period are the same.
SLM creates a graph for each QoS metric defined in the SLA, including the sampled data, compliance threshold, and compliance period. In the following example, the red line represents the threshold value, the blue line represents the actual sample values, and the green line represents the average value of the data samples throughout the compliance and operating periods.
In this example, none of the samples breach the threshold line within the operating periods, which means that compliance is 100%. Samples that exceed the threshold value fall outside of the compliance period. Those samples can be specifically excluded from the compliance period for system maintenance or other foreseen downtime.
When a QoS metric breaches the object threshold, the compliance percentage is reduced according to the percentage of time that the threshold is breached. For example, if the total number of samples within the operating period is 129 and 9 samples breach the threshold, 6.98% (9 * 100/129) of the samples would be out of compliance.
Compliance values for multiple QoS objects are summed for their assigned SLOs and compliance values for multiple SLOs are summed for their assigned SLAs. In this example, if this QoS object is the only one defined in the SLA and the SLA required 98.50% or better compliance, the SLA would be breached due to a QoS compliance percentage of 93.02% (100% - 6.98% ).
Calculation Terms and Conditions for the QoS Object
The QoS reflects the data series that is measured by monitoring probes. The compliance percentage is calculated for each QoS object, and the results are presented to the SLO.
The compliance percentage for a QoS object is calculated based on the following settings:
- Threshold value- A threshold defines a maximum or minimum value for each QoS object. Each sample in the data series that a probe collects is evaluated to determine whether it meets or exceeds the threshold.
- Operating period- The operating period defines the time interval for a compliance percentage. Only data samples from within the operating period influences the compliance percentage.
- Calculation method- The Calculation method is the way the compliance percentage is calculated for the QoS object.
These settings are set in the
Quality of Service Constraintsdialog.
Calculation Terms and Conditions for the SLO
The SLO receives the compliance calculations from the associated QoS objects. The compliance percentage is calculated on each SLO, and the result is presented to the SLA.
The compliance percentage on the SLO is calculated, based on three different parameters:
- Excluded period- Data collected within an excluded period is not considered when compliance is calculated for an SLO. For example, excluded periods might be be days and times when the monitored system shut down for maintenance.
- Calculation method- The calculation method that you select determines how the compliance percentage is calculated.Select between two different types of calculation methods:FormulaorProfile:
- Formula- Select a mathematical formula to calculate the compliance percentage based on the input from a QoS:
- Average- Calculates the average value of the input from the monitoring probes.
- Best- Looks for the QoS object with the best result and selects this result.
- Sequential- The difference between 100% and the achieved compliance for each QoS object is summarized and then extracted from 100%.Example:The SLO receives the compliance calculations from two QoS objects with compliance of 70% and 90%.Calculated compliance: 100% - ((100% -70%) + (100% -90%)) = 60%.
- Weight- Weights the relative importance of the different QoS objects.
- Worst- Looks for the QoS object with the worst result and selects this result.
- Profile- Select one or more conditions to determine compliance:
- AND- The values ofallsamples inallQoS objects must meet or be better than the QoS threshold values for the SLO to be in compliance.
- OR- The values ofallsamples inany singleQoS object must meet or be better than its threshold value for the SLO to be in compliance.Example Using ANDIn the preceding example using AND, both data series must be equal to or better than the expected value. This condition is achieved except for the period marked red.Example Using ORIn the preceding example using OR, at least one of the data series must be equal to or better than the expected value. In the previous example, this condition is achieved except for the period marked red.
Calculation Terms and Conditions for the SLA
The SLA receives the compliance calculations from the associated SLOs and calculates the total compliance percentage based on three different parameters:
- Operating period- The operating period defines the critical days and times that compliance is measured (for example, Monday to Friday from 08:00 - 17:00). Only data series gathered within this period determine compliance percentages.
- Weight- Weight is the relative importance of the different SLOs to SLA compliance.
- Calculation method- The calculation method is the mathematical formula for calculating the SLA compliance percentage from SLOs:
- Average- Calculates the average value of the input from the SLOs.
- Best- Looks for the SLO with the best result and selects this result.
- Sequential- The difference between 100% and achieved compliance for each SLO is summarized and extracted from 100%.Example: The SLA receives the compliance calculations from two SLOs with compliance of 70% and 80%.Calculated compliance: 100% - ((100% - 70%) + (100% - 80%)) = 50%.
- Weight- Weighs the relative importance of the different SLOs.
- Worst- Looks for the QoS with the worst result and selects this result.
Data Collection and Compliance Calculation
QoS-enabled probes monitor and report changes and threshold breaches. QoS-enabled probes, such as cdm (the CPU, Disk, and Memory monitoring probe), generate messages for QoS objects that contain sampled data.
The data_engine probe subscribes to the primary hub to receive messages that are collected by QoS-enabled probes. QoS-enabled probes initiate themselves during startup by sending a QOS_DEFINITION message. The data_engine probe picks up and decodes this message, and then inserts it into the database.
The sla_engine probe retrieves the data that the data_engine probe inserts into the database. The sla_engine probe performs calculations according to the SLA settings and writes the results back into the database. Calculation jobs are automatically started and run on a schedule that is specified in the sla_engine probe UI.
Calculation jobs also can be started manually.
The high-level process for calculating SLA compliance includes calculations at each level in the heirarchy, from bottom to top.
- Each of the QoS constraints compares the collected data values from the probes with the defined threshold value and calculates the compliance percentage.
- The SLO collects the compliance values from the QoS constraints and computes the compliance percentage based on a selected calculation method (selects the best value, the worst value, the average value, etc.).
- The SLA collects the compliance value from the SLOs and calculates the total compliance value, also based on a selected calculation method.
Example 1: One QoS and One SLO
Example 2: Two QoSs and One SLO
If using a calculation method other than Default for the QoS.
Example 3: Two QoSs and One SLO, Using Calculation Method AND or OR
Example 4: Two SLOs, Each with Three QoS
This example uses a calculation method other than Default for the QoS.
In the following figure:
- SLO 1: - Calculating the compliance percentage from QoS 1, 2, and 3 using calculation method Worst yields a compliance percentage of 70%.
- SLO 2: - Calculating the compliance percentage from QoS 4, 5, and 6 using calculation method Average yields a compliance percentage of 90%.
The table below the figure shows the total SLA compliance percentage, using different calculation methods for the SLA.
The table shows the SLA compliance percentage for the previous example, selecting different calculation methods for the SLA:
The average value of the two SLOs (70% + 90%)/2
The best value of the two SLOs (70% and 90%)
The worst value of the two SLOs (70% and 90%)
The difference between 100 % and achieved compliance for each SLO is summarized and extracted from 100%:
100% - ((100% -70%) + (100% -90%))
Assuming that the weight distribution between SLO 1 and SLO 2 is set to 40 / 60 for the SLA:
(70% * 40/100) + (90% * 60/100)