Port Fault Correlation

Contents
casp1032
Contents
CA Spectrum
lets you customize its fault isolation algorithm to resolve the root cause of a network outage to the port level. This is most desirable in cases where a single physical port, such as a Frame Relay interface, supports multiple logical connections to remote devices. If the physical port goes down,
CA Spectrum
can suppress the alarms on all downstream devices in favor of a single red alarm on the physical interface, thus significantly reducing the number of alarms which need attention. The impact severity and scope of the red alarm on the physical interface will contain all downstream devices, as well as the physical interface.
Port Fault Correlation Options
Use the Port Fault Correlation setting in the VNM model’s Fault Isolation subview to configure port fault correlation.
  • Disabled
    Disables port fault correlation. The root cause of a network outage will remain a red alarm on a device model. However, Fault Isolation will still examine all of the device’s connected ports to see if they are all in Maintenance Mode. If so, the alarm on the device model will be suppressed.
  • All Connected Ports
    Port fault correlation will run, examining all ports that exist on “up” neighbors which are connected to the down device as possible root causes of the outage. No additional manual configuration is required.
  • Management Neighbors Only
    Enables port fault correlation to run but only examine ports which were previously configured manually as management neighbors as possible root causes of the outage.
  • All Connected Ports -- Multiple Devices Only
    Enables port fault correlation, examining all ports that exist on “up” neighbors which are connected to the down device as possible root causes of the outage. However,
    CA Spectrum
    will only resolve the outage to the port level if there is more than one device model that would have a red alarm which can be correlated to the port alarm. If only one connected device alarm can be correlated to the port alarm,
    CA Spectrum
    will not suppress the device alarm. Instead, both the port and device alarm will be generated.
Port Fault Correlation Criteria
The following criteria must be met for the root cause of an outage to be resolved to the port level:
  • The down device must have only one “up” neighbor. If the down device has more than one “up” neighbor, port fault correlation will not be performed. This is done to reduce the number of alarms for a single problem. If multiple up neighbors were a valid criteria, and all connected ports were down, multiple red alarms would exist, all with the same impact severity and scope. If a device has more than one up neighbor,
    CA Spectrum
    assumes the problem lies with the device, not the upstream ports and creates a single red alarm on the device.
  • The down device must have at least one connected port (or management neighbor port) on an “up” neighbor that is down.
  • If multiple ports on the “up” neighbor connect to the down device (such as link aggregation), all of the ports must be down.
  • A port is considered “down” if it is either operationally down, or the port model has been put into Maintenance Mode.
  • There must be an alarm on at least one of the down ports. Otherwise, there would be no alarm to which
    CA Spectrum
    could resolve the outage.
  • If Port Fault Correlation is set to Management Neighbors Only, management neighbors for the down device must have been configured before the outage occurred.
Port Fault Correlation Caveats
Port Fault Correlation overrides the Suppress Linked Port Alarms setting in the Live Pipes subview. When set to TRUE, this setting suppresses the alarm on an upstream port if it's connected to an unreachable device. If Port Fault Correlation is enabled, and the upstream port is the root cause of an outage,
CA Spectrum
forces the upstream port to alarm.
Only the Criticality of the alarmed port will be used in the impact severity and scope calculation of the root cause alarm. The Criticality of any sub-interfaces (such as DLCI ports) will not be included.
Port Fault Correlation is supported by Device models only. Models such as Fanouts and Unplaced do not support this feature. WA_Link models have their own mechanism for supporting port fault correlation, Link Fault Disposition, which is explained in Wide Area Link Monitoring.
If multiple ports on the “up” neighbor connect to the down device (e.g. link aggregation), and all of the ports are down, multiple red alarms will exist as the root causes of the outage. Each red alarm will contain the same impact severity and scope. The root cause of the outage in this case is all of the ports, not just one of them.
Example Port Fault Correlation Scenario 1
spec--faultscenario1_OTH
The previous diagram assumes that
CA Spectrum
must communicate through Router A to reach Routers 1 through N, and that this is the only means by which
CA Spectrum
can reach them. Each remote router is connected to Router 1 using a frame relay link. In
CA Spectrum
, this is modeled by connecting each DLCI port model to the other device.
If the physical frame relay interface (FR A) goes down in this scenario, all virtual circuits provisioned on the interface will go down as well. With Port Fault Correlation disabled, the alarms shown in the following diagram will occur.
Fault Scenario 1: Alarms without Port Fault Correlation
spec--faultscenario1a_OTH
If a trap is received for FR A going down (or a live pipe is configured to be on), the physical frame relay interface will have a red alarm on it. In addition, all routers connected to the frame relay interface will have a red alarm on them. This could mean multiple red alarms could be generated by
CA Spectrum
for a single problem.
Port Fault Correlation reduces the number of alarms generated for this problem to a single alarm without requiring any manual configuration beforehand. The following diagram shows the results with Port Fault Correlation enabled.
Fault Scenario 1: Alarms with Port Fault Correlation
spec--faultscenario1b_OTH
A single red “Bad Link” alarm will be seen in the Alarms tab. That alarm will have an Impact Scope and Severity which contains the following models: FR A, Routers 1 through N, and all unreachable devices that are downstream from Routers 1 through N.
Example Port Fault Correlation Scenario 2
This fault scenario illustrates the benefits of setting the Suppress Linked Port Alarms and Port Fault Correlation attributes as recommended in Suggested Port Fault Settings for Optimal Fault Notification.
The following diagram assumes that the VNM must communicate through Routers A and B to reach Routers C, D, and E, and that is the only means by which the VNM can reach them. In
CA Spectrum
, port-level connectivity is modeled as shown.
Fault Scenario 2: Multiple “Up” Neighbors
spec--faultscenario2_1_OTH
In this scenario, Router C goes down, which causes
CA Spectrum
to lose contact with Routers C, D, and E, and makes Ports A and B go down as well. If Suppress Linked Port Alarms is set to TRUE, and Port Fault Correlation is set to All Connected Ports, only a single red alarm on Router C will result, as shown in the following diagram:
Fault Scenario 2: Multiple “Up” Neighbors
spec--faultscenario2_2_OTH
The upstream ports (Ports A and B) have their alarms suppressed because Suppress Linked Port Alarms is set to TRUE. Even though Port Fault Correlation is enabled, Router C has multiple “up” neighbors, so the fault won't be resolved to the port level. When this occurs,
CA Spectrum
assumes the fault is with the device itself, not the connected ports.
If you set Suppress Linked Port Alarms to FALSE, and Port Fault Correlation is still set to All Connected Ports, Router C and the upstream ports will be alarmed (if the status of the ports is being polled, or
CA Spectrum
receives a LinkDown trap), as shown in the following diagram:
Fault Scenario 2: Multiple “Up” Neighbors
spec--faultscenario2_3_OTH
Once again, the fault wasn't resolved to the port level because Router C has multiple “up” neighbors. Since Suppress Linked Port Alarms is disabled,
CA Spectrum
will alarm the upstream ports.
If Router C had only one “up” neighbor, as shown in the following diagram, even if Suppress Linked Port Alarms were set to TRUE (assuming Port Fault Correlation is still set to All Connected Ports),
CA Spectrum
will resolve the fault down to the port level. Port Fault Correlation forces the upstream port to be alarmed, and the alarm on Router C is suppressed.
Fault Scenario 2: Single “Up” Neighbor
spec--faultscenario2_4_OTH
Example Port Fault Correlation Scenario 3
This fault scenario demonstrates what happens when Port Fault Correlation is set to All Connected Ports--Multiple Devices Only. It assumes that
CA Spectrum
must communicate through Router A to reach Routers 1 through N, and that this is the only means by which
CA Spectrum
can reach them. Each remote router is connected to Router 1 using a frame relay link. This is modeled by connecting each DLCI port model to the other device, as shown in the following diagram:
spec--faultscenario1_OTH
Assume the physical frame relay interface (FR A) goes down. This means that all virtual circuits provisioned on the interface will go down as well. With Port Fault Correlation disabled, the alarms shown in the following diagram will occur:
Fault Scenario 3: Alarms without Port Fault Correlation
spec--faultscenario1a_OTH
With Port Fault Correlation set to All Connected Ports--Multiple Devices Only, the alarms in the following diagram will occur because multiple devices can be correlated to the frame relay interface.
Fault Scenario 3: Port Fault Correlation Set To “All Connected Ports--Multiple Devices Only”
spec--faultscenario1b_OTH
With Port Fault Correlation set to All Connected Ports--Multiple Devices Only, if there is only one router lost because of a down link, then the alarm on the remote router will not be suppressed.
SPEC--
Port Fault Correlation Anomalies
If a red alarm is generated on a port model as the root cause of an outage, you may then choose to put that port model into Maintenance Mode. If so, the red alarm would be replaced with a brown alarm. The brown alarm will still contain the same impact severity and scope (except the maintenance port will no longer contribute to the impact). If you then decide to take the port out of Maintenance Mode, the red alarm will reappear. It is possible, in this scenario, for the impact scope and severity of the red alarm to be lost.