About SpectroSERVER Fault Tolerance

About SpectroSERVER Fault Tolerance   
Fault tolerance requires more than one 
 to manage a given landscape. A copy of the database for that landscape is loaded on each 
. However, only a single copy is active at any time. The 
 with the active database is known as the 
. The inactive database runs on a standby 
, which is the secondary 
. You can also install another inactive copy of the database on a tertiary 
If the primary 
 fails, the database on the secondary 
 becomes active, and the secondary 
 starts managing the network. Applications that are connected to the primary 
 are automatically switched to the secondary 
. When the primary 
 returns to service, the applications automatically switch back to the primary 
, and the secondary 
 becomes inactive again.
 Not all applications can exercise the full range of their capabilities when they are being run from a secondary
. The main reason to set up a fault-tolerant environment is to ensure continuous monitoring of the network, not to create a full copy of
CA Spectrum
 Precedence in a Fault Tolerant Environment
Primary, secondary, and tertiary 
s that manage the same landscape must all have the same landscape handle and the same modeling catalog. The servers are distinguished from one another with a numeric precedence value. The lowest number indicates the primary 
s are installed with a default precedence value of 10. To designate a 
 as a secondary server, assign it a higher precedence number, such as 20. Likewise, a tertiary 
 would have a higher precedence than the secondary, for example, 30.
When you first set up a fault tolerant environment, you can assign precedence values at the time you are loading database copies on any standby 
s using the SSdbload utility.
To change precedence values later, you can use the Loaded Landscapes subview. Access this subview by selecting a local landscape in the Navigation panel, and then selecting the Information tab in the Component Detail panel.
The Loaded Landscapes subview is different from the
Control subview. Access the
Control subview by selecting the VNM in the Navigation panel and then selecting the Information tab in the Component Detail panel.
 Data Synchronization
A single database is active at any given time in a fault tolerant 
CA Spectrum
 environment. Therefore, the other databases must be updated periodically to reflect new models and changes to attribute values in the active database. This synchronization of data is accomplished through the 
CA Spectrum
 Online Backup feature. You can run Online Backup on demand or at regularly scheduled intervals. When you run Online Backup against the primary 
, it creates a backup copy of the current database. Online Backup automatically loads the copy onto each designated secondary 
As in any DSS environment, each of the 
s in a fault tolerant environment must have the same modeling catalog installed. Online Backup copies the current modeling catalog. However, it does not copy all the .i files or other elements that are associated with individual management modules. Therefore, if you install any new management modules on your primary 
, also install the same new management modules on any secondary 
For more information, see the .
EventDisp and the Alertmap files that are defined in the <
>/custom/Events directory are propagated to fault-tolerant servers when the secondary 
 polls the primary 
 for status information.
Support for Fault-Tolerant Archive Manager
You can run the Archive Manager on the secondary 
 host in a fault-tolerant 
 environment. This secondary Archive Manager provides visibility to events in OneClick when the primary Archive Manager is down.
Primary or secondary 
 locally stores events in the following two scenarios:
  • When primary Archive Manager is down, and the primary 
     is running. In this case, primary 
     locally stores events as they are created until primary Archive Manager is up.
  • When the primary 
     host itself is down. In this case, the secondary 
     locally stores events as they are created until the primary Archive Manager is up.
You can start the secondary Archive Manager on the secondary 
 host to provide visibility to not only events as they are created when the primary Archive Manager is down, but also historical events.
When you start the secondary Archive Manager, it acts as a client to the primary 
 to receive and log events as they are created. This behavior does not affect the normal connection between the primary 
 and primary Archive Manager. When the primary Archive Manager goes down, OneClick fails over to the secondary Archive Manager to provide event data.
When the primary 
 host itself goes down, the secondary 
 locally stores events, but also forwards events to secondary Archive Manager. When the primary Archive Manager comes up, the secondary 
 transfers all the locally stored events to it.
Archive Manager Data Synchronization
The secondary Archive Manager provides a best-effort synchronization of events, and there is no event synchronization that occurs between the primary Archive Manager and the secondary Archive Manager. When the secondary Archive Manager is running and connected to a 
, it receives a copy of all events as they are generated. Anytime the secondary Archive Manager is down, events are not stored on the secondary. This functionality is distinctly different from the functionality of primary Archive Manager, where the
 stores the events for later transfer to the primary Archive Manager.
This means that when the secondary Archive Manager is started for the first time, its DDM database does not contain any events, and no attempt is made to synchronize with the primary. Once the secondary Archive Manager has been running for MAX_EVENT_DAYS configured in the .configrc, it is generally in sync with the primary Archive Manager database.
Generate an Alarm If the Secondary 
 Is Not Restarted
When a primary 
 synchronizes its database with the secondary 
, a Contact Lost to Secondary Server (0x00010c0e) event and alarm are generated. The secondary 
 has been brought down to load the new database from the primary 
You can set up a rule to process this alarm so that the alarm is generated only if the secondary 
 is not restarted.
The EventPair rule lets you specify that a new event is generated if the Contact Lost to Secondary Server event occurs and a Contact Established to Secondary Server (0x00010c0f) event does not follow within a specified time period. You can then specify that this new event creates an event and an alarm indicating that the secondary 
 is still down.
Follow these steps:
  1. Open the EventDisp file with a text editor.
    The EventDisp file is located in the
    /SS/CsVendor/Cabletron directory.
  2. Find the line that reads 0x00010c0e E 50 A 2, 0x00010c0e and change this line to the following:
    0x00010c0e R Aprisma.EventPair, 0x00010c0f,
    • <generatedeventcode>
      Is the event code to generate if the secondary 
       does not come up within the time specified in 
  3. Add the following line to the EventDisp file:
    <generatedeventcode>E 50 A 2, <generatedalarmcode>
    • <generatedeventcode>
      Is the event code generated in Step 2 if the secondary 
       did not come up. 'E 50' indicates that the event is logged and has a severity value of 50. A 2 indicates that a major alarm is created. 
       is the alarm code to generate based on this event.
  4. Create a Probable Cause file for this alarm that indicates that contact with the secondary 
     has not been reestablished after data synchronization.
 For more information, see the .
Secondary SpectroSERVER Readiness Levels
A secondary 
 is considered to be at one of three different levels of readiness. Readiness depends on server configuration and status. The readiness levels are defined as follows:
  • Hot
    The secondary 
     is running and is available to take over immediately upon failure of the primary 
     because it is already polling. To configure a secondary 
     for this level of readiness, add the following line to the .vnmrc file: secondary_polling=yes. This statement causes the standby to commence polling and processing traps whenever it starts, regardless of its connection status with the primary 
  • Warm
    The secondary 
     is running, but the server can take a short time to become fully available. The secondary 
     has not been configured to start polling 
     it loses contact with the primary 
    . For example, it has no secondary_polling entry in the .vnmrc file, or the entry is set to no.
    If the secondary_polling entry is not in the .vnmrc file or the entry is set to no, the secondary 
     does not process traps while in standby mode.
  • Cold
    The secondary 
     is not running and must be started when there is a failure of the primary 
    . In this case, it is irrelevant whether the secondary 
     is configured for secondary polling.