Establish Fault Tolerance

Contents
casp1032
You can set up a fault-tolerant environment when you first install
DX NetOps Spectrum
, before any models have been created. Or you can set up a fault-tolerant environment after you install
DX NetOps Spectrum
.
In this article:
Establish Fault Tolerance
The following procedure describes how to set up two
SpectroSERVER
s: a primary and a secondary. You can also set up a tertiary
SpectroSERVER
by taking the same steps. However, assign the tertiary
SpectroSERVER
a higher precedence number than the secondary
SpectroSERVER
.
To establish fault tolerance in an environment with a Southbound Gateway integration, see the Southbound Gateway Toolkit
Follow these steps:
  1. Install the same version of
    DX NetOps Spectrum
    with the same modeling catalog on both the primary
    SpectroSERVER
    and the secondary
    SpectroSERVER
    . Each server requires the same landscape handle.
  2. Verify that both the primary and secondary
    SpectroSERVER
    s have entries in their .hostrc files that give the
    SpectroSERVER
    s mutual access permissions.
    If you are specifying secure users for the secondary
    SpectroSERVER
    in the .hostrc file on the primary
    SpectroSERVER
    , and the secondary
    SpectroSERVER
    is running in the Windows environment, include the user SYSTEM in the secure user list.
  3. Verify that the MAIN_LOCATION_HOST_NAME parameter in the .locrc file on the secondary
    SpectroSERVER
    server points to the same system name as the .locrc file on the primary
    SpectroSERVER
    . Otherwise, synchronization fails.
  4. Configure the primary and secondary
    SpectroSERVER
    s so that the user running each
    SpectroSERVER
    is the same. If the users are not the same, the secondary
    SpectroSERVER
    fails or does not run properly after an Online Backup.
  5. Make a copy of the primary
    SpectroSERVER
    database by running Online Backup. Or, if the
    SpectroSERVER
    is shut down, use the SSdbsave utility with the -cm argument (to save the modeling catalog and any new models).
    For more information, see Database Management.
  6. Verify that the save file that you created is available to the server that hosts the secondary
    SpectroSERVER
    . Copy the file to the server if necessary.
  7. On the secondary server, with
    SpectroSERVER
    shutdown, navigate to the
    DX NetOps Spectrum
    SS directory and load the save file using the following command:
    ../SS-Tools/SSdbload -il -add precedence savefile
    • precedence
      Specifies a numeric value greater than the primary server default value of 10 (20 is recommended).
    • savefile
      Specifies the name of the saved file that was previously created.
  8. (Optional) Add the line 'secondary_polling=yes' to the .vnmrc file to let the secondary
    SpectroSERVER
    function as a hot backup.
  9. Start the primary
    SpectroSERVER
    , if it is not already running.
  10. Start the secondary
    SpectroSERVER
    .
  11. To verify the setup, use the MapUpdate command with the view argument to display the current landscape map.
For more information, see the
.
The secondary
SpectroSERVER
is now available to take over automatically if the primary
SpectroSERVER
fails. If you previously activated secondary polling, the secondary
SpectroSERVER
is available immediately. Otherwise, polling begins when the server detects that it has lost contact with the primary
SpectroSERVER
.
When service switches from the primary
SpectroSERVER
to the secondary
SpectroSERVER
, the Connection Status icondisplays yellow. To view the connection status of all servers in a landscape, click the Connection Status icon. In the Connection Status dialog, the Connection Status icon for each server in the landscape displays yellow to indicate the “switched” condition.
When the primary
SpectroSERVER
comes back online, the secondary
SpectroSERVER
stops polling (unless you have set secondary_polling to 'yes'). All the applications switch back to the primary
SpectroSERVER
. However, any edits that you make to the secondary
SpectroSERVER
while it is active are
not
automatically replicated to the primary
SpectroSERVER
. Manually recreate these modifications on the primary
SpectroSERVER
.
When you restart the primary
SpectroSERVER
, connections are accepted when all models are loaded, but
before
all models are activated. The models can take some time to activate. Because the secondary
SpectroSERVER
stops polling when the primary
SpectroSERVER
is restarted, a gap in your network management coverage can result.
To avoid this situation, edit the .vnmrc file on the primary
SpectroSERVER
so that the wait_active resource is set to 'yes'. This parameter causes the server to wait until all of the models are activated before accepting any connections. The message area in the
DX NetOps Spectrum
Control Panel also dynamically displays the percentage of models that are activated. The
SpectroSERVER
can appear to take longer to come up. However when all the models are activated, the
SpectroSERVER
is ready to manage the network.
You can also set the wait_active resource to 'yes' on the secondary
SpectroSERVER
. During a planned shutdown of the primary
SpectroSERVER
, you can then verify in the
DX NetOps Spectrum
Control Panel that the secondary
SpectroSERVER
is ready to take over.
For more information, see Database Management.
Validate Fault Tolerance Configuration
After you have set up fault tolerance in a distributed
SpectroSERVER
deployment, verify that the OneClick server has access to both primary and secondary
SpectroSERVER
s. Without connectivity to both servers, the OneClick server cannot failover to the secondary
SpectroSERVER
.
Follow these steps:
  1. Access the OneClick Administration, Landscapes web page.
  2. Check the ‘Secondary Status’ column. Verify that OneClick has established contact with the secondary SpectroSERVER.
    The status also indicates whether Fault Tolerance is ready for failover.
    The Fault Tolerance configuration is validated.
Test Fault Tolerance
During initial installation, the secondary
SpectroSERVER
might not have access to all the devices to which the primary
SpectroSERVER
has access. This situation causes the secondary
SpectroSERVER
to generate false alarms. To avoid false alarms, verify that the secondary
SpectroSERVER
can manage your network devices by testing fault tolerance.
Test fault tolerance whenever new devices are added to the primary
SpectroSERVER
.
Follow these steps:
  1. With both the primary and secondary
    SpectroSERVER
    s up and running, bring down the primary
    SpectroSERVER
    .
    The Connection Status iconis yellow to indicate the "switched" condition.
    A red connector indicates that the OneClick server was not able to contact the secondary
    SpectroSERVER
    .
  2. Wait for 15 - 20 minutes for the secondary
    SpectroSERVER
    to run.
  3. Verify the following conditions:
    • The Connection Status icon does not display red.
    • All device models and pingable models maintain SNMP or ICMP contact.
      If this contact is lost, verify that the secondary
      SpectroSERVER
      has access to your devices. Contact a Network Administrator to resolve this problem, if applicable.
    • DX NetOps Spectrum
      is managing all devices that have an established contact state. Verify the status by checking for device contact or management contact loss alarms from any of the device models.
  4. Restart the primary
    SpectroSERVER
    .
    The Connection Status icon displays green to indicate a normal contact state.
Fault-Tolerant Recovery
Following are the two possible failure scenarios:
  • The primary
    SpectroSERVER
    stops. The secondary
    SpectroSERVER
    then forwards event and statistical information to the primary Archive Manager that is running on the server that hosts the primary
    SpectroSERVER
    . When the primary
    SpectroSERVER
    restarts, no event and statistical data have been lost.
  • The computer where the primary
    SpectroSERVER
    and the primary Archive Manager is running stops operating completely. The secondary
    SpectroSERVER
    then caches event and statistical data in its database until the primary
    SpectroSERVER
    computer comes back online. If a secondary Archive Manager is running, historical, and real-time information is available in OneClick, but the information is still cached for transfer to primary Archive Manager.
Restart both the primary Archive Manager and the primary
SpectroSERVER
if their server goes down, or if the primary
SpectroSERVER
stops operating.
It is no longer necessary to start the Archive Manager before the SpectroSERVER, the cached events from the secondary SpectroSERVER can be transferred at any time, even after the primary SpectroSERVER has started logging new events.
Follow these steps:
  1. Start the Spectrum Control Panel on the primary
    SpectroSERVER
    host.
  2. To start the
    SpectroSERVER
    , click Start
    SpectroSERVER
    on the Spectrum Control Panel.
    When the primary Archive Manager is again operational, the secondary
    SpectroSERVER
    connects and transfers its cached event data to the primary Archive Manager.
Change the Host Names of the Primary and Secondary SpectroSERVERs
SpectroSERVER
s in a fault-tolerant environment use a precedence value that is associated with their host names to recognize their relationship to one another. Therefore, to preserve the fault-tolerant relationship, use SSdbsave and SSdbload to change the host name of your primary
SpectroSERVER
.
Follow these steps:
  1. Save the database using SSdbsave with the -cm option.
  2. Change the host name.
  3. Reload the database with the save file that you created in the first step. Run SSdbload with the -il option and the -replace option:
    SSdbload -il -replace precedence savefile
    This command causes the database to associate the new host name with the precedence value (10) that designates a primary
    SpectroSERVER
    .
    The change in the host name is communicated to any warm or hot standby
    SpectroSERVER
    s the next time that the databases are synchronized as a result of Online Backup being run.
    In the meantime, however, the host name change prevents the standby
    SpectroSERVER
    s from detecting that the primary
    SpectroSERVER
    is running. As a result, any
    SpectroSERVER
    that is configured as a warm standby starts polling.
  4. Load the save file on the warm standby using SSdbload with the -il and -replace options, and specify a higher precedence value (for example, 20) that designates it as a standby.
Now you can change the host name of the secondary
SpectroSERVER
.
Follow these steps:
  1. Save the database using SSdbsave with the -cm option.
  2. Make the change to the host name.
  3. Reload the database with the save file that you created in the first step. Run SSdbload with the -il option and the -replace option:
    SSdbload -il -replace precedence savefile
    This command causes the database to associate the new host name with the precedence value (20) that designates a secondary
    SpectroSERVER
    .
    When you restart the secondary
    SpectroSERVER
    , the server communicates the new host name and precedence to the primary
    SpectroSERVER
    .
For more information, see the
.