Establish Fault Tolerance

Contents
casp1032
 
 
Establishing Fault Tolerance
You can set up a fault-tolerant environment when you first install 
DX NetOps Spectrum
, before any models have been created. Or you can set up a fault-tolerant environment after you install 
DX NetOps Spectrum
.
The following procedure describes how to set up two 
SpectroSERVER
s: a primary and a secondary. You can also set up a tertiary 
SpectroSERVER
 by taking the same steps. However, assign the tertiary 
SpectroSERVER
 a higher precedence number than the secondary 
SpectroSERVER
.
 To establish fault tolerance in an environment with a Southbound Gateway integration, see the Southbound Gateway Toolkit 
 
Follow these steps:
 
  1. Install the same version of 
    DX NetOps Spectrum
     with the same modeling catalog on both the primary 
    SpectroSERVER
     and the secondary 
    SpectroSERVER
    . Each server requires the same landscape handle.
  2. Verify that both the primary and secondary 
    SpectroSERVER
    s have entries in their .hostrc files that give the 
    SpectroSERVER
    s mutual access permissions.
     If you are specifying secure users for the secondary 
    SpectroSERVER
     in the .hostrc file on the primary 
    SpectroSERVER
    , and the secondary 
    SpectroSERVER
     is running in the Windows environment, include the user SYSTEM in the secure user list.
  3. Verify that the MAIN_LOCATION_HOST_NAME parameter in the .locrc file on the secondary 
    SpectroSERVER
     server points to the same system name as the .locrc file on the primary 
    SpectroSERVER
    . Otherwise, synchronization fails.
  4. Configure the primary and secondary 
    SpectroSERVER
    s so that the user running each 
    SpectroSERVER
     is the same. If the users are not the same, the secondary 
    SpectroSERVER
     fails or does not run properly after an Online Backup.
  5. Make a copy of the primary 
    SpectroSERVER
     database by running Online Backup. Or, if the 
    SpectroSERVER
     is shut down, use the SSdbsave utility with the -cm argument (to save the modeling catalog and any new models).
 
For more information, see the
 .
  1. Verify that the save file that you created is available to the server that hosts the secondary 
    SpectroSERVER
    . Copy the file to the server if necessary.
  2. On the secondary server, with 
    SpectroSERVER
     shutdown, navigate to the 
    DX NetOps Spectrum
     SS directory and load the save file using the following command:
    ../SS-Tools/SSdbload -il -add precedence savefile
    •  
       
      precedence
       
      Specifies a numeric value greater than the primary server default value of 10 (20 is recommended).
    •  
       
      savefile
       
      Specifies the name of the saved file that was previously created.
  3. (Optional) Add the line 'secondary_polling=yes' to the .vnmrc file to let the secondary 
    SpectroSERVER
     function as a hot backup
  4. Start the primary 
    SpectroSERVER
    , if it is not already running.
  5. Start the secondary 
    SpectroSERVER
    .
  6. To verify the setup, use the MapUpdate command with the view argument to display the current landscape map.
 
For more information, see the
 .
The secondary 
SpectroSERVER
 is now available to take over automatically if the primary 
SpectroSERVER
 fails. If you previously activated secondary polling, the secondary 
SpectroSERVER
 is available immediately. Otherwise, polling begins when the server detects that it has lost contact with the primary 
SpectroSERVER
.
When service switches from the primary 
SpectroSERVER
 to the secondary 
SpectroSERVER
, the Connection Status icon displays yellow. To view the connection status of all servers in a landscape, click the Connection Status icon. In the Connection Status dialog, the Connection Status icon for each server in the landscape displays yellow to indicate the “switched” condition.
When the primary 
SpectroSERVER
 comes back online, the secondary 
SpectroSERVER
 stops polling (unless you have set secondary_polling to 'yes'). All the applications switch back to the primary 
SpectroSERVER
. However, any edits that you make to the secondary 
SpectroSERVER
 while it is active are 
not
 automatically replicated to the primary 
SpectroSERVER
. Manually recreate these modifications on the primary 
SpectroSERVER
.
When you restart the primary 
SpectroSERVER
, connections are accepted when all models are loaded, but 
before
 all models are activated. The models can take some time to activate. Because the secondary 
SpectroSERVER
 stops polling when the primary 
SpectroSERVER
 is restarted, a gap in your network management coverage can result.
To avoid this situation, edit the .vnmrc file on the primary 
SpectroSERVER
 so that the wait_active resource is set to 'yes'. This parameter causes the server to wait until all of the models are activated before accepting any connections. The message area in the 
DX NetOps Spectrum
 Control Panel also dynamically displays the percentage of models that are activated. The 
SpectroSERVER
 can appear to take longer to come up. However when all the models are activated, the 
SpectroSERVER
 is ready to manage the network.
You can also set the wait_active resource to 'yes' on the secondary 
SpectroSERVER
. During a planned shutdown of the primary 
SpectroSERVER
, you can then verify in the 
DX NetOps Spectrum
 Control Panel that the secondary 
SpectroSERVER
 is ready to take over.
 
For more information, see the
 .
Validate Fault Tolerance Configuration
After you have set up fault tolerance in a distributed 
SpectroSERVER
 deployment, verify that the OneClick server has access to both primary and secondary 
SpectroSERVER
s. Without connectivity to both servers, the OneClick server cannot failover to the secondary 
SpectroSERVER
.
 
Follow these steps:
 
  1. Access the OneClick Administration, Landscapes web page.
  2. Check the ‘Secondary Status’ column. Verify that OneClick has established contact with the secondary SpectroSERVER.
    The status also indicates whether Fault Tolerance is ready for failover.
    The Fault Tolerance configuration is validated.
Test Fault Tolerance
During initial installation, the secondary 
SpectroSERVER
 might not have access to all the devices to which the primary 
SpectroSERVER
 has access. This situation causes the secondary 
SpectroSERVER
 to generate false alarms. To avoid false alarms, verify that the secondary 
SpectroSERVER
 can manage your network devices by testing fault tolerance.
 Test fault tolerance whenever new devices are added to the primary 
SpectroSERVER
.
 
Follow these steps:
 
  1. With both the primary and secondary 
    SpectroSERVER
    s up and running, bring down the primary 
    SpectroSERVER
    .
    The Connection Status icon is yellow to indicate the "switched" condition.
    A red connector indicates that the OneClick server was not able to contact the secondary 
    SpectroSERVER
    .
  2. Wait for 15 - 20 minutes for the secondary 
    SpectroSERVER
     to run.
  3. Verify the following conditions:
    • The Connection Status icon does not display red.
    • All device models and pingable models maintain SNMP or ICMP contact.
      If this contact is lost, verify that the secondary 
      SpectroSERVER
       has access to your devices. Contact a Network Administrator to resolve this problem, if applicable.
    •  
      DX NetOps Spectrum
       is managing all devices that have an established contact state. Verify the status by checking for device contact or management contact loss alarms from any of the device models.
  4. Restart the primary 
    SpectroSERVER
    .
    The Connection Status icon displays green to indicate a normal contact state.
Fault-Tolerant Recovery
Following are the two possible failure scenarios:
  • The primary 
    SpectroSERVER
     stops. The secondary 
    SpectroSERVER
     then forwards event and statistical information to the primary Archive Manager that is running on the server that hosts the primary 
    SpectroSERVER
    . When the primary 
    SpectroSERVER
     restarts, no event and statistical data have been lost.
  • The computer where the primary 
    SpectroSERVER
     and the primary Archive Manager is running stops operating completely. The secondary 
    SpectroSERVER
     then caches event and statistical data in its database until the primary 
    SpectroSERVER
     computer comes back online. If a secondary Archive Manager is running, historical, and real-time information is available in OneClick, but the information is still cached for transfer to primary Archive Manager.
Restart both the primary Archive Manager and the primary 
SpectroSERVER
 if their server goes down, or if the primary 
SpectroSERVER
 stops operating.
 It is no longer necessary to start the Archive Manager before the SpectroSERVER, the cached events from the secondary SpectroSERVER can be transferred at any time, even after the primary SpectroSERVER has started logging new events.
 
Follow these steps:
 
  1. Start the Spectrum Control Panel on the primary 
    SpectroSERVER
     host.
  2. To start the 
    SpectroSERVER
    , click Start 
    SpectroSERVER
     on the Spectrum Control Panel.
    When the primary Archive Manager is again operational, the secondary 
    SpectroSERVER
     connects and transfers its cached event data to the primary Archive Manager.
Change the Host Names of the Primary and Secondary SpectroSERVERs
 
SpectroSERVER
s in a fault-tolerant environment use a precedence value that is associated with their host names to recognize their relationship to one another. Therefore, to preserve the fault-tolerant relationship, use SSdbsave and SSdbload to change the host name of your primary 
SpectroSERVER
.
 
Follow these steps:
 
  1. Save the database using SSdbsave with the -cm option.
  2. Change the host name.
  3. Reload the database with the save file that you created in the first step. Run SSdbload with the -il option and the -replace option:
    SSdbload -il -replace precedence savefile
    This command causes the database to associate the new host name with the precedence value (10) that designates a primary 
    SpectroSERVER
    .
    The change in the host name is communicated to any warm or hot standby 
    SpectroSERVER
    s the next time that the databases are synchronized as a result of Online Backup being run.
    In the meantime, however, the host name change prevents the standby 
    SpectroSERVER
    s from detecting that the primary 
    SpectroSERVER
     is running. As a result, any 
    SpectroSERVER
     that is configured as a warm standby starts polling.
  4. Load the save file on the warm standby using SSdbload with the -il and -replace options, and specify a higher precedence value (for example, 20) that designates it as a standby.
Now you can change the host name of the secondary 
SpectroSERVER
.
 
Follow these steps:
 
  1. Save the database using SSdbsave with the -cm option.
  2. Make the change to the host name.
  3. Reload the database with the save file that you created in the first step. Run SSdbload with the -il option and the -replace option:
    SSdbload -il -replace precedence savefile
    This command causes the database to associate the new host name with the precedence value (20) that designates a secondary 
    SpectroSERVER
    .
    When you restart the secondary 
    SpectroSERVER
    , the server communicates the new host name and precedence to the primary 
    SpectroSERVER
    .
 
 
For more information, see the
 .