Back Up the Data Repository

To protect your data, back up the data repository.
To protect your data, back up the data repository.
Back up the data repository using the process outlined in this article. Do not back up the data repository by taking a virtual machine snapshot.
The first backup is a full backup of all historical data. Subsequent backups are incremental and include database activity that occurred since the snapshot at the start of the previous backup.
2
About Data Repository Backups
  • The data repository and data aggregator continue to run during a data repository backup.
  • Backup processing can be resource-intensive, but
    DX NetOps Performance Management
    prioritizes the processing below other processing. To let backups to proceed more quickly, and to minimize the impact to other processing, perform backups during non-peak hours.
  • You can back up data repository to a remote host, or you can back it up to the same host. If you back up to the same host, save the backup to a different disk than the one that is used by the catalog and data directories.
  • Perform full backups weekly. Perform incremental backups daily.
  • Full backups occur only when the backup location is a new directory. The snapshotName can be the same as previous backups.
  • The incremental snapshots store new files and hard links to unchanged files from the previous backup.  Restoring to any incremental snapshot depends on the integrity of the files that are linked to in previous snapshots.
  • For information about the size of the backup files, see the
    DX NetOps Performance Management
    Sizing Tool
    .
Verify the Prerequisites
Before backing up the data repository, ensure that you have completed the following prerequisites:
  • To ensure data integrity, back up each node of the data repository to a dedicated backup host.
  • Verify the following information about the data repository host and the remote backup host:
    • Neither host is connected to LDAP.
    • Neither host is connected to Network Information Service (NIS) and have the same Vertica Linux database administrator user.
    • Port 50000 is open on any firewalls so that the data repository host can access the custom rsync/ssh port 50000 on the backup host.
    If you do not have a backup host, you can back up the data repository locally.
    For more information, see Configure the Data Repository Host for a Local Backup.
Create a Data Repository Backup
Perform the following procedure for each data repository node.
Follow these steps:
  1. Log in to the backup host as the root user.
  2. Create the Vertica Linux database administrator user on the remote backup host by issuing the following command:
    useradd
    db_admin
    -s /bin/bash
    db_admin
    is the same Vertica Linux database administrator user that exists on the data repository hosts.
  3. Set the Vertica Linux database administrator user password by issuing the following command:
    passwd
    db_admin
  4. Create the Vertica directories on the remote backup host by issuing the following command:
    mkdir /opt/vertica/bin
    mkdir /opt/vertica/oss
  5. Change the owner of the Vertica directories by issuing the following command:
    chown -R
    db_admin
    /opt/vertica
  6. Log out from the remote backup host.
  7. Set up passwordless ssh on the data repository host for the remote backup host:
    1. Log in to the data repository host as the Vertica Linux database administrator user.
    2. Generate the keys for passwordless ssh by issuing the following command:
      ssh-keygen -N "" -t rsa -f ~/.ssh/id_rsa
      cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys2
      chmod 644 ~/.ssh/authorized_keys2
    3. Copy the Vertica Linux database administrator user public key into the list of authorized keys on the remote backup host by issuing the following command:
      ssh-copy-id -i [email protected]
      backuphost
    4. Log in to the remote backup host as the Vertica Linux database administrator user.
    5. Copy the Vertica rsync and python tools from the data repository host to the remote backup host:
      scp [email protected]
      drhost
      :/opt/vertica/bin/rsync /opt/vertica/bin
      scp -r [email protected]
      drhost
      :/opt/vertica/oss/python /opt/vertica/oss
  8. Verify that the remote backup host has the following directories:
    • /opt/vertica/bin/rsync
    • /opt/vertica/oss/python
  9. Create the backup directory by issuing the following command:
    mkdir
    backup_directory
    backup_directory
    Specifies the directory where you want to save the backup files. Select a backup directory that is on a disk partition with a large amount of free space. If these directories are not writable by the database administrator user, give this user access to these directories.
    The remote host is ready for the backup configuration file to be created and for the backup directory to be initialized.
Configure the Data Repository Backup
Configure the data repository backup involves creating a configuration file for the backup. Vertica performs a full backup during the first backup into a new backup directory.  Subsequent backups to the same directory are incremental backups, even if the snapshot name changes.
The node where you perform this procedure initiates the backup.
Follow these steps:
  1. Log in to the data repository host as the database administrator user.
  2. Create a password file:
    Example:
    /opt/vertica/config/password.txt
    You can choose a different location for the password file.
    [Passwords]
    ; Specified password for db admin account
    dbPassword = DBpassword
    ; Specifies password for rsync user account – if different than DB admin
    ; serviceAccessPass = rsyncpwd
    ; Specifies password for the dest_dbuser Vertica account. Used only for restoring to alternate cluster.
    ; dest_dbPassword = DestinationPwd
  3. Go to the Vertica
    vbr
    utility sample configuration files:
    /opt/vertica/share/vbr/example_configs
    The database administrator user requires write privileges for the directory.
    Vertica automatically installs sample configuration files in this directory.
    For more information about these files, see the Vertica documentation.
  4. Copy, edit, and deploy a configuration file for backup. See the following examples from Vertica.
    Example 1:
    backup_restore_full_local.ini
    Back up the data repository to a local area on the same machine. The backup can be a mount from an external shared drive or a local disk. You cannot use the same disk as
    data/catalog
    :
    [Mapping]
    ; node_name = backup_host:backup_dir
    ; [] indicates backup to localhost
    v_drdata_node0001 = []:/backups
    v_drdata_node0002 = []:/backups
    v_drdata_node0003 = []:/backups
    [Misc]
    ; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
    ; SnapshotName is used for naming archives in the backup directory, and for monitoring and troubleshooting.
    ; Valid values: a-z A-Z 0-9 - _
    snapshotName = backup_snapshot
    [Misc]
    ; The temp directory location on all database hosts.
    ; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
    ; tempDir = /tmp/vbr
    ; Specifies the number of historical backups to retain in addition to the most recent backup.
    ; 1 current + n historical backups
    restorePointLimit = 7
    ; Full path to the password configuration file
    ; Store this file in directory readable only by the dbadmin.
    passwordFile = /opt/vertica/config/password.txt
    ; When enabled, Vertica confirms that the specified backup locations contain
    ; sufficient free space and inodes to allow a successful backup. If a backup
    ; location has insufficient resources, Vertica displays an error message explaining the shortage and
    ; cancels the backup. If Vertica cannot determine the amount of available space
    ; or number of inodes in the backupDir, it displays a warning and continues
    ; with the backup.
    ; enableFreeSpaceCheck = True
    ; When performing a backup, replication, or copycluster, specifies the maximum
    ; acceptable difference, in seconds, between the current epoch and the backup epoch.
    ; If the time between the current epoch and the backup epoch exceeds the value
    ; specified in this parameter, Vertica displays an error message.
    ; SnapshotEpochLagFailureThreshold = 3600
    Example 2:
    backup_restore_full_external.ini
    Back up the data repository to a different machine. Replace the IP addresses with the IP address of the backup hosts:
    [Mapping]
    ; node_name = backup_host:backup_dir
    ; In this "parallel backup" configuration, each node backs up to a distinct external host.
    ; To backup all database nodes to a single external host, use that single hostname/IP address in each entry below.
    v_drdata_node0001 = 1.1.1.1:/backups
    v_drdata_node0002 = 2.2.2.2:/backups
    v_drdata_node0003 = 3.3.3.3:/backups
    [Misc]
    ; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
    ; SnapshotName is used for naming archives in the backup directory, and for monitoring and troubleshooting.
    ; Valid characters: a-z A-Z 0-9 - _
    snapshotName = backup_snapshot
    [Misc]
    ; The temp directory location on all database hosts.
    ; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
    ; tempDir = /tmp/vbr
    ; Specifies the number of historical backups to retain in addition to the most recent backup.
    ; 1 current + n historical backups
    restorePointLimit = 7
    ; Full path to the password configuration file
    ; Store this file in directory readable only by the dbadmin
    ; (no default)
    passwordFile = /opt/vertica/config/password.txt
    ; When enabled, Vertica confirms that the specified backup locations contain
    ; sufficient free space and inodes to allow a successful backup. If a backup
    ; location has insufficient resources, Vertica displays an error message explaining the shortage and
    ; cancels the backup. If Vertica cannot determine the amount of available space
    ; or number of inodes in the backupDir, it displays a warning and continues
    ; with the backup.
    ; enableFreeSpaceCheck = True
    ; When performing a backup, replication, or copycluster, specifies the maximum
    ; acceptable difference, in seconds, between the current epoch and the backup epoch.
    ; If the time between the current epoch and the backup epoch exceeds the value
    ; specified in this parameter, Vertica displays an error message.
    ; SnapshotEpochLagFailureThreshold = 3600
  5. (First Time Only) Initialize the backup directory before the first time that you run the backup by issuing the following command:
    /opt/vertica/bin/vbr.py --task init --config-file
    configuration_directory_path_filename
    configuration_directory_path_filename
    Indicates the directory path and filename of the configuration file that you will reference when you restore. This file is located where you ran the backup utility.
    Once initialized, multiple configuration files can use the directory if the files share the same backup directory location.
  6. Back up the data repository by issuing the following command:
    /opt/vertica/bin/vbr.py --task backup --config-file
    configuration_directory_path_filename
    For example:
    /opt/vertica/bin/vbr.py --task backup --config-file /home/vertica/vert-db-production.ini
  7. If you are prompted about the authenticity of the host, answer
    yes
    .
    The data repository starts the backup. This process can take a long time, especially for a full backup.
  8. (Optional) If you do not want to retain the data repository password in clear text for future manual backups, complete the following steps:
    The configuration file that is generated contains a clear text password. Automated backups require the password. This procedure prevents automated backups from this configuration file.
    1. Verify that the following line exists under the [Database] section:
      dbPromptForPassword = True
    2. Remove the following line from the [Database] section:
      dbPassword =
      password
Set Up an Automatic Backup
To ensure regular backups of the data repository, create a cron job to schedule automatic backups. The first backup is a full backup and the following backups are incremental. Run a full backup weekly or biweekly. Vertica performs a full backup only when you use a new backup directory.
Run a full backup weekly. If disk space is limited, retain only two to three weeks of data. Delete the oldest backup file at the beginning of each week. Use the vbr utility remove task to delete old backups. Vertica does not support removing backups through the file system.
Follow these Steps:
  1. Create a wrapper shell script that contains the following line by issuing the following command:
    /opt/vertica/bin/vbr.py --task backup --config-file
    configuration_directory_path_filename
    configuration_directory_path_filename
    Indicates the directory path and filename of the configuration file that you will reference when you restore.
  2. Save the contents to a new file named
    backup_script.sh
    in a location of your choice, for example:
    /home/vertica/backup_script.sh
    .
  3. Change permissions for running the script by issuing the following command:
    chmod 777
    makes the file readable, writable, and executable by everyone. If you want only the script owner to run the file, use
    chmod 700
    . If you want only the root user to run the file, use
    chmod 755
    .
    chmod 777
    location_backup_script.sh
    /backup_script.sh
    For example:
    chmod 777 /home/vertica/backup_script.sh
  4. As the database administrator user, open the crontab to define a cron job:
    crontab -e
  5. Add a cron job that runs the backup script.
    Create a cron job to run the script daily at an off-peak time.
    For example:
    00 02 * * * /home/vertica/backup_script.sh >/tmp/backup.log 2>&1
    This example cron job runs the backup script every day at 2:00 AM.
    The cron job runs a daily incremental backup.
  6. Add a script to copy the configuration file, and change the snapshot name in the configuration file. Also use a new backup directory in the configuration file to cause Vertica to perform a full backup.
    Do not delete the previous configuration file. The original configuration file is required to remove a backup or restore from an older series of backups.
  7. (Optional) Remove older backup sequences as required with the vbr utility with the remove task using the configuration that was used to create it by issuing the following command:
    /opt/vertica/bin/vbr.py --task remove --archive=[<date>_<time>|"all"] --config-file
    configuration_directory_path_filename
    The
    remove
    command is destructive and removes the data and free space on the disk. The archive must be specified to remove a single restore point, a comma separated list, or "all". To display the list of backups, issue
    --task listbackup
    .
Back Up Using the VBR Utility
You can back up and restore either the full database, or one or more schema and table objects of interest using the
vbr
utility. You can also copy a cluster and list backups you created previously.
For more information about how to use this utility, see the Vertica Documentation.
Recover Data from the iRep
In the event that you must recover your data, you can restore the iRep. You can export the iRep data from an existing system and import it into a different schema using the
/opt/CA/IMDataRepository*/caVerticaUtility.sh
script. You can then query the iRep data to determine how the original system was configured.
You can also export self-monitoring data or poll the data for debugging purposes using this script.