Back Up the Data Repository

To protect your data, back up the Data Repository.
capm370
To protect your data, back up the Data Repository.
The following procedure is the supported method for backing up the Data Repository. Taking a virtual machine snapshot is
not
a supported method for backing up the Data Repository.
The first backup is a full backup of all historical data. Subsequent backups are incremental and include all database activity that occurred since the snapshot at the start of the previous backup.
2
About Data Repository Backups
  • The Data Repository and Data Aggregator continue to run during a Data Repository backup.
  • Backup processing can be resource-intensive, but is prioritized below other processing. To let backups to proceed more quickly, and to minimize the impact to other processing, perform backups during non-peak hours.
  • You can back up Data Repository to a remote host, or you can back it up to the same host. If you back up to the same host, save the backup to a different disk than the one that is used by the catalog and data directories.
  • Perform full backups weekly. Perform incremental backups daily.
  • Full backups occur only when the backup location is a new directory. The snapshotName can be the same as previous backups.
  • The incremental snapshots store new files and hard links to unchanged files from the previous backup.  Restoring to any incremental snapshot depends on the integrity of the files that are linked to in previous snapshots.
  • For information about the size of the backup files, see the
    CA Performance Management
    Sizing Tool
    .
To ensure data integrity, back up each node of the data repository to a dedicated backup host. To prepare for the backup, perform the following procedure for each Data Repository node.
Before you begin, verify the following information about the Data Repository host and the remote backup host:
  • Neither host is connected to LDAP.
  • Neither host is connected to Network Information Service (NIS) and have the same Vertica Linux database administrator user.
  • Port 50000 is open on any firewalls so that the Data Repository host can access the custom rsync/ssh port 50000 on the backup host.
If you do not have a backup host, you can back up the Data Repository locally. For more information, see Configure the Data Repository Host for a Local Backup.
Follow these steps:
  1. Log in to the backup host as the root user.
  2. Create the Vertica Linux database administrator user on the remote backup host:
    useradd
    db_admin
    -s /bin/bash
    db_admin
    is the same Vertica Linux database administrator user that exists on the Data Repository hosts.
  3. Set the Vertica Linux database administrator user password:
    passwd
    db_admin
  4. Create the Vertica directories on the remote backup host:
    mkdir /opt/vertica/bin
    mkdir /opt/vertica/oss
  5. Change the owner of the Vertica directories:
    chown -R
    db_admin
    /opt/vertica
  6. Log out from the remote backup host.
  7. Set up passwordless ssh on the Data Repository host for the remote backup host:
    1. Log in to the Data Repository host as the Vertica Linux database administrator user.
    2. Generate the keys for passwordless ssh:
      ssh-keygen -N "" -t rsa -f ~/.ssh/id_rsa
      cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2
      chmod 644 ~/.ssh/authorized_keys2
    3. Copy the Vertica Linux database administrator user public key into the list of authorized keys on the remote backup host:
      ssh-copy-id -i [email protected]
      backuphost
    4. Log in to the remote backup host as the Vertica Linux database administrator user.
    5. Copy the Vertica rsync and python tools from the Data Repository host to the remote backup host:
      scp [email protected]
      drhost
      :/opt/vertica/bin/rsync /opt/vertica/bin
      scp -r [email protected]
      drhost
      :/opt/vertica/oss/python /opt/vertica/oss
  8. Verify that the remote backup host has the following directories:
    • /opt/vertica/bin/rsync
    • /opt/vertica/oss/python
  9. Create the backup directory:
    mkdir
    backup_directory
    backup_directory
    specifies the directory where you want to save the backup files. Select a backup directory that is on a disk partition with a large amount of free space. If these directories are not writable by the database administrator user, give this user access to these directories.
    The remote host is ready for the backup configuration file to be created and for the backup directory to be initialized.
Configure the Data Repository Backup
To back up the Data Repository and configure automatic backups, create a configuration file for the backup. Vertica performs a full backup during the first backup into a new backup directory.  All subsequent backups to the same directory are incremental backup, even if the snapshot name changes.
The node where you perform this procedure initiates the backup.
To configure the vbr utility, Vertica automatically installs sample configuration files at the following location:
/
opt/vertica/share/vbr/example_configs
.
For more information, see the Vertica documentation.
Follow these steps:
  1. Log in to the Data Repository host as the database administrator user.
  2. Create a password file:
    Example:
    /opt/vertica/config/password.txt
    You can choose a different location for the password file.
    [Passwords]
    ; Specified password for db admin account
    dbPassword = DBpassword
    ; Specifies password for rsync user account – if different than DB admin
    ; serviceAccessPass = rsyncpwd
    ; Specifies password for the dest_dbuser Vertica account. Used only for restoring to alternate cluster.
    ; dest_dbPassword = DestinationPwd
  3. Go to the sample configuration files:
    /opt/vertica/share/vbr/example_configs
    The database administrator user requires write privileges for the directory.
  4. Copy, edit, and deploy a configuration file for backup. See the following examples from Vertica.
    Example 1:
    backup_restore_full_local.ini
    Back up the Data Repository to a local area on the same machine. The backup can be a mount from an external shared drive or a local disk. You cannot use the same disk as
    data/catalog
    .
    [Mapping]
    ; node_name = backup_host:backup_dir
    ; [] indicates backup to localhost
    v_drdata_node0001 = []:/backups
    v_drdata_node0002 = []:/backups
    v_drdata_node0003 = []:/backups
    [Misc]
    ; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
    ; SnapshotName is used for naming archives in the backup directory, and for monitoring and troubleshooting.
    ; Valid values: a-z A-Z 0-9 - _
    snapshotName = backup_snapshot
    [Misc]
    ; The temp directory location on all database hosts.
    ; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
    ; tempDir = /tmp/vbr
    ; Specifies the number of historical backups to retain in addition to the most recent backup.
    ; 1 current + n historical backups
    restorePointLimit = 7
    ; Full path to the password configuration file
    ; Store this file in directory readable only by the dbadmin.
    passwordFile = /opt/vertica/config/password.txt
    ; When enabled, Vertica confirms that the specified backup locations contain
    ; sufficient free space and inodes to allow a successful backup. If a backup
    ; location has insufficient resources, Vertica displays an error message explaining the shortage and
    ; cancels the backup. If Vertica cannot determine the amount of available space
    ; or number of inodes in the backupDir, it displays a warning and continues
    ; with the backup.
    ; enableFreeSpaceCheck = True
    ; When performing a backup, replication, or copycluster, specifies the maximum
    ; acceptable difference, in seconds, between the current epoch and the backup epoch.
    ; If the time between the current epoch and the backup epoch exceeds the value
    ; specified in this parameter, Vertica displays an error message.
    ; SnapshotEpochLagFailureThreshold = 3600
    Example 2:
    backup_restore_full_external.ini
    Back up the Data Repository to a different machine. Replace the IP addresses with the IP address of the backup host(s).
    [Mapping]
    ; node_name = backup_host:backup_dir
    ; In this "parallel backup" configuration, each node backs up to a distinct external host.
    ; To backup all database nodes to a single external host, use that single hostname/IP address in each entry below.
    v_drdata_node0001 = 1.1.1.1:/backups
    v_drdata_node0002 = 2.2.2.2:/backups
    v_drdata_node0003 = 3.3.3.3:/backups
    [Misc]
    ; Backups with the same snapshotName form a time sequence limited by restorePointLimit.
    ; SnapshotName is used for naming archives in the backup directory, and for monitoring and troubleshooting.
    ; Valid characters: a-z A-Z 0-9 - _
    snapshotName = backup_snapshot
    [Misc]
    ; The temp directory location on all database hosts.
    ; The directory must be readable and writeable by the dbadmin, and must implement POSIX style fcntl lockf locking.
    ; tempDir = /tmp/vbr
    ; Specifies the number of historical backups to retain in addition to the most recent backup.
    ; 1 current + n historical backups
    restorePointLimit = 7
    ; Full path to the password configuration file
    ; Store this file in directory readable only by the dbadmin
    ; (no default)
    passwordFile = /opt/vertica/config/password.txt
    ; When enabled, Vertica confirms that the specified backup locations contain
    ; sufficient free space and inodes to allow a successful backup. If a backup
    ; location has insufficient resources, Vertica displays an error message explaining the shortage and
    ; cancels the backup. If Vertica cannot determine the amount of available space
    ; or number of inodes in the backupDir, it displays a warning and continues
    ; with the backup.
    ; enableFreeSpaceCheck = True
    ; When performing a backup, replication, or copycluster, specifies the maximum
    ; acceptable difference, in seconds, between the current epoch and the backup epoch.
    ; If the time between the current epoch and the backup epoch exceeds the value
    ; specified in this parameter, Vertica displays an error message.
    ; SnapshotEpochLagFailureThreshold = 3600
  5. (First Time Only) Initialize the backup directory before the first time you run the backup.
    /opt/vertica/bin/vbr.py --task init --config-file
    configuration_directory_path_filename
    • configuration_directory_path_filename
      indicates the directory path and filename of the configuration file that you will reference when you restore. This file is located where you ran the backup utility.
    Once initialized, multiple configuration files can use the directory if the files share the same backup directory location.
  6. Back up Data Repository:
    /opt/vertica/bin/vbr.py --task backup --config-file
    configuration_directory_path_filename
    For example:
    /opt/vertica/bin/vbr.py --task backup --config-file /home/vertica/vert-db-production.ini
  7. If you are prompted about the authenticity of the host, answer yes.
    The Data Repository starts the backup. This process can take a long time, especially for a full backup.
  8. (Optional) If you do not want to retain the Data Repository password in clear text for future manual backups.
    The configuration file that is generated contains a clear text password. Automated backups require the password. This procedure prevents automated backups from this configuration file.
    1. Verify that the following line exists under the [Database] section:
      dbPromptForPassword = True
    2. Remove the following line from the [Database] section:
      dbPassword =
      password
Set Up an Automatic Backup
To ensure regular backups of the Data Repository, create a cron job to schedule automatic backups. The first backup is a full backup and the following backups are incremental. Run a full backup weekly or biweekly. Vertica performs a full backup only when you use a new backup directory.
Run a full backup weekly. If disk space is limited, retain only two to three weeks of data. Delete the oldest backup file at the beginning of each week. Use the vbr utility remove task to delete old backups. Vertica does not support removing backups through the file system.
Follow these Steps:
  1. Create a wrapper shell script that contains the following line:
    /opt/vertica/bin/vbr.py --task backup --config-file
    configuration_directory_path_filename
    configuration_directory_path_filename
    Indicates the directory path and filename of the configuration file that you will reference when you restore.
  2. Save the contents to a new file named backup_script.sh in a location of your choice.
    For example:
    /home/vertica/backup_script.sh
  3. Change permissions for running the script:
    chmod 777
    makes the file readable, writable, and executable by everyone. If you want only the script owner to run the file, use
    chmod 700
    . If you want only the root user to run the file, use
    chmod 755
    .
    chmod 777
    location_backup_script.sh
    /backup_script.sh
    For example:
    chmod 777 /home/vertica/backup_script.sh
  4. As the database administrator user, open the crontab to define a cron job:
    crontab -e
  5. Add a cron job that runs the backup script.
    Create a cron job to run the script daily at an off-peak time.
    For example:
    00 02 * * * /home/vertica/backup_script.sh >/tmp/backup.log 2>&1
    This example cron job runs the backup script every day at 2:00 AM.
    The cron job runs a daily incremental backup.
  6. Add a script to copy the configuration file, and change the snapshot name in the configuration file. Also use a new backup directory in the configuration file to cause Vertica to perform a full backup.
    Do not delete the previous configuration file. The original configuration file is required to remove a backup or restore from an older series of backups.
  7. (Optional) Remove older backup sequences as required with the vbr utility with the remove task using the configuration that was used to create it.
    /opt/vertica/bin/vbr.py --task remove --archive=[<date>_<time>|"all"] --config-file
    configuration_directory_path_filename
    The remove command is destructive and removes the data and free space on the disk. The archive must be specified to remove a single restore point, a comma separated list, or "all". To display the list of backups, run
    --task listbackup
    .
VBR Utility Reference
The Vertica vbr utility lets you back up and restore either the full database, or one or more schema and table objects of interest. You can also copy a cluster and list backups you created previously.
For a full reference for the vbr utility, see the Vertica Documentation.