Back Up the Data Repository

To protect your data, back up the Data Repository.
capm350
To protect your data, back up the Data Repository.
The following procedure is the supported method for backing up the Data Repository. Taking a virtual machine snapshot is
not
a supported method for backing up the Data Repository.
The first backup is a full backup of all historical data. Subsequent backups are incremental and include all database activity that occurred since the snapshot at the start of the previous backup.
2
About Data Repository Backups
  • The Data Repository and Data Aggregator continue to run during a Data Repository backup.
  • Backup processing can be resource-intensive, but is prioritized below other processing. To let backups to proceed more quickly, and to minimize the impact to other processing, perform backups during non-peak hours.
  • You can back up Data Repository to a remote host, or you can back it up to the same host. If you back up to the same host, save the backup to a different disk than the one that is used by the catalog and data directories.
  • Perform full backups weekly. Perform incremental backups daily.
  • Full backups occur only when the backup location is a new directory. The snapshotName can be the same as previous backups.
  • The incremental snapshots store new files and hard links to unchanged files from the previous backup.  Restoring to any incremental snapshot depends on the integrity of the files that are linked to in previous snapshots.
  • For information about the size of the backup files, see the capm Sizing Tool.
To ensure data integrity, back up each node of the data repository to a dedicated backup host. To prepare for the backup, perform the following procedure for each Data Repository node.
Before you begin, verify the following information about the Data Repository host and the remote backup host:
  • Neither host is connected to LDAP.
  • Neither host is connected to Network Information Service (NIS) and have the same Vertica Linux database administrator user.
  • Port 50000 is open on any firewalls so that the Data Repository host can access the custom rsync/ssh port 50000 on the backup host.
If you do not have a backup host, you can back up the Data Repository locally. For more information, see Configure the Data Repository Host for a Local Backup.
Follow these steps:
  1. Log in to the backup host as the root user.
  2. Create the Vertica Linux database administrator user on the remote backup host:
    useradd 
    db_admin
    -s /bin/bash
    db_admin
    is the same Vertica Linux database administrator user that exists on the Data Repository hosts.
     
  3. Set the Vertica Linux database administrator user password:
    passwd
    db_admin
  4. Create the Vertica directories on the remote backup host:
    mkdir /opt/vertica/bin
    mkdir /opt/vertica/oss
  5. Change the owner of the Vertica directories:
    chown -R
    db_admin
    /opt/vertica
  6. Log out from the remote backup host.
  7. Set up passwordless ssh on the Data Repository host for the remote backup host:
    1. Log in to the Data Repository host as the Vertica Linux database administrator user.
    2. Generate the keys for passwordless ssh:
      ssh-keygen -N "" -t rsa -f ~/.ssh/id_rsa
      cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys2
      chmod 644 ~/.ssh/authorized_keys2
    3. Copy the Vertica Linux database administrator user public key into the list of authorized keys on the remote backup host:
      ssh-copy-id -i [email protected]
      backuphost
    4. Log in to the remote backup host as the Vertica Linux database administrator user.
    5. Copy the Vertica rsync and python tools from the Data Repository host to the remote backup host:
      scp [email protected]
      drhost
      :/opt/vertica/bin/rsync /opt/vertica/bin
      scp -r [email protected]
      drhost
      :/opt/vertica/oss/python /opt/vertica/oss
  8. Verify that the remote backup host has the following directories:
    •  /opt/vertica/bin/rsync
    • /opt/vertica/oss/python
  9. Create the backup directory:
    mkdir 
    backup_directory
    backup_directory
    specifies the directory where you want to save the backup files. Select a backup directory that is on a disk partition with a large amount of free space. If these directories are not writable by the database administrator user, give this user access to these directories. 
    The remote host is ready for the backup configuration file to be created and for the backup directory to be initialized.
Configure the Data Repository Backup
To back up the Data Repository and configure automatic backups, create a configuration file for the backup. Vertica performs a full backup during the first backup into a new backup directory.  All subsequent backups to the same directory are incremental backup, even if the snapshot name changes.
The node where you perform this procedure initiates the backup.
Follow these steps:
  1. Log in to the Data Repository host as the database administrator user.
  2. Create the configuration script for the backup:
    /opt/vertica/bin/vbr.py --setupconfig
    Run this command from the target directory for the configuration file. The database administrator user requires write privileges for the directory.
  3. Provide answers to the prompts. The following list provides information about the prompts and the recommended responses:
    • Snapshot name:
      Specify a name for the snapshot file.
      Default:
      backup_snapshot
    • Destination Vertica DB bin directory:
      The location of the Vertica tools on the remote host. The default is correct unless you have customized the remote host.
      Default:
      /opt/vertica/bin
    • Number of restore points (1):
      7
      A restore point limit of 7 enables Data Repository to be restored to the most recent backup or to any of the previous 7 incremental backups. For example, if you do daily backups with a restore point limit of 7, only the objects changed in the last 24 hours are backed up. However, if you do weekly backups with the same restore point limit, all objects that changed in the last 7 weeks are backed up. If the restore point is set to 1, you can only restore Data Repository to the most recent backup or to the previous incremental backup. The oldest backup in the backup directory is removed when the restore point limit is reached. 
    • Specify objects (no default):
      To inspect all objects in the Data Repository for changes, do not specify a value, and press Return.
    • Object restore mode (coexist, createOrReplace or create) (createOrReplace):
      Accept the default value.
    • Vertica user name (
      dradmin
      ):
      Accept the default value.
    • Save password to avoid runtime prompt ? (n) [y/n]:
      y
    • Database user password to save in vbr config file (no default):
      Specify the password for the database administrator user.
    • Backup host name (no default):
      Specify the host name of the backup host. If you are backing up a cluster, you are prompted for the hostname that corresponds to each node in the cluster.
    • Backup directory (no default):
      Specify the full path of the backup_directory. If you are backing up a cluster, you are prompted for a backup directory for each node in the cluster. Back up each node in a cluster. The backup directory must be initialized before the backup runs.
    • Config file name (
      snapshot_name
      .ini):/tmp/
      snapshot_name
      .ini
      Accept the default value.
    • Password file name (no default value) (no default):
      You must provide the password file name. The default of no filename causes the backup script to fail.
      Example:
      /tmp/pwdfile
    • Change advanced settings? (n) [y/n]:
      Accept the default value.
    • Config file name (<snapshot name>.ini):
      The default name for the configuration file corresponds to the snapshot name provided in the first question. Accept the default value. A message indicates that the vbr configuration has been saved to the file name as specified.
  4. (First Time Only) Initialize the backup directory before the first time you run the backup.
    /opt/vertica/bin/vbr.py --task init --config-file 
    configuration_directory_path_filename
    • configuration_directory_path_filename
       indicates the directory path and filename of the configuration file that you will reference when you restore. This file is located where you ran the backup utility.
    Once initialized, multiple configuration files can use the directory if the files share the same backup directory location.
  5. Back up Data Repository:
    /opt/vertica/bin/vbr.py --task backup --config-file 
    configuration_directory_path_filename
    For example:
    /opt/vertica/bin/vbr.py --task backup --config-file /home/vertica/vert-db-production.ini
  6. If you are prompted about the authenticity of the host, answer yes.
    The Data Repository starts the backup. This process can take a long time, especially for a full backup.
  7. (Optional) If you do not want to retain the Data Repository password in clear text for future manual backups.
    The configuration file that is generated contains a clear text password. Automated backups require the password. This procedure prevents automated backups from this configuration file.
    1. Verify that the following line exists under the [Database] section:
      dbPromptForPassword = True
    2. Remove the following line from the [Database] section:
      dbPassword = 
      password
Set Up an Automatic Backup
To ensure regular backups of the Data Repository, create a cron job to schedule automatic backups. The first backup is a full backup and the following backups are incremental. Run a full backup weekly or biweekly. Vertica performs a full backup only when you use a new backup directory.
Run a full backup weekly. If disk space is limited, retain only two to three weeks of data. Delete the oldest backup file at the beginning of each week. Use the vbr utility remove task to delete old backups. Vertica does not support removing backups through the file system.
Follow these Steps:
  1. Create a wrapper shell script that contains the following line:
    /opt/vertica/bin/vbr.py --task backup --config-file 
    configuration_directory_path_filename
    configuration_directory_path_filename
    indicates the directory path and filename of the configuration file that you will reference when you restore.
  2. Save the contents to a new file named backup_script.sh in a location of your choice.
    For example:
    /home/vertica/backup_script.sh
  3. Change permissions for running the script:
    chmod 777
    makes the file readable, writable, and executable by everyone. If you want only the script owner to run the file, use
    chmod 700
    . If you want only the root user to run the file, use
    chmod 755
    .
    chmod 777 
    location_backup_script.sh
    /backup_script.sh
    For example:
    chmod 777 /home/vertica/backup_script.sh
  4. As the database administrator user, open the crontab to define a cron job:
    crontab -e
  5. Add a cron job that runs the backup script.
    Create a cron job to run the script daily at an off-peak time.
    For example:
    00 02 * * *   /home/vertica/backup_script.sh >/tmp/backup.log  2>&1
    This example cron job runs the backup script every day at 2:00 AM.
    The cron job runs a daily incremental backup.
  6. Add a script to copy the configuration file, and change the snapshot name in the configuration file. Also use a new backup directory in the configuration file to cause Vertica to perform a full backup.
    Do not delete the previous configuration file. The original configuration file is required to remove a backup or restore from an older series of backups.
  7. (Optional) Remove older backup sequences as required with the vbr utility with the remove task using the configuration that was used to create it.
    /opt/vertica/bin/vbr.py --task remove --archive=[<date>_<time>|"all"] --config-file 
    configuration_directory_path_filename
    The remove command is destructive and removes the data and free space on the disk. The archive must be specified to remove a single restore point, a comma separated list, or "all". To display the list of backups, run
    --task listbackup
    .
VBR Utility Reference
The Vertica vbr utility lets you back up and restore either the full database, or one or more schema and table objects of interest. You can also copy a cluster and list backups you created previously.
For a full reference for the vbr utility, see the Vertica Documentation.