Prepare to Install the Data Repository

To ensure that your Data Repository installation is successful, complete the requirements before you install Data Repository:
To ensure that your Data Repository installation is successful, complete the requirements before you install Data Repository:
For more information about Data Repository configuration options and administration, see Data Repository Administration.
Verify the Prerequisites
Verify the following prerequisites before you install Data Repository:
  • Verify whether the dialog package is installed on each Data Repository host:
    rpm -qa | grep ^dialog
    If the command does not return any results, install the dialog package:
    yum install dialog
    If this package is not installed, the validation and installation scripts fail.
  • The installer requires the zip and unzip packages. If these packages are not installed, use the following command to install them:
    yum -y install zip unzip
  • Verify that you have at least 2 GB of swap space on Data Repository host.
  • Verify that the Data Repository hosts use the ext4 or ext3 file system for data and catalog directories.
    The database performs best with the ext4 file system.
  • Verify that the following ports are open on the Data Repository systems:
    • Port 22 (TCP protocol)
    • Port 4803 (TCP and UDP protocol)
    • Port 4804 (UDP protocol)
    • Port 5433 (TCP protocol)
      Remote access is required to this port.
    • Port 5434 (TCP protocol)
    • Port 6543 (UDP protocol)
  • To avoid database corruption, exclude the installation directory, and all its subdirectories, from antivirus scans. Prevent scanning by a local instance of an antivirus client and scanning by a remote antivirus instance. Exclude the following directories:
    • /opt/vertica/*
    • /opt/vconsole/*
    • The specified data directory
    • The specified catalog directory
    • Vertica temporary files in /tmp
      • /tmp/4803
      • /tmp/vbr/*
    • The directory where you back up the Data Repository
  • If a file named 'release' appears in the /etc directory, remove it. Otherwise, the Data Repository installation fails.
  • Verify the access according to your installation type:
    • Single Node:
      Root access is required to install Data Repository. Determine whether you can install Data Repository as root.
    • Cluster:
      Verify that the root user or sudo user can create database administrator user accounts, or can have an administrator create these accounts.
  • Verify that CPU frequency scaling is disabled. Disable CPU frequency scaling through the host system BIOS and OS settings.
    If CPU frequency scaling is enabled, you might experience inconsistent performance for similar queries in Vertica. CPU frequency scaling can cause observable slowness and variation in dashboard loading.
  • Verify that you are not using Logical Volume Manager (LVM) for /data and /catalog directories.
  • (Cluster only) Verify all the hosts in the cluster are in the same subnet.
  • (Cluster only) Verify that the root user can use Secure Shell (SSH) to log in (ssh) to all the hosts in the cluster.
    Set up SSH for the root user for the Data Repository installation or upgrade.
  • (Cluster only) Select the hosts where you install Data Repository nodes.
    Database software is deployed on each participating host in a cluster. This software represents a ‘node’ in the cluster. A three-node cluster represents the simplest configuration that can tolerate the loss of a single node. You can, however, include more than three hosts in the cluster. If more than one node fails or shuts down, Data Repository is no longer available for use and Data Aggregator shuts down automatically.
Install the Data Repository on VMs
For best performance, install the Data Repository in a bare-metal environment. However, if you install the Data Repository in VMware virtual machines, verify the following requirements:
  • Use VMware version 5.5 or greater.
  • The number of VMs per host does not exceed the number of physical processors.
  • Pre-allocate and reserve 4 GB of memory for each of the VMs.
  • Each VM has a dedicated 10 GB NIC.
  • Disable CPU frequency scaling at the host level and for each VM.
  • Disable VMotion. VMotion can disrupt communication, and can cause the Data Repository to shut down.
  • Set the VMware parameters for hugepages to the version 5.5 default values.
  • Verify the hardware and network performance. Use the Vertica diagnostic tools described bellow to verify performance.
For more information about running Vertica on VMs, see the Vertica documentation.
Install the Data Repository on Shared Storage (SAN)
To install the Data Repository on SAN, verify the following requirements:
  • The hosts have no contention for disk space or bandwidth.
  • Each host has a unique catalog and data location. The hosts cannot share the location for these directories.
  • The storage has enough I/O bandwidth for each node to access the storage independently. To verify the I/O bandwidth, simultaneously run vioperf from all hosts in the Data Repository cluster. For more information, see the following procedures.
Set a Unique Hostname for Each Data Repository Host
Set a unique hostname for each Data Repository host in the cluster.
Follow these steps:
  1. As the root user, log in to each Data Repository host, and verify the unique hostname.
    The hostname must be associated with the IP address and
    the loopback address of
  2. Verify that the following lines appear in the /etc/hosts file on each computer:
    Do not remove the following line, or various programs
    # that require network functionality will fail.   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    IP address of your host
    YourHostName YourHostName.domain
  3. If you change the file, run the following command:
    service network restart
    The /etc/hosts file is configured correctly.
    The unique host name is set.
  4. (Cluster installations only) The hostnames of all hosts in the cluster must resolve correctly. If the hostname resolution is incorrect, the Data Repository cluster does not install or work properly. All participating hosts in the cluster must use static IP or permanently leased DHCP addresses. Set up the /etc/hosts file on each of the three hosts you selected for the cluster. The hosts file must contain entries for all three hosts in the cluster.
    This example shows the /etc/hosts file for a cluster where the hosts are named host01, host02, and host03: localhost.localdomain localhost host01.domain host01 host02.domain host02 host03.domain host03
    Do not remove the loopback address ( line. The local Data Repository hostname cannot be on the line. Also, do not use the loopback address or localhost name when you are defining hosts in the cluster.
  5. Verify that hostname resolution works for each host in the cluster.
    For example, on host01, the following syntax is correct:
    $ /bin/hostname -f 
    Hostname resolution is configured.
(Optional) Set Up Passwordless SSH  
The hosts in a Data Repository cluster require passwordless ssh for the root or sudo users during the Data Repository installation or upgrade. The script sets up passwordless SSH, but requests the password many times. To avoid repeatedly specifying the root user password, set up passwordless ssh before you run the validation script. Repeat this procedure for each pair of hosts.
Passwordless SSH is automatically set up for the Data Repository admin user when you install the Data Repository.
Follow these steps:
  1. Open a console and log in to the Data Repository host as the root user.
  2. Run the following commands:
    ssh-keygen -N "" -t rsa -f ~/.ssh/id_rsa
    cp ~/.ssh/ ~/.ssh/authorized_keys2
    chmod 644 ~/.ssh/authorized_keys2
  3. Copy the root user public key into the list of authorized keys on the remote hosts:
    ssh-copy-id -i [email protected]
    specifies a host in the cluster where you are copying the SSH ID.
  4. To verify that passwordless ssh is set up correctly, log in to the remote host from the local host:
    If the passwordless SSH has been set up successfully, you are not prompted for a password. You also see a directory listing from the ‘ls command’.
(Optional) Configure the Sudo User Account for Data Repository
If you do not have root access to install and run the Data Repository, configure the sudo user account.
For cluster environments, complete this procedure on each host in the cluster.
Follow these steps:
  1. Locate the following file:
  2. Add a command alias with the following permissions to the file:
    Cmnd_Alias CA_DATAREP = /tmp/installDR.bin,/opt/CA/IMDataRepository_vertica7/,/opt/CA/IMDataRepository_vertica7/,/usr/bin/vim,/usr/bin/reboot,/opt/CA/IMDataRespository_vertica7/RemoteEngineer/
    ## Allows the Data Repository user to manage the Data Repository
    • sudouser
      Specify the user who can run the sudo commands.
    This command alias details the commands that the sudo user must be able to run.
    With the sudo user configured, and the sudo prefix to all commands to install the Data Repository.
    sudo ./installDR.bin