Prepare to Install the Data Repository

To ensure that your data repository installation is successful, complete the requirements before you install the data repository:
HID_Prepare_to_Install_the_Data_Repository
To ensure that your data repository installation is successful, complete the requirements before you install the data repository:
2
For more information about data repository configuration options and administration, see Data Repository Administration.
Verify the Prerequisites
Verify the following prerequisites before you install the data repository:
  • Verify whether the dialog and chrony packages are installed on each data repository host:
    The chrony package is required only for Red Hat Enterprise Linux (RHEL) 7.x and Oracle Linux (OL) 7.x.
    rpm -qa | grep ^dialog
    rpm -qa | grep ^chrony
    If either command does not return results, install the package:
    If you are not the root user, use the sudo prefix.
    yum install dialog
    yum install chrony
    If this package is not installed, the validation and installation scripts fail.
  • Ensure that the zip and unzip packages are installed. If these packages are not installed, use the following command to install them:
    yum -y install zip unzip
  • Verify that you have at least 2 GB of swap space on the data repository host.
  • Verify that the data repository hosts use the ext4 file system. Vertica does not support XFS or btrfs. All disks with Vertica should use ext4.
    The default file system for RHEL 7.x and OL 7.x is the XFS file system. The default file system for SUSE Linux Enterprise Server (SLES) is btrfs. Vertica does not support XFS or btrfs. The database performs best with the ext4 file system.
  • Verify that the following ports are open on the data repository systems:
    • Port 22 (TCP protocol)
    • Port 4803 (TCP and UDP protocol)
    • Port 4804 (UDP protocol)
    • Port 5433 (TCP protocol)
      Remote access is required to this port.
    • Port 5434 (TCP protocol)
    • Port 6543 (UDP protocol)
  • To avoid database corruption, exclude the installation directory, and all its subdirectories, from antivirus scans. Prevent scanning by a local instance of an antivirus client and scanning by a remote antivirus instance. Exclude the following directories:
    • /opt/vertica/*
    • /opt/vconsole/*
    • The specified data directory.
      Default:
      /drdata/data
    • The specified catalog directory.
      Default:
      /drdata/catalog
    • Vertica temporary files in
      /tmp
      • /tmp/4803
      • /tmp/vbr/*
    • The directory where you back up the data repository.
  • To prevent the data repository installation from failing, if a file named
    /etc/release
    exists, remove it.
  • Verify the access according to your installation type:
    • Single Node:
      Root access is required to install the data repository. Determine whether you can install the data repository as root.
    • Cluster:
      Verify that the root user or sudo user can create database administrator user accounts, or can have an administrator create these accounts.
  • Verify that CPU frequency scaling is disabled. Disable CPU frequency scaling through the host system BIOS and OS settings.
    If CPU frequency scaling is enabled, you might experience inconsistent performance for similar queries in Vertica. CPU frequency scaling can cause observable slowness and variation in dashboard loading.
  • Verify that you are not using Logical Volume Manager (LVM) for
    /data
    and
    /catalog
    directories.
  • (Cluster only) Verify all the hosts in the cluster are in the same subnet.
  • (Cluster only) Verify that the root user can use Secure Shell (SSH) to log in (ssh) to all the hosts in the cluster.
    Set up SSH for the root user for the data repository installation.
  • The default shell environment must be
    bash
    .
  • (Cluster installations only) Select the hosts where you install the data repository nodes.
    Database software is deployed on each participating host in a cluster. This software represents a ‘node’ in the cluster. A three-node cluster represents the simplest configuration that can tolerate the loss of a single node. You can, however, include more than three hosts in the cluster. If more than one node fails or shuts down, the data repository is no longer available for use and the data aggregator shuts down automatically.
Install the Data Repository on VMs
For best performance, install the data repository in a bare-metal environment. However, if you install the data repository in VMware virtual machines, verify the following requirements:
  • Use VMware 5.5 or higher.
  • The number of VMs per host does not exceed the number of physical processors.
  • Pre-allocate and reserve 4 GB of memory for each of the VMs.
  • Each VM has a dedicated 10 GB NIC.
  • Disable CPU frequency scaling at the host level and for each VM.
  • Disable VMotion. VMotion can disrupt communication, and can cause the data repository to shut down.
  • Set the VMware parameters for hugepages to the 5.5 default values.
  • Verify the hardware and network performance. Use the Vertica diagnostic tools described bellow to verify performance.
For more information about how to run Vertica on VMs, see the Vertica documentation.
Install the Data Repository on Shared Storage (SAN)
To install the data repository on SAN, verify the following requirements:
  • The hosts have no contention for disk space or bandwidth.
  • Each host has a unique catalog and data location. The hosts cannot share the location for these directories.
  • The storage has enough I/O bandwidth for each node to access the storage independently. To verify the I/O bandwidth, simultaneously run vioperf from all hosts in the data repository cluster.
    For more information, see the following procedures.
Set a Unique Hostname for Each Data Repository Host
Set a unique hostname for each data repository host in the cluster.
Follow these steps:
  1. As the root user, log in to each data repository host, and verify the unique hostname.
    The hostname must be associated with the IP address and
    not
    the loopback address of 127.0.0.1.
  2. Verify that the following lines appear in the
    /etc/hosts
    file on each computer:
    Do not remove the following line, or various programs
    # that require network functionality will fail.
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    IP address of your host
    YourHostName YourHostName.domain
  3. If you change the file, issue the following command:
    service network restart
    The
    /etc/hosts
    file is configured correctly.
    The unique host name is set.
  4. (Cluster installations only) Complete the following:
    1. The hostnames of all hosts in the cluster must resolve correctly. If the hostname resolution is incorrect, the data repository cluster does not install or work properly. All participating hosts in the cluster must use static IP or permanently leased DHCP addresses. Set up the /etc/hosts file on each of the hosts you selected for the cluster. The hosts file must contain entries for all hosts in the cluster.
      Example:
      This example shows the /etc/hosts file for a cluster where the hosts are named host01, host02, and host03:
      127.0.0.1 localhost.localdomain localhost
      192.168.13.128 host01.domain host01
      192.168.13.129 host02.domain host02
      192.168.13.130 host03.domain host03
      Do not remove the loopback address (127.0.0.1) line. The local data repository hostname cannot be on this line. Do not use the loopback address or localhost name when you are defining hosts in the cluster.
    2. Verify that hostname resolution works for each host in the cluster.
      For example, on host01, the following syntax is correct:
      $ /bin/hostname -f
      host01
      Hostname resolution is configured.
(Optional) Set Up Passwordless SSH  for the Root or Sudo User
The hosts in a data repository cluster require passwordless ssh for the root or sudo user during the data repository installation or upgrade. The
dr_validate.sh
script sets up passwordless SSH, but requests the password many times. To avoid repeatedly specifying the root or sudo user password, set up passwordless ssh before you run the validation script.
Repeat this procedure for each pair of hosts. If you have set up passwordless ssh for the root user, but you do not have root access to install and run the data repository, configure a sudo user account. You can also install the data repository without requiring to enter the root password by using the sudo user account.
For more information about how to configure the passwordless sudo user account for the data repository, see the "Configure the passwordless Sudo User Account for Data Repository" section.
Passwordless SSH is automatically set up for the data repository admin user when you install the data repository.
Follow these steps:
  1. Open a console and log in to the data repository host as the root or sudo user.
  2. Issue the following commands:
    ssh-keygen -N "" -t rsa -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2 chmod 644 ~/.ssh/authorized_keys2
  3. Copy the root or sudo user public key into the list of authorized keys on the remote hosts:
    ssh-copy-id -i
    user
    @
    remotehost
    remotehost
    is a host in the cluster where you are copying the SSH ID.
  4. To verify that passwordless ssh is set up correctly, log in to the remote host from the local host:
    ssh
    user
    @
    remotehost
    ls
    If the passwordless SSH has been set up successfully, you are not prompted for a password. You also see a directory listing from the ‘ls command’.
(Optional) Configure the Sudo User Account for the Data Repository
If you have passwordless ssh set up for the root user, but you do not have root access to install and run the data repository, configure a sudo user account.
For cluster environments, complete this procedure on each host in the cluster.
Follow these steps:
  1. Locate the
    /etc/sudoers
    file.
  2. Add a command alias with the following permissions to the file:
Cmnd_Alias CA_DATAREP = /tmp/installDR.bin,/opt/CA/IMDataRepository_vertica9/dr_validate.sh,/opt/CA/IMDataRepository_vertica9/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/usr/bin/yum,/opt/CA/IMDataRespository_vertica9/RemoteEngineer/re.sh
## Allows the Data Repository user to manage the Data Repository
sudouser
ALL = CA_DATAREP
  • sudouser
    is the user who will install and manage the Vertica node.
This command alias details the commands that the sudo user must be able to run.
With the sudo user configured, add the sudo prefix to all commands to install the data repository.
Example:
sudo ./installDR.bin
(Optional) Configure the Passwordless Sudo User Account for the Data Repository
Due to certain security policies, in some environments, you cannot enable passwordless SSH for the root users on the host servers. The following procedure provides you an alternative method to install the data repository without requiring that level of access by using the sudo user account.
Only RHEL 7.x and SLES 12 supports this functionality.
  1. Locate the
    /etc/sudoers
    file.
  2. Add command aliases with the following permissions to the file:
  • (RHEL 7.x)
    Cmnd_Alias CA_DATAREP=/opt/vertica/sbin/install_vertica,/tmp/installDR.bin,/opt/CA/IMDataRepository_vertica9/dr_validate.sh,/opt/CA/IMDataRepository_vertica9/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/opt/CA/IMDataRespository_vertica9/RemoteEngineer/re.sh,/bin/mkdir*,/usr/bin/whoami,/bin/echo,/sbin/service,/bin/grep,/usr/bin/test,/sbin/iptables,/opt/vertica/oss/python/bin/python,/usr/bin/tee,/usr/sbin/ntpd,/etc/init.d/ntpd,/sbin/blockdev,/etc/init.d/sshd,/etc/sysconfig/sshd,/etc/ssh/sshd_config,/bin/su,/usr/sbin/sshd restart,/usr/bin/ssh,/bin/df,/bin/mv,/bin/rm,/usr/bin/install
    Cmnd_Alias VERTICA = /opt/vertica/bin/,/opt/vertica/sbin/,/opt/vertica/oss/python/bin/
    Cmnd_Alias VERTICA_INSTALL = /bin/echo,/bin/ps -A,/bin/cp /opt/vertica/config/admintools.conf /opt/vertica/config/admintools.conf.bak.*,/bin/rm -rf /tmp/dbRPM.rpm,/bin/df --portability /tmp,/usr/bin/install --owner * --mode 700 -d *,/bin/mv -f /tmp/vstage-*/file /tmp/*,/bin/rm -rf /tmp/vstage-*,/usr/bin/id *,/bin/cp -T /opt/vertica/* /tmp/vstage-*,/bin/su --login
    dbadmin
    *,/bin/mkdir -p /opt/vertica/*,/bin/touch /opt/vertica/*,/bin/rm -rf /opt/vertica/*,/bin/mv -f /tmp/vstage-* /opt/vertica/*,/bin/mkdir -p /opt/vertica/*,/bin/touch /opt/vertica/config/users/
    dbadmin
    /agent.conf,/bin/su
    dbadmin
    *,/bin/sh -c *,/usr/bin,/opt/vertica/share/binlib/test/*,/usr/bin/su
    dbadmin
    ,/bin/test [ -e /* ],/usr/bin/[ -e /* ]
    Cmnd_Alias USEFUL = /usr/bin/lshw,/usr/bin/yum,/bin/rpm,/sbin/reboot,/sbin/shutdown,/usr/bin/cpan,/bin/chgrp,/bin/chmod,/bin/chown,/bin/mnt,/usr/bin/test,/bin/[,/sbin/service
    ## Allows the Data Repository user to manage the Data Repository
    sudouser
    ALL = CA_DATAREP, VERTICA , VERTICA_INSTALL , USEFUL
    Defaults env_keep +="VERT_DBA_USR VERT_DBA_HOME VERT_DBA_GRP VERT_DBA_DATA_DIR _ENV_VPWD_VAR"
    • dbadmin
      is the database administrator user that the Vertica install creates, and who will own and run Vertica.
    • sudouser
      is the user who will install and manage the Vertica node.
  • (SLES 12)
    Cmnd_Alias CA_DATAREP =/opt/vertica/sbin/install_vertica,/tmp/installDR.bin,/opt/CA/IMDataRepository_vertica9/dr_validate.sh,/opt/CA/IMDataRepository_vertica9/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/opt/CA/IMDataRespository_vertica9/RemoteEngineer/re.sh,/usr/bin/mkdir, /sbin/SuSEfirewall2 off *,/usr/bin/whoami,/usr/bin/echo,/usr/bin/id,/usr/bin/env,/usr/sbin/service,/usr/bin/grep,/usr/bin/test,/sbin/iptables,/opt/vertica/oss/python/bin/python,/usr/bin/tee,/usr/sbin/ntpd,/etc/init.d/ntpd,/sbin/blockdev,/etc/init.d/sshd,/etc/sysconfig/sshd,/etc/ssh/sshd_config,/usr/bin/su,/usr/sbin/sshd restart,/usr/bin/ssh,/usr/bin/sh,/usr/bin/install
    Cmnd_Alias VERTICA = /opt/vertica/bin/,/opt/vertica/sbin/,/opt/vertica/oss/python/bin/
    Cmnd_Alias VERTICA_INSTALL = /usr/bin/echo,/usr/bin/ps -A,/usr/bin/cp /opt/vertica/config/admintools.conf /opt/vertica/config/admintools.conf.bak.*,/usr/bin/rm -rf /tmp/dbRPM.rpm,/usr/bin/df --portability /tmp,/usr/bin/install --owner * --mode 700 -d *,/usr/bin/mv -f /tmp/vstage-*/file /tmp/*,/usr/bin/rm -rf /tmp/vstage-*,/usr/bin/id *,/usr/bin/cp -T /opt/vertica/* /tmp/vstage-*,/usr/bin/su --login
    dbadmin
    *,/usr/bin/mkdir -p /opt/vertica/*,/usr/bin/touch /opt/vertica/*,/usr/bin/rm -rf /opt/vertica/*,/usr/bin/mv -f /tmp/vstage-* /opt/vertica/*,/usr/bin/mkdir -p /opt/vertica/*,/usr/bin/touch /opt/vertica/config/users/
    dbadmin
    /agent.conf,/usr/bin/su
    dbadmin
    *,/usr/bin/sh -c *,/opt/vertica/share/binlib/test/*,/usr/bin/su
    dbadmin
    ,/usr/bin/test [ -e /* ],/usr/bin/[ -e /* ]
    Cmnd_Alias USEFUL = /usr/bin/lshw,/usr/bin/yum,/bin/rpm,/sbin/reboot,/sbin/shutdown,/usr/bin/cpan,/bin/chgrp,/bin/chmod,/bin/chown,/bin/mnt,/usr/bin/test,/bin/[,/sbin/service
    ## Allows the Data Repository user to manage the Data Repository
    sudouser
    ALL = CA_DATAREP, VERTICA , VERTICA_INSTALL , USEFUL
    Defaults env_keep +="VERT_DBA_USR VERT_DBA_HOME VERT_DBA_GRP VERT_DBA_DATA_DIR _ENV_VPWD_VAR"
    • dbadmin
      is the database administrator user that the Vertica install creates, and who will own and run Vertica.
    • sudouser
      is the user who will install and manage the Vertica node.