Prepare to Install the Data Repository

Ensure that you can install the data repository successfully, prepare for the installation.
HID_Prepare_to_Install_the_Data_Repository
Ensure that you can install the data repository successfully by completing the following steps:
2
For more information about the configuration options and how to administrate the data repository, see Data Repository Administration.
Verify the Prerequisites
Before you install the data repository, ensure that you have met the following prerequisite steps:
  • You have reviewed the Vertica documentation.
    NetOps Portal
    21.2.1 and 21.2.2 include Vertica 9.1.1-5.
    NetOps Portal
    21.2.3 and higher includes Vertica 10.1.1.
  • (Red Hat Enterprise Linux (RHEL) 7.x and Oracle Linux (OL) 7.x only) You have verified that the dialog and chrony packages are installed on each data repository host by issuing the following command:
    rpm -qa | grep ^dialog
    rpm -qa | grep ^chrony
    If either command does not return results, install the package by issuing the following command. The validation and installation scripts require the installation of this package:
    If you are a user with the necessary sudo privileges, issue this command using the sudo prefix.
    yum install dialog
    yum install chrony
  • You have installed the zip and unzip packages. If these packages are not installed, install them by issuing the following command:
    yum -y install zip unzip
  • You have verified that you have at least 2 GB of swap space on the data repository host.
  • You have verified that the data repository hosts use the ext4 file system.
    The default file system for RHEL 7.x and OL 7.x is XFS. The default file system for SUSE Linux Enterprise Server (SLES) is btrfs. Vertica supports ext4. The disks with Vertica should use this default file sytem. The database performs best with the ext4 file system.
  • You have verified that the following ports are open on the data repository systems:
    • Port 22 (TCP protocol)
    • Port 4803 (TCP and UDP protocol)
    • Port 4804 (UDP protocol)
    • Port 5433 (TCP protocol)
      Remote access is required to this port.
    • Port 5434 (TCP protocol)
    • Port 6543 (UDP protocol)
  • To avoid database corruption and to prevent scanning by a local instance of an antivirus client and scanning by a remote antivirus instance, you have excluded the installation directory, and the following subdirectories:
    • /opt/vertica/*
    • /opt/vconsole/*
    • The specified data directory.
      Default:
      /data
      Ensure that the
      data
      directory is on a separate mount from the
      catalog
      directory. This isolates those file systems from performance and space interference so that they are unencumbered from any other disk usage or performance considerations, including each other.
    • The specified catalog directory.
      Default:
      /catalog
      Ensure that
      catalog
      directory is on a separate mount from the
      data
      directory. This isolates those file systems from performance and space interference so that they are unencumbered from any other disk usage or performance considerations, including each other.
    • The Vertica
      /tmp/vbr/*
      temporary file.
    • The directory where you back up the data repository.
    For more information about disk locations, see the Vertica documentation.
  • To avoid the data repository installation from failing, you have ensured that a file named
    release
    is not in the
    /etc
    directory. Remove the file if it exists.
  • You have verified the access according to your installation type:
    • Single Node:
      Root access is required to install the data repository. Determine whether you have this access level.
    • Cluster:
      Verify that the root user or sudo user can create database administrator user accounts, or can have an administrator create these accounts.
  • You have verified that CPU frequency scaling is disabled. Disable CPU frequency scaling through the host system BIOS and OS settings.
    If CPU frequency scaling is enabled, you might experience inconsistent performance for similar queries in Vertica. CPU frequency scaling can cause observable slowness and variation in dashboard loading.
  • You have verified that you are not using Logical Volume Manager (LVM) for the
    /data
    and
    /catalog
    directories.
  • (Cluster only) You have verified that all the hosts in the cluster are in the same subnet.
  • (Cluster only) You have verified that the root user can use Secure Shell (SSH) to log in (ssh) to all the hosts in the cluster.
    Set up SSH for the root user.
  • You have verified that the default shell environment is
    bash
    .
  • (Cluster installations only) You have selected the hosts where you plan to install the data repository nodes.
    You deploy database software on each participating host in a cluster. The software represents a ‘node’ in the cluster. A three-node cluster represents the simplest configuration that can tolerate the loss of a single node. You can, however, include more than three hosts in the cluster.
    Warning!
    If more than one node fails or shuts down, the data repository is no longer available for use and the data aggregator shuts down automatically.
Install the Data Repository on VMs
For best performance, install the data repository in a bare-metal environment. However, if you install the data repository in VMware virtual machines (VMs), verify the following requirements:
  • You are using VMware version 5.5 or later.
  • The number of VMs per host does not exceed the number of physical processors.
  • Pre-allocate and reserve 4 GB of memory for each of the VMs.
  • Each VM has a dedicated 10 GB NIC.
  • You have disabled CPU frequency scaling at the host level and for each VM.
  • You have disabled VMotion. VMotion can disrupt communication, and can cause the data repository to shut down.
  • You have set the VMware parameters for hugepages to the VMware 5.5 default values.
  • You have verified the hardware and network performance.
    You can verify performance using the
    vioperf
    Vertica utility.
    For more information, see the Vertica documentation.
For more information about how to run Vertica on VMs, see the Vertica documentation.
Install the Data Repository on Shared Storage (SAN)
To install the data repository on SAN, verify the following requirements:
  • The hosts have no contention for disk space or bandwidth.
  • Each host has a unique catalog and data location. The hosts cannot share the location for these directories.
  • The storage has enough I/O bandwidth for each node to access the storage independently.
    You can verify the I/O bandwidth by simultaneously running the Vertica
    vioperf
    utility from all hosts in the data repository cluster.
    For more information, see the following procedures.
Set a Unique Hostname for the Data Repository Host
Set a unique hostname for each data repository host in the cluster.
Follow these steps:
  1. As the root user, log in to each data repository host, and verify the unique hostname.
    The hostname must be associated with the IP address and
    not
    with the loopback address of 127.0.0.1.
  2. Verify that the following lines appear in the
    /etc/hosts
    file on each computer:
    Do not remove the following line, or various programs
    # that require network functionality will fail.
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    IP address of your host
    YourHostName YourHostName.domain
  3. If you change the file, issue the following command:
    service network restart
    The
    /etc/hosts
    file is configured correctly.
    The unique host name is set.
  4. (Cluster installations only) Complete the following:
    1. The hostnames of all hosts in the cluster must resolve correctly. If the hostname resolution is incorrect, the data repository cluster does not install or work properly. All participating hosts in the cluster must use static IP or permanently leased DHCP addresses. Set up the
      /etc/hosts
      file on each of the hosts you selected for the cluster. The hosts file must contain entries for all hosts in the cluster.
      Example:
      This example shows the
      /etc/hosts
      file for a cluster where the hosts are named host01, host02, and host03:
      127.0.0.1 localhost.localdomain localhost
      192.168.13.128 host01.domain host01
      192.168.13.129 host02.domain host02
      192.168.13.130 host03.domain host03
      • Do not remove the loopback address (127.0.0.1) line.
      • The local data repository hostname cannot be on the 127.0.0.1 line.
      • do not use the loopback address or localhost name when you are defining hosts in the cluster.
    2. Verify that hostname resolution works for each host in the cluster.
      For example, on host01, the following syntax is correct:
      $ /bin/hostname -f
      host01
      Hostname resolution is configured.
(Optional) Set Up Passwordless SSH for the Root or Sudo User
The hosts in a data repository cluster require passwordless ssh for the root or sudo user during the data repository installation or upgrade. The
dr_validate.sh
script sets up passwordless SSH, but requests the password many times. To avoid repeatedly specifying the root or sudo user password, set up passwordless ssh before you run the validation script.
Repeat this procedure for each pair of hosts. If you have passwordless ssh set up for the root user, but you do not have root access to install and run the data repository, configure a sudo user account. You also have an alternative method to install the data repository without requiring to enter the root password by using the sudo user account.
For more information about how to configure the passwordless sudo user account for the data repository, see the section "(Optional) Configure the passwordless Sudo User Account for Data Repository".
Passwordless SSH is automatically set up for the data repository admin user when you install the data repository.
Follow these steps:
  1. Open a console and log in to the data repository host as the root or sudo user.
  2. Issue the following commands:
    ssh-keygen -N "" -t rsa -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2 chmod 644 ~/.ssh/authorized_keys2
  3. Copy the root or sudo user public key into the list of authorized keys on the remote hosts by issuing the following command:
    ssh-copy-id -i
    user
    @
    remotehost
    • remotehost
      is a host in the cluster where you are copying the SSH ID.
  4. To verify that passwordless ssh is set up correctly, log in to the remote host from the local host by issuing the following command:
    ssh
    user
    @
    remotehost
    ls
    You have set up passwordless SSH successfully when you are not prompted for a password. You also see a directory listing.
(Optional) Configure the Sudo User Account for the Data Repository
If you have passwordless ssh set up for the root user, but you do not have root access to install and run the data repository, configure a sudo user account. As a sudo user, you can add the sudo prefix to all commands to install the data repository, for example
sudo ./installDR.bin
.
For cluster environments, complete this procedure on each host in the cluster.
Follow these steps:
  1. Locate the
    /etc/sudoers
    file.
  2. Add a command alias with the following permissions to the file by issuing the following command:
    • (21.2.2 or lower)
      Cmnd_Alias CA_DATAREP = /tmp/installDR.bin,/opt/CA/IMDataRepository_vertica9/dr_validate.sh,/opt/CA/IMDataRepository_vertica9/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/usr/bin/yum,/opt/CA/IMDataRespository_vertica9/RemoteEngineer/re.sh
      ## Allows the Data Repository user to manage the Data Repository
      sudouser
      ALL = CA_DATAREP
    • (21.2.3 or higher)
      Cmnd_Alias CA_DATAREP = /tmp/installDR.bin,/opt/CA/IMDataRepository_vertica10/dr_validate.sh,/opt/CA/IMDataRepository_vertica10/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/usr/bin/yum,/opt/CA/IMDataRespository_vertica10/RemoteEngineer/re.sh
      ## Allows the Data Repository user to manage the Data Repository
      sudouser
      ALL = CA_DATAREP
    sudouser
    is the user who will install and manage the Vertica node.
    This command alias details the commands that the sudo user can issue.
The sudo user is configured.
(Optional) Configure the Passwordless Sudo User Account for the Data Repository
Due to certain security policies, in some environments, you cannot enable passwordless SSH for the root users on the host servers. On RHEL 7.x or SLES 12, you can install the data repository without requiring that level of access by using the sudo user account.
Follow these steps:
  1. Locate the
    /etc/sudoers
    file.
  2. Add command aliases with the following permissions to the file:
    • (RHEL 7.x)
      • (21.2.2 or lower)
        Cmnd_Alias CA_DATAREP=/opt/vertica/sbin/install_vertica,/tmp/installDR.bin,/opt/CA/IMDataRepository_vertica9/dr_validate.sh,/opt/CA/IMDataRepository_vertica9/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/opt/CA/IMDataRespository_vertica9/RemoteEngineer/re.sh,/bin/mkdir*,/usr/bin/whoami,/bin/echo,/sbin/service,/bin/grep,/usr/bin/test,/sbin/iptables,/opt/vertica/oss/python/bin/python,/usr/bin/tee,/usr/sbin/ntpd,/etc/init.d/ntpd,/sbin/blockdev,/etc/init.d/sshd,/etc/sysconfig/sshd,/etc/ssh/sshd_config,/bin/su,/usr/sbin/sshd restart,/usr/bin/ssh,/bin/df,/bin/mv,/bin/rm,/usr/bin/install
        Cmnd_Alias VERTICA = /opt/vertica/bin/,/opt/vertica/sbin/,/opt/vertica/oss/python/bin/
        Cmnd_Alias VERTICA_INSTALL = /bin/echo,/bin/ps -A,/bin/cp /opt/vertica/config/admintools.conf /opt/vertica/config/admintools.conf.bak.*,/bin/rm -rf /tmp/dbRPM.rpm,/bin/df --portability /tmp,/usr/bin/install --owner * --mode 700 -d *,/bin/mv -f /tmp/vstage-*/file /tmp/*,/bin/rm -rf /tmp/vstage-*,/usr/bin/id *,/bin/cp -T /opt/vertica/* /tmp/vstage-*,/bin/su --login
        dbadmin
        *,/bin/mkdir -p /opt/vertica/*,/bin/touch /opt/vertica/*,/bin/rm -rf /opt/vertica/*,/bin/mv -f /tmp/vstage-* /opt/vertica/*,/bin/mkdir -p /opt/vertica/*,/bin/touch /opt/vertica/config/users/
        dbadmin
        /agent.conf,/bin/su
        dbadmin
        *,/bin/sh -c *,/usr/bin,/opt/vertica/share/binlib/test/*,/usr/bin/su
        dbadmin
        ,/bin/test [ -e /* ],/usr/bin/[ -e /* ]
        Cmnd_Alias USEFUL = /usr/bin/lshw,/usr/bin/yum,/bin/rpm,/sbin/reboot,/sbin/shutdown,/usr/bin/cpan,/bin/chgrp,/bin/chmod,/bin/chown,/bin/mnt,/usr/bin/test,/bin/[,/sbin/service
        ## Allows the Data Repository user to manage the Data Repository
        sudouser
        ALL = CA_DATAREP, VERTICA , VERTICA_INSTALL , USEFUL
        Defaults env_keep +="VERT_DBA_USR VERT_DBA_HOME VERT_DBA_GRP VERT_DBA_DATA_DIR _ENV_VPWD_VAR"
      • (21.2.3 or higher)
        Cmnd_Alias CA_DATAREP=/opt/vertica/sbin/install_vertica,/tmp/installDR.bin,/opt/CA/IMDataRepository_vertica10/dr_validate.sh,/opt/CA/IMDataRepository_vertica10/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/opt/CA/IMDataRespository_vertica10/RemoteEngineer/re.sh,/bin/mkdir*,/usr/bin/whoami,/bin/echo,/sbin/service,/bin/grep,/usr/bin/test,/sbin/iptables,/opt/vertica/oss/python/bin/python,/usr/bin/tee,/usr/sbin/ntpd,/etc/init.d/ntpd,/sbin/blockdev,/etc/init.d/sshd,/etc/sysconfig/sshd,/etc/ssh/sshd_config,/bin/su,/usr/sbin/sshd restart,/usr/bin/ssh,/bin/df,/bin/mv,/bin/rm,/usr/bin/install
        Cmnd_Alias VERTICA = /opt/vertica/bin/,/opt/vertica/sbin/,/opt/vertica/oss/python/bin/
        Cmnd_Alias VERTICA_INSTALL = /bin/echo,/bin/ps -A,/bin/cp /opt/vertica/config/admintools.conf /opt/vertica/config/admintools.conf.bak.*,/bin/rm -rf /tmp/dbRPM.rpm,/bin/df --portability /tmp,/usr/bin/install --owner * --mode 700 -d *,/bin/mv -f /tmp/vstage-*/file /tmp/*,/bin/rm -rf /tmp/vstage-*,/usr/bin/id *,/bin/cp -T /opt/vertica/* /tmp/vstage-*,/bin/su --login
        dbadmin
        *,/bin/mkdir -p /opt/vertica/*,/bin/touch /opt/vertica/*,/bin/rm -rf /opt/vertica/*,/bin/mv -f /tmp/vstage-* /opt/vertica/*,/bin/mkdir -p /opt/vertica/*,/bin/touch /opt/vertica/config/users/
        dbadmin
        /agent.conf,/bin/su
        dbadmin
        *,/bin/sh -c *,/usr/bin,/opt/vertica/share/binlib/test/*,/usr/bin/su
        dbadmin
        ,/bin/test [ -e /* ],/usr/bin/[ -e /* ]
        Cmnd_Alias USEFUL = /usr/bin/lshw,/usr/bin/yum,/bin/rpm,/sbin/reboot,/sbin/shutdown,/usr/bin/cpan,/bin/chgrp,/bin/chmod,/bin/chown,/bin/mnt,/usr/bin/test,/bin/[,/sbin/service
        ## Allows the Data Repository user to manage the Data Repository
        sudouser
        ALL = CA_DATAREP, VERTICA , VERTICA_INSTALL , USEFUL
        Defaults env_keep +="VERT_DBA_USR VERT_DBA_HOME VERT_DBA_GRP VERT_DBA_DATA_DIR _ENV_VPWD_VAR"
      sudouser
      is the user who will install and manage the Vertica node.
      dbadmin
      is the database administrator system account that the Vertica install creates, and who will own and run Vertica.
    • (SLES 12)
      • (21.2.2 or lower)
        Cmnd_Alias CA_DATAREP =/opt/vertica/sbin/install_vertica,/tmp/installDR.bin,/opt/CA/IMDataRepository_vertica9/dr_validate.sh,/opt/CA/IMDataRepository_vertica9/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/opt/CA/IMDataRespository_vertica9/RemoteEngineer/re.sh,/usr/bin/mkdir, /sbin/SuSEfirewall2 off *,/usr/bin/whoami,/usr/bin/echo,/usr/bin/id,/usr/bin/env,/usr/sbin/service,/usr/bin/grep,/usr/bin/test,/sbin/iptables,/opt/vertica/oss/python/bin/python,/usr/bin/tee,/usr/sbin/ntpd,/etc/init.d/ntpd,/sbin/blockdev,/etc/init.d/sshd,/etc/sysconfig/sshd,/etc/ssh/sshd_config,/usr/bin/su,/usr/sbin/sshd restart,/usr/bin/ssh,/usr/bin/sh,/usr/bin/install
        Cmnd_Alias VERTICA = /opt/vertica/bin/,/opt/vertica/sbin/,/opt/vertica/oss/python/bin/
        Cmnd_Alias VERTICA_INSTALL = /usr/bin/echo,/usr/bin/ps -A,/usr/bin/cp /opt/vertica/config/admintools.conf /opt/vertica/config/admintools.conf.bak.*,/usr/bin/rm -rf /tmp/dbRPM.rpm,/usr/bin/df --portability /tmp,/usr/bin/install --owner * --mode 700 -d *,/usr/bin/mv -f /tmp/vstage-*/file /tmp/*,/usr/bin/rm -rf /tmp/vstage-*,/usr/bin/id *,/usr/bin/cp -T /opt/vertica/* /tmp/vstage-*,/usr/bin/su --login
        dbadmin
        *,/usr/bin/mkdir -p /opt/vertica/*,/usr/bin/touch /opt/vertica/*,/usr/bin/rm -rf /opt/vertica/*,/usr/bin/mv -f /tmp/vstage-* /opt/vertica/*,/usr/bin/mkdir -p /opt/vertica/*,/usr/bin/touch /opt/vertica/config/users/
        dbadmin
        /agent.conf,/usr/bin/su
        dbadmin
        *,/usr/bin/sh -c *,/opt/vertica/share/binlib/test/*,/usr/bin/su
        dbadmin
        ,/usr/bin/test [ -e /* ],/usr/bin/[ -e /* ]
        Cmnd_Alias USEFUL = /usr/bin/lshw,/usr/bin/yum,/bin/rpm,/sbin/reboot,/sbin/shutdown,/usr/bin/cpan,/bin/chgrp,/bin/chmod,/bin/chown,/bin/mnt,/usr/bin/test,/bin/[,/sbin/service
        ## Allows the Data Repository user to manage the Data Repository
        sudouser
        ALL = CA_DATAREP, VERTICA , VERTICA_INSTALL , USEFUL
        Defaults env_keep +="VERT_DBA_USR VERT_DBA_HOME VERT_DBA_GRP VERT_DBA_DATA_DIR _ENV_VPWD_VAR"
      • (21.2.3 or higher)
        Cmnd_Alias CA_DATAREP =/opt/vertica/sbin/install_vertica,/tmp/installDR.bin,/opt/CA/IMDataRepository_vertica10/dr_validate.sh,/opt/CA/IMDataRepository_vertica10/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/opt/CA/IMDataRespository_vertica10/RemoteEngineer/re.sh,/usr/bin/mkdir, /sbin/SuSEfirewall2 off *,/usr/bin/whoami,/usr/bin/echo,/usr/bin/id,/usr/bin/env,/usr/sbin/service,/usr/bin/grep,/usr/bin/test,/sbin/iptables,/opt/vertica/oss/python/bin/python,/usr/bin/tee,/usr/sbin/ntpd,/etc/init.d/ntpd,/sbin/blockdev,/etc/init.d/sshd,/etc/sysconfig/sshd,/etc/ssh/sshd_config,/usr/bin/su,/usr/sbin/sshd restart,/usr/bin/ssh,/usr/bin/sh,/usr/bin/install
        Cmnd_Alias VERTICA = /opt/vertica/bin/,/opt/vertica/sbin/,/opt/vertica/oss/python/bin/
        Cmnd_Alias VERTICA_INSTALL = /usr/bin/echo,/usr/bin/ps -A,/usr/bin/cp /opt/vertica/config/admintools.conf /opt/vertica/config/admintools.conf.bak.*,/usr/bin/rm -rf /tmp/dbRPM.rpm,/usr/bin/df --portability /tmp,/usr/bin/install --owner * --mode 700 -d *,/usr/bin/mv -f /tmp/vstage-*/file /tmp/*,/usr/bin/rm -rf /tmp/vstage-*,/usr/bin/id *,/usr/bin/cp -T /opt/vertica/* /tmp/vstage-*,/usr/bin/su --login
        dbadmin
        *,/usr/bin/mkdir -p /opt/vertica/*,/usr/bin/touch /opt/vertica/*,/usr/bin/rm -rf /opt/vertica/*,/usr/bin/mv -f /tmp/vstage-* /opt/vertica/*,/usr/bin/mkdir -p /opt/vertica/*,/usr/bin/touch /opt/vertica/config/users/
        dbadmin
        /agent.conf,/usr/bin/su
        dbadmin
        *,/usr/bin/sh -c *,/opt/vertica/share/binlib/test/*,/usr/bin/su
        dbadmin
        ,/usr/bin/test [ -e /* ],/usr/bin/[ -e /* ]
        Cmnd_Alias USEFUL = /usr/bin/lshw,/usr/bin/yum,/bin/rpm,/sbin/reboot,/sbin/shutdown,/usr/bin/cpan,/bin/chgrp,/bin/chmod,/bin/chown,/bin/mnt,/usr/bin/test,/bin/[,/sbin/service
        ## Allows the Data Repository user to manage the Data Repository
        sudouser
        ALL = CA_DATAREP, VERTICA , VERTICA_INSTALL , USEFUL
        Defaults env_keep +="VERT_DBA_USR VERT_DBA_HOME VERT_DBA_GRP VERT_DBA_DATA_DIR _ENV_VPWD_VAR"
      sudouser
      is the user who will install and manage the Vertica node.
      dbadmin
      is the database administrator system account that the Vertica install creates, and who will own and run Vertica.