Prepare to Install the Data Repository

Ensure that you can install the data repository successfully, prepare for the installation.
HID_Prepare_to_Install_the_Data_Repository
To ensure that you can install the data repository successfully, complete the following steps:
2
For more information about the configuration options and how to administrate the data repository, see Data Repository Administration.
Verify the Prerequisites
Before you install the data repository, ensure that you have met the following prerequisite steps:
  • You have reviewed the Vertica documentation.
  • You have verified that the dialog and chrony packages are installed on each data repository host by issuing the following command:
    Only RHEL 7.x and OL 7.x require the chrony package.
    rpm -qa | grep ^dialog
    rpm -qa | grep ^chrony
    If either command does not return results, install the package by issuing the following command. The validation and installation scripts require the installation of this package:
    If you are a user with the necessary sudo privileges, issue this command using the sudo prefix.
    yum install dialog
    yum install chrony
  • You have installed the zip and unzip packages. If these packages are not installed, install them by issuing the following command:
    yum -y install zip unzip
  • You have verified that you have at least 2 GB of swap space on the data repository host.
  • You have verified that the data repository hosts use the ext4 file system.
    The default file system for RHEL 7.x and OL 7.x is the XFS file system. The default file system for SLES is btrfs. The disks with Vertica should use ext4 (Vertica does not support the XFS or btrfs file systems). The database performs best with the ext4 file system.
  • You have verified that the following ports are open on the data repository systems:
    • Port 22 (TCP protocol)
    • Port 4803 (TCP and UDP protocol)
    • Port 4804 (UDP protocol)
    • Port 5433 (TCP protocol)
      Remote access is required to this port.
    • Port 5434 (TCP protocol)
    • Port 6543 (UDP protocol)
  • To avoid database corruption and to prevent scanning by a local instance of an antivirus client and scanning by a remote antivirus instance, you have excluded the installation directory, and the following subdirectories:
    • /opt/vertica/*
    • /opt/vconsole/*
    • The specified data directory.
      Default:
      /data
      Ensure that
      data
      directory is on a separate mount from the
      catalog
      directory. This isolates those file systems from performance and space interference so that they are unencumbered from any other disk usage or performance considerations, including each other.
    • The specified catalog directory.
      Default:
      /catalog
      Ensure that
      catalog
      directory is on a separate mount from the
      data
      directory. This isolates those file systems from performance and space interference so that they are unencumbered from any other disk usage or performance considerations, including each other.
    • The Vertica
      /tmp/vbr/*
      temporary file.
    • The directory where you back up the data repository.
    For more information about disk locations, see the Vertica documentation.
  • To avoid the data repository installation from failing, you have ensured that a file named
    release
    is not in the
    /etc
    directory. Remove the file if it exists.
  • You have verified the access according to your installation type:
    • Single Node:
      Root access is required to install the data repository. Determine whether you have this access level.
    • Cluster:
      Verify that the root user or sudo user can create database administrator user accounts, or can have an administrator create these accounts.
  • You have verified that CPU frequency scaling is disabled. Disable CPU frequency scaling through the host system BIOS and OS settings.
    If CPU frequency scaling is enabled, you might experience inconsistent performance for similar queries in Vertica. CPU frequency scaling can cause observable slowness and variation in dashboard loading.
  • You have verified that you are not using Logical Volume Manager (LVM) for the
    /data
    and
    /catalog
    directories.
  • (Cluster only) You have verified that all the hosts in the cluster are in the same subnet.
  • (Cluster only) You have verified that the root user can use Secure Shell (SSH) to log in (ssh) to all the hosts in the cluster.
    Set up SSH for the root user.
  • You have verified that the default shell environment is
    bash
    .
  • (Cluster only) You have selected the hosts where you install data repository nodes.
    Database software is deployed on each participating host in a cluster. This software represents a ‘node’ in the cluster. A three-node cluster represents the simplest configuration that can tolerate the loss of a single node. You can, however, include more than three hosts in the cluster. If more than one node fails or shuts down, the data repository is no longer available for use and the data aggregator shuts down automatically.
Install the Data Repository on VMs
For best performance, install the data repository in a bare-metal environment. However, if you install the data repository in VMware virtual machines, verify the following requirements:
  • Use VMware version 5.5 or greater.
  • The number of VMs per host does not exceed the number of physical processors.
  • Pre-allocate and reserve 4 GB of memory for each of the VMs.
  • Each VM has a dedicated 10 GB NIC.
  • Disable CPU frequency scaling at the host level and for each VM.
  • Disable VMotion. VMotion can disrupt communication, and can cause the Data Repository to shut down.
  • Set the VMware parameters for hugepages to the version 5.5 default values.
  • Verify the hardware and network performance. Use the Vertica diagnostic tools described bellow to verify performance.
For more information about how to run Vertica on VMs, see the Vertica documentation.
Install the Data Repository on Shared Storage (SAN)
To install the data repository on SAN, verify the following requirements:
  • The hosts have no contention for disk space or bandwidth.
  • Each host has a unique catalog and data location. The hosts cannot share the location for these directories.
  • The storage has enough I/O bandwidth for each node to access the storage independently. To verify the I/O bandwidth, simultaneously run vioperf from all hosts in the data repository cluster.
    For more information, see the following procedures.
Set a Unique Hostname for Each Data Repository Host
Set a unique hostname for each data repository host in the cluster.
Follow these steps:
  1. As the root user, log in to each data repository host, and verify the unique hostname.
    The hostname must be associated with the IP address and
    not
    the loopback address of 127.0.0.1.
  2. Verify that the following lines appear in the
    /etc/hosts
    file on each computer:
    Do not remove the following line, or various programs
    # that require network functionality will fail.
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    IP address of your host
    YourHostName YourHostName.domain
  3. If you change the file, issue the following command:
    service network restart
    The
    /etc/hosts
    file is configured correctly.
    The unique host name is set.
  4. (Cluster installations only) The hostnames of all hosts in the cluster must resolve correctly. If the hostname resolution is incorrect, the data repository cluster does not install or work properly. All participating hosts in the cluster must use static IP or permanently leased DHCP addresses. Set up the
    /etc/hosts
    file on each of the hosts you selected for the cluster. The hosts file must contain entries for all hosts in the cluster.
    Example:
    This example shows the
    /etc/hosts
    file for a cluster where the hosts are named host01, host02, and host03:
    127.0.0.1 localhost.localdomain localhost
    192.168.13.128 host01.domain host01
    192.168.13.129 host02.domain host02
    192.168.13.130 host03.domain host03
    Do not remove the loopback address (127.0.0.1) line. The local Data Repository hostname cannot be on the 127.0.0.1 line. Also, do not use the loopback address or localhost name when you are defining hosts in the cluster.
  5. Verify that hostname resolution works for each host in the cluster.
    For example, on host01, the following syntax is correct:
    $ /bin/hostname -f
    host01
    Hostname resolution is configured.
(Optional) Set Up Passwordless SSH for the Root or Sudo User
The hosts in a data repository cluster require passwordless ssh for the root or sudo user during the data repository installation or upgrade. The
dr_validate.sh
script sets up passwordless SSH, but requests the password many times. To avoid repeatedly specifying the root or sudo user password, set up passwordless ssh before you run the validation script.
Repeat this procedure for each pair of hosts. If you have passwordless ssh set up for the root user, but you do not have root access to install and run the data repository, configure a sudo user account. You also have an alternative method to install the product without requiring to enter the root password by using the sudo user account.
For more information about how to configure the passwordless sudo user account for the data repository, see the section Configure the passwordless Sudo User Account for Data Repository.
Passwordless SSH is automatically set up for the data repository admin user when you install the data repository.
Follow these steps:
  1. Open a console and log in to the data repository host as the root or sudo user.
  2. Issue the following commands:
    ssh-keygen -N "" -t rsa -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2 chmod 644 ~/.ssh/authorized_keys2
  3. Copy the root or sudo user public key into the list of authorized keys on the remote hosts by issuing the following command:
    ssh-copy-id -i
    user
    @
    remotehost
    remotehost
    specifies a host in the cluster where you are copying the SSH ID.
  4. To verify that passwordless ssh is set up correctly, log in to the remote host from the local host by issuing the following command:
    ssh
    user
    @
    remotehost
    ls
    If the passwordless SSH has been set up successfully, you are not prompted for a password. You also see a directory listing from the
    ls
    command.
(Optional) Configure the Sudo User Account for the Data Repository
If you have passwordless ssh set up for the root user, but you do not have root access to install and run the data repository, configure a sudo user account. As a sudo user, you can add the sudo prefix to all commands to install the data repository, for example
sudo ./installDR.bin
.
For cluster environments, complete this procedure on each host in the cluster.
Follow these steps:
  1. Locate the
    /etc/sudoers
    file.
  2. Add a command alias with the following permissions to the file by issuing the following command:
    Cmnd_Alias CA_DATAREP = /tmp/installDR.bin,/opt/CA/IMDataRepository_vertica9/dr_validate.sh,/opt/CA/IMDataRepository_vertica9/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/usr/bin/yum,/opt/CA/IMDataRespository_vertica9/RemoteEngineer/re.sh
    ## Allows the Data Repository user to manage the Data Repository
    sudouser
    ALL = CA_DATAREP
    sudouser
    specifies the user who can issue the sudo commands.
    This command alias details the commands that the sudo user can issue.
    The sudo user is configured.
(Optional) Configure the Passwordless Sudo User Account for the Data Repository
Due to certain security policies, in some environments, you cannot enable passwordless SSH for the root users on the host servers. The following procedure provides you an alternative method to install the product without requiring that level of access by using the sudo user account.
You cannot use this functionality on RHEL 6.x.
  1. Locate the
    /etc/sudoers
    file.
  2. Add command aliases with the following permissions to the file:
    • On RHEL 7
      Cmnd_Alias CA_DATAREP=/opt/vertica/sbin/install_vertica,/tmp/installDR.bin,/opt/CA/IMDataRepository_vertica9/dr_validate.sh,/opt/CA/IMDataRepository_vertica9/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/opt/CA/IMDataRespository_vertica9/RemoteEngineer/re.sh,/bin/mkdir*,/usr/bin/whoami,/bin/echo,/sbin/service,/bin/grep,/usr/bin/test,/sbin/iptables,/opt/vertica/oss/python/bin/python,/usr/bin/tee,/usr/sbin/ntpd,/etc/init.d/ntpd,/sbin/blockdev,/etc/init.d/sshd,/etc/sysconfig/sshd,/etc/ssh/sshd_config,/bin/su,/usr/sbin/sshd restart,/usr/bin/ssh,/bin/df,/bin/mv,/bin/rm,/usr/bin/install
      Cmnd_Alias VERTICA = /opt/vertica/bin/,/opt/vertica/sbin/,/opt/vertica/oss/python/bin/
      Cmnd_Alias VERTICA_INSTALL = /bin/echo,/bin/ps -A,/bin/cp /opt/vertica/config/admintools.conf /opt/vertica/config/admintools.conf.bak.*,/bin/rm -rf /tmp/dbRPM.rpm,/bin/df --portability /tmp,/usr/bin/install --owner * --mode 700 -d *,/bin/mv -f /tmp/vstage-*/file /tmp/*,/bin/rm -rf /tmp/vstage-*,/usr/bin/id *,/bin/cp -T /opt/vertica/* /tmp/vstage-*,/bin/su --login dbadmin *,/bin/mkdir -p /opt/vertica/*,/bin/touch /opt/vertica/*,/bin/rm -rf /opt/vertica/*,/bin/mv -f /tmp/vstage-* /opt/vertica/*,/bin/mkdir -p /opt/vertica/*,/bin/touch /opt/vertica/config/users/dbadmin/agent.conf,/bin/su dbadmin *,/bin/sh -c *,/usr/bin,/opt/vertica/share/binlib/test/*,/usr/bin/su dbadmin,/bin/test [ -e /* ],/usr/bin/[ -e /* ]
      Cmnd_Alias USEFUL = /usr/bin/lshw,/usr/bin/yum,/bin/rpm,/sbin/reboot,/sbin/shutdown,/usr/bin/cpan,/bin/chgrp,/bin/chmod,/bin/chown,/bin/mnt,/usr/bin/test,/bin/[,/sbin/service
      ## Allows the Data Repository user to manage the Data Repository
      sudouser ALL = CA_DATAREP, VERTICA , VERTICA_INSTALL , USEFUL
      Defaults env_keep +="VERT_DBA_USR VERT_DBA_HOME VERT_DBA_GRP VERT_DBA_DATA_DIR _ENV_VPWD_VAR"
    • On SLES 12
      Cmnd_Alias CA_DATAREP =/opt/vertica/sbin/install_vertica,/tmp/installDR.bin,/opt/CA/IMDataRepository_vertica9/dr_validate.sh,/opt/CA/IMDataRepository_vertica9/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/opt/CA/IMDataRespository_vertica9/RemoteEngineer/re.sh,/usr/bin/mkdir, /sbin/SuSEfirewall2 off *,/usr/bin/whoami,/usr/bin/echo,/usr/bin/id,/usr/bin/env,/usr/sbin/service,/usr/bin/grep,/usr/bin/test,/sbin/iptables,/opt/vertica/oss/python/bin/python,/usr/bin/tee,/usr/sbin/ntpd,/etc/init.d/ntpd,/sbin/blockdev,/etc/init.d/sshd,/etc/sysconfig/sshd,/etc/ssh/sshd_config,/usr/bin/su,/usr/sbin/sshd restart,/usr/bin/ssh,/usr/bin/sh,/usr/bin/install
      Cmnd_Alias VERTICA = /opt/vertica/bin/,/opt/vertica/sbin/,/opt/vertica/oss/python/bin/
      Cmnd_Alias VERTICA_INSTALL = /usr/bin/echo,/usr/bin/ps -A,/usr/bin/cp /opt/vertica/config/admintools.conf /opt/vertica/config/admintools.conf.bak.*,/usr/bin/rm -rf /tmp/dbRPM.rpm,/usr/bin/df --portability /tmp,/usr/bin/install --owner * --mode 700 -d *,/usr/bin/mv -f /tmp/vstage-*/file /tmp/*,/usr/bin/rm -rf /tmp/vstage-*,/usr/bin/id *,/usr/bin/cp -T /opt/vertica/* /tmp/vstage-*,/usr/bin/su --login dbadmin *,/usr/bin/mkdir -p /opt/vertica/*,/usr/bin/touch /opt/vertica/*,/usr/bin/rm -rf /opt/vertica/*,/usr/bin/mv -f /tmp/vstage-* /opt/vertica/*,/usr/bin/mkdir -p /opt/vertica/*,/usr/bin/touch /opt/vertica/config/users/dbadmin/agent.conf,/usr/bin/su dbadmin *,/usr/bin/sh -c *,/opt/vertica/share/binlib/test/*,/usr/bin/su dbadmin,/usr/bin/test [ -e /* ],/usr/bin/[ -e /* ]
      Cmnd_Alias USEFUL = /usr/bin/lshw,/usr/bin/yum,/bin/rpm,/sbin/reboot,/sbin/shutdown,/usr/bin/cpan,/bin/chgrp,/bin/chmod,/bin/chown,/bin/mnt,/usr/bin/test,/bin/[,/sbin/service
      ## Allows the Data Repository user to manage the Data Repository
      sudouser ALL = CA_DATAREP, VERTICA , VERTICA_INSTALL , USEFUL
      Defaults env_keep +="VERT_DBA_USR VERT_DBA_HOME VERT_DBA_GRP VERT_DBA_DATA_DIR _ENV_VPWD_VAR"