Prepare to Install the Data Repository

To ensure that your Data Repository installation is successful, complete the requirements before you install Data Repository:
capm370
HID_Prepare_to_Install_the_Data_Repository
To ensure that your Data Repository installation is successful, complete the requirements before you install Data Repository:
2
For more information about Data Repository configuration options and administration, see Data Repository Administration.
Verify the Prerequisites
Verify the following prerequisites before you install Data Repository:
  • Verify whether the dialog and chrony packages are installed on each Data Repository host:
    The chrony package is required only for RHEL 7.x and OL 7.x.
    rpm -qa | grep ^dialog
    rpm -qa | grep ^chrony
    If either command does not return results, install the package:
    If you are not the root user, use the sudo prefix.
    yum install dialog
    yum install chrony
    If this package is not installed, the validation and installation scripts fail.
  • The installer requires the zip and unzip packages. If these packages are not installed, use the following command to install them:
    yum -y install zip unzip
  • Verify that you have at least 2 GB of swap space on Data Repository host.
  • Verify that the Data Repository hosts use the ext4 file system. Vertica does not support XFS or btrfs. All disks with Vertica should use ext4.
    The default file system for RHEL 7.x and OL 7.x is the XFS file system. The default file system for SLES is btrfs. Vertica does not support XFS or btrfs. The database performs best with the ext4 file system.
  • Verify that the following ports are open on the Data Repository systems:
    • Port 22 (TCP protocol)
    • Port 4803 (TCP and UDP protocol)
    • Port 4804 (UDP protocol)
    • Port 5433 (TCP protocol)
      Remote access is required to this port.
    • Port 5434 (TCP protocol)
    • Port 6543 (UDP protocol)
  • To avoid database corruption, exclude the installation directory, and all its subdirectories, from antivirus scans. Prevent scanning by a local instance of an antivirus client and scanning by a remote antivirus instance. Exclude the following directories:
    • /opt/vertica/*
    • /opt/vconsole/*
    • The specified data directory
      Default:
      /drdata/data
    • The specified catalog directory
      Default:
      /drdata/catalog
    • Vertica temporary files in /tmp
      • /tmp/4803
      • /tmp/vbr/*
    • The directory where you back up the Data Repository
  • If a file named 'release' appears in the /etc directory, remove it. Otherwise, the Data Repository installation fails.
  • Verify the access according to your installation type:
    • Single Node:
      Root access is required to install Data Repository. Determine whether you can install Data Repository as root.
    • Cluster:
      Verify that the root user or sudo user can create database administrator user accounts, or can have an administrator create these accounts.
  • Verify that CPU frequency scaling is disabled. Disable CPU frequency scaling through the host system BIOS and OS settings.
    If CPU frequency scaling is enabled, you might experience inconsistent performance for similar queries in Vertica. CPU frequency scaling can cause observable slowness and variation in dashboard loading.
  • Verify that you are not using Logical Volume Manager (LVM) for /data and /catalog directories.
  • (Cluster only) Verify all the hosts in the cluster are in the same subnet.
  • (Cluster only) Verify that the root user can use Secure Shell (SSH) to log in (ssh) to all the hosts in the cluster.
    Set up SSH for the root user for the Data Repository installation or upgrade.
  • The default shell environment must be
    bash
    .
  • (Cluster only) Select the hosts where you install Data Repository nodes.
    Warning!
    Database software is deployed on each participating host in a cluster. This software represents a ‘node’ in the cluster. A three-node cluster represents the simplest configuration that can tolerate the loss of a single node. You can, however, include more than three hosts in the cluster. If more than one node fails or shuts down, Data Repository is no longer available for use and Data Aggregator shuts down automatically.
Install the Data Repository on VMs
For best performance, install the Data Repository in a bare-metal environment. However, if you install the Data Repository in VMware virtual machines, verify the following requirements:
  • Use VMware version 5.5 or greater.
  • The number of VMs per host does not exceed the number of physical processors.
  • Pre-allocate and reserve 4 GB of memory for each of the VMs.
  • Each VM has a dedicated 10 GB NIC.
  • Disable CPU frequency scaling at the host level and for each VM.
  • Disable VMotion. VMotion can disrupt communication, and can cause the Data Repository to shut down.
  • Set the VMware parameters for hugepages to the version 5.5 default values.
  • Verify the hardware and network performance. Use the Vertica diagnostic tools described bellow to verify performance.
For more information about running Vertica on VMs, see the Vertica documentation.
Install the Data Repository on Shared Storage (SAN)
To install the Data Repository on SAN, verify the following requirements:
  • The hosts have no contention for disk space or bandwidth.
  • Each host has a unique catalog and data location. The hosts cannot share the location for these directories.
  • The storage has enough I/O bandwidth for each node to access the storage independently. To verify the I/O bandwidth, simultaneously run vioperf from all hosts in the Data Repository cluster. For more information, see the following procedures.
Set a Unique Hostname for Each Data Repository Host
Set a unique hostname for each Data Repository host in the cluster.
Follow these steps:
  1. As the root user, log in to each Data Repository host, and verify the unique hostname.
    The hostname must be associated with the IP address and
    not
    the loopback address of 127.0.0.1.
  2. Verify that the following lines appear in the /etc/hosts file on each computer:
    Do not remove the following line, or various programs
    # that require network functionality will fail.
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
    IP address of your host
    YourHostName YourHostName.domain
  3. If you change the file, run the following command:
    service network restart
    The /etc/hosts file is configured correctly.
    The unique host name is set.
  4. (Cluster installations only) The hostnames of all hosts in the cluster must resolve correctly. If the hostname resolution is incorrect, the Data Repository cluster does not install or work properly. All participating hosts in the cluster must use static IP or permanently leased DHCP addresses. Set up the /etc/hosts file on each of the hosts you selected for the cluster. The hosts file must contain entries for all hosts in the cluster.
    Example:
    This example shows the /etc/hosts file for a cluster where the hosts are named host01, host02, and host03:
    127.0.0.1 localhost.localdomain localhost
    192.168.13.128 host01.domain host01
    192.168.13.129 host02.domain host02
    192.168.13.130 host03.domain host03
    Do not remove the loopback address (127.0.0.1) line. The local Data Repository hostname cannot be on the 127.0.0.1 line. Also, do not use the loopback address or localhost name when you are defining hosts in the cluster.
  5. Verify that hostname resolution works for each host in the cluster.
    For example, on host01, the following syntax is correct:
    $ /bin/hostname -f
    host01
    Hostname resolution is configured.
(Optional) Set Up Passwordless SSH  for the Root or Sudo User
The hosts in a Data Repository cluster require passwordless ssh for the root or sudo user during the Data Repository installation or upgrade. The
dr_validate.sh
script sets up passwordless SSH, but requests the password many times. To avoid repeatedly specifying the root or sudo user password, set up passwordless ssh before you run the validation script.
Repeat this procedure for each pair of hosts. If you have passwordless ssh set up for the root user, but you do not have root access to install and run the Data Repository, configure a sudo user account. You also have an alternative method to install the product without requiring to enter the root password by using the sudo user account. For more information about configuring the passwordless sudo user account for Data Repository, see the section
Passwordless SSH is automatically set up for the Data Repository admin user when you install the Data Repository.
Follow these steps:
  1. Open a console and log in to the Data Repository host as the root or sudo user.
  2. Run the following commands:
    ssh-keygen -N "" -t rsa -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2 chmod 644 ~/.ssh/authorized_keys2
  3. Copy the root or sudo user public key into the list of authorized keys on the remote hosts:
    ssh-copy-id -i
    user
    @
    remotehost
    remotehost
    specifies a host in the cluster where you are copying the SSH ID.
  4. To verify that passwordless ssh is set up correctly, log in to the remote host from the local host:
    ssh
    user
    @
    remotehost
    ls
    If the passwordless SSH has been set up successfully, you are not prompted for a password. You also see a directory listing from the ‘ls command’.
(Optional) Configure the Sudo User Account for the Data Repository
If you have passwordless ssh set up for the root user, but you do not have root access to install and run the Data Repository, configure a sudo user account.
For cluster environments, complete this procedure on each host in the cluster.
Follow these steps:
  1. Locate the following file:
    /etc/sudoers
  2. Add a command alias with the following permissions to the file:
Cmnd_Alias CA_DATAREP = /tmp/installDR.bin,/opt/CA/IMDataRepository_vertica9/dr_validate.sh,/opt/CA/IMDataRepository_vertica9/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/usr/bin/yum,/opt/CA/IMDataRespository_vertica9/RemoteEngineer/re.sh
## Allows the Data Repository user to manage the Data Repository
sudouser
ALL = CA_DATAREP
  • sudouser
    Specify the user who can run the sudo commands.
This command alias details the commands that the sudo user must be able to run.
With the sudo user configured, add the sudo prefix to all commands to install the Data Repository.
Example:
sudo ./installDR.bin
(Optional) Configure the Passwordless Sudo User Account for the Data Repository
Due to certain security policies, in some environments, you cannot enable passwordless SSH for the root users on the host servers. The following procedure provides you an alternative method to install the product without requiring that level of access by using the sudo user account.
This functionality is not supported on RHEL 6.x
  1. Locate the following file:
    /etc/sudoers
  2. Add command aliases with the following permissions to the file:
  • On RHEL 7
    Cmnd_Alias CA_DATAREP=/opt/vertica/sbin/install_vertica,/tmp/installDR.bin,/opt/CA/IMDataRepository_vertica9/dr_validate.sh,/opt/CA/IMDataRepository_vertica9/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/opt/CA/IMDataRespository_vertica9/RemoteEngineer/re.sh,/bin/mkdir*,/usr/bin/whoami,/bin/echo,/sbin/service,/bin/grep,/usr/bin/test,/sbin/iptables,/opt/vertica/oss/python/bin/python,/usr/bin/tee,/usr/sbin/ntpd,/etc/init.d/ntpd,/sbin/blockdev,/etc/init.d/sshd,/etc/sysconfig/sshd,/etc/ssh/sshd_config,/bin/su,/usr/sbin/sshd restart,/usr/bin/ssh,/bin/df,/bin/mv,/bin/rm,/usr/bin/install
    Cmnd_Alias VERTICA = /opt/vertica/bin/,/opt/vertica/sbin/,/opt/vertica/oss/python/bin/
    Cmnd_Alias VERTICA_INSTALL = /bin/echo,/bin/ps -A,/bin/cp /opt/vertica/config/admintools.conf /opt/vertica/config/admintools.conf.bak.*,/bin/rm -rf /tmp/dbRPM.rpm,/bin/df --portability /tmp,/usr/bin/install --owner * --mode 700 -d *,/bin/mv -f /tmp/vstage-*/file /tmp/*,/bin/rm -rf /tmp/vstage-*,/usr/bin/id *,/bin/cp -T /opt/vertica/* /tmp/vstage-*,/bin/su --login dbadmin *,/bin/mkdir -p /opt/vertica/*,/bin/touch /opt/vertica/*,/bin/rm -rf /opt/vertica/*,/bin/mv -f /tmp/vstage-* /opt/vertica/*,/bin/mkdir -p /opt/vertica/*,/bin/touch /opt/vertica/config/users/dbadmin/agent.conf,/bin/su dbadmin *,/bin/sh -c *,/usr/bin,/opt/vertica/share/binlib/test/*,/usr/bin/su dbadmin,/bin/test [ -e /* ],/usr/bin/[ -e /* ]
    Cmnd_Alias USEFUL = /usr/bin/lshw,/usr/bin/yum,/bin/rpm,/sbin/reboot,/sbin/shutdown,/usr/bin/cpan,/bin/chgrp,/bin/chmod,/bin/chown,/bin/mnt,/usr/bin/test,/bin/[,/sbin/service
    ## Allows the Data Repository user to manage the Data Repository
    sudouser ALL = CA_DATAREP, VERTICA , VERTICA_INSTALL , USEFUL
    Defaults env_keep +="VERT_DBA_USR VERT_DBA_HOME VERT_DBA_GRP VERT_DBA_DATA_DIR _ENV_VPWD_VAR"
  • On SLES 12
    Cmnd_Alias CA_DATAREP =/opt/vertica/sbin/install_vertica,/tmp/installDR.bin,/opt/CA/IMDataRepository_vertica9/dr_validate.sh,/opt/CA/IMDataRepository_vertica9/dr_install.sh,/usr/bin/vim,/usr/bin/reboot,/opt/CA/IMDataRespository_vertica9/RemoteEngineer/re.sh,/usr/bin/mkdir, /sbin/SuSEfirewall2 off *,/usr/bin/whoami,/usr/bin/echo,/usr/bin/id,/usr/bin/env,/usr/sbin/service,/usr/bin/grep,/usr/bin/test,/sbin/iptables,/opt/vertica/oss/python/bin/python,/usr/bin/tee,/usr/sbin/ntpd,/etc/init.d/ntpd,/sbin/blockdev,/etc/init.d/sshd,/etc/sysconfig/sshd,/etc/ssh/sshd_config,/usr/bin/su,/usr/sbin/sshd restart,/usr/bin/ssh,/usr/bin/sh,/usr/bin/install
    Cmnd_Alias VERTICA = /opt/vertica/bin/,/opt/vertica/sbin/,/opt/vertica/oss/python/bin/
    Cmnd_Alias VERTICA_INSTALL = /usr/bin/echo,/usr/bin/ps -A,/usr/bin/cp /opt/vertica/config/admintools.conf /opt/vertica/config/admintools.conf.bak.*,/usr/bin/rm -rf /tmp/dbRPM.rpm,/usr/bin/df --portability /tmp,/usr/bin/install --owner * --mode 700 -d *,/usr/bin/mv -f /tmp/vstage-*/file /tmp/*,/usr/bin/rm -rf /tmp/vstage-*,/usr/bin/id *,/usr/bin/cp -T /opt/vertica/* /tmp/vstage-*,/usr/bin/su --login dbadmin *,/usr/bin/mkdir -p /opt/vertica/*,/usr/bin/touch /opt/vertica/*,/usr/bin/rm -rf /opt/vertica/*,/usr/bin/mv -f /tmp/vstage-* /opt/vertica/*,/usr/bin/mkdir -p /opt/vertica/*,/usr/bin/touch /opt/vertica/config/users/dbadmin/agent.conf,/usr/bin/su dbadmin *,/usr/bin/sh -c *,/opt/vertica/share/binlib/test/*,/usr/bin/su dbadmin,/usr/bin/test [ -e /* ],/usr/bin/[ -e /* ]
    Cmnd_Alias USEFUL = /usr/bin/lshw,/usr/bin/yum,/bin/rpm,/sbin/reboot,/sbin/shutdown,/usr/bin/cpan,/bin/chgrp,/bin/chmod,/bin/chown,/bin/mnt,/usr/bin/test,/bin/[,/sbin/service
    ## Allows the Data Repository user to manage the Data Repository
    sudouser ALL = CA_DATAREP, VERTICA , VERTICA_INSTALL , USEFUL
    Defaults env_keep +="VERT_DBA_USR VERT_DBA_HOME VERT_DBA_GRP VERT_DBA_DATA_DIR _ENV_VPWD_VAR"