Segment Database Tables

If you received a table segmentation warning during the data aggregator upgrade, segment the database tables on the data repository.
If you received the following message during a data aggregator upgrade, segment the database tables on the data repository:
unsegmented projections exist
Table segmentation is a one-time task that is required for systems where the original
DX NetOps Performance Management
installation was
DX NetOps Performance Management
2.3.2 or earlier. Segmenting the tables reduces the amount of disk space that is required for the database. Segmenting the tables also improves general query performance.
If you are unsure whether your database requires segmentation, download the
segment.py
script and attempt to identify tables that require segmentation. If segmentation is already complete or not required, the script does not return any tables.
Segmentation is a resource-intensive process. Complete this process when the data aggregator and the data collectors are down.
Segmentation can take several hours to segment large tables in the database. During tests, migration of tables larger than 100 GB took over 10 hours. The segmentation time is not uniform to table size. Time depends on many factors including row count, column count, compression of the data, and system specifications.
No active monitoring of your infrastructure environment occurs when the data aggregator and data collectors are down.
The data aggregator and the data collectors can be up or down when you segment the database tables. If you choose to segment the database tables while the data aggregator is up, the following restrictions apply:
  • Do not perform any data aggregator-administrative functions:
    • Modifying monitoring profiles
    • Associating collections to monitoring profiles
    • Increasing poll rates
    • Running new discoveries
  • Minimize the report load.
Use the following process to segment the database tables:
Identify the Tables that Require Segmentation
Before you start table segmentation, identify the tables that require segmentation.
Follow these steps:
  1. Download the
    segment.py
    script from Support: This procedure assumes that the
    segment.py
    script is in the home directory of the Vertica Linux database administrator user.
  2. Log in to any host in the data repository cluster as the Vertica Linux database administrator user.
  3. Run the script:
    ./segment.py --task tables --pass
    database_admin_user_password
    [--name
    database_name
    ] [--port
    database_port
    ]
    • database_admin_user_password
      The Vertica Linux database administrator user password.
    • database_name
      The name of the database.
      Default:
      drdata
      This parameter is case-sensitive.
    • database_port
      The Vertica port.
      Default:
      5433
      For more information about the ports that are required for
      DX NetOps Performance Management
      to work properly, see Review Installation Requirements and Considerations.
    For example:
    ./segment.py --task tables --pass password
    -
    -name
    mydatabase
    Currently-unsegmented table projections, which are sorted from largest to smallest, are returned. If tables require segmentation, continue this process.
Back up the Data Repository
Before you segment the tables, back up the data repository.
After segmentation, the disk space for the backup increases by the amount of data in the new segmented table projections. Verify that the backup system has enough disk space available after segmentation is completed and before backups run.
The data for the old unsegmented table projections is removed from the backup data one day after the time of the
restorePointLimit
. To remove this data immediately, change the snapshot name in the backup configuration file, and do a full backup. You can then archive the older backup, and delete the backup from the backup disk. Use the presegmentation backup only if you cannot use the backup that was created after segmentation completed. If you use the presegmentation backup, the tables will require segmentation again.
For information, see Back Up the Data Repository.
Segment Tables with No Data
Tables with no date are segmented quickly and segmentation does not negatively affect performance. You can segment these tables without stopping the data aggregator.
Follow these steps:
  1. Log in to any host in the data repository cluster as the Vertica Linux database administrator user.
  2. Run the
    segment.py
    script by issuing the following command:
    ./segment.py --task segment --zerotables --pass
    database_admin_user_password
    [--name
    database_name
    ] [--port
    database_port
    ]
    • database_admin_user_password
      The Vertica Linux database administrator user password.
    • database_name
      The name of the database.
      Default:
      drdata
      This parameter is case-sensitive.
    • database_port
      The Vertica port.
      Default:
      5433
    The script segments the database tables with no data.
Determine the Table Segmentation Time
To determine whether to segment the tables while the data aggregator is up or down, calculate the necessary time for segmentation.
Follow these steps:
  1. Get a list of the tables that require segmentation by issuing the following command:
    ./segment.py --task tables --pass
    database_admin_user_password
    [--name
    database_name
    ] [--port
    database_port
    ]
    The
    database_name
    parameter is case-sensitive.
    The list sorts the tables from largest to smallest.
  2. Disable scheduled data repository backups until segmentation is complete.
    Backups can interfere with the segmentation process.
  3. Select a table from step 1 that is about 5 GB in size, and segment that table by issuing the following command:
    ./segment.py --task segment --table
    rate_table_name
    --pass
    database_admin_user_password
    [--name
    database_name
    ] [--port
    database_port
    ]
    The
    database_name
    parameter is case-sensitive.
    You can run this command when data aggregator is running, but we recommend that you run the command during a 2-3 hour maintenance window.
  4. Reenable the scheduled data repository backups.
  5. Use the segmentation time for the 5-GB table to determine how long segmentation might take to segment all of the tables that are less than 100 GB.
    The actual segmentation time for the database tables can vary based on the type and compression of the data in the tables. The values that are calculated here are rough estimates. When planning a scheduled maintenance window, add an extra hour of time for every 10 GB to 15 GB of database tables. For large databases, you might not be able to schedule a single maintenance window that is long enough to segment the entire database. In this case, you can segment the database tables over multiple maintenance windows.
Segment the Remaining Database Tables
Segment the remaining tables.
When segmenting the tables in the database, if the data aggregator is running, at least 40 percent of the available disk space must remain free for query processing and other database activities. When the data aggregator is not running, the total disk utilization during segmentation must not exceed 90 percent of available disk space. Tables that would cause the disk utilization to exceed these limits during segmentation are not segmented.
Follow these steps:
  1. As the Vertica Linux database administrator user, log in to one of the computers in the cluster where Data Repository is installed.
  2. During the table projection segmentation validation in the previous procedure, if more than ten zero-length table projections were seen during this verification, issue the following command to segment them:
    ./segment.py --task segment --pass database_admin_user_password --zerotables [--name database_name] [--port database_port]
    • database_admin_user_password
      The Vertica Linux database administrator user password.
    • database_name
      The name of the database.
      Default:
      drdata
      This parameter is case-sensitive.
    • database_port
      The Vertica port.
      Default:
      5433
    For example:
    ./segment.py --task segment --pass password --zerotables --name mydatabase --port 1122
  3. If there are table projections that are greater than 100 GB in size, issue the following command to create a script to segment the table projections that are
    less than
    100 GB first:
    ./segment.py --task script --pass database_admin_user_password --lt100G [--name database_name] [--port database_port]
    • database_admin_user_password
      The Vertica Linux database administrator user password.
    • database_name
      The name of the database.
      Default:
      drdata
      This parameter is case-sensitive.
    • database_port
      The Vertica port.
      Default:
      5433
    For example:
    ./segment.py --task script --pass password --lt100G --name mydatabase --port 1122
  4. Disable scheduled backups until segmentation is complete.
    Backups can interfere with the segmentation process.
  5. Run the
    segment-script.sh
    script by issuing the following command:
    nohup ./segment-script.sh
    The script segments all unsegmented table projections that are less than 100 GB and sorts them from smallest to largest. The output is sent to nohup.out. If the shell is closed accidentally, the script continues to run.
    Depending on your maintenance window size and the combined size of all of the tables under 100 GB, determine which tables can be segmented in the maintenance window. Modify the generated script by removing the tables that do not fit inside the maintenance window. Run the generated
    segment-script.sh
    script during the maintenance window. If all of the tables under 100 GB could not be segmented in the maintenance window, regenerate the script, and run the segment-script.sh during the next maintenance window until all of the tables have been segmented.
    When you run the script, any tables that cause disk utilization to exceed 90 percent displays an error message. These tables are not segmented. To segment these tables, more available disk space is needed.
    You are prompted for each table that can cause disk utilization to exceed 60 percent. We strongly recommend that Data Aggregator be brought down before segmenting these tables.
    This script can take several hours to execute. Do not interrupt the script execution once it begins to avoid corruption of the database.
  6. Reenable scheduled backups only if more segmentation is needed and will be done in a future maintenance window.
  7. Generate a script,
    segment-script.sh
    , that segments remaining table projections that are over 100 GB by issuing the following command:
    ./segment.py --task script --pass database_admin_user_password [--name database_name] [--port database_port]
    • database_admin_user_password
      The Vertica Linux database administrator user password.
    • database_name
      The name of the database.
      Default:
      drdata
      This parameter is case-sensitive.
    • database_port
      The Vertica port.
      Default:
      5433
    For example:
    ./segment.py --task script --pass password --name mydatabase --port 1122
    When the script is generated, any tables that might cause disk utilization to exceed 60 percent and 90 percent are indicated.
  8. Disable scheduled backups, if they are not already disabled.
  9. Run the
    segment-script.sh script
    by issuing the following command:
    nohup ./segment-script.sh
    The script segments all unsegmented tables and sorts them from smallest to largest.
  10. Verify that all tables are now segmented by issuing the following command:
    ./segment.py --task tables --pass
    database_admin_user_password
    [--name
    database_name
    ] [--port
    database_port
    ]
    The
    database_name
    parameter is case-sensitive.
    The following message appears:
    No tables found with unsegmented projections.
  11. Reenable scheduled backups.
  12. If you segmented the database tables when data aggregator and data collector were down, start these components.