CA UIM Sizing Recommendations

Review the following reference material to understand the architecture and typical deployment of CA UIM. The recommendations and specifications that are listed below are based on the scale of deployment.
uim901
Review the following reference material to understand the architecture and typical deployment of CA UIM. The recommendations and specifications that are listed below are based on the scale of deployment.
2
CA UIM 9.2.0 Sizing Recommendations
Note:
These sizing recommendations are for the non-secure (hub and robot) setup.
Small Scale Reference Deployment Architecture
High-Level Description
  • A small company or individual team
  • <500 devices monitored
  • <5 hubs
  • <200 robots
  • A few concurrent users
Architecture
Small Scale Customer Representative Environment.png
Specifications and Information
  • Systems
    • Database
      • 16 cores x 2.8 Ghz
      • 16-GB RAM
      • One file system on 10k SAN
      • Disk Size: 100 GB (data storage that is expected to reach ~30-40 GB with retention settings below)
    • UIM
      • 16 cores x 2.8 Ghz
      • 16-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 100 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • UMP
      • 8 cores x 2.8 Ghz
      • 12-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 50G
    • CABI
      • 8 cores x 2.8 Ghz
      • 12-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 100 GB
    • SNMPC hub
      • 4 cores x 2.8 Ghz
      • 8-GB RAM
      • One file system on 10k SAN
      • Disk Size: 100 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • General hubs
      • 4 cores x 2.8 Ghz
      • 8-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 50 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
  • Configuration
    • 100 robots
    • 1 SNMPC (100 devices)
  • Performance
    • Sustained insert rate: 100 msgs/sec (used in example test environment)
    • Max insert rate: 20,000 msgs/sec
  • Configuration Parameters
    • Data_engine insert threads that are increased by setting thread_count_insert=24 in data_engine.cfg
    • Increase bulk size for data_engine queue: hub_bulk_size = 2000
    • Data_engine retention/maintenance settings – 7 days raw, 30 days hourly, 1 year daily
    • SNMPC – can simply use default max heap settings for javaopts (2G)
Medium Scale Reference Deployment Architecture
High-Level Description
  • A medium-sized company or medium-sized lab/datacenter
  • 500-1000s of devices monitored
  • 5-20 hubs
  • 200-1000 robots
  • A dozen concurrent users
Architecture
Medium Scale Customer Representative Environment.png
Specs and Info
  • Systems
    • Database
      • 24 cores x 2.8 Ghz
      • 16-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 500 GB (data storage that is expected to reach ~300GB with retention settings below)
    • CA UIM
      • 24 cores x 2.8 Ghz
      • 32-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 100G for main drive and 100-300G for hub queues (as desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • UMP
      • 16 cores x 2.8 Ghz
      • 16-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 50 GB
    • CABI
      • 16 cores x 2.8 Ghz
      • 16-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 100 GB
    • Tunnel Server hub
      • 8 cores x 2.8 Ghz
      • 8-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 200 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • SNMPC hub
      • 8 cores x 2.8 Ghz
      • 8-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 100G (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • General hubs
      • 8 cores x 2.8 Ghz
      • 8-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 100 GB/ 1 TB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
  • Configuration
    • 200 robots
    • One snmpc (1,000 devices)
    • Ten groups
  • Performance
    • Sustained insert rate: ~2,000 msgs/sec (used in example test environment)
    • Max insert rate (with 24 insert threads in data_engine and bulk size of 2000 on data_engine queue): 22, 000 msgs/sec )
  • Configuration Parameters
    • Data_engine insert threads that are increased by setting thread_count_insert=24 in data_engine.cfg
    • Increase bulk size for data_engine queue: hub_bulk_size = 2000
    • Data_engine retention/maintenance settings – 7 days raw, 30 days hourly, 1 year daily
    • SNMPC – change default max heap settings for javaopts from 2G to 4G
Recommendations
  • 15,000 drives in RAID 10 configuration recommended. If you expect high reporting needs, use SSD drives.
Large-Scale Reference Deployment Architecture
High-Level Description
  • A large company or large datacenters
  • 5-10K+ devices monitored
  • 20+ hubs
  • >1000 robots
  • Dozens of concurrent users
Architecture
Large Scale Customer Representative Environment.png
Specifications and Information
  • Systems
    • Database
      • 64 cores x 2.7 Ghz
      • 32-GB RAM
      • SSD SAN
      • Log and Data on separate filesystems
      • Disk Size: 2-3 TB (data storage that is expected to reach ~1.5 TB with retention settings below)
    • CA UIM
      • 56 cores x 2.5 Ghz
      • 64-GB RAM
      • SSD SAN
      • Disk Size: 200G for main drive and 200-500G for hub queues (as desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • UMP
      • 24 cores x 2.5 Ghz
      • 32-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 50 GB
    • CABI (probably multiple instances, but need to verify this config)
      • 24 cores x 2.8 Ghz
      • 32-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 100G
    • Tunnel Server hub
      • 16 cores x 2.8 Ghz
      • 16-GB RAM
      • SSD SAN
      • Disk Size: 200 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows and additional drive that is dedicated to hub queues for resiliency)
    • SNMPC hubs
      • 16 cores x 2.8 Ghz
      • 16-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 100 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • General hubs
      • 16 cores x 2.8 Ghz
      • 16-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 100 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
  • Configuration
    • 15,000 robots
    • 4 SNMPC (8,000 devices)
    • 2.6 m metrics
    • 9,000 groups
  • Performance
    • Sustained insert rate (used in example test environment): 15000 msgs/sec
    • Max insert rate (tuned): 50,000 msgs/sec
    • Active alarm count: 70,000
  • Configuration Parameters
    • spooler_inbound_threads = 50 and check_spooler_session = 1 in primary hub.cfg
    • bulk_size to 1500 for data_engine queue
    • bulk_size to 1000 for baseline engine queue
    • Data_engine insert threads that are increased by setting thread_count_insert = 24 in data_engine.cfg
    • Data_engine queue limit increase: queue_limit_total = 1000000
    • Increase bulk size for data_engine queue: hub_bulk_size = 2000
    • Data_engine retention/maintenance settings – 7 days raw, 30 days hourly, 1 year daily
    • SNMPC – change default max heap settings for javaopts from 2G to 8G for each SNMPC probe
Recommendations
  • SSD drives should be used when deploying environments this large for better reporting and DB performance
Additional Sizing Considerations and Information
  • Plan to leave enough headroom for expansion and ability to catch up after planned outages
    • For example, if your environment has a max message rate of 50,000 messages/sec and typically runs at a steady state of 15,000 messages/sec then a one-hour outage window will accumulate 54,000,000 messages.
  • High volume hub Disk speed / IO capacity (primary hub, tunnel server) can aid in processing - i.e., SSD drive for queues
  • Database disk speed / IO capacity can help insert rate and report processing speed – SSDs for database is highly recommended
  • Database platform/version can improve performance or can enable additional scale capability
    • Partitioning is recommended for large deployments and is supported by Oracle, MySQL and MSSQL Enterprise
    • MSSQL 2014 supports per-partition online rebuild of indexes – minimizes table locks during partitioning making data maintenance more efficient.
  • Reserve CPU/Memory for UIM/tunnel server/database virtual machines to increase speed and dedicated resources
  • An intermediate proxy/concentrator hub is often used to offload metric processing and tunnel termination from the primary UIM server (see medium and large environment recommendations)
  • To minimize firewall configuration and enable a cleaner architecture tunnel can be used to connect UIM hub components in a distributed customer environment.
  • Linux tunnel server hubs are known to scale the best in terms of raw message rate and number of subscribers supported.
CA UIM 9.0.2 Sizing Recommendations
Small Scale Reference Deployment Architecture
High-Level Description
  • A small company or individual team
  • <500 devices monitored
  • <5 hubs
  • <200 robots
  • A few concurrent users
Architecture
Small Scale Customer Representative Environment.png
Specifications and Information
  • Systems
    • Database
      • 4 cores x 2.8 Ghz
      • 8-GB RAM
      • One file system on 10k SAN
      • Disk Size: 100 GB (data storage that is expected to reach ~30-40 GB with retention settings below)
    • UIM
      • 4 cores x 2.8 Ghz
      • 12-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 100 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • UMP
      • 4 cores x 2.8 Ghz
      • 12-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 50G
    • CABI
      • 4 cores x 2.8 Ghz
      • 8-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 100 GB
    • SNMPC hub
      • 4 cores x 2.8 Ghz
      • 4-GB RAM
      • One file system on 10k SAN
      • Disk Size: 100 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • General hubs
      • 2 cores x 2.8 Ghz
      • 8-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 50 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
  • Configuration
    • 100 robots
    • 1 SNMPC (100 devices)
  • Performance
    • Sustained insert rate: 100 msgs/sec (used in example test environment)
    • Max insert rate: 20,000 msgs/sec
  • Configuration Parameters
    • Data_engine insert threads that are increased by setting thread_count_insert=24 in data_engine.cfg
    • Increase bulk size for data_engine queue: hub_bulk_size = 2000
    • Data_engine retention/maintenance settings – 7 days raw, 30 days hourly, 1 year daily
    • SNMPC – can simply use default max heap settings for javaopts (2G)
Medium Scale Reference Deployment Architecture
High-Level Description
  • A medium-sized company or medium-sized lab/datacenter
  • 500-1000s of devices monitored
  • 5-20 hubs
  • 200-1000 robots
  • A dozen concurrent users
Architecture
Medium Scale Customer Representative Environment.png
Specs and Info
  • Systems
    • Database
      • 4 cores x 2.8 Ghz
      • 12-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 500 GB (data storage that is expected to reach ~300GB with retention settings below)
    • UIM
      • 4 cores x 2.8 Ghz
      • 12-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 100G for main drive and 100-300G for hub queues (as desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • UMP
      • 4 cores x 2.8 Ghz
      • 12-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 50 GB
    • CABI
      • 4 cores x 2.8 Ghz
      • 8-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 100 GB
    • Tunnel Server hub
      • 4 cores x 2.8 Ghz
      • 8-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 200 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • SNMPC hub
      • 4 cores x 2.8 Ghz
      • 8-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 100G (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • General hubs
      • 2 cores x 2.8 Ghz
      • 8-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 100 GB/ 1 TB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
  • Configuration
    • 200 robots
    • One snmpc (1,000 devices)
    • Ten groups
  • Performance
    • Sustained insert rate: ~2,000 msgs/sec (used in example test environment)
    • Max insert rate (with 24 insert threads in data_engine and bulk size of 2000 on data_engine queue): 22, 000 msgs/sec )
  • Configuration Parameters
    • Data_engine insert threads that are increased by setting thread_count_insert=24 in data_engine.cfg
    • Increase bulk size for data_engine queue: hub_bulk_size = 2000
    • Data_engine retention/maintenance settings – 7 days raw, 30 days hourly, 1 year daily
    • SNMPC – change default max heap settings for javaopts from 2G to 4G
Recommendations
  • 15,000 drives in RAID 10 configuration recommended. If you expect high reporting needs, use SSD drives.
Large-Scale Reference Deployment Architecture
High-Level Description
  • A large company or large datacenters
  • 5-10K+ devices monitored
  • 20+ hubs
  • >1000 robots
  • Dozens of concurrent users
Architecture
Large Scale Customer Representative Environment.png
Specifications and Information
  • Systems
    • Database
      • 64 cores x 2.7 Ghz
      • 32-GB RAM
      • SSD SAN
      • Log and Data on separate filesystems
      • Disk Size: 2-3 TB (data storage that is expected to reach ~1.5 TB with retention settings below)
    • UIM
      • 6 cores x 2.5 Ghz
      • 32-GB RAM
      • SSD SAN
      • Disk Size: 200G for main drive and 200-500G for hub queues (as desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • UMP
      • 8 cores x 2.5 Ghz
      • 16-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 50 GB
    • CABI (probably multiple instances, but need to verify this config)
      • 4 cores x 2.8 Ghz
      • 8-GB RAM
      • One filesystem on 15k SAN
      • Disk Size: 100G
    • Tunnel Server hub
      • 4 cores x 2.8 Ghz
      • 8-GB RAM
      • SSD SAN
      • Disk Size: 200 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows and additional drive that is dedicated to hub queues for resiliency)
    • SNMPC hubs
      • 4 cores x 2.8 Ghz
      • 12-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 100 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
    • General hubs
      • 2 cores x 2.8 Ghz
      • 8-GB RAM
      • One filesystem on 10k SAN
      • Disk Size: 100 GB (more if desired to enable queues to fill up on disk in the case of extended outages or maintenance windows)
  • Configuration
    • 15,000 robots
    • 4 SNMPC (8,000 devices)
    • 2.6 m metrics
    • 9,000 groups
  • Performance
    • Sustained insert rate (used in example test environment): 9000 msgs/sec
    • Max insert rate (tuned): 35,000 msgs/sec
  • Configuration Parameters
    • Data_engine insert threads that are increased by setting thread_count_insert = 24 in data_engine.cfg
    • Data_engine queue limit increase: queue_limit_total = 1000000
    • Increase bulk size for data_engine queue: hub_bulk_size = 2000
    • Data_engine retention/maintenance settings – 7 days raw, 30 days hourly, 1 year daily
    • SNMPC – change default max heap settings for javaopts from 2G to 8G for each SNMPC probe
Recommendations
  • SSD drives should be used when deploying environments this large for better reporting and DB performance
Additional Sizing Considerations and Information
  • Plan to leave enough headroom for expansion and ability to catch up after planned outages
    • For example, if your environment has a max message rate of 35,000 messages/sec and typically runs at a steady state of 6,000 messages/sec then a one-hour outage window will accumulate 21,600,000 messages.
  • High volume hub Disk speed / IO capacity (primary hub, tunnel server) can aid in processing - i.e., SSD drive for queues
  • Database disk speed / IO capacity can help insert rate and report processing speed – SSDs for database is highly recommended
  • Database platform/version can improve performance or can enable additional scale capability
    • Partitioning is recommended for large deployments and is supported by Oracle, MySQL and MSSQL Enterprise
    • MSSQL 2014 supports per-partition online rebuild of indexes – minimizes table locks during partitioning making data maintenance more efficient.
  • Reserve CPU/Memory for UIM/tunnel server/database virtual machines to increase speed and dedicated resources
  • An intermediate proxy/concentrator hub is often used to offload metric processing and tunnel termination from the primary UIM server (see medium and large environment recommendations)
  • To minimize firewall configuration and enable a cleaner architecture tunnel can be used to connect UIM hub components in a distributed customer environment.
  • Linux tunnel server hubs are known to scale the best in terms of raw message rate and number of subscribers supported