Masking Performance Optimization in CA TDM Portal

CA TDM Portal performs masking with Fast Data Masker. CA TDM Portal can run multiple instances of FDM concurrently (the maximum and default number of instances is 4). You may be able to perform your masking job faster, if you can split the job into smaller jobs that Portal can process with concurrent instances of FDM.
tdm48
CA TDM Portal performs masking with Fast Data Masker. CA TDM Portal can run multiple instances of FDM concurrently (the maximum and default number of instances is 4). You may be able to perform your masking job faster, if you can split the job into smaller jobs that Portal can process with concurrent instances of FDM.
 
 
Assess the size of your environment
The size of the data set that you want to mask (i.e. number of tables in each database/schema, number of columns and rows in tables) has an effect on how much memory FDM needs to mask the data, and how long it takes. The amount of memory Fast Data Masker requires for a masking job generally increases linearly in relation to the number of tables, columns and rows to mask.
You can see how many rows and columns a table contains in the filter to see only larger tables.
 You can use the PARALLEL option on the Masking Settings to set the number of parallel Java threads. Within an instance of FDM, FDM creates a Java thread for each table.
Memory use in FDM
For every 1 million rows and 100 columns to mask, 1GB of memory is generally sufficient to maintain optimum performance (see table below).
Rows
Columns
Memory Recommended
1 million
100
1GB
2 million
100
2GB
2 million
200
4GB
Allocate memory to masking instances
The 
HEAPSIZE
 option on the Masking Settings page controls how much memory, in MB, TDM Portal assigns to each FDM instance. The default value is 1000MB (1GB).
The allocation of less memory than this for each instance is likely to result in slower masking.
Optimize concurrent jobs in CA TDM Portal
Connection Profiles in CA TDM Portal
CA TDM Portal creates an instance of FDM for each 
Connection Profile
 in a masking job. A masking job can run on either of the following (see Start Masking for more information about Masking Configurations):
  • Environments. These consist of data sources that it accesses through 
    Connection Profiles
    .
  • Specified 
    Connection Profiles
    .
The creation of multiple Environments, each with one Connection Profile, 
does not
 improve masking efficiency. However, a Connection Profile can contain multiple schemas.
 If you create one 
Connection Profile
 for each 
schema
, this results in more concurrent instances of FDM (up to a maximum of 4).
 
For example:
 If you have 10 Connection Profiles in your masking job, and each Connection Profile contains one schema (and your maximum number of instances is 4) CA TDM creates 4 instances of FDM (with one schema on each instance). The other 6 schemas are queued until one of the instances completes the job. See Masking Jobs for example of CA TDM's behaviour when you cancel a job with multiple FDM instances.
Connection Profiles
Schemas per Connection Profile
Maximum concurrent FDM instances
1
4
1
1
10
1
4
1
4
 
10
 
 
1
 
 
4
 
Memory Usage for concurrent masking instances
The total memory required for a masking job that contains multiple concurrent instances of FDM is equal to the sum of the memory required for each instance of FDM.
For example, if your job contains 
4 instances
, and each one requires 
1GB
 of memory, the total memory you need is 
4GB
.
Optimize performance with Scalable Masking
From TDM 4.8, the Masking Engine is available as a Docker container, which you can use to distribute a masking job's masking tasks across multiple machines and/or instances of the Masking Engine. For more information, see Scalable masking with Docker.
Each instance of the Masking Engine (which is an instance of FDM), can perform 
4
 tasks concurrently by default.
 Contact CA Support for information on how to change this value. We recommend that you leave it at 4.
The fastest way to mask data is to have enough instances of FDM active, that all your masking tasks can run concurrently. The best way to increase the total number of instances of FDM, is to increase the number of instances of the Masking Engine container, with the 
--scale
 parameter.
 You can add instances of the Masking Engine to your Docker network while the network is active. To do so, increase the 
--scale masking=
n
 
parameter in the 
docker-compose up
 command that you use to start your Docker network, and execute the 
docker-compose up
 command again.
Calculate masking tasks
When you use TDM Portal with the Messaging container, the Masking service (part of TDM Portal) creates additional masking tasks according to the following logic:
  • Each database/schema is one masking task.
    • Within each database/schema, any tables of more than 1,000,000 rows (by default) are split into an additional task.
       To change this default value, change the 
      application.properties
       parameter 
      tdmweb.TDMMaskingService.tableTaskRowThreshold
      .
      For information on how to set these properties in your TDM Portal Docker container, see Custom application.properties configuration.
Therefore, you can calculate the total number of masking tasks for a masking job with the following formula:
Total number of masking tasks = Number of databases or schemas + Number of tables of over 1,000,000 rows
 
For example:
 
You have a masking job that includes 2 databases, consisting of the following databases and tables:
 
SALES
 
Table Name
Row Count
Customers
1,100,000
Suppliers
24,000
Items
250,000
Orders
1,450,000
Employees
10,000
 
CUSTOMER_CARE
 
 
Table Name
 
 
Row Count
 
Customers
1,100,000
Purchases
1,300,000
Review Scores
60,000
This masking job consists of:
  • 2 databases (
    SALES
     and 
    CUSTOMER_CARE
    )
    The Masking service creates an additional masking task for each database.
  • 4 tables with more than 1,000,000 rows each (
    SALES.Customers
    SALES.Orders
    CUSTOMER_CARE.Customers
     and 
    CUSTOMER_CARE.Purchases)
    The Masking service creates an additional masking task for each table with more than 1,000,000 rows.
Therefore, the total number of masking tasks for this masking job is 
6
 (
2
 databases + 
4
 tables of over 1,000,000 rows in size). To manage 6 masking tasks concurrently, with 4 tasks per Masking Engine container, you would need 
2
 Masking Engine containers.
You can use the following 
docker-compose up
 command to set the number of instances of the Masking Engine container to 2:
docker-compose -f docker-compose.yml -f docker-compose-messaging.yml -f docker-compose-masking.yml up -d
--scale masking=2
Results of insufficient memory
For optimum masking performance, the system that hosts instances of FDM (either the Windows application, or the Docker container) must have enough physical memory for all concurrent instances of FDM. Less memory can result in:
  • Slower performance of masking jobs
  • Slower CA TDM Portal performance
Maximise memory availability
To maximize memory available to CA TDM Portal, we recommend that you schedule jobs to run at a time when memory load is lower.