Fast Data Masker Best Practices

This article focuses on the best practices that are used to mask tables. Use the Fast Data Masker functions to mask tables, seed tables, work with columns, and for cross-reference mapping.
tdm41
This article focuses on the best practices that are used to mask tables. Use the Fast Data Masker functions to mask tables, seed tables, work with columns, and for cross-reference mapping.
These best practices make the Fast Data Masker tasks easier to understand and to modify.
 
 
2
 
2
 
 
Manage Seed Tables
The seedtables subdirectory contains seed data in editable text files. The subdirectory is located in the install directory; the following line is an example of the directory path.
C:\Program Files\Grid-Tools\FastDataMasker
Each file contains a list of values that you can use to mask the columns.
The following text file shows the contents of the sample seed file 
femalenames.txt
:
Amanda
Amanda
Angela
Anita
Ann
Anna
Anne
Anne
Ashley
Beate
Betty
Beverly
Bonnie
Carla
Carla
Carol
Carol
Carolyn
Cary
Casey
Cathy
Chris
Christina
Christine
Christine
Christine
Cindy
Cindy
Claire
Clarise
Collette
Connie
Cory
Courtney
Darsha
Edith
To use the seed file to mask database columns, enter the seed file name with the appropriate masking function. The following example shows how to mask the column, FIRST_NAME of the table PEOPLE:
 Mask Colum Dialog.PNG 
Split Tables
The Fast Data Masker component uses parallel processes to ensure the highest possible performance. To do so, determine the amount of work that is required to allocate to each parallel thread.
To automatically assign each table its own thread when you mask multiple tables in one masking run, set the
 PARALLEL=
 option in the 
Options
 tab. Similarly, when you mask a partitioned table in Oracle, the partitions in the table are automatically assigned to separate threads.
To mask large tables, split the row to be masked for a given thread. The simplest way to split the table is to use the SQL WHERE clauses. When you click on a table in the first tab and center list more than once, the 
Add Masking
 dialog opens.
Click 
Yes
, and multiple tabs are provided for the same table. In this way, you can apply the same mask multiple times but with different WHERE clauses. Fast Data Masker can split the mask work into logical chunks, and can assign a thread for each.
 
Note:
 For more information, see Use Parallel Threads to Mask Data. Additionally, when masking Microsoft SQL Server tables, use the PARALLEL option only when the table has a primary key or unique index. If the table does not have a primary key or unique index and you use the PARALLEL option, masking is either slow or does not work.
The following table shows an example that is displayed on the 
Summary
 tab after the split:
Table
Column
Function
Parm1
PEOPLE
 
WHERE
where id < 10000
PEOPLE
FIRST_NAME
RANDLOV
firstnamefemaleamerican.txt
PEOPLE
 
WHERE
where id > 10000
PEOPLE
FIRST_NAME
RANDLOV
firstnamefemaleamerican.txt
Cross-Reference Tables
To use the cross-reference functionality, create a table in one of your connections that has the following structure:
CREATE TABLE gtsrc_xref (rx_ref_id varchar(254) NOT NULL,
rx_old_value varchar(254) NOT NULL,
rx_new_value varchar(254) );
ALTER TABLE gtsrc_xref
ADD CONSTRAINT gtsrc_xref_pk PRIMARY KEY (
rx_ref_id ,
rx_old_value );
If you clear the table down, the cross-reference that is mapped is rebuilt during the next mask run.
Shows the colum masking dialog
Shows the colum masking dialog
 
Note: 
Using cross-reference to mask large volumes is time consuming. Each row that is to be masked has to perform a lookup in the cross-reference table. To optimize this functionality, see the FASTXREF scramble option. To maintain consistency across different tables and databases when you use seed values to mask, we recommend that you use the HASHLOV function without cross-reference. The HASHLOV function consistently converts an existing value to an integer value in a given range. For example, if you mask a person LAST_NAME, and use the seed data category LASTNAME (which has 2000 entries), the existing LAST_NAME value is assigned a number from 1 through 2000. This value is used to look up the corresponding value in the seed list. For example, the LAST_NAME Jones is always masked to value Burton. 
Add WHERE Clause to a Table
The WHERE clause allows you to restrict your masking to only certain rows in the table. This allows you to mask, for example, male names and female names differently based on the GENDER column.
As an example, consider a scenario where you want to mask the employee first names (FIRST_NAME) in the table EMPLOYEE for those rows where the value in the EMPLOYEE ID column is lesser than 5 (EMPLOYEE_ID<5). To perform the mask in the table EMPLOYEE, you use the WHERE CONDITION field to specify the condition (EMPLOYEE_ID<5), select the FIRST_NAME for masking, and choose the appropriate masking options. The following table shows the example:
Table
Column
Function
Parm1
EMPLOYEE
 
WHERE
EMPLOYEE_ID<5
EMPLOYEE
FIRST_NAME
RANDLOV
firstnamefemaleamerican.txt
The following table shows another example. In this example, all the HELP TEXT columns are masked with random values  from companies.txt. The rows with an MSP_CODE starting with 'HSD' has the value huw assigned to CREATED BY. The rows with an MSP_CODE starting with 'OFG' has the value fred assigned to CREATED BY:
Table
Column
Function
Parm1
QMS_MESSAGE_TEXT
 HELP TEXT
RANDLOV
companies.txt
QMS_MESSAGE_TEXT
 
WHERE
MSP_CODE LIKE 'HSD%'
QMS_MESSAGE_TEXT
CREATED BY
FIXED
huw
QMS_MESSAGE_TEXT
 
WHERE
MSP_CODE LIKE 'OFG%' 
QMS_MESSAGE_TEXT
CREATED BY
FIXED
fred
Review the following points:
  • To execute the WHERE clause, the WHERE condition for a map is required before subsequent masks for the same table. In other words, the WHERE clause applies to the mask for a table that follows in the CSV file. The other rows for the table specify the specific mask that is applied to the table.
  • WHERE clauses are required to contain ANSII standard SQL. You cannot use special characters like ; or /.
  • Fixed width records cannot have zero length. Therefore, WHERE COL = ""  does not work because the column value is padded out to its fixed width size.
  • Fast Data Masker currently supports the following operators: <,<=,=, =>, >, IN and LIKE (LIKE is not supported for flat files. It is only used for database masking).
Apply Multiple Functions to the Same Column
When you apply multiple functions to substrings in the same column, list the functions in consecutive rows in the mapping CSV; for example, as shown in the following table:
Table
Column
Function
PAYMENT_OPTION
ACCOUNT_NUMBER
SEQNUMBER
PAYMENT_OPTION
ACCOUNT_NUMBER
FIXED
No limit is set to the number of mask functions that are performed on each column. However, set KeepNulls the same for all the functions.
 
Note
: All functions are required to have “
substr start
� and “
substr lengt
h� set. Substr specifications for different functions can refer to the same character positions. Also, if you are using cross-references, they (substr specifications) can only be set for the last function used on each column as shown in the following example table:
Table
Column
Function
Parm1
Parm2
Parm3
Parm4
KeepNulls
DateFormat
Cross Reference
Overrice SQL
Unique Columns
Xpath Element
Substr start
Substr length
PAYMENT_OPTION
ACCOUNT_NUMBER
SEQNUMBER
1000000001
 
N
 
 
 
 
 
 
 
7
10
PAYMENT_OPTION
ACCOUNT_NUMBER
FIXED
0
 
N
 
 
 
 
 
 
 
7
1
In the preceding example, the mapping CSV shows that a mask of column ACCOUNT_NUMBER in the table PAYMENT_OPTIONS with a zero padded sequence in positions 7 through 16 is expressed as follows:
 
caaccounts,numberx,SEQNUMBER,1000000001,,,,,,,,,,7,10, 
 
 
             caaccounts,numberx,FIXED,0,,,,,,,,,,7,1, 
 
 
NOTE
: SEQNUMBER returns a left-aligned non-padded number. The sequence is started at 1000000001 and the lead digit is then set to 0.
In the audit report, the function names are suffixed with start:length parameters and concatenated together. For example, “SEQNUMBER(7:10) FIXED(7:1)�.
If you specify cross-referencing, the value that is used to look up and update the cross-reference table is a substring of the value, not the whole column value.
Manage Primary and Foreign Keys
To mask primary keys, unique indexes, or columns that match with foreign keys, you might have to drop them before you mask. You can reapply them once the mask is complete.
If you have the DBUPDATES=P option when you run the mask, no masking takes place. But, Fast Data Masker checks if any FK constraints or triggers exist for the tables to be masked. Pre- and post-step scripts are produced to disable or drop the constraints and re-enable or create them after the mask. 
It is easy to create duplicate values when you mask primary key columns. For example, you have ID values of 2,5,7,8,100, and apply a SEQNUMBER masking function starting at 100, the value 2 is updated to the value 100 which creates a duplicate.
If you have any questions before you begin, contact CA Support.
Manage Large Tables and Seed Tables
To mask large or multiple tables and seed tables, we recommend that you increase the memory that is allocated to the Fast Data Masker executable.
To increase the allocated memory, specify the appropriate values in the 
Start Memory (Mb)
 and 
Max Memory (Mb)
 fields in the 
Summary
 tab while defining the masking. When you save the defined masking information, Fast Data Masker creates a batch file. This batch file includes the values that you specify for the memory in the 
Summary
 tab. The batch file also includes other required information (for example, location of the connection file, masking file, and options file). The following screenshot shows the increased memory values:
 Snap.png 
The default values are 100 MB and 1000 MB. In this screenshot, the values are increased to 1000 MB and 10000 MB.
The following snippet shows the example content included in the batch file that is generated after you save the masking. Review that the parameters 
Xms1000M
 and 
Xmx10000M
 represent the increased memory values:
 
java -Djava.util.logging.config.file="C:/Program Files/Grid-Tools/FastDataMasker/logging.properties" -Xms1000M -Xmx10000M -jar "C:/Program Files/Grid-Tools/FastDataMasker/Fastdatamasker.jar" "C:/Program Files/Grid-Tools/FastDataMasker/doc/connectSQLSERVER.txt" "C:\Users\
<username>
\AppData\Roaming\Grid-Tools\Fastdatamasker\MyMask.csv"
 
Write Logs to a Local Drive
As a best practice, we recommend that you write the logs to a local drive, not a network drive. Audit/application logs on remote drives significantly affect the masking performance.
Configure the Fast Data Masker Logs Location
By default, all Fast Data Masker logs are located in %AppData%\Grid-Tools\FastDataMasker\Logs (for example, C:\Users\
<user_name>
\AppData\Roaming\Grid-Tools\FastDataMasker\logs). If you are unable to use this folder and want to change the location where Fast Data Masker logs are stored, you can do so in the Fast Data Masker Mapper.
 
Follow these steps:
 
  1. Click 
    Start, All Programs, FastDataMasker, FastDataMasker
     to open the Fast Data Masker Mapper.
  2. Click the 
    Options
     tab.
  3. Locate the row that contains the LOGDIR option.
  4. In the 
    Value
     column, enter the file path where you want to save the the Fast Data Masker log files.
  5. Complete your masking rules and run a masking job.
  6. Access the file path that you specified in the LOGDIR option.
    Your log files are now available in this location.
Hash on the Value of a different XML tag when using HASHLOV for XML files
When masking XML files using XPATH elements, you may want to define a custom hash column so you can hash on a different XML element than the default, which is to hash on the current value being masked. For example, when you use an XML input file to update information of a pre-existing user in the system, the masked values in the file must match those in the database. If you use a member ID for hashing in the database which is also in the XML file, you want to use that ID as the hash column. 
You can customize the 
Relative XPATH to hash on
 at the step where you configure seed data for HASHLOV. Enter a tag name using XPATH syntax, for example, 
/preceding-sibling::id
.
Use Parallel Threads to Mask Data
Fast Data Masker lets you use parallel threads to mask large tables.  The PARALLEL option enables you to run 
n
 concurrent threads. 
To apply parallel threads, Fast Data Masker must split the work to be done into separate chunks. Fast Data Masker can manage this task in one of the following ways:
  • A regular mask of multiple tables where none of the tables has a large amount of data. Fast Data Masker automatically sets a thread for each table.
  • A large table that is not partitioned. In this case, split the table using the 
    where
     clauses. For this to work, the following must apply:
    • The masking CSV must only contain mask for a single table.
    • The 
      where
       clauses must not overlap; for example, if two or more SQL Where clauses select the same rows in the table to be masked, then the mask causes row lock errors.
  • For an Oracle-partitioned table, Fast Data Masker automatically assigns a thread to each underlying partition. This cannot be combined with the 
    where
     clauses, and as in point two above, it is applicable to a mask for a single large table.
 
Note:
 The number of parallel threads that you can execute concurrently is constrained by the number of physical cores and/or processors available. If the parallel number specified in the options is greater than the number of cores, then some of the threads are held in a queue until resources become available.
To split a table with the 
where
 clause, use Fast Data Masker as follows:
  1. Access the Fast Data Masker UI.
  2. Use the required connection file to connect to the database.
    Note:
     When masking Microsoft SQL Server tables, use the PARALLEL option only when the table has a primary key or unique index. If the table does not have a primary key or unique index and you use the PARALLEL option, masking is either slow or does not work.
  3. Select the 
    Masking
     tab and perform the following steps:
    1. Select the table (for example, 
      PERSONS
      ) in which you want to mask the data.
    2. Select the columns (for example, 
      FIRST_NAME
       and 
      LAST_NAME
      ) that you want to mask.
    3. Provide appropriate information (for example, 
      HASHLOV
       masking type, 
      FULL NAME
       data category, and so on) in the relevant fields.
  4. Select the 
    Options
     tab and specify the number of threads (for example, 4) for the 
    PARALLEL
     option.
  5. Select the 
    Summary
     tab and review the information. The following table shows the example information that is displayed before you split the table:
    Table
    Column
    Function
    Parm1
    Parm2
    PERSONS
    FIRST_NAME
    HASHLOV
    FULL NAME
    3
    PERSONS
    LAST_NAME
    HASHLOV
    FULL NAME
    4
  6. Click the
     Split Table
     button.
    The
     Split Table
     dialog opens.
  7. Enter the following information and click 
    OK
    :
    1. Enter the number of threads; for example, 4. 
    2. Select the numeric column (for example, 
      PERSON_ID
      ) that you want to use for the split.
      We recommend that you choose an indexed column or a column that is a primary key. That is, the column must contains unique values.
  8. Review the updated information in the 
    Summary
     tab. The following table shows the example information after you split the table:
    Table
    Column
    Function
    Parm1
    Parm2
    PERSONS
     
    WHERE
    PERSON_ID < 153
     
    PERSONS
    FIRST_NAME
    HASHLOV
    FULL NAME
    3
    PERSONS
    LAST_NAME
    HASHLOV
    FULL NAME
    4
    PERSONS
     
    WHERE
    PERSON_ID BETWEEN 153 AND 205
     
    PERSONS
    FIRST_NAME
    HASHLOV
    FULL NAME
    3
    PERSONS
    LAST_NAME
    HASHLOV
    FULL NAME
    4
    PERSONS
     
    WHERE
    PERSON_ID BETWEEN 206 AND 258
     
    PERSONS
    FIRST_NAME
    HASHLOV
    FULL NAME
    3
    PERSONS
    LAST_NAME
    HASHLOV
    FULL NAME
    4
    PERSONS
     
    WHERE
    PERSON_ID BETWEEN 259 AND 313
     
    PERSONS
    FIRST_NAME
    HASHLOV
    FULL NAME
    3
    PERSONS
    LAST_NAME
    HASHLOV
    FULL NAME
    4
  9. Click the 
    Save & Run Mask
     button.
  10. Enter the name for the masking file that includes all the mapping information and click 
    Save
    .
  11. Enter the name for the options file that includes the selected options information and click 
    Save
    .
  12. Click 
    OK
    .
    A dialog opens that displays the masking progress.
  13. Review the information and save or close the dialog when masking is done.
You have successfully used parallel threads to mask the data.
Run Fast Data Masker Scripts Remotely
When you save the defined masking information in Fast Data Masker, Fast Data Masker creates a batch file. This batch file includes information about the location of the connection file, masking file, options file, and other related information (for example, start memory). You can use this batch file to run from a remote location where Fast Data Masker is not installed. This is helpful in scenarios where you are facing performance issues on your server because of different components installed on the server. And, to improve the performance, you want to run the batch script from a different server.
To run this batch file from a remote location, edit the batch file and update the paths to the Fast Data Masker .jar, connection file, masking file, and options file. Ensure that they all point to valid locations. You can then run the file from that remote location. 
The following example snippet shows the contents of a masking batch file; update the required file locations based on your requirement:
 
java -Djava.util.logging.config.file="C:/Program Files/Grid-Tools/FastDataMasker/logging.properties" -Xms1000M -Xmx10000M -jar "C:/Program Files/Grid-Tools/FastDataMasker/Fastdatamasker.jar" "C:/Program Files/Grid-Tools/FastDataMasker/doc/connectSQLSERVER.txt" "C:\Users\<username>\AppData\Roaming\Grid-Tools\Fastdatamasker\MyMask.csv" "C:\Users\<username>\AppData\Roaming\Grid-Tools\Fastdatamasker\MyMask_options.txt"
 
In this snippet, you can find the following example locations:
  • C:/Program Files/Grid-Tools/FastDataMasker/Fastdatamasker.jar shows the location of the Fast Data Masker .jar file.
  • C:/Program Files/Grid-Tools/FastDataMasker/doc/connectSQLSERVER.txt shows the location of the Microsoft SQL Server connection file.
  • C:\Users\<username>\AppData\Roaming\Grid-Tools\Fastdatamasker\MyMask.csv shows the location of the file that contains masking information.
  • C:\Users\<username>\AppData\Roaming\Grid-Tools\Fastdatamasker\MyMask_options.txt shows the location of the file that contains the applied options information.