Propagate Seed List Data Across Masking Engines

This article expects the input in the form of a structured CSV file and provides appropriate SQL scripts for all the different databases. The rationale behind adopting this approach is that the schema is different across different databases; therefore, exporting from one table and importing into another table may or may not work. Also, the input data and the SQL commands vary. 
tdm10
As a test data engineer, review this article to know how you can propagate seed list data across various masking engines for different databases.
This article expects the input in the form of a structured CSV file and provides appropriate SQL scripts for all the different databases. The rationale behind adopting this approach is that the schema is different across different databases; therefore, exporting from one table and importing into another table may or may not work. Also, the input data and the SQL commands vary. 
The information in this article represents a generic use case. The objective is to help users understand this generic information and follow a similar approach for their specific requirements.
2
2
Understand the Current Seed List Repository
Currently, the seed list that is shipped with CA TDM is available at the following places:
  • Repository database (GTREP)
    The Datamaker component of CA TDM uses the seed list present in this database for synthetic data generation. The Fast Data Masker component can also use this seed list.
  • Scramble database
    The Fast Data Masker component uses the seed list present in this database.
  • File system
The table schema of the seed list tables in the GTREP and Scramble databases is different. Also, the schema of the seed list table in the Scramble database is not the same across different databases (Oracle, MS SQL, and Teradata).
For any record in the seed list table, the following fields are relevant to the user:
  • Category Name
  • Value
    The value can include a single value or multiple values. In the GTREP database table, each value is further associated with a name.
The other fields in the seed list table are calculated values that Datamaker or Fast Data Masker uses internally.
Considerations
Review the following considerations:
  • Ensure that the seed data is present in a CSV (comma-separated value) file. Each row in the CSV file represents a record in the database. The first value in the row represents the category name and the remaining items in the row represent the name-value pairs. For example, consider a row in the CSV file:
    Address-US,City,Plano,State,Texas
    In this example, 
    Address-US
     represents the category name and 
    City,Plano
     and 
    State,Texas
     represent the name-value pairs.
  • Remove the column headers from the CSV file.
  • If the CSV file contains non-ASCII characters, use UTF-8 encoding.
  • The minimum number of name-value pairs is one and the maximum is 30.
  • The SQL scripts that are used in this article are generic and are written considering the maximum allowed values. If you have fewer values in a row of a CSV file, change the SQL scripts accordingly.
  • If you need to insert data only in the Scramble database where names are not required for the values, you can simplify the CSV file by providing only the category names and values. Also, you must modify the SQL query that is used for loading the CSV file into the table as appropriate.
  • Some databases (for example, MS SQL and Teradata) expect that the number of values in each row of a CSV file must match the number of columns present in the database table. Therefore, if you have a CSV file where the number of values is not the same for each row or the number of values is fewer than the number of table columns, you can use the provided PowerShell script. This script helps you transform the CSV file into the format that the SQL scripts included in this article require. For more information about how to run this script see, the Run the Transform Script section.
Repository Database
The repository database supports the following flavors of databases:
  • Oracle
  • MS SQL
You can use the appropriate SQL scripts provided in each section to import/export the seed list data:
Seed List Propagation in the Repository Database for Oracle
This section includes information about how you can propagate seed list data when Oracle is used as a database.
The following illustration outlines the high-level process:
Seedlist_Oracle
Seedlist_Oracle
Insert Data into the Repository Database (GTREP)
The seed list in the repository database is stored in the 
gtrep_reference_data
 table. CA TDM uses this data for synthetic data generation.
To insert the seed data into this table, perform the following steps:
  1. Load the CSV file data as an external table in the Oracle database as follows:
    1. Create a directory for the Oracle user and map it to the hard disk location on the Oracle server host system by running the following SQL commands:
      create or replace directory CSVDIR as 'C:/mycsv';
      grant read, write on directory CSVDIR to GTREP;
    2. Copy the CSV file to the created directory (in this case, 
      C:/mycsv
      ).
    3. Run the following SQL command to create the external table:
      ALTER SESSION SET CURRENT_SCHEMA = GTREP;
      CREATE TABLE csv_ext_table
      (
      GROUPNAME VARCHAR2(254),
      NAME1 VARCHAR2(254),
      VALUE1 VARCHAR2(254),
      NAME2 VARCHAR2(254),
      VALUE2 VARCHAR2(254),
      NAME3 VARCHAR2(254),
      VALUE3 VARCHAR2(254),
      NAME4 VARCHAR2(254),
      VALUE4 VARCHAR2(254),
      NAME5 VARCHAR2(254),
      VALUE5 VARCHAR2(254),
      NAME6 VARCHAR2(254),
      VALUE6 VARCHAR2(254),
      NAME7 VARCHAR2(254),
      VALUE7 VARCHAR2(254),
      NAME8 VARCHAR2(254),
      VALUE8 VARCHAR2(254),
      NAME9 VARCHAR2(254),
      VALUE9 VARCHAR2(254),
      NAME10 VARCHAR2(254),
      VALUE10 VARCHAR2(254),
      NAME11 VARCHAR2(254),
      VALUE11 VARCHAR2(254),
      NAME12 VARCHAR2(254),
      VALUE12 VARCHAR2(254),
      NAME13 VARCHAR2(254),
      VALUE13 VARCHAR2(254),
      NAME14 VARCHAR2(254),
      VALUE14 VARCHAR2(254),
      NAME15 VARCHAR2(254),
      VALUE15 VARCHAR2(254),
      NAME16 VARCHAR2(254),
      VALUE16 VARCHAR2(254),
      NAME17 VARCHAR2(254),
      VALUE17 VARCHAR2(254),
      NAME18 VARCHAR2(254),
      VALUE18 VARCHAR2(254),
      NAME19 VARCHAR2(254),
      VALUE19 VARCHAR2(254),
      NAME20 VARCHAR2(254),
      VALUE20 VARCHAR2(254),
      NAME21 VARCHAR2(254),
      VALUE21 VARCHAR2(254),
      NAME22 VARCHAR2(254),
      VALUE22 VARCHAR2(254),
      NAME23 VARCHAR2(254),
      VALUE23 VARCHAR2(254),
      NAME24 VARCHAR2(254),
      VALUE24 VARCHAR2(254),
      NAME25 VARCHAR2(254),
      VALUE25 VARCHAR2(254),
      NAME26 VARCHAR2(254),
      VALUE26 VARCHAR2(254),
      NAME27 VARCHAR2(254),
      VALUE27 VARCHAR2(254),
      NAME28 VARCHAR2(254),
      VALUE28 VARCHAR2(254),
      NAME29 VARCHAR2(254),
      VALUE29 VARCHAR2(254),
      NAME30 VARCHAR2(254),
      VALUE30 VARCHAR2(254)
      )
      organization external
      (
      type ORACLE_LOADER
      default directory CSVDIR
      access parameters
      (
      RECORDS DELIMITED by NEWLINE
      FIELDS TERMINATED by ","
      MISSING FIELD VALUES ARE NULL
      REJECT ROWS WITH ALL NULL FIELDS
      )
      location ('SAMPLE_SEED.csv') -- provide your csv file name here
      )
      reject limit unlimited;
  2. Insert the data from the external table 
    csv_external_table
     into the 
    gtrep_reference_data
     table by running the following SQL command:
    ALTER SESSION SET CURRENT_SCHEMA = GTREP;
    INSERT
    INTO gtrep_reference_data(rd_ref_id, rd_ref_type, rd_col_cnt, rd_ref_name_1, rd_ref_value_1, rd_ref_name_2, rd_ref_value_2, rd_ref_name_3, rd_ref_value_3,
    rd_ref_name_4, rd_ref_value_4, rd_ref_name_5, rd_ref_value_5, rd_ref_name_6, rd_ref_value_6, rd_ref_name_7, rd_ref_value_7, rd_ref_name_8, rd_ref_value_8,
    rd_ref_name_9, rd_ref_value_9, rd_ref_name_10, rd_ref_value_10, rd_ref_name_11, rd_ref_value_11, rd_ref_name_12, rd_ref_value_12, rd_ref_name_13, rd_ref_value_13,
    rd_ref_name_14, rd_ref_value_14, rd_ref_name_15, rd_ref_value_15, rd_ref_name_16, rd_ref_value_16, rd_ref_name_17, rd_ref_value_17, rd_ref_name_18, rd_ref_value_18,
    rd_ref_name_19, rd_ref_value_19, rd_ref_name_20, rd_ref_value_20, rd_ref_name_21, rd_ref_value_21, rd_ref_name_22, rd_ref_value_22, rd_ref_name_23, rd_ref_value_23,
    rd_ref_name_24, rd_ref_value_24, rd_ref_name_25, rd_ref_value_25, rd_ref_name_26, rd_ref_value_26, rd_ref_name_27, rd_ref_value_27, rd_ref_name_28, rd_ref_value_28,
    rd_ref_name_29, rd_ref_value_29, rd_ref_name_30, rd_ref_value_30, rd_proj_id)
    SELECT GROUPNAME, 'SAM' REF_TYPE,
    ((CASE WHEN NAME1 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME2 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME3 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME4 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME5 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME6 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME7 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME8 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME9 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME10 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME11 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME12 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME13 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME14 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME15 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME16 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME17 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME18 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME19 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME20 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME21 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME22 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME23 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME24 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME25 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME26 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME27 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME28 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME29 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME30 IS NULL THEN 0 ELSE 1 END)) COLM_COUNT,
    NAME1,VALUE1,NAME2,VALUE2,NAME3,VALUE3,NAME4,VALUE4,
    NAME5,VALUE5,NAME6,VALUE6,NAME7,VALUE7,NAME8,VALUE8,
    NAME9,VALUE9,NAME10,VALUE10,NAME11,VALUE11,NAME12,VALUE12,
    NAME13,VALUE13,NAME14,VALUE14,NAME15,VALUE15,NAME16,VALUE16,
    NAME17,VALUE17,NAME18,VALUE18,NAME19,VALUE19,NAME20,VALUE20,
    NAME21,VALUE21,NAME22,VALUE22,NAME23,VALUE23,NAME24,VALUE24,
    NAME25,VALUE25,NAME26,VALUE26,NAME27,VALUE27,NAME28,VALUE28,
    NAME29,VALUE29,NAME30,VALUE30,0
    FROM CSV_EXT_TABLE;
  3. Drop the external table as follows:
    ALTER SESSION SET CURRENT_SCHEMA=GTREP;
    drop table CSV_EXT_TABLE;
Export User-Defined Seed List
You can export the user-defined seed list from the repository database.
If a user-defined seed list is present in the 
gtrep_reference_data
 table in the repository database (GTREP), you can export it to a CSV file. The user-defined data in seed list must have been added with the 
rd_ref_type
 column value as 
SAM
 to identify that this is a user-defined data.
Run the following SQL script to export user-defined seed data from the 
gtrep_reference_data
 table into the GTREP database:
ALTER SESSION SET CURRENT_SCHEMA=GTREP;
DECLARE file_handle utl_file.file_type;
delimiter CHAR := ',';
BEGIN
--
-- open file for writing
file_handle := utl_file.fopen('CSVDIR', 'GTREP_SAMPLE_SEED.csv', 'w');
--
-- loop through the records
FOR seed_rec IN
(SELECT *
FROM gtrep_reference_data
WHERE rd_ref_type = 'SAM') -- add more clauses to filter the records
LOOP
utl_file.PUT_LINE(file_handle, seed_rec.rd_ref_id || delimiter || seed_rec.rd_ref_name_1 || delimiter || seed_rec.rd_ref_value_1 || delimiter || seed_rec.rd_ref_name_2 || delimiter || seed_rec.rd_ref_value_2
|| delimiter || seed_rec.rd_ref_name_3 || delimiter || seed_rec.rd_ref_value_3 || delimiter || seed_rec.rd_ref_name_4 || delimiter || seed_rec.rd_ref_value_4 || delimiter || seed_rec.rd_ref_name_5
|| delimiter || seed_rec.rd_ref_value_5 || delimiter || seed_rec.rd_ref_name_6 || delimiter || seed_rec.rd_ref_value_6 || delimiter || seed_rec.rd_ref_name_7 || delimiter || seed_rec.rd_ref_value_7
|| delimiter || seed_rec.rd_ref_name_8 || delimiter || seed_rec.rd_ref_value_8 || delimiter || seed_rec.rd_ref_name_9 || delimiter || seed_rec.rd_ref_value_9 || delimiter || seed_rec.rd_ref_name_10
|| delimiter || seed_rec.rd_ref_value_10 || delimiter || seed_rec.rd_ref_name_11 || delimiter || seed_rec.rd_ref_value_11 || delimiter || seed_rec.rd_ref_name_12 || delimiter || seed_rec.rd_ref_value_12
|| delimiter || seed_rec.rd_ref_name_13 || delimiter || seed_rec.rd_ref_value_13 || delimiter || seed_rec.rd_ref_name_14 || delimiter || seed_rec.rd_ref_value_14 || delimiter || seed_rec.rd_ref_name_15
|| delimiter || seed_rec.rd_ref_value_15 || delimiter || seed_rec.rd_ref_name_16 || delimiter || seed_rec.rd_ref_value_16 || delimiter || seed_rec.rd_ref_name_17 || delimiter || seed_rec.rd_ref_value_17
|| delimiter || seed_rec.rd_ref_name_18 || delimiter || seed_rec.rd_ref_value_18 || delimiter || seed_rec.rd_ref_name_19 || delimiter || seed_rec.rd_ref_value_19 || delimiter || seed_rec.rd_ref_name_20
|| delimiter || seed_rec.rd_ref_value_20 || delimiter || seed_rec.rd_ref_name_21 || delimiter || seed_rec.rd_ref_value_21 || delimiter || seed_rec.rd_ref_name_22 || delimiter || seed_rec.rd_ref_value_22
|| delimiter || seed_rec.rd_ref_name_23 || delimiter || seed_rec.rd_ref_value_23 || delimiter || seed_rec.rd_ref_name_24 || delimiter || seed_rec.rd_ref_value_24 || delimiter || seed_rec.rd_ref_name_25
|| delimiter || seed_rec.rd_ref_value_25 || delimiter || seed_rec.rd_ref_name_26 || delimiter || seed_rec.rd_ref_value_26 || delimiter || seed_rec.rd_ref_name_27 || delimiter || seed_rec.rd_ref_value_27
|| delimiter || seed_rec.rd_ref_name_28 || delimiter || seed_rec.rd_ref_value_28 || delimiter || seed_rec.rd_ref_name_29 || delimiter || seed_rec.rd_ref_value_29 || delimiter || seed_rec.rd_ref_name_30
|| delimiter || seed_rec.rd_ref_value_30);
END LOOP;
--
-- close the file
utl_file.fclose(file_handle);
EXCEPTION
WHEN others THEN
IF utl_file.is_open(file_handle) THEN
utl_file.fclose(file_handle);
END IF;
END;
The file is exported to the directory mapped to the CSVDIR directory on the Oracle server.
Seed List Propagation in the Repository Database for MS SQL
This section includes information about how you can propagate seed list data when MS SQL is used as a database.
The following illustration outlines the high-level process:
MSSQL_Seedlist
MSSQL_Seedlist
Insert Data into the Repository Database (GTREP)
Note:
 The scripts in this section use the GTREP as the repository database.
  1. Create a temporary table in the MS SQL database as follows:
    USE [GTREP]
    CREATE TABLE csv_ext_table
    (
    GROUPNAME VARCHAR(254),
    NAME1 VARCHAR(254) ,
    VALUE1 VARCHAR(254),
    NAME2 VARCHAR(254) DEFAULT NULL,
    VALUE2 VARCHAR(254) DEFAULT NULL,
    NAME3 VARCHAR(254) DEFAULT NULL,
    VALUE3 VARCHAR(254) DEFAULT NULL,
    NAME4 VARCHAR(254) DEFAULT NULL,
    VALUE4 VARCHAR(254) DEFAULT NULL,
    NAME5 VARCHAR(254) DEFAULT NULL,
    VALUE5 VARCHAR(254) DEFAULT NULL,
    NAME6 VARCHAR(254) DEFAULT NULL,
    VALUE6 VARCHAR(254) DEFAULT NULL,
    NAME7 VARCHAR(254) DEFAULT NULL,
    VALUE7 VARCHAR(254) DEFAULT NULL,
    NAME8 VARCHAR(254) DEFAULT NULL,
    VALUE8 VARCHAR(254) DEFAULT NULL,
    NAME9 VARCHAR(254) DEFAULT NULL,
    VALUE9 VARCHAR(254) DEFAULT NULL,
    NAME10 VARCHAR(254) DEFAULT NULL,
    VALUE10 VARCHAR(254) DEFAULT NULL,
    NAME11 VARCHAR(254) DEFAULT NULL,
    VALUE11 VARCHAR(254) DEFAULT NULL,
    NAME12 VARCHAR(254) DEFAULT NULL,
    VALUE12 VARCHAR(254) DEFAULT NULL,
    NAME13 VARCHAR(254) DEFAULT NULL,
    VALUE13 VARCHAR(254) DEFAULT NULL,
    NAME14 VARCHAR(254) DEFAULT NULL,
    VALUE14 VARCHAR(254) DEFAULT NULL,
    NAME15 VARCHAR(254) DEFAULT NULL,
    VALUE15 VARCHAR(254) DEFAULT NULL,
    NAME16 VARCHAR(254) DEFAULT NULL,
    VALUE16 VARCHAR(254) DEFAULT NULL,
    NAME17 VARCHAR(254) DEFAULT NULL,
    VALUE17 VARCHAR(254) DEFAULT NULL,
    NAME18 VARCHAR(254) DEFAULT NULL,
    VALUE18 VARCHAR(254) DEFAULT NULL,
    NAME19 VARCHAR(254) DEFAULT NULL,
    VALUE19 VARCHAR(254) DEFAULT NULL,
    NAME20 VARCHAR(254) DEFAULT NULL,
    VALUE20 VARCHAR(254) DEFAULT NULL,
    NAME21 VARCHAR(254) DEFAULT NULL,
    VALUE21 VARCHAR(254) DEFAULT NULL,
    NAME22 VARCHAR(254) DEFAULT NULL,
    VALUE22 VARCHAR(254) DEFAULT NULL,
    NAME23 VARCHAR(254) DEFAULT NULL,
    VALUE23 VARCHAR(254) DEFAULT NULL,
    NAME24 VARCHAR(254) DEFAULT NULL,
    VALUE24 VARCHAR(254) DEFAULT NULL,
    NAME25 VARCHAR(254) DEFAULT NULL,
    VALUE25 VARCHAR(254) DEFAULT NULL,
    NAME26 VARCHAR(254) DEFAULT NULL,
    VALUE26 VARCHAR(254) DEFAULT NULL,
    NAME27 VARCHAR(254) DEFAULT NULL,
    VALUE27 VARCHAR(254) DEFAULT NULL,
    NAME28 VARCHAR(254) DEFAULT NULL,
    VALUE28 VARCHAR(254) DEFAULT NULL,
    NAME29 VARCHAR(254) DEFAULT NULL,
    VALUE29 VARCHAR(254) DEFAULT NULL,
    NAME30 VARCHAR(254) DEFAULT NULL,
    VALUE30 VARCHAR(254) DEFAULT NULL
    )
  2. Import data from the CSV file into the temporary table as follows:
    use [GTREP]
    bulk insert [dbo].[csv_ext_table]
    from 'C:\TEST_GROUP.csv' --provide your csv file path here
    with (fieldterminator = ',', rowterminator = '\n', keepnulls)
    go
  3. Copy the data from the temporary table to the 
    gtrep_reference_data
     table in the repository database as follows:
    USE [GTREP]
    INSERT
    INTO [DBO].[GTREP_REFERENCE_DATA](rd_ref_id, rd_ref_type, rd_col_cnt, rd_ref_name_1, rd_ref_value_1, rd_ref_name_2, rd_ref_value_2, rd_ref_name_3, rd_ref_value_3, 
    rd_ref_name_4, rd_ref_value_4, rd_ref_name_5, rd_ref_value_5, rd_ref_name_6, rd_ref_value_6, rd_ref_name_7, rd_ref_value_7, rd_ref_name_8, rd_ref_value_8, 
    rd_ref_name_9, rd_ref_value_9, rd_ref_name_10, rd_ref_value_10, rd_ref_name_11, rd_ref_value_11, rd_ref_name_12, rd_ref_value_12, rd_ref_name_13, rd_ref_value_13, 
    rd_ref_name_14, rd_ref_value_14, rd_ref_name_15, rd_ref_value_15, rd_ref_name_16, rd_ref_value_16, rd_ref_name_17, rd_ref_value_17, rd_ref_name_18, rd_ref_value_18, 
    rd_ref_name_19, rd_ref_value_19, rd_ref_name_20, rd_ref_value_20, rd_ref_name_21, rd_ref_value_21, rd_ref_name_22, rd_ref_value_22, rd_ref_name_23, rd_ref_value_23, 
    rd_ref_name_24, rd_ref_value_24, rd_ref_name_25, rd_ref_value_25, rd_ref_name_26, rd_ref_value_26, rd_ref_name_27, rd_ref_value_27, rd_ref_name_28, rd_ref_value_28, 
    rd_ref_name_29, rd_ref_value_29, rd_ref_name_30, rd_ref_value_30, rd_proj_id)
    SELECT GROUPNAME, 'SAM' REF_TYPE,
    ((CASE WHEN NAME1 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME2 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME3 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME4 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME5 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME6 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME7 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME8 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME9 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME10 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME11 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME12 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME13 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME14 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME15 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME16 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME17 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME18 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME19 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME20 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME21 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME22 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME23 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME24 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME25 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME26 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME27 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME28 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME29 IS NULL THEN 0 ELSE 1 END)
    + (CASE WHEN NAME30 IS NULL THEN 0 ELSE 1 END)) COLM_COUNT,
    NAME1,VALUE1,NAME2,VALUE2,NAME3,VALUE3,NAME4,VALUE4,
    NAME5,VALUE5,NAME6,VALUE6,NAME7,VALUE7,NAME8,VALUE8,
    NAME9,VALUE9,NAME10,VALUE10,NAME11,VALUE11,NAME12,VALUE12,
    NAME13,VALUE13,NAME14,VALUE14,NAME15,VALUE15,NAME16,VALUE16,
    NAME17,VALUE17,NAME18,VALUE18,NAME19,VALUE19,NAME20,VALUE20,
    NAME21,VALUE21,NAME22,VALUE22,NAME23,VALUE23,NAME24,VALUE24,
    NAME25,VALUE25,NAME26,VALUE26,NAME27,VALUE27,NAME28,VALUE28,
    NAME29,VALUE29,NAME30,VALUE30,0
    FROM [DBO].[CSV_EXT_TABLE];
  4. Drop the temporary table as follows:
    USE [GTREP]
    DROP TABLE [dbo].[csv_ext_table]
Export the User-Defined Seed List
You can export the user-defined seed list from the 
gtrep_reference_data table
 into the GTREP database.
Use the 
Results to File
 option in the SQL server query browser to export the data. Set the output format to comma delimited and exclude headers in 
Query Options
.
Run the following script to export:
SELECT [rd_ref_id]
,[rd_ref_name_1]
,[rd_ref_value_1]
,[rd_ref_name_2]
,[rd_ref_value_2]
,[rd_ref_name_3]
,[rd_ref_value_3]
,[rd_ref_name_4]
,[rd_ref_value_4]
,[rd_ref_name_5]
,[rd_ref_value_5]
,[rd_ref_name_6]
,[rd_ref_value_6]
,[rd_ref_name_7]
,[rd_ref_value_7]
,[rd_ref_name_8]
,[rd_ref_value_8]
,[rd_ref_name_9]
,[rd_ref_value_9]
,[rd_ref_name_10]
,[rd_ref_value_10]
,[rd_ref_name_11]
,[rd_ref_value_11]
,[rd_ref_name_12]
,[rd_ref_value_12]
,[rd_ref_name_13]
,[rd_ref_value_13]
,[rd_ref_name_14]
,[rd_ref_value_14]
,[rd_ref_name_15]
,[rd_ref_value_15]
,[rd_ref_name_16]
,[rd_ref_value_16]
,[rd_ref_name_17]
,[rd_ref_value_17]
,[rd_ref_name_18]
,[rd_ref_value_18]
,[rd_ref_name_19]
,[rd_ref_value_19]
,[rd_ref_name_20]
,[rd_ref_value_20]
,[rd_ref_name_21]
,[rd_ref_value_21]
,[rd_ref_name_22]
,[rd_ref_value_22]
,[rd_ref_name_23]
,[rd_ref_value_23]
,[rd_ref_name_24]
,[rd_ref_value_24]
,[rd_ref_name_25]
,[rd_ref_value_25]
,[rd_ref_name_26]
,[rd_ref_value_26]
,[rd_ref_name_27]
,[rd_ref_value_27]
,[rd_ref_name_28]
,[rd_ref_value_28]
,[rd_ref_name_29]
,[rd_ref_value_29]
,[rd_ref_name_30]
,[rd_ref_value_30]
FROM [GTREP].[dbo].[gtrep_reference_data]
WHERE [rd_ref_type]='SAM' -- add more clauses to filter the records further
Scramble Database
The Scramble database supports the following flavors of databases:
  • Oracle
  • MS SQL
  • Teradata
  • File System
Seed List Propagation in the Scramble Database for Oracle
This section includes information about how you can propagate seed list data when Oracle is used as a database.
The following illustration outlines the high-level process:
Oracle_Scramble
Oracle_Scramble
Insert Data into the Scramble Database
The Fast Data Masker component uses seed list from the Scramble database to mask data. The seed list is stored in the 
gtsrc_reference_lov
 table of the Scramble database.
To insert the new seed data, the data is first inserted into the 
gtsrc_reference_data
 table. The number of rows available for each seed category is calculated and the data is copied into the 
gtsrc_reference_lov
 table accordingly.
To insert the seed data into the 
gtsrc_reference_lov
 table, perform the following steps:
  1. Load the CSV data as an external table in the Scramble database as mentioned in the Insert Data into the Repository Database (GTREP) section.
  2. Import data from the external table into the 
    gtsrc_reference_data
     table as follows:
    ALTER SESSION SET CURRENT_SCHEMA=SCRAMBLE;
    INSERT
    INTO gtsrc_reference_data(rd_ref_id, rd_ref_value, rd_ref_value2, rd_ref_value3, rd_ref_value4, rd_ref_value5, rd_ref_value6, rd_ref_value7,
    rd_ref_value8, rd_ref_value9, rd_ref_value10, rd_ref_value11, rd_ref_value12, rd_ref_value13, rd_ref_value14, rd_ref_value15, rd_ref_value16,
    rd_ref_value17, rd_ref_value18, rd_ref_value19, rd_ref_value20, rd_ref_value21, rd_ref_value22, rd_ref_value23, rd_ref_value24, rd_ref_value25,
    rd_ref_value26, rd_ref_value27, rd_ref_value28, rd_ref_value29, rd_ref_value30)
    SELECT groupname, value1, value2, value3, value4, value5, value6, value7, value8, value9, value10,
    value11, value12, value13, value14, value15, value16, value17, value18, value19, value20, value21,
    value22, value23, value24, value25, value26, value27, value28, value29, value30
    FROM csv_ext_table;
  3. Drop the external table as follows:
    ALTER SESSION SET CURRENT_SCHEMA=SCRAMBLE;
    drop table csv_ext_table;
  4. Copy the data from the 
    gtsrc_reference_data
     table to the 
    gtsrc_reference_lov
     table as follows:
    ALTER SESSION SET CURRENT_SCHEMA=SCRAMBLE;
    exec gtsrc_setcount.setcount();
Export User-Defined Seed List
You can export the user-defined seed list from the Scramble database.
In the existing schema of the 
gtsrc_reference_lov
 table in the Scramble database, no information is available to identify the records added by the user. However, if users can use a WHERE clause to identify the records, they can then export the records using the similar export script as provided in the repository section.
Seed List Propagation in the Scramble Database for MS SQL
This section includes information about how you can propagate seed list data when MS SQL is used as a database.
The following illustration outlines the high-level process:
MSSQL_Scramble
MSSQL_Scramble
Insert Data into the Scramble Database
Note:
 The scripts in this section use the Scramble database.
  1. Create a temporary table in the MS SQL database as follows:
    USE [SCRAMBLE]
    CREATE TABLE csv_ext_table
    (
    GROUPNAME VARCHAR(254),
    NAME1 VARCHAR(254) ,
    VALUE1 VARCHAR(254),
    NAME2 VARCHAR(254) DEFAULT NULL,
    VALUE2 VARCHAR(254) DEFAULT NULL,
    NAME3 VARCHAR(254) DEFAULT NULL,
    VALUE3 VARCHAR(254) DEFAULT NULL,
    NAME4 VARCHAR(254) DEFAULT NULL,
    VALUE4 VARCHAR(254) DEFAULT NULL,
    NAME5 VARCHAR(254) DEFAULT NULL,
    VALUE5 VARCHAR(254) DEFAULT NULL,
    NAME6 VARCHAR(254) DEFAULT NULL,
    VALUE6 VARCHAR(254) DEFAULT NULL,
    NAME7 VARCHAR(254) DEFAULT NULL,
    VALUE7 VARCHAR(254) DEFAULT NULL,
    NAME8 VARCHAR(254) DEFAULT NULL,
    VALUE8 VARCHAR(254) DEFAULT NULL,
    NAME9 VARCHAR(254) DEFAULT NULL,
    VALUE9 VARCHAR(254) DEFAULT NULL,
    NAME10 VARCHAR(254) DEFAULT NULL,
    VALUE10 VARCHAR(254) DEFAULT NULL,
    NAME11 VARCHAR(254) DEFAULT NULL,
    VALUE11 VARCHAR(254) DEFAULT NULL,
    NAME12 VARCHAR(254) DEFAULT NULL,
    VALUE12 VARCHAR(254) DEFAULT NULL,
    NAME13 VARCHAR(254) DEFAULT NULL,
    VALUE13 VARCHAR(254) DEFAULT NULL,
    NAME14 VARCHAR(254) DEFAULT NULL,
    VALUE14 VARCHAR(254) DEFAULT NULL,
    NAME15 VARCHAR(254) DEFAULT NULL,
    VALUE15 VARCHAR(254) DEFAULT NULL,
    NAME16 VARCHAR(254) DEFAULT NULL,
    VALUE16 VARCHAR(254) DEFAULT NULL,
    NAME17 VARCHAR(254) DEFAULT NULL,
    VALUE17 VARCHAR(254) DEFAULT NULL,
    NAME18 VARCHAR(254) DEFAULT NULL,
    VALUE18 VARCHAR(254) DEFAULT NULL,
    NAME19 VARCHAR(254) DEFAULT NULL,
    VALUE19 VARCHAR(254) DEFAULT NULL,
    NAME20 VARCHAR(254) DEFAULT NULL,
    VALUE20 VARCHAR(254) DEFAULT NULL,
    NAME21 VARCHAR(254) DEFAULT NULL,
    VALUE21 VARCHAR(254) DEFAULT NULL,
    NAME22 VARCHAR(254) DEFAULT NULL,
    VALUE22 VARCHAR(254) DEFAULT NULL,
    NAME23 VARCHAR(254) DEFAULT NULL,
    VALUE23 VARCHAR(254) DEFAULT NULL,
    NAME24 VARCHAR(254) DEFAULT NULL,
    VALUE24 VARCHAR(254) DEFAULT NULL,
    NAME25 VARCHAR(254) DEFAULT NULL,
    VALUE25 VARCHAR(254) DEFAULT NULL,
    NAME26 VARCHAR(254) DEFAULT NULL,
    VALUE26 VARCHAR(254) DEFAULT NULL,
    NAME27 VARCHAR(254) DEFAULT NULL,
    VALUE27 VARCHAR(254) DEFAULT NULL,
    NAME28 VARCHAR(254) DEFAULT NULL,
    VALUE28 VARCHAR(254) DEFAULT NULL,
    NAME29 VARCHAR(254) DEFAULT NULL,
    VALUE29 VARCHAR(254) DEFAULT NULL,
    NAME30 VARCHAR(254) DEFAULT NULL,
    VALUE30 VARCHAR(254) DEFAULT NULL
    )
  2. Import data from the CSV file into the temporary table as follows:
    use [SCRAMBLE]
    bulk insert [dbo].[csv_ext_table]
    from 'C:\TEST_GROUP.csv' --provide your csv file path here
    with (fieldterminator = ',', rowterminator = '\n', keepnulls)
    go
  3. Copy the data from the temporary table to the
    gtsrc_reference_data
    table in the Scramble database:
    use [SCRAMBLE]
    INSERT
    INTO [dbo].[gtsrc_reference_data](rd_ref_id, rd_ref_value, rd_ref_value2, rd_ref_value3, rd_ref_value4, rd_ref_value5, rd_ref_value6, rd_ref_value7,
    rd_ref_value8, rd_ref_value9)
    SELECT groupname, value1, value2, value3, value4, value5, value6, value7, value8, value9
    FROM [dbo].[csv_ext_table];
  4. Drop the temporary table as follows:
    USE [SCRAMBLE]
    DROP TABLE [dbo].[csv_ext_table]
Export the User-Defined Seed List
You can export the user-defined seed list from the Scramble database.
In the
gtsrc_reference_data
table in the Scramble database, no information is available to identify the user-defined rows. If users can create an appropriate WHERE clause, they can then use the following query and can export the results to a file:
SELECT [rd_ref_id],
[rd_ref_value]
,[rd_ref_value2]
,[rd_ref_value3]
,[rd_ref_value4]
,[rd_ref_value5]
,[rd_ref_value6]
,[rd_ref_value7]
,[rd_ref_value8]
,[rd_ref_value9]
FROM [SCRAMBLE].[dbo].[gtsrc_reference_data] -- add your WHERE clause here
Seed List Propagation in the Scramble Database for Teradata
In Teradata, the seed list is present only in the
gtsrc_reference_data
table.
The following illustration outlines the high-level process:
Teradata_Seedlist
Teradata_Seedlist
Consider the following points for Teradata:
  • The database name that is used in the SQL scripts (used in this section) is
    functions
    .
  • The scripts that are used in this section assume the CSV file to be in the same format as mentioned earlier in this article, with the exception that it contains only 19 values per row (one category name and nine name-value pairs) because the seed list for Teradata contains only nine values.
    Also, no names are required for the Teradata seed list. However, to keep the CSV format consistent, it is recommended to include them in the CSV file. They are ignored at the time of import. If you do not want to add names with values in the CSV file, you must modify the scripts accordingly.
  • The CSV file must have exactly 19 values per row for the scripts that are used in this section to work. If fewer values are present, review the main Considerations section in this article.
Execute the SQL Scripts in Teradata SQL Assistant
  1. Create a table to hold the values included in the CSV file:
    CREATE TABLE csv_ext_table
    (
    groupname VARCHAR(254),
    name1 VARCHAR(254),
    value1 VARCHAR(254),
    name2 VARCHAR(254) DEFAULT NULL,
    value2 VARCHAR(254) DEFAULT NULL,
    name3 VARCHAR(254) DEFAULT NULL,
    value3 VARCHAR(254) DEFAULT NULL,
    name4 VARCHAR(254) DEFAULT NULL,
    value4 VARCHAR(254) DEFAULT NULL,
    name5 VARCHAR(254) DEFAULT NULL,
    value5 VARCHAR(254) DEFAULT NULL,
    name6 VARCHAR(254) DEFAULT NULL,
    value6 VARCHAR(254) DEFAULT NULL,
    name7 VARCHAR(254) DEFAULT NULL,
    value7 VARCHAR(254) DEFAULT NULL,
    name8 VARCHAR(254) DEFAULT NULL,
    value8 VARCHAR(254) DEFAULT NULL,
    name9 VARCHAR(254) DEFAULT NULL,
    value9 VARCHAR(254) DEFAULT NULL,
    row_index INTEGER
    );
  2. Open the Teradata SQL Assistant interface and perform the following steps:
    1. Verify that the import file accepts comma as a delimiter. If not, change it from the
      Tools -> Options -> Import/Export
      menu.
    2. Enable the import operation by selecting
      File -> Import Data
      from the menu.
    3. Execute the following SQL statement:
      INSERT INTO functions.csv_ext_table
      (groupname, name1, value1, name2, value2, name3, value3, name4,
      value4, name5, value5, name6, value6, name7, value7, name8, value8,
      name9, value9, row_index)
      VALUES
      (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,-1);
      A file browser dialog opens.
    4. Change the select file type in the file browser to All files(*.*), browse and select the CSV file, and click
      OK
      .
    5. After the file is imported, disable the import operation by selecting
      File -> Import Data
      from the menu.
  3. Copy the data from temporary table to
    gtsrc_reference_data
    :
    INSERT INTO functions.csv_ext_table
    (row_index, groupname, value1, value2, value3, value4, value5, value6, value7, value8, value9)
    SELECT ROW_NUMBER() OVER (PARTITION BY groupname ORDER BY groupname)-1 AS RNUM, tab.groupname,tab.value1,
    tab.value2, tab.value3, tab.value4, tab.value5, tab.value6, tab.value7, tab.value8, tab.value9
    FROM functions.csv_ext_table tab;
    DELETE FROM functions.csv_ext_table WHERE row_index=-1;
    UPDATE functions.csv_ext_table tempTab
    SET row_index=COALESCE(row_index+(SELECT MAX(rd_index) FROM functions.gtsrc_reference_data refTab GROUP BY refTab.rd_ref_id where refTab.rd_ref_id=tempTab.groupname)+1,row_index);
    INSERT INTO functions.gtsrc_reference_data
    (rd_ref_id, rd_ref_value,rd_ref_value2, rd_ref_value3,
    rd_ref_value4, rd_ref_value5, rd_ref_value6, rd_ref_value7, rd_ref_value8,
    rd_ref_value9, rd_index)
    SELECT groupname,value1,value2,value3,
    value4,value5,value6,value7,
    value8,value9, row_index
    FROM functions.csv_ext_table;
  4. Drop the temporary table:
    DROP TABLE csv_ext_table;
Export Seed List
You cannot identify user-defined rows in this table. If you can provide a WHERE clause to identify the data, use the following query to export the data. Set the Teradata SQL Assistant in export results to the file mode by selecting
File -> Export Results
before executing the query:
SELECT rd_ref_id, rd_ref_value, rd_ref_value2, rd_ref_value3,
rd_ref_value4, rd_ref_value5, rd_ref_value6, rd_ref_value7, rd_ref_value8,
rd_ref_value9
FROM functions.gtsrc_reference_data ---provide your where clause here
Note: 
On exporting from the Scramble database, the data is in the format CategoryName,Value1,Value2. That is, names are not associated with the values.
Seed List Propagation in the File System
Seed list present in the file system is shipped with the Fast Data Masker component. The seed list is present in the directory C:\Program Files\Grid-Tools\FastDataMasker\seedtables (if you installed the Fast Data Masker in the C:\ drive).
Each file or group of files present in the directory represents a category. If a category has single data in a row, it is represented by a single file and the file name represents the category name. If a category has multiple values in a row, the same number of files are present for that category. For example, category
australianpostalcodes
has two values, so two files
australianpostalcodes.1.txt
(contains first value for the row) and
australianpostalcodes.2.txt
(contains second value for the row) are present. The contents in a file are the values that are separated by a new line.
If multiple files for a category are present, each file must have an equal number of lines. In case of multiple files, it is possible for some rows to have only one value. In such cases, the second value in the second file must be blank.
To add more data to the existing seed category, add the values to the existing files. Each category file name is mapped with a logical name to be displayed in the Fast Data Masker UI.This mapping is done in the file BuildMap.xls located in the folder C:\Program Files\Grid-Tools\FastDataMasker. The mapping is present in the File Descriptions sheet of the BuildMap.xls file. If the file description is available for a file name, it is shown to the user in the UI. Otherwise, only the file name is displayed. Ideally, all files must have entries in this sheet.
To add a new category, create a new file (or files) based on the description that is mentioned in this section.
Limitations
The 
gtsrc_reference_data
 table in the Scramble database and the 
gtrep_reference_data
 table in GTREP do not have primary columns. So, you cannot identify a row uniquely. You must inspect the data to identify the appropriate record that you want to update or delete.
Run the Transform Script
The following scripts help you transform the CSV file into the required format that the scripts in this article can use—transformcsv.ps1 (For MS SQL) and transformcsv_teradata.ps1 (For Teradata).
  1. Copy the files on your Windows computer where PowerShell is enabled.
  2. Copy the CSV file on the same computer.
  3. Open the appropriate transform file in a text editor and edit the first and the second lines to provide the path of your input CSV file and output CSV file.
  4. Save your changes.
  5. Right-click on the transform file and select
    Run with PowerShell
    from the context menu (or run it using the PowerShell command shell).
Script-Based Masking
Script-based masking uses the list of functions provided in the GTREP_FUNCTIONS table in the repository database (GTREP). If you add any new categories to the seed list table, ensure that you make them available for the HASHLOV, RANDLOV, and SEQLOV functions in the GTREP_FUNCTIONS table in the repository database (GTREP).
The scripts used in this section show how you can insert entries into the Oracle repository database (GTREP) when new categories are added to the scramble seed list (for different flavors of the databases).
Note:
You must explicitly run the SQL statement for each new category; the SQL statement does not insert all the entries at the same time.
Update Oracle Repository Database (GTREP) Based on Oracle Scramble Seed List
If you add a new category to the Oracle seed list (scramble), add a corresponding category for the HASHLOV function in the format 'HASHLOV','CUSTOM_USER_GROUP' to the Oracle repository database table (GTREP). The function name and the category name are enclosed within quotes and are separated by a comma. Similarly, provide entries for the RANDLOV and SEQLOV functions.
Use the following SQL statement to insert the new category into the Oracle repository database (GTREP):
Note:
 In this case, the seed list that already contains the new category belongs to the Oracle scramble database.
INSERT INTO gtrep_functions(
                          fun_rdbms, --rdbms type ORACLE
                          fun_datatype,--the datatype of the category values.Options are CHAR,DATE & NUMERIC
                          fun_function, --function name
                          date_created,
                          date_updated,
                          who_created,
                          fun_description --description of what the function does
                          )
values ('ORACLE','CHAR','''HASHLOV'',''CUSTOM_USER_GROUP''',to_char(sysdate),to_char(sysdate),'USER_NAME','custom_description');
Apart from the mentioned columns, the table also has two more columns named
program_created
and
program_updated
, which you can leave blank.
Update Oracle Repository Database (GTREP) Based on MS SQL Scramble Seed List
If you add a new category to the MSSQL seed list (scramble), add a corresponding entry for RANDLOV in the format RANDLOV,CUSTOM_USER_GROUP to the Oracle repository database table (GTREP). The function name and the category name are not enclosed within quotes and are separated by a comma.
Use the following SQL statement to insert the new category into the Oracle repository database (GTREP). 
Note:
In this case, the seed list that already contains the new category belongs to the MS SQL scramble database.
INSERT INTO gtrep_functions(
                          fun_rdbms, --rdbms type SQLSERVER
                          fun_datatype,--the datatype of the category values.Options are CHAR,DATE & NUMERIC
                          fun_function, --function name
                          date_created,
                          date_updated,
                          who_created,
                          fun_description --description of what the function does
                          )
values ('SQLSERVER','CHAR','RANDLOV,CUSTOM_USER_GROUP',to_char(sysdate),to_char(sysdate),'USER_NAME','custom_description')
Update Oracle Repository Database (GTREP) Based on Teradata Scramble Seed List 
For the Teradata seed list, the function name syntax is the same as for MS SQL. Provide the value for the
fun_rdbms
column as 'TERADATA'. The function name and the category name are NOT enclosed within quotes and are separated by a comma.
Update Oracle Repository Database (GTREP) Based on File System Scramble Seed List
For the file system seed list, the function name syntax is RANDLOV,FileName.txt. Provide the value for the
fun_rdbms
column as 'SDM'. The function name and the category file name are NOT enclosed within quotes and are separated by a comma.