Data Scrambling

The data scrambling process can be split into two discrete steps.
tdm10
The data scrambling process can be split into two discrete steps.
  1. In Datamaker, identify the table columns that you want to scramble, and select the appropriate scramble function.
  2. In Data Subset, design the extract, then connect to the Datamaker repository to select the scramble project. 
    1. Choose the table columns that you want to scramble, and specify the scramble functions to apply to these columns. 
    2. If necessary, also select the table order project from the Datamaker repository.
Familiarize yourself with the following concepts:
2
Add Scrambled Data
Datamaker has several different techniques to scramble (mask or obfuscate) data. The method that you use depends on the particular task you are perform. You can use multiple techniques to ensure that the sensitive records are  secure.
Method 1
  1. Copy data to the central test data repository, and apply randomize functions to store test data.
  2. Copy data into your project, and edit the data for publishing later. These edits include:  
    • Converting column values to variables
    • Converting column values to functions, which could include a randomize function. For example,
      randrange(min,max)
    • Selecting one of the right-hand click edit functions, such as randomize by range
Method 2 
  1. Define transformation functions that are applied to the data as it is extracted, or to update the data in-situ
  2. To identify the columns to scramble and the masking function to apply, in Datamaker, select Projects, and Transformation Maps.
  3. Select a Transformation Map or create a new one. For more information, see transformation Maps.
    The function is labeled as part of the extract process on the server, or as part of an in-situ data masking process. The list of available masking functions can be expanded to include existing transformation, or customized by request.
Add More Scramble Functions (Oracle only)
To include more functions to scramble the date, do the following in Datamaker:
  1. Click Tools, and Maintain Data Functions.
  2. Click the Plus icon, and enter the function name.
  3. Amend the master functions to call your section of code. For example, in Oracle edit the master function F_SCRAMBLE2.PLS. 
    The function contains a standard set of input parameters.
  4. Create or replace function
    gtsrc_scramble2 (in_table_column in varchar,
                                            in_data_precision in number,
                                            in_data_length in number,
                                            in_nullable in char,
                                            in_rownum in number,
                                            in_xref in varchar,
                                            in_keepnull in varchar,
                                            in_list_colno in number,
                                            in_value in varchar,
                                            in_type in varchar,
                                            in_parm1 in varchar,
                                            in_parm2 in varchar)
The parameters are available for you to use as parameters to your function. Find the section of code and add in the call to your function. If the function is written in Java, create an Oracle Java function first.
 elsif in_type = 'REPLACE' then
    if wk_text is not null then
      wk_text := replace(wk_text,in_parm1,in_parm2);
    end if;
  elsif in_type = 'MYFUNCTION' then
    if wk_text is not null then
      wk_text := myfunction(wk_text);
    end if;
Compile the function in the Scramble user and test. The three master functions for Oracle are:
  • F_SCRAMBLE2.PLS
     Handles Characters
  • F_SCRAMBLED2.PLS
    Handles Dates
  • F_SCRAMBLEN.PLS
    Handles Numerics
Add More Seed Tables
  1. Add seed rows to the table GTSRC_REFERENCE_DATA.
  2. Set the column RD_REF_ID to the name of the seed table.
  3. Click Tools, Maintain Data Functions, and add a LIST function to the available functions.
    The function is available in the scramble function drop-down list.
Check Columns for Quoted Data (Microsoft SQL Server only)
When you generate scripts for scrambled Microsoft SQL Server extracts in Data Subset, first verify if the tables in your extract schema have data that contains quotes. Data Subset generates insert statements where character fields are delimited by single quotes. If the data contains quotes, Data Subset issues a replace statement to replace each single quote with two quotes. The first quote acts as an escape character so SQL Server knows that the end of a data field is not reached.
To ascertain which tables and columns in your schema contain quoted data, follow these steps:
  1. Register your source tables in Datamaker
  2. Click 
    Tools
    and 
    Actions for Registered Tables
    . From the drop-down actions list, select
    Check for Quotes in data
    .
  3. Click the GO button next to the actions list to verify table columns for quoted data.
Datamaker creates a table tag in the right-hand list that is named GT Contains Quotes.
  1. Click the GT Contains Quotes tag.
    Datamaker highlights those tables that contain quoted data.
  2. Double-click one of these highlighted tables in the center list.
    Datamaker shows the columns for this table that contain quoted data.
  3. Click the ‘tables’ node in the tree view on the left of the screen to return to the tables view for the center list.
Datamaker stores the results of these searches in its repository.
If you select the same project as the scrambled project in Data Subset, it interrogates the Datamaker repository for quoted columns and automatically wraps those columns with replace statements when it generates its scramble scripts. 
Create a Cross Reference Table for Character Columns (Microsoft SQL Server only)
Creating a cross-reference table for character columns lets you create consistent from-and-to scrambling. You use this method if you want, for example, ACME CORP to be changed to CUSTOMER 1, and MICROSOFT INC to be changed to CUSTOMER 2, and you want this change to occur for several columns across different databases or schemas.
The steps below describe how to scramble the table PRODUCT and the column DESCRIP across databases:
  1. Create user-defined function gtsrc_scramble using f_scrambleMSSS.sql in the \scrambleinstall\MSSQLSERVER  folder.
  2. Create user defined stored procedure gtexplode_ref  using gtexplode_ref.sql
  3. Add some values to gtsrc_reference_data as follows (this only needs to be done once).
  4. Explode data in gtsrc_reference_data by issuing the following command (this only needs to be done once).
EXECUTE gtexplode_ref  ‘[reference id of existing data to use]’ ,  ‘[newid to insert]’, ‘[tablename you are inserting into]’ , ‘[column name to scramble]’
The Data Subset Steps
  1. Open the Database Actions Screen.
  2. Select the action type:
    1. (SQL Server only) Scrambled SQL Server Extract 
    2. (Oracle only) Scrambled Windows Extract / Loader Import, or UNIX Extract / Loader Import
  3. (If you have chosen Scrambled SQL Server Extract only) Enter the functions database name where you have installed the scramble functions ‘md5hash’ and ‘selectrand’.
  4. Choose the Data Scrambling tab.
  5. Connect to a Datamaker repository if prompted.
  6. Choose your GT Datamaker project from the drop-down list where you defined your sensitive columns.
  7. Click the Table Order tab if your target database (where you extract to) is constrained.
  8. Choose the project where you calculated your insert orders.
  9. Click the Generate button to create your scramble scripts.