Mask Data Stored in Flat Files

You can use Fast Data Masker to mask the data that is stored in flat files. This article provides information about how you can mask the data that is stored in the following file types:
tdm43
You can use Fast Data Masker to mask the data that is stored in flat files. This article provides information about how you can mask the data that is stored in the following file types:
Process
The high-level generic process remains the same for flat file masking as for relational data sources. That is, you connect Fast Data Masker to the data source, define masking rules, and verify the output.
Perform the following tasks:
After you complete the masking, a file that contains the masked data is created. Test data engineers (TDEs) can access this masked file to use the masked data.
Mask Data in Fixed-Width or Delimited Files
Fast Data Masker lets you mask the data in fixed-width and delimited files. The delimited file type includes character-, comma-, and tab-separated files.
Understand Fixed-Width and Delimited File Formats
Fast Data Masker expects these file types to follow a specific format. If a file does not adhere to the required format, Fast Data Masker cannot process it for masking.
Delimited File
An example of a delimited file that contains the sample data is as follows. This file is a character-separated file, which includes the pipe character (
|
) as a separator. This file includes three rows of data.
1|James|Lynn|39 2|John|Smith|50 3|Mary|Jane|30
This data file follows the structure that is defined in the associated definition file. An example of a definition file that is related to the delimited data file is as follows. The first row of the definition file gives general details about the file. The
DELIM
parameter in the definition file defines that the pipe character (
|
) is used as a separator in the data file. The file also defines the column names:
ID
,
FIRSTNAME
,
LASTNAME
, and
AGE
, which correspond to the data included in the data file. For example, the first row in the data file is related to the data definition file as
ID
=
1
,
FIRSTNAME
=
James
,
LASTNAME
=
Lynn
,
AGE
=
39
. Other rows also follow the same representation. The definition file does not define any header and trailer values.
HEADER=N,TRAILER=N,DELIM=|,DATEQUOTED=,CHARQUOTED=,NUMQUOTED= ID FIRSTNAME LASTNAME AGE
The following list includes information about the parameter names, description, and values used in the first row:
  • HEADER
    Specifies the header record.
    Values:
    • Y, N, or a number
    • Y==1 and N==0
  • TRAILER
    Specifies the trailer record.
    Values:
    • Y, N, or a number
    • Y==1 and N==0
  • DELIM
    Specifies whether fields are delimited. If yes, specify the value.
    Values:
    • 'fixed', 'FIXED' – fixed width – no delimiters
    • 'comma', '<COMMA>' – comma delimited
    • 'tab', '<TAB>' – tab delimited
    • Any character (for example, pipe)
  • DATEQUOTED
    Specifies whether dates are quoted.
    Values:
    • Y or N
  • CHARQUOTED
    Specifies whether strings are quoted.
    Values:
    • Y or N
  • NUMQUOTED
    Specifies whether numerics are quoted.
    Values:
    • Y or N
Fixed-Width File
An example of a fixed-width file that contains the data is as follows:
This is a test header ARON TAMMY 19780513416670249 4712345J09-335 O412857641MELTON TAYLOR J1581378??19981209 CRAB MIGEL 6817 HAWTHORNE LN JOINT BASE LEWIS MCCWA9843312200019880304412897765 AUTOZONE INC DEPT 8070 PO BOX 2198 MEMPHIS TN3810121980000010101 47037452011-002992??E756100189ABAD GISELLE N4633141??20090904
This data file follows the structure that is defined in the associated definition file. An example of a definition file that is related to the fixed-width data file is as follows. This file defines the value
1
for
HEADER
and
FIXED
for
DELIM
(because it is a fixed-width file). Also, review that the file defines each column name with its specific width (
FNAME,20
).
For example, the first line (
This is a test header
) in the data file is not masked because the value for the
HEADER
parameter is set as
1
. This means that the first row in the data file is excluded from masking. Similarly, the name
ARON
in the data file corresponds to the
FNAME
column, and so on.
HEADER=1,TRAILER=N,DELIM=FIXED,DATEQUOTED=,CHARQUOTED=,NUMQUOTED= FNAME,20 LNAME,15 ADDRESS1,75 CITY,20 STATE_ZIP,13 ID_1,36 S,2 ADDRESS_TITLE,30 POBOX,25 POBOX_2,50 CITY_2,20 STATE_ID_1,21 TITLE_2,35 ADDRESS2,25 ID_2,50 CITY_3,20 STATE_ZIP_2,13 ID_3,15 ID_4,20 ID_5_NAME,25 LNAME_2,13 ID_6,10 ID_7,8
Create a Connection File
Before you start masking your flat file, you must create a connection file. This connection file stores information about the data source type, file that contains the data, file that contains the definition for the data file.
After you create the connection file, the connection file is added to the list of connection files in Fast Data Masker. You can then select this file whenever you want to mask the data that is stored in the associated data file.
Follow these steps:
  1. Click
    Start, All Programs, FastDataMasker, FastDataMasker
    to launch Fast Data Masker.
  2. Click
    New
    .
    All applicable options open in the right pane.
  3. Enter an appropriate name for the connection file in the
    Connection Name
    field.
  4. Select
    File
    from the
    DBMS
    drop-down list.
  5. Select the file (fixed width or delimited) that contains the data you want to mask in the
    File Name
    field.
    Note:
    Instead of a single data file if your data is spread across multiple files, specify the folder location in the
    File Name
    field and select the
    All Files In Directory
    option. Ensure that all data files are available in the same folder location.
  6. Select the associated file that includes the definition for the data file, which you selected in the previous step.
    If you do not have the related definition file, follow these steps to create it:
    1. Click the Create Definition File icon.
    2. Select the file type (
      Comma Separated
      ,
      Tab Separated
      ,
      Character Separated
      , or
      Fixed Width
      ) from the
      File Type
      drop-down list.
    3. Enter required values in the following fields:
      • Separator character
        (For character separated) Specifies the character that you want to use as a separator (for example, pipe) for the data.
      • Header Lines
        Specifies the number of lines (counted from the top) in the data file that you want to mark as header lines and exclude from masking.
      • Trailer Lines
        Specifies the number of lines (counted from the bottom) in the data file that you want to mark as trailer lines and exclude from masking.
    4. Select whether you want to use quotes for the character, numeric, or date data.
    5. For the delimited file, click
      Parse File to Mask
      .
    6. Enter information in
      Column Name
      ,
      Width
      ,
      Sample Data
      , and
      Date Format - add for date columns
      , as applicable:
      For the fixed-width file, the following snippet shows an example of how you add this information in the UI:
      Column Name Width Date Format - add for date columns FNAME 20 LNAME 15 ADDRESS1 75 ... ... ID_7 8
      For the delimited file, the following snippet shows an example of how you add this information in the UI:
      Column Name Date Format - add for date columns Sample Data ID FIRSTNAME LASTNAME AGE
    7. Click
      OK
      .
      A message appears that specifies the location of the saved file.
    8. Click
      OK
      .
      The file is saved and is added to the
      Definition File Name
      field.
      Note:
      If your data definition file is always stored in a fixed location, you can specify the file name in the
      Definition File Name
      field. You can then enter the folder location where the definition file is available in the
      Defn File Directory
      field.
  7. Click
    Save
    .
  8. Click
    OK
    in the
    Save Connection File
    dialog to confirm the save action.
  9. Click
    Connect
    to connect Fast Data Masker to the associated file by using the created connection file.
    The Fast Data Masker UI opens. You can now proceed with the remaining steps.
Select Columns to Mask
After you connect Fast Data Masker to your flat file, you can start adding columns that you want to mask.
Follow these steps:
  1. In the Fast Data Masker UI, use the connection file to connect to the flat file.
  2. Verify that the
    File Mask
    tab (under
    Masking)
    is selected.
  3. Select a column that you want to mask from the
    Add column to mask
    drop-down list. This drop-down list is populated with all the columns that you defined in the data definition file.
  4. Click
    Add
    .
    The column that you selected is displayed as a tab.
  5. Select the data type of the column from the
    Data Type
    drop-down list. Available data types are character, numeric, and date.
    The
    Mask Type
    drop-down list filters the list of masking functions. This list displays only those functions that are applicable to the selected data type.
  6. Select the mask type that you want to use from the
    Mask Type
    drop-down list.
  7. Follow the remaining steps as described in the Select the Mask Type section in the Masking Functions and Parameters.
  8. Repeat Step 3 through Step 7 for other columns that you want to mask.
    You have successfully added appropriate masking functions to the columns that you want to mask.
Select Applicable Masking Options
After you select your columns for masking, you can select the required masking options in the
Options
tab. For more information, see the Specify Options for the Masking Process section in the Masking Options.
Note:
Some of the masking options are not applicable for flat files, though they are displayed in the UI.
Start Masking
Use the
Summary
tab to review the masking information before you actually mask the data. For more information, see the Save and Run the Mask section in the Mask Data Stored in Relational Databases article.
When you run the masking, Fast Data Masker creates and saves the following files:
  • Masked data file:
    A scramble file that includes the masked data. This file is created in the same location where your original data file is available.
    For example, if the name of the data file is EmpMask.txt, the masked data file is generated with the name EmpMask.txt.scramble.
  • Mapping file:
    A .csv file that includes the complete information that you see in the
    Summary
    tab.
    For example, masking columns, masking functions, and masking options. For example, C:\Users\abc01\AppData\Roaming\Grid-Tools\FastDataMasker\EmpMask.csv.
  • Options file:
    A .txt file that includes information about the masking options that you have used.
    For example, C:\Users\abc01\AppData\Roaming\Grid-Tools\FastDataMasker\EmpMask_options.txt.
  • Masking batch file:
    A .bat file that includes information about the mapping file, options file, connection file, memory (start and maximum) allocation, Fast Data Masker JAR file (for a flat file), and log configuration file.
    For example, C:\Users\abc01\AppData\Roaming\Grid-Tools\FastDataMasker\EmpMask.bat.
  • Log file:
    A .log file that includes the log information.
    For example, C:\Users\abc01\AppData\Roaming\Grid-Tools\FastDataMasker\EmpMask.log.
You have successfully masked the data in the fixed-width and delimited files.
Verify the Masked Data
After you mask the data stored in a flat file, you must verify whether the masked output is correct; that is, data is masked correctly and as expected. In the case of flat files, Fast Data Masker generates a new data file that contains the masked data.
Follow these steps:
  1. Navigate to the location where the masked data file (for example, EmpMask.txt) is saved.
  2. Open the file using an editor (for example, Notepad++).
  3. Locate the columns that you used for masking.
  4. Note the masked data in the identified columns.
  5. Verify that the data is masked as expected and is not the same as in the original data file.
Mask Data in XML Files
The high-level steps to mask the data that is stored in XML files is the same as for the fixed-width and delimited files that are discussed in the preceding section. Therefore, this procedure does not include the detailed information and outlines only the required steps.
Follow these steps:
  1. Launch Fast Data Masker.
  2. Click
    New
    .
  3. Enter the connection file name.
  4. Select
    XMLFILE
    from the
    DBMS
    drop-down list.
  5. Specify the location of the XML file in the
    File Name
    field.
  6. Click
    Connect
    .
  7. Verify that the
    XML Mask
    tab is selected under the
    Masking
    tab.
  8. Select the XPATH corresponding to the data that you want to mask from the
    Add XPATH to mask
    drop-down list. This drop-down list is populated with all the XPATHs that are available in the XML file that you are masking. For example, /purchaseOrder/shipTo/name.
  9. Click Add.
    The selected XPATH element is added as a tab.
  10. Select the data type, mask type, and provide the required information.
  11. (Optional) Click the
    Options
    tab and enter values for the masking options that you want to use.
  12. Click the
    Summary
    tab and review the masking information.
  13. Click
    Save & Run Mask
    to save the masking information and mask the data in the XML file.
    A message appears when the masking completes.
  14. Verify that a file containing the masked data is created at the same location where your original data file is available. For example, Client.xml.scramble.xml. Also, verify that the mapping file, options file, batch file, and log file are created.
You have successfully masked the data in the XML file.
Mask Data in JSON Files
The high-level steps to mask the data that is stored in JSON files is the same as for the fixed-width and delimited files that are discussed in the preceding section. Therefore, this procedure does not include the detailed information and outlines only the required steps.
Follow these steps:
  1. Launch Fast Data Masker.
  2. Click
    New
    .
  3. Enter the connection file name.
  4. Select
    JSONFILE
    from the
    DBMS
    drop-down list.
  5. Specify the location of the JSON file in the
    File Name
    field.
  6. Click
    Connect
    .
  7. Verify that the
    JSON Mask
    tab is selected under the
    Masking
    tab.
  8. Select the JSON path corresponding to the data that you want to mask from the
    Add JSON PATH to mask
    drop-down list. This drop-down list is populated with all the JSON paths that are available in the JSON file that you are masking. For example, $['employee']['firstname'].
  9. Click
    Add
    .
    The selected JSON path is added as a tab.
  10. Select the data type, mask type, and provide the required information.
  11. (Optional) Click the
    Options
    tab and enter values for the masking options that you want to use.
  12. Click the
    Summary
    tab and review the masking information.
  13. Click
    Save & Run Mask
    to save the masking information and mask the data in the JSON file.
    A message appears when the masking completes.
  14. Verify that a file containing the masked data is created at the same location where your original data file is available. For example, employee.json.scramble.json. Also, verify that the mapping file, options file, batch file, and log file are created.
You have masked the data in the JSON file.
Mask Excel Files
The high-level steps to mask the data that is stored in Microsoft Excel spreadsheets is the same as for the fixed-width and delimited files that are discussed in the preceding section. Therefore, this procedure does not include the detailed information and outlines only the required steps.
Follow these steps:
  1. Open Fast Data Masker to create a connection profile to the Excel file.
  2. Specify the
    Connection Name
    , for example: PERSONS DATA
  3. Specify EXCEL FILE as the
    DBMS
    type.
  4. Define the
    File Name
    by browsing to the Excel file to connect to.
  5. Define the
    Definition File Directory
    by browsing to the directory where to store the definition files.
  6. Define the
    Definition File Name
    to create definition files for the worksheets.
    The File Definition dialog opens. The dialog shows one tab for each worksheet.
    1. Specify the
      Rows with data before column header line
      , if applicable. Enter the number of header rows to ignore.
      Note: Empty rows are ignored automatically, subtract them from the number of lines to ignore.
    2. Select the ‘
      include in definition
      ’ checkbox to retrieve the worksheet info.
      The tabs display the work sheet content.
    3. Define the format of any date data in the center column, for each worksheet. Double-click on a cell, enter the date format, then press return to commit the change.
      Example:
      dd/mm/yyyy hh:mm
    4. Click
      Save
      to save the file definitions.
  7. Click
    Connect
    to connect to the file.
    A window opens. The window shows one tab for each worksheet.
  8. Define the columns that you want to mask for each worksheet.
    Note: You do not need to provide a date format for date masking. You have already defined the date format in the definition file.
  9. Run the mask to create a masked copy of the original Excel File.
You have masked the data in the Excel file.