Scan Data Model for PII

The PII Scan process is also part of the PII Audit process, but the PII Audit process only produces a report on sensitive data. With your Data Model, you can also mask this sensitive data.
tdm47
HID_Scan_Model_PII
You can run a PII Scan on your Data Model, to discover sensitive data in it. You can then use the results of this scan to produce a masking configuration and mask this sensitive data (see Mask Data with CA TDM Portal).
The PII Scan process is also part of the PII Audit process, but the PII Audit process only produces a report on sensitive data. With your Data Model, you can also mask this sensitive data.
The following PII data scan scenario outlines the step-by-step process for a Test Data Engineer (TDE) to identify any Personally Identifiable Information (PII) data in a Data Model.
Prerequisites
You need to create a Data Model in order to scan this model for PII.
For more information, see End-to-End Scenario for Data Discovery.
Options for the PII Scan
Scan Level
The Scan Level ranges from Basic to All based on the percentage of data you want to scan in your environment. For example, if you set the Scan Level to 10%, CA TDM performs a PII data scan scan on 10% of your environment only with a minimum of at least 10 rows per table.
  • Basic:
     Performs a PII data scan on 10 samples of data for each column in a table of your environment. 
  • All:
     Performs a PII data scan on all columns and rows for all tables in the selected environment. 
    Running a scan at Scan Level
    All
    on an entire Data Source may take a long time.
Store Matched Samples
When you select 
Store matched Samples
, CA TDM collects ten samples of data for each column in a table and stores it in the repository until the Internal Data Controller signs off the report. The collected data is deleted after the Internal Data Controller signs off and no record of the data is preserved in CA TDM. You can remove PII samples at any time from the Heat Map view of the Data Model. Click on
Actions
, and
Remove Matched Samples
, to permanently delete all matched samples.
Include or Exclude Connection Profiles, Schemas, and Tables
You can apply a filter to include or exclude Connection Profiles, Schemas, and Tables to reduce the size of your scan. You can use the basic wild card characters such as * (used to match one or more characters) and ? (used to match a single character) in the search terms to include or exclude any matching connection profile, schema, and table from the scan. For example, when you enter *sys in the tables to be excluded, the scan excludes all tables that end in 'sys'.
You can either include a connection profile, schema, and table or exclude it from the scan but not both.
Perform a Data Model PII Scan
You can run a PII Scan on your Data Model to detect sensitive information in the Data Model.
Follow these steps:
  1. Click on
    Data Model
    under the
    Modeling
    section of the Portal UI.
    The Data Model page opens with the List View active.
  2. From the View options in the top-right, select the
    Heatmap
    icon.
    The Data Model page displays the Heatmap. The Heatmap displays a grid of squares, which represent the tables in the Data Model.
    These squares are blue before you run the PII Scan on your Data Model.
  3. Click
    Run PII Scan
    .
    After the first execution of the PII Scan, this changes to an
    Actions
    drop-down, with options to
    Re-scan the Data Model
    ,
    Re-Run PII Scan
    or DownloadCSV.
    The Personally Identifiable Information (PII) Data Scanning page opens.
  4. Select the Classifier Packs that you want the PII Scan to use to identify sensitive data.
    For more information on Classifiers, see Manage Data Classifiers.
  5. Click
    Next
    .
  6. Select the appropriate Scan Level.
  7. (Optional) Check
    Store Matched Samples
    if you want to store matched samples.
  8. Click
    Next
    .
  9. Under
    Scan Key Columns
    , select whether to include Key columns (primary key and foreign key columns) in the PII scan. You can choose whether to include:
    • String-based Key columns
      By default, the PII scan includes String-based Key columns.
    • Numeric-based Key columns
      By default, the PII scan does not include Numeric-based Key columns.
      You may want to exclude key columns from your PII scan, because masking these columns may cause conflicts with database constraints. However, if you exclude key columns that contain sensitive data (for example, credit card numbers),
      you risk the exposure of sensitive data
      .
      For this case, you should download and use database constraints scripts (to disable and re-enable constraints before and after the mask job), and mask the data using Fast Data Masker. For more information, see Database Constraints Scripts.
  10. Under
    Include/Exclude Tables
    , select one of the following:
    • Scan All Tables
      Scan executes on all tables in the data sources.
    • Include / Exclude
      'Add Filters' section appears below.
  11. Under 'Add Filters', you can do the following:
    • Click
      New Filter
       to add a filter.
    • In the fields for each filter, select the appropriate Connection Profile, Schema, and as many Table names as you want. For each field, you can select a value from a dropdown list of existing values, or type your own value (this can include wildcards).
  12. Click 
    Next 
    to confirm your selection.
    The PII Data Scan Execution page opens. This page lists your choices from the PII Scan process.
  13. If you are happy with the details of the Scan to be run, select one of the following Schedule options:
    • Now
      When you click
      Profile
      , the PII Scan begins.
    • Schedule
      When you click
      Profile,
      CA TDM schedules the PII Scan to run at the time you specify.
  14. Click
    Profile 
    to begin or schedule the scan.
    The Data Model page opens. The page displays the progress of your PII Scan.
  15. When the scan completes, the Data Model page opens with the Heatmap View. The Heatmap shows the results of the PII Scan. The colour of tables now represents their risk in terms of PII, according to how many tags the PII Scan identifies in each table (see table below).
Review Scan Results
When CA TDM completes a PII scan of your Data Model, it is available on the Heatmap View on the Data Model page. The Heat Map provides an instant graphical view to identify the total potential risk from PII data that exists within the scanned environment. The colours of the squares (i.e. tables) on the heat map indicates the following numbers of distinct tags present in each table:
Colour
Distinct tags
Risk Level
Red
15+
Very High
Dark Orange
10 - 14
High
Light Orange
5 - 9
Medium
Yellow
1 - 4
Low
Green
0
Very Low
Multiple occurences of a single tag within a table only count as one distinct tag. Multiple tags assigned to one column all count as distinct tags.
The top menu bar lists the total number of PII data found within the scanned environment and the number of tables that are marked as confirmed. Each square in the Heat Map represents a table in the Data Source. You can zoom into a specific section of the Heat map to better view the table details.
You can filter tables in the following two ways:
  • Filter Unscanned Tables:
    Use the Unscanned tables toggle to view all unscanned tables.
  • Filter Search Tab
    :
    Use the Filter tab to search for a table, column, tag, profiles, and schema in the Heat Map
  • Risk Slider
    :
    Use the Risk slider to filter tables based on their Risk category
Filter tables
By search term
Type a search term into the
Filter search results
field, to view a drop down with all matches to that term for tables, columns, tags, connection profiles, and schemas. You can use the basic wild card characters such as
*
(used to match one or more characters) and
?
(used to match a single character) in the search terms.
Click one of these matches to make the filter active, and to redraw the Heat Map to only show tables that contain matches to active filters. Active filters are listed under the
Filter
section of the page. You can remove a filter by clicking the
X
next to it.
For example, type in 'CREDIT' to search for all entities that begin with 'CREDIT'. Matches for each type of entity are displayed in the drop down. Click on any of the matches within the drop down to activate that filter and redraw the Heat Map.
CA TDM applies filters with AND logic, therefore more filters results in fewer matches. For example, to search for a string that contains 'CUSTOMER' within the schemas containing 'ACCOUNT' or 'ACCOUNTS', enter the search term '*ACCOUNT*' and select the matching Schema filter from the drop down. Next enter the search term '*CUSTOMER', which shows results for all matches in the remaining tables, columns, tags and profiles. Matching is case insensitive, so you can get results as follows:
tables: 'LEGACY_ACCOUNT', 'Account', columns: 'ACCOUNT_ID', 'ACCOUNT_CUSTOMER', 'Active_Account'
By row count
You can also filter tables based on column size in rows, and re-draw the Heat Map. The column size range is between small and extra large (relative to the row count of tables across the database).
Use the Risk slider to filter tables based on their Risk category
Drag the edges of the slider over the Risk categories to adjust your selection and redraw the Heat Map for a more specific view of the potential PII data that are identified in the scanned environment. Depending on the number of distinct tags that are identified in a table, the tables are filtered and positioned in the Heat Map as follows:
Distinct tags
Risk Level
15+
Very High
10 - 14
High
5 - 9
Medium
1 - 4
Low
0
Very Low
Manually Review Data within Tables
You can review each table in the Heat Map and further investigate if the data identified as PII is correct. You can select multiple tables to
Confirm
them or to mark them as
Not PII
. Click
Clear Selection
to select no tables. If you change filters or the risk slider, this has the same effect as
Clear Selection
.
Follow these steps:
  1. Select a table or tables in the Heat Map. 
    Hover your mouse over a table to view a summary of the table details and the tags that were identified as PII data. You can zoom into the Heat Map to view table details.
  2. You can perform one of the following actions on this table/tables:
    • Click
      Confirm 
      if the tags identified in a table, or tables, are correct.
    • Click
      Not PII
       if you are sure that a table, or tables, do not contain PII data.
    • (Single table only)
      Click
      Investigate
       to see a list view that includes details of columns, tags, and the sample data matched for each column that was identified as PII data.
      From this view, you can do the following:
      • Click 
        View Random Row 
        to view a random row from the selected table to get a better understanding of data available in the selected table. 
      • Click tags that you confirm are appropriate to the column, to 'pin' the tag. To unpin a tag, click the tag again.
      • Click the plus icon to add tags for columns that should be identified as PII data. The first tag you add is defined as the column's Primary tag.
        When you type in the Tag Name field, a drop-down list of available tags appears. If you add your own custom tag (i.e. not from the drop-down list), the next time you add a new tag, your custom tag is available from the drop-down list of tags.
        The tags that the Audit Scan automatically assigns, already have associated masking functions. User-defined tags do not have associated masking functions, until you define them from the Configure Data Masking page. You can also define your own classifiers, which add tags and associated masking functions to the drop-down list. For more information, see Manage Data Classifiers.
      • Click the
        X
        icon associated with each tag, to remove the tag from the column. You can click 
        Remove Unpinned Tags
         to remove all tags that are not pinned, from all columns .
        You must provide a reason when you manually add or remove tags from columns. The 'Reason' field automatically populates with your last input value.
      • Click 
        Confirm And Review Next Table 
        to automatically review the next table or click 
        Confirm And Close 
        to manually select the next table you want to review.
        A tick mark is added to the reviewed and confirmed tables in the Heat Map.
Download PII Scan results as CSV
To better understand the Profiling scan details in a Heat Map, you can download all details of a Heat Map into a CSV file. The CSV file includes details such as Job ID, Job Name, when the scan was initiated, Connection Profile or Environment name that was scanned, all the Heat Map details for matched tags and where they were found. 
To download all details of a Heat Map in a CSV file, click
Actions
and select
Download as CSV
.
Mask PII Data
You can use the tags applied to columns in the Data Model, to mask sensitive data in your data sources.
The data masking process is irreversible.
For more information, see Mask Data with CA TDM Portal.