Manage Data Classifiers

For a list of terms used in the process, see .
tdm49
HID_manage_classifier
Data Classifiers are a set of rules in JSON format, that CA TDM uses for two purposes:
  • To perform a PII scan
    For the PII scan, CA TDM compares column values to seedlists or regular expressions within classifiers, to assign a classifier's tag to that column. For more information on the PII Audit procedure, see PII Audit Using CA TDM Portal.
  • To mask data
    When CA TDM masks data, it uses a classifier's seedlist or regular expression to generate masked data. For more information on the masking process, see Mask Data with CA TDM Portal.
For a list of terms used in the process, see PII Data Scan Terminology.
This page covers the following topics:
Classes of Data Classifier
The two classes of Data Classifiers are as follows:
  • RegEx
    Classifiers that recognize column names and table content that match the appropriate regular expression.
    For example, a UK Postcode Classifier is a RegEx Classifier that contains the appropriate regular expression.
  • SeedList
    Classifiers that recognize column names and table content that match a sample list of values.
    For example, a UK Given Name Classifier is a SeedList Classifier that contains a sample list of UK given names.
CA TDM loads classifiers at startup. To add your own classifiers to the classifiers that CA TDM loads at startup, see Add Classifiers to CA TDM.
How to create a RegEx or SeedList Classifier
Use the following JSON code to create a customized RegEx or SeedList Classifier.
Syntax
{
"name":"name",
"description":"description",
"classifierOrigin":"company name",
"classifierClass":"com.ca.tdm.profiler.classifiers.RegExClassifier",
"classifierType":"content",
"tags":"tag name",
"config":[
{
"name":"name",
"value":"value"
}
]
}
Parameters
  • name
    Specifies the name of the Classifier.
  • description
    (Optional) Specifies a generic description of the Classifier.
  • classifierOrigin
    Specifies the origin of the Classifier. By default this parameter is set to CA.
  • classifierClass
    Specifies the class of the Classifier.
    • For RegEx Classifier:
      By default this parameter is set to
      com.ca.tdm.profiler.classifiers.RegExClassifier
    • For SeedList Classifier:By default this parameter is set to
      com.ca.tdm.profiler.classifiers.SeedListClassifier
  • classifierType
    Identifies the type for each Classifier. There are two possible values for classifierType:
    • content
      These Classifiers scan the contents of a column (against either a regular expression or seedlist, dependent on the Classifier class)
    • column
      These Classifiers only scan the title of a column (against either a regular expression or seedlist, dependent on the Classifier class)
  • tags
    Specifies a tag that the Classifier associates with matched columns.
  • config
    Specifies the name and value parameters for a Classifier to match content during PII data scan. Choose one of the following:
    • For RegEx Classifier:
      Specify the name and enter a Java-compliant regular expression.
    • For SeedList Classifier:
      Do not edit the value of the name parameter. The name value in the JSON file is used to match with the name value in the corresponding SEEDLIST file.
      The value parameter specifies the name of the SEEDLIST file.
      Note:
      Only one config item is allowed in a SeedList Classifier JSON file.
Examples: Create a RegEx and SeedList Classifier
Example 1: Create a RegEx Classifier
This example creates a RegEx Classifier:
{
"name":"IBAN",
"description":"Classifier to identify an IBAN from the United Kingdom, Germany or Sweden",
"classifierOrigin":"CA Technologies",
"classifierClass":"com.ca.tdm.profiler.classifiers.RegExClassifier",
"classifierType":"content",
"tags":"IBAN",
"config":[
{
"name":"Germany",
"value":"(?:DE)[\\d]{2}\\s?(?:[\\d]{4}\\s?){4}[\\d]{2}"
},
{
"name":"UK",
"value":"(?:GB)(?:[\\d]{2})\\s?(?:[A-Z]{4})\\s?(?:[\\d]{4}\\s?){3}[\\d]{2}"
},
{
"name":"Sweden",
"value":"(?:SE)[\\d]{2}\\s?(?:[\\d]{4}\\s?){5}"
}
]
}
Example 2: Create a Seedlist Classifier
This example creates a SeedList Classifier:
{
"name": "German Given Name",
"description": "Seedlist classifier for given names (German).",
"classifierOrigin": "CA Technologies",
"classifierClass": "com.ca.tdm.profiler.classifiers.SeedListClassifier",
"classifierType": "content",
"tags": "Given Name",
"config": [
{
"name": "name",
"value": "Given Name (Germany)"
}
]
}
Example: Create a SeedList File
This is an example of a SeedList file, used by the SeedList Classifier above:
name:Given Name (Germany)
description:Given Name (Germany)
origin:CA Technologies
revision:1
values:
Abbo
Abelard
Achim
Adalgisa
Adelaide
Include Masking Functions in a Classifier
You can include one or more masking functions in a RegEx Classifier or a SeedList Classifier. You can use these when you generate a masking configuration for a Data Model. You define a Mask Function Group (maskFunctionGroup), and all masking functions defined within this Mask Function Group are associated with the Classifier.
Each Classifier has a tag, and at least one masking function is associated with the tag. Masking functions have between zero and four parameters.
For more information about all the supported masking functions and their required parameters, see Masking Functions and Parameters.
If a classifier has no tag, the Mask Function Group section should be empty in the Classifier JSON file.
Use the following JSON code to customize and add a Mask Function Group to a RegEx or SeedList Classifier.
Syntax
{
"name": "name",
"description": "description",
"classifierOrigin": "company name",
"classifierClass": "com.ca.tdm.profiler.classifiers.SeedListClassifier",
"classifierType":"content",
"tags": "tag name",
"config": [
{
"name": "name",
"value": "value"
}
],
"maskFunctionGroup":[
{
"groupName":
"group name"
,
"maskFunction":[
{
"functionName": "function name",
"displayName": "display name",
"notes": "notes",
"maskParams": [
{
"paramPosition": "parameter position",
"paramValue": "parameter value"
},
]
}
]
}
]
}
Parameters
  • maskFunctionGroup
    Specifies the name for a group of masking functions.
    • maskFunction
      Specifies a list of names and parameters for all the masking functions in this group. Each item within the list has the following parameters:
      • functionName
        Specifies the name of the masking function.
      • (Optional) displayName
        Specifies the user-defined alias for a function name that appears in the TDM Portal.
      • (Optional) notes
        Specifies additional details about the masking function.
      • maskParams
        Specifies the parameters to be used during masking.
        • paramPosition
          Specifies the parameter position of each masking parameter.
          Values:
          1,2,3, or 4
          Ensure that you enter the correct parameter position as supported by the masking function. For more information about the masking functions and their required parameters, see Masking Functions and Parameters.
        • paramValue
          Specifies the parameter value of each masking parameter.
Examples: Masking functions in Classifiers
In both of the following examples, masking functions are defined inside a masking function group callled "masking functions".
Example 1: Create a SeedList Classifier with a Masking Function
This example creates a SeedList Classifier with the masking function HASHLOV, with firstname.txt as the argument for HASHLOV's first parameter:
{
"name": "Given Name (UK)",
"description": "Seedlist matcher for given names (UK)",
"classifierOrigin": "CA Technologies",
"classifierClass": "com.ca.tdm.profiler.classifiers.SeedListClassifier",
"classifierType":"content",
"tags": "Given Name",
"config": [
{
"name": "name",
"value": "Given Name (UK)"
}
],
"maskFunctionGroup":
[
{
"groupName": "masking functions"
"maskFunction":[
{
"functionName": "HASHLOV",
"displayName": "Given Name UK",
"notes": "Given name derived from a hashed index into a lookup-table",
"maskParams": [
{
"paramPosition": "1",
"paramValue": "firstname.txt"
}
]
}
]
}
]
}
Example 2: Create a SeedList Classifier with multiple Masking Functions
This example creates a SeedList Classifier with the following masking functions:
  • HASHLOV, with firstname.txt as the argument for HASHLOV's first parameter
  • RANDLOV, with lastnameindian.txt as the argument for RANDLOV's first parameter
{
"name": "Given Name (UK)",
"description": "Seedlist matcher for given names (UK)",
"classifierOrigin": "CA Technologies",
"classifierClass": "com.ca.tdm.profiler.classifiers.SeedListClassifier",
"classifierType":"content",
"tags": "Given Name",
"config": [
{
"name": "name",
"value": "Given Name (UK)"
}
],
"maskFunctionGroup":
[
{
"groupName": "masking functions"
"maskFunction":[
{
"functionName": "HASHLOV",
"displayName": "Given Name UK",
"notes": "Given name derived from a hashed index into a lookup-table",
"maskParams": [
{
"paramPosition": "1",
"paramValue": "firstname.txt"
}
]
},
{
"functionName": "RANDLOV",
"displayName": "Last Name India",
"notes": "Last name derived from a random index in a lookup-table",
"maskParams": [
{
"paramPosition": "1",
"paramValue": "lastnameindian.txt"
}
]
}
]
}
]
}
Create a Classifier Pack
You can create classifier packs, to add them to CA TDM.
Follow these steps:
  1. Save the Classifier file in a directory within a zip file.
    The hierarchy of the files and directories in the zip file is replicated as the hierarchy of Classifiers in the CA TDM Portal.
    For example, the
    classifier-data.zip
    file contains a classifier pack directory named Common. The individual classifier files are saved under two group directories, Financial and Personal. When importing the
    classifier-data.zip
    file in to the CA TDM Portal, the following tree structure appears under the preview in
    Classifiers:
  • Common
    • Financial
      • Credit Card
      • IBAN
      • Swift Code
    • Personal
      • Birth Date
      • E-mail
Import a Classifier pack
You can import classifiers to CA TDM during use. After you import a Classifier in CA TDM Portal, the Classifier remains available in CA TDM Portal after restarting the server.
Follow these steps:
  1. Open the CA TDM Portal as administrator.
  2. Click
    Configuration
    ,
    Classifiers
    .
  3. Drag and drop the zip file onto the grey 'Drag and Drop File or Click to select' button, or click the button to browse for the zip file, and click
    Open
    when you locate the zip file. The import process starts automatically.
If you try to import duplicate classifiers, a warning message to overwrite the existing classifiers appears. Click
Yes
to overwrite the duplicate Classifiers or click
No
to ignore the duplicate Classifiers.
The Classifier files are successfully imported into the CA TDM Portal. A preview of the imported Classifiers appears under
Classifiers
.
Delete a Classifier
You can delete classifiers from CA TDM Portal.
  1. Open the CA TDM Portal as administrator.
  2. Click
    Configuration
    ,
    Classifiers
    .
  3. In the preview, select one or more classifiers in the tree structure.
  4. Click
    Delete
    .
    A confirmation prompt appears.
  5. To confirm the deletion of selected classifiers, click
    Delete
    .