Create Data Generation Rules

Use the Generation functionality in the CA TDM Portal to create data generation rules. These rules include data functions that help you generate synthetic data when you perform the data publish operation. At the time of data publishing, the CA TDM Portal evaluates each column and resolves the rule that is applied to each column. The application then populates the column with the value that is generated based on the defined rules.
tdm48
Use the Generation functionality in the CA TDM Portal to create data generation rules. These rules include data functions that help you generate synthetic data when you perform the data publish operation. At the time of data publishing, the CA TDM Portal evaluates each column and resolves the rule that is applied to each column. The application then populates the column with the value that is generated based on the defined rules.
  1. Access the CA TDM Portal.
  2. Select an appropriate project and its corresponding version from the 
    Project
     drop-down list in the top blue bar.
    This selection sets the required project and version context for all the related operations that you perform in the Portal.
  3. Click 
    Generators
     in the left pane.
    The 
    Generators
     page opens.
    Note:
     If the left pane is not visible, click the icon (represented by three horizontal bars) in the top left corner.
  4. Click the data generator that you want to use to publish data.
    The 
    <Data Generator Name>
     page opens. This page includes information (for example, associated project and project version) about the data generator.
  5. Click the 
    Select Tables
     button to open the 
    Registered Tables
     dialog.
    This dialog lists the registered tables (
    used
     and 
    unused
    ) based on the project that is associated to the selected generator. A 
    used
     table is a table that includes data (or data generation rules). 
    Used
     tables are represented by a tick mark in the 
    Used
     column.
    Note:
     You can also view the table relationships. For more information, see View Table Relationships.
  6. (Optional) Click the 
    Relational Edit
     button and select the following actions as required. These actions are applicable for the 
    used
     tables in the selected data generator. These actions help you automate the data generation process to some extent:
    • Make All Children References
      Lets the foreign key column in the child table refer to the primary key column in the parent table using data generation functions. This option helps you automatically generate data generation rules for the foreign key column in the child table, which as a manual procedure is prone to error. This reference, therefore, enables you to establish a relationship between the parent and the corresponding child tables. This approach also maintains the referential integrity of the data at the time of publishing.
      For example, consider a scenario where the value in the foreign key column 
      purchaseOrder_tdm_root_SHRED_ID
       of the child table 
      items
       references the primary key column 
      SHRED_ID
       in the parent table 
      purchaseOrder_tdm_root
      . When you implement this option, the value in the foreign key column of the child table is replaced with an expression 
      ^purchaseOrder_tdm_root.SHRED_ID(1)^
      .  This expression is a data generation function that is generated automatically without any manual effort. And, it establishes a parent-child relationship.
    • Make All Parents default
      Replaces all the primary keys in the parent tables with the valid next sequence using data generation functions.
      For example, 
      SHRED_ID
       is a primary key column that includes some value after the sample data is imported into the table 
      purchaseOrder_tdm_root
      . Therefore, when you implement this option, the sample value in this column is automatically replaced with the expression 
      @nextval(SHRED_ID_SEQ)@
      . This data generation function is automatically generated without any manual effort. This function establishes a sequence that generates unique, sequential values at the time of publishing.
    Click 
    OK
    , review the summary of the affected tables in the 
    Relational Editor Results
     dialog, and click 
    OK
    .
    All 
    used
     tables are updated based on the selected options.
  7. Click the row corresponding to the table for which you want to create data generation rules.
    All the table rows are added to the 
    <Data Generator Name>
     page, including columns. You can perform the following actions as required:
    • To add another table to the 
      <Data Generator Name> 
      page, repeat this step.
    • Click the 
      +r
       (Add Row) icon if no row is available in the table or you want to add another row to the table. To delete a row, click on the row number and then the 
      X
       (Delete Row) icon.
    • To view the constraints applicable to a column belonging to a derived object, move the mouse pointer to the column name. A tool tip appears that shows the constraints. When you move out of a table cell, Portal validates the value based on the constraints applied to the column.
       If your Data Generation rule includes complex constraints that cause validation to take a long time, you can suppress Portal's automatic validation. To do so, add this line to your 
      application.properties
       file:
      tdmweb.TDMGeneratorService.disableValidation=true
    • Click the First Row Display button to view the values of first row of a used table.
      Used table first row details help you understand the columns used in the respective table. You can use this information to select appropriate tables for which you can write necessary data generation rules. You can see the following details of the first row of each table:
      • Column Name
      • Definition
      • Default
      • Global Default
      • Data Type
  8. Click the cells in each table row where you want to add the rule. You can also enter variables in the cell. You can type in the values in each cell, or use the pop-up menu (recommended) that appears when you click a cell. If you type in the values in the cells, the values entered in each row are saved only when all the mandatory cells in that row are filled in. If you choose to work with the pop-up menu, follow these steps:
    • Use the Cut and Paste icons to move the values from one cell to another. 
    • Use Copy icon to copy the values from one cell to another cell/row/column. Following are the available copy options:
      • Copy Row Down
         
        All Rows - all cells
         
        Copies the values of the cells in the focused row to the corresponding cells in the rows below the current row of the same table.
        All Rows - empty cells
         
        Copies the values of the cells in the focused row to the corresponding empty cells in the rows below the current row of the same table.
        New Rows - all cells
         
        Copies the value of the cells in the focused row to the corresponding cells in a new row added below the current row in the same table.
        New Rows - empty cells
         
        Copies the value of the cells in the focused row to the corresponding empty cells in a new row added below the current row in the same table.
      • Copy Column Down
         
        Copy Down All
         
        Copies the value of the focused cell to all the cells below in the same column in the same table.
        Copy Down Null (Empty)
         
        Copies the value of the focused cell to the empty cells below in the same column in the same table.
        Copy Down All with Increment
         
        Copies the value of the focused cell to all the cells below in the same column in the same table with the specified increment value.
        Copy Down Null with Increment
         
        Copies the value of the focused cell to the empty cells below in the same column in the same table with the specified increment value.
      • Copy Column Right
         
        Copy Right All
         
        Copies the value of the focused cell to all the cells on the right side in the same row.
        Copy Right Null (Empty)
         
        Copies the value of the focused cell to the empty cells on the right side in the same row.
        Copy Right All with Increment
         
        Copies the value of the focused cell to all the cells on the right side in the same row with the specified increment value. 
        Copy Right Null with Increment
         
        Copies the value of the focused cell to the empty cells on the right side in the same row with the specified increment value.
      • Copy Column Left 
         
        Copy Left All
         
        Copies the value of the focused cell to all the cells on the left side in the same row.
        Copy Left Null (Empty)
         
        Copies the value of the focused cell to the empty cells on the left side in the same row.
        Copy Left All with Increment
         
        Copies the value of the focused cell to all the cells on the left side with the specified increment value in the same row.
        Copy Left Null with Increment
         
        Copies the value of the focused cell to the empty cells on the left side with the specified increment value in the same row.
    • Use the icons F, C and V to insert Functions, Columns and Variables respectively in the cells. 
    • Use the Data Painter icon (brush icon) to open the 
      Data Painter: <Table_Name>.<Column_Name>
       dialog and create data generation rules.
  9. Use the 
    Data Painter: <Table_Name>.<Column_Name>
     dialog that opens when you click the Data Painter icon to edit and test the data functions. The dialog lets you click on objects in each of the three sections—Functions, Columns, and Variables. These objects transfer to the edit section (left side) where you manipulate them. The details of the three sections are as follows:
    Note:
     There is no character limit for data painter expressions in the CA TDM Portal. However, Datamaker has a limit of 16000 characters. If you expect parallel usage of this data generation rule in Datamaker, adhere to the Datamaker character limit.
    • Functions
      Functions can use hard-coded values, columns, or variables as parameters. Functions can also use other functions as parameters. For example, you can use a function as a result from a Boolean operator in the IF function.
    • Columns
      The Columns list contains any other columns in the table. This list also shows the columns in the other 
      used
       tables.
    • Variables
      The Variables section contains a combination of system operators and any substitution variables you have created. For more information about creating and managing variables, see Create and Manage Variables.
    Note:
     The dialog also displays information about the constraints applicable to the columns belonging to derived objects. When you click the 
    Validate
     button, the Portal validates the value based on the constraints applied to the column. 
  10. Expand the appropriate section (
    Functions
    Columns
    Variables
    ) and click the required object.
    The object is added to the edit section.
    In case of any priorpublishkeylist function, you must replace the 
    "ld_id"
     value with the respective generator ID. Select the applicable generator from the auto populated list when you add a priorpublishkeylist function to a column.
    In case of any sqllist function, you must replace the 
    "connection"
     value with the respective connection profile. Select the applicable connection profile from the auto populated list when you add a sqllist function to a column. The parameter connection is replaced with the name of connection profile adding the character "P" as a prefix.In case of any seedlist function, you must replace the 
    "seedname"
     value with the respective seed data type. Select the applicable seed data type from the auto populated list when you add a seedlist function to a column. For more information, see Data Generation Functions and Parameters.
    Note:
     You can also use the search field at the top of the 
    Functions
     section to find an object.
  11. Edit the rule and click 
    Validate
     to verify that the data generation rule is working correctly.
    If the validation is unsuccessful, a proper error message is displayed. For more information about Functions, Parameters, and Return Values see Data Generation Functions and Parameters.
  12. Click the 
    Insert
     button to insert the validated rule into the cell.
  13. Follow the same steps for other cells in the table.
You have successfully added data generation rules to the table columns. After you add data generation rules to a table, the table becomes a 
used
 table. You can verify this by reviewing the presence of a tick mark in the 
Used
 column. You can now publish the data into the target database schema.
Example: Create Data Generation Rules for the Employee Table
In this example, you create data generation rules for the Employee table so that you can generate and insert synthetic data into the table cells. This synthetic data does not include real data; it includes random data that is as close to the real data as possible. This random data is generated based on the data generation rules that you specify.
The Employee table includes the following columns:
  • employee_id
  • first_name
  • last_name
  • email
  • phone
  • birth_date
  • salary
You add data generation rules to the cells in the Employee table as follows:
Note:
 For more information about specific data functions and their usage, see Data Generation Functions and Parameters.
employee_id
 
  1. Select the 
    NEXT [Next value for column]
     variable from the 
    Variables
     list. This variable adds the next value in the sequence to this cell whenever you publish the table.
    The variable is added as follows to the edit section in the left: 
    ~NEXT~
     
  2. Click 
    Validate
    .
    A random number is generated and is displayed in the field next to the 
    Validate
     button.
  3. Click 
    Insert
    .
    The variable is added to the 
    employee_id
     cell in the table.
first_name
 
  1. Select the 
    randlov(percnull,@seedlist(seedname)@)
     function from the 
    Functions, List of Values
     list. The data function is added as follows: 
    @randlov(percnull,@seedlist(seedname)@)@
     
  2. Search for the 
    seedname
     value (
    FirstName
     in this case) in the pop-up menu and click the seedlist name to add it to the expression.
    The data function is updated as follows: 
    @randlov(percnull,@seedlist(FirstName)@)@
     
  3. Enter 
    0
     as a value for percnull. This function allows you to identify the percentage of rows that are null. Selecting 20 means that 20 percent of the values are designated null.
    The data function is updated as follows: 
    @randlov(0,@seedlist(FirstName)@)@
     
  4. Click 
    Validate
    .
    A random first name is generated based on the provided parameters and is displayed in the field next to the 
    Validate
     button.
  5. Click 
    Insert
    .
    The data function is added to the 
    first_name
     cell in the table.
last_name
 
  • Follow the same steps as for the first_name column. The only difference is that you must select the seedlist name as 
    LastName
    .
email
 
Note:
 The format of the email is.
  1. Select the 
    first_name
     column from the 
    Column, employee
     list.
    The data function is added as follows: 
    ^first_name^
     
  2. Add a dot (.) after the column name and select the 
    last_name
     column from the 
    Column, employee
     list.
    The data function is updated as follows: 
    ^first_name^.^last_name^
     
  3. Select the 
    atsign()
     function from the 
    Functions, String
     list.
    The data function is updated as follows: 
    ^first_name^.^last_name^@atsign()@
     
  4. Add 
    xyz.com
     after the last 
    @
     symbol.
    The data function is updated as follows: 
    ^first_name^.^last_name^@atsign()@xyz.com
     
  5. Click 
    Validate
    .
    A random email ID is generated based on the provided parameters and is displayed in field next to the 
    Validate
     button.
  6. Click 
    Insert
    .
    The data function is added to the 
    email
     cell in the table.
phone
 
  1. Select the 
    randlov(percnull,@seedlist(seedname)@)
     function from the 
    Functions, List of Values
     list.
    The data function is added as follows: 
    @randlov(percnull,@seedlist(seedname)@)@
     
  2. Search for the 
    seedname
     value (
    US Phone no
     in this case) in the pop-up menu and click the seedlist name to add it to the expression.
    The data function is updated as follows: 
    @randlov(percnull,@seedlist(US Phone no)@)@
     
  3. Enter
     0
     as a value for percnull.
    The data function is updated as follows: 
    @randlov(0,@seedlist(US Phone no)@)@
     
  4. Click 
    Validate
    .
    A random phone number is generated based on the provided parameters and is displayed in the field next to the 
    Validate
     button.
  5. Click 
    Insert
    .
    The data function is added to the 
    phone
     cell in the table.
birth_date
 
  1. Select the 
    addranddays(date,min,max)
     function from the
     Functions, Date &/or Time
     list.
    The data function is added as follows: 
    @addranddays(date,min,max)@
     
  2. For the 
    date
     parameter, select 
    SDATE [System Date]
     from the 
    Variables, System Variables
     list.
    The data function is updated as follows: 
    @addranddays(~SDATE~,min,max)@
     
  3. For the 
    min
     parameter, enter -20000 as the minimum value and for the 
    max
     parameter, enter -8000 as the maximum value.
    The data function is updated as follows: 
    @addranddays(~SDATE~,-20000,-8000)@
     
  4. Click 
    Validate
    .
    A random date is generated based on the specified parameters and is displayed in field next to the 
    Validate
     button.
  5. Click 
    Insert
    .
    The data function is added to the 
    birth_date 
    cell in the table.
salary
 
  1. Select the
     addrand(number,min,max)
     function from the 
    Functions, Numeric
     list.
    The data function is added as follows: 
    @addrand(number,min,max)@
     
  2. Enter 100, 1000, and 100000 as values for the number, min, and max parameters respectively.
    The data function is updated as follows: 
    @addrand(100,1000,100000)@
     
  3. Click 
    Validate
    .
    A random value is generated based on the specified parameters and is displayed in field next to the 
    Validate
     button.
  4. Click 
    Insert
    .
    The data function is added to the 
    salary 
    cell in the table.
You have successfully generated data creation rules for the Employee table columns. Now, when you publish the data for this table, the data is generated based on the defined rules and then the generated synthetic data is inserted into the table.