Visualize Test Data Coverage

The Test Data Visualizer is a business intelligence-like utility that is designed to help you build better test data.
tdm10
You do not want to rely on testing (copies of) production data: Apart from legal and privacy objections, production data has high volume (that means, it takes long to test) and low variance (that means, it contains few edge cases). You want to avoid overtesting common cases, and undertesting edge cases. There are more combinations than you can possibly test. Your goal is to identify which columns are relevant which cases you want your tests to cover, and what you consider 100 percent test coverage.
The Test Data Visualizer is a business intelligence-like utility that is designed to help you build better test data. Use it to identify missing combinations of data, analyze invalid sets of combinations, and compare the coverage of different environments (production data compared to QA data, compared to DEV data). The utility lets you measure coverage accurately and track progress. Data visualization is an essential tool for understanding where you want to generate synthetic data to ensure that you have sufficient data coverage across your test repository.
Tip: Ideal candidates for coverage visualization are columns that have only few, unique values (such as languages, currencies, user roles, or account types). You do not use it to visualize columns that contain large sets of unique values (such as names, IDs, addresses, phone numbers) because the result would be meaningless. Look at your test plan to identify the relevant test data attributes, and create a flattened abstraction of relevant columns to run the visualization on.

Open the Test Data Visualizer
To visualize your data coverage, use one of the following methods:
To visualize a CSV file:
  1. Launch C:\Program Files\Grid-Tools\TestData\Visualizer\TestDataVisualizer.exe
    Test Data Visualizer opens.
  2. Click CSV in the toolbar to load a CSV file.
To visualize data from a table in TDM:
  1. Open TDM, and open any data source or data target.
  2. Open a Tables node, and write, and execute any SQL statement.
  3. Click the Visualize Data button (top right of the results table).
    Test Data Visualizer opens and loads the table.
Analyze Coverage of Data Attributes
Define two or more attributes in whose test data coverage you are interested.
Tip: Use the Values column to identify attributes that have few unique values.
In this example, you have decided to analyze the coverage for the attributes role and department. This example refers to the default intervals and spot color settings.
  1. Load a table into the Spot Graph tab and open the Data Attributes section.
  2. Click the X-axis check box for the first attribute.
  3. Click the Y-axis check box for the second attribute.
  4. (Optional) If you inspect more than two attributes, assign related attributes to the same axis.
    Example: Assign
    credit_card_name
    to the same axis as
    account_type
    .
  5. (Optional) Click
    File, Settings
    to configure colors and thresholds.
    • Define spot colors for your low, medium, and good coverage thresholds.
    • Add rows and conditions to visualize more fine-grained coverage thresholds.
  6. Interpret the visualization:
    • The visualizer displays the coverage as a percentage in top of the diagram.
    • A green spot means good data coverage for testing these two attributes.
      Example: The table contains many rows for testing
      sales manager
      and
      marketing contributor
      .
      Result: Do not generate more test data of this type. You could test less data of this type, and could get equally good test results faster.
    • A yellow spot means medium data coverage for testing these two attributes.
      Example: The table contains enough rows for
      sales contributor.
    • A red spot means low data coverage for testing these two attributes.
      Example: The table does not contain enough rows for
      marketing director
      .
      Result: You can improve test coverage by generating more test data of this type.
    • No spot means that no data is available for testing these two attributes together.
      Example: The table does not contain any rows for testing
      marketing manager
      and
      sales director
      .
      Result: Focus on adding test data of this type to improve coverage considerably.
visualizing test data coverage in a 2D diagram
visualizing test data coverage in a 2D diagram
If you compare a large dataset, the diagram becomes harder to read. Drag the mouse to select a rectangular group of spots, then right-click and select Zoom to Region to change the display scale.
Reserve or Export Data
Drag the mouse to select a rectangular group of spots, or double-click an individual colored spot to inspect the rows that fulfill these criteria. You can perform the following operations against the selected data:
Filter and Compare
Switch to Test Factory to filter and compare data, and highlight relevant differences.
  • Filter Data
    Refine the data that is visualized in the diagram. The Data Filter window lets you select and unselect individual values for each non-numeric data attribute, and define an interval for numeric values. Click Reset to remove all custom filters.
    • Filter Axes
      Enable this option to hide the column label from the visualization if no data exists for that column. If you disable this option, a column label remains in the visualization even if no data exists for the column. You typically use this option when you visualize missing values or export.
    • Filter Inspector
      Disable this option to stop applying the same filter to the data inspector window.
    • Auto Update
      Update the visualization while you edit the data filters. Disable this option to speed up performance while you define filters for a large set of data.
  • Load Override List of Variables
    Load more values for attributes that you know are missing from the test data. Loading these missing values adds them to the axes, so they are considered in the coverage calculation.
    Tip: Use the Export Values operation to create a base CSV file as input template, edit it, and fill columns with values to add.
  • Load Invalid or Must Have Combinations
    Highlight variable combinations between a source and a target set of data. A spot that is surrounded by a gray box means that the combination exists in both data sets. A spot without gray box means that the source data has better coverage than the target data. An empty gray box means that this combination in your target data is missing in the source data.
    Example use cases:
    • Compare production data (source) with QA data (target) to identify missing must-have combinations. First, load the target table and use the Export Values operation to create a CSV file. Then, load the source table and use the CSV file as input for the comparison.
    • Highlight invalid combinations that you do not want to test. Load the source table and use the Export Values operation to create a CSV file as template. Manually edit the CSV so that only invalid combinations remain. Then load the edited CSV as input for the comparison.
  • All Pairs Combination
    Overlay an All Pairs Combination test suite over the display. Use this display to identify the coverage of all attribute pairs. Most bugs can be detected by testing interactions between all attribute pairs. It is not possible to test
    all
    attribute combinations, while testing
    all
    pairs is a realistic goal.
  • Visualize Missing Values / Restore Inverse
    Display the inverse of the current data. The display now marks attributes that lack coverage with a green spot. Areas that have at least minimum coverage are not marked. Use this visualization to concentrate on identifying missing data.
  • Export Values
    Export the data values shown on the display. Edit the exported CSV file, and use it as comparison input for the 'Override List of Variables', or 'Invalid and Must Have Combinations' displays.
Switch Graph Types
Click Add Graph in the Toolbar to display a different visualization that represents your data optimally.
  • P-Coords (Parallel Coordinates)
  • Spot Graph
  • Bar Chart
  • Pie Chart
  • Radial Coverage
  • Rect Coverage
  • Data Table
Visualize Data Flow
Open the P-Coords Tab to use a Parallel Coordinates visualization to show how multi-dimensional data flows through the system. The goal of using parallel coordinates is to identify relationships between data dimensions. The thickness of a connector line represents the data coverage: A thick line stands for high coverage, and a thin line for low coverage of this attribute pair. No line means that an attribute pair is not covered.
visualization of values occuring together
visualization of values occuring together
Customize Settings
Click
File, Settings
to configure colors and thresholds for all graphs.
  • Define custom spot colors to represent low, medium, and good coverage thresholds.
  • Add rows and conditions to visualize more fine-grained coverage thresholds.
You can configure
Graph Options
for the SpotGraph.
  • Color by property—By default, the visualizer uses colors to represent the Test Occurrence attribute; it uses three colors to highlight low, adequate, too much coverage relative to a given threshold.
    If you select any other data attribute, the visualizer uses unique colors to distinguish property values.
  • Size by property—Disabled by default. Use this setting to represent different values by different dot sizes.
  • Shape by property—Disabled by default. Use this setting to represent different values by different dot shapes.
  • Horizontal zoom
  • Vertical zoom