Example: Generate Insert Scripts for Oracle from Subset

The fastest way to create a subset of data in GTSubset is to generate direct insert scripts.
The fastest way to create a subset of data in GTSubset is to generate direct insert scripts.
In a simple subset design like in the following example, you want to extract out to a file using, for example, DataPump.
To get the rows for the table AIRCRAFT_LAYOUTS, the select must join on AIRCRAFT_TYPES, FLIGHT_ROUTES, AIRLINES, FREQUENT_FLYER_PROFILES and finally PEOPLE. Joining 6 tables is an inefficient task for any database.
The direct insert scripts rely on a temporary schema being present next to the schema you are subsetting. This schema can be empty, or contain empty copies of the tables in source.
The inserts load the data sequentially according to the extract design. The rows for PEOPLE are inserted first, then for ACCESS_CONTROL, the select is from source, but joined on the rows already loaded into target in the first SQL insert. Similarly, once ACCESS_CONTROL is loaded, the rows for AIRLINES need only join on the already loaded rows in ACCESS_CONTROL. This way, very large data sets can be subsetted in a very streamlined way.
After the subset is complete, you can perform a full schema datapump extract of the target (temp) schema, and load the corresponding files into a database on another server if required.
To generate these types of scripts, first design and save an extract in the normal way.
  1. Select ‘Build SQL Insert Script’ as the script generation option from the drop down.
    The Database Action - Build SQL Insert Script window opens.
  2. Fill in the following fields:
    • Action Name
      Defines the prefix name for the generated scripts
    • Max Parallel Processes
      Defines the Oracle Parallel number to use for the SQL Insert, for example 4.
      INSERT  /*+ append parallel(TRAVEL_E.PEOPLE, 4)
    • Target Database
      Defines the database schema into which you want to insert the subsetted data.
    • Create Table As Select
      Specifies whether generated SQL creates and populates the tables in one SQL statement. This is the fastest option in terms of loading of the data, but the tables are created without any constraints, indexes or keys. Enable this option if your target schema is totally empty.
      If you enable this option, many of the other script options are not necessary, for example, check for constraints, create merge script, or check for existing data.
    • Check for constraints
      Specifies whether you want to generate FK disable and enable scripts based on the constraint present in the target schema. Enable this option only if you are loading into a constrained schema, that is, your target schema has some foreign key constraints. Enabling this options prompts you for a connection to your target schema.
    • Create merge scripts
      Specifies whether you want to generate two additional script types, insert and upsert merge scripts. Use these if you want to merge data from source to target and if target already contains data.
    • Check for existing Data
      Specifies whether to add a ‘NOT EXISTS’ clause to the insert statements, in other words only new (not duplicated) data is inserted. Use this if you did not enable 'Create Merge Script', and your target tables contain data.
    • Database link
      Specifies whether to use a link to reference target or source tables, or both. If your target schema does not reside on the same server as source, choose the option to use database links to join the tables. This requires that the database link already exists.
      Joining tables across DB Links is usually pretty slow, so this option is not recommended.
    • Get Metadata from Repository Connection
      Subset has to query the Oracle catalog tables to get meta information for all the tables in an extract to generate the scripts. For large schemas this can be quite slow, and the generation process can take several minutes to complete. If you register your source tables in Datamaker, enabling this option will get the meta info from the Datamaker repository instead, making the generation process much quicker.
  3. Click generate.