Using the Integration Wizard to Create a User-defined Data Source

The integration wizard helps you define data sources and mappings for your environment in order to pull security data into
Symantec ICA
. The following tasks describe how to use the integration wizard:

Configuring a User-defined Data Source

A data source is a database, file, or API source, such as Splunk, ArcSight, or Symantec ICDx, that can connect to the
Symantec ICA
database. Configuration of a data source involves first identifying the source, and then writing a query to pull data from the source. A data source can be associated to more than one query, but a query can only be related to one data source.
To configure the data source for a user-defined integration pack, do the following steps:
Connection issues are stored as error messages in the
Log_DataTransformation
table. Refer to Troubleshooting the Integration Wizard for more information about troubleshooting connection issues.
  1. Ensure that the server name, port, login name, and password for the data source are available. If the data source is a database, then also have the database name and domain account for the data source.
  2. Ensure that the ports are open on the integration platform (web or database server), and the external system.
  3. Ensure that the query that will run on the data source returns the expected results.
  4. In the
    Symantec ICA
    administration section, select
    Integration
    , and then select
    Data Sources
    .
  5. Select
    User Defined
    from the list of data sources.
  6. Click
    Create Data Source
    .
  7. Select the data source type. The default data source is a Microsoft SQL Server database. Other data sources are listed.
  8. Enter the data source label. The data source label is displayed in the integration wizard. It does not affect the data integration.
  9. Enter following information based on the data source type:
    Data Source Type
    Fields
    ArcSight
    • Data Source Label: The name for the data source.
    • API Server: The name or IP address of the API server.
    • API Port: The API port to connect to for pulling the data.
    • Proxy: The API connection proxy.
    • API Login: The login name for the API connection. This name and domain must match the LDAP account name in ArcSight.
      To verify the account name, log in to ArcSight Logger console with the name and password. If the login fails, then ensure that the user including the domain name was created correctly in ArcSight.
    • API Password: The password for the API login name.
    • Verify Password: The password for the API login name.
    • Use HTTPS: Select
      Yes
      to use HTTPS, or
      No
      to not use HTTPS.
    Dropbox
    • Data Source Label: The name for the data source.
    • Username: The user name for connecting to the database. This field is available when using User/Pass authentication mode.
    • Password: The password for the user name to connect to the database. This field is available when using User/Pass authentication mode.
    • Verify Password: The password for the user name to connect to the database.
    • Download Directory: The directory with the files to be downloaded to
      Symantec ICA
      .  This is a path on the local server where the integration process retrieves the files from the source folder and then places it in the download directory.
    • Source Folder: The folder that has the files. The integration process looks for files in the source directory specified in the query, and copies those files to the download directory before processing them. If the source folder has a remote path that starts with \\, then the integration process uses the authentication specified.
    Elasticsearch
    • Data Source Label: The name for the data source.
    • API Server: The name or IP address of the API server.
    • API Port: The API port to connect to for pulling the data.
    • Proxy: Not applicable.
    • API Login: The login name for the API connection. This name must match the account name in Elasticsearch.
      To verify the account name, log in to Elasticsearch console with the name and password. If the login fails, then ensure that the user was created correctly in Elasticsearch.
    • API Password: The password for the API login name.
    • Verify Password: The password for the API login name.
    • Use HTTPS: Select
      Yes
      to use HTTPS, or
      No
      to not use HTTPS.
    • Protocol Type: The protocol to use to communicate with Elasticsearch.
    If you use the Dev Tools tool in Elasticsearch Kibana, then you can develop the query there, and copy the query content into the query statement field.
    FileSystem
    • Data Source Label: The name for the data source.
    • Server Name: The name of the SQL Server.
    • Authentication Mode: The type of authentication used to connect to the database.
    • Download Directory: The directory with the files to be downloaded to
      Symantec ICA
      . This is a path on the local server where the integration process retrieves the files from the source folder and then places it in the download directory.
    • Source Folder: The folder that has the files. The integration process looks for files in the source directory specified in the query, and copies those files to the download directory before processing them. If the source folder has a remote path that starts with \\, then the integration process uses the authentication specified.
    QRadar
    • Data Source Label: The name for the data source.
    • API Server: The name or IP address of the API server.
    • API Port: The API port to connect to for pulling the data.
    • API Path: The API path to the data.
    • Proxy: The API connection proxy.
    • API Login: The login name for the API connection.
    • API Password: The password for the API login name.
      The API password field can be used with the API login as a password field, or without the API login as an authorized service token from QRadar.
    • Verify Password: The password for the API login name, or the authorized service token.
    • Use HTTPS: Select
      Yes
      to use HTTPS, or
      No
      to not use HTTPS.
    Splunk
    • Data Source Label: The name for the data source.
    • API Server: The name or IP address of the API server.
    • API Port: The API port to connect to for pulling the data.
    • API Path: The API path to the data.
    • Proxy: The API connection proxy.
    • API Login: The login name for the API connection.
    • API Password: The password for the API login name.
    • Verify Password: The password for the API login name.
    • Use HTTPS: Select
      Yes
      to use HTTPS, or
      No
      to not use HTTPS.
    SQL Server
    • Data Source Label: The name for the data source.
    • Server Name: The name of the server that has the files.
    • Database Name: The name of the database.
    • Server Port: The port used to connect to the database.
    • Authentication Mode: The type of authentication used to connect to the database.
    Symantec ICDx
    • Data Source Label: The name for the data source.
    • API Host Name: The name of the API host.
    • API Login: The log in name for the API connection.
    • API Password: The password for the API log in name.
    • Verify Password: The password for the API log in name.
  10. Click
    Save
    to save the data source.

Creating a Query

There are two types of queries that can be associated with a data source. The query can either pull data by querying a table on the data source, or pull data from a Microsoft Excel spreadsheet file or comma-separated values (CSV) file.
When designing a query for a data source, do the following steps:
  • Ensure the query compiles and runs against the data source before using it in
    Symantec ICA
    .
  • Cast the data types as follows:
    • Integer data types to
      int
      data type, such as
      SELECT CAST (EP.EPEventID as int) as SourceEventID
    • NVARCHAR data types to
      NVARCHAR
      data type, such as
      SELECT CAST (DIP.Name as nvarchar(100)) as         \
      DestinationIPAddress
    • DATETIME data types to
      datetime
      , such as
      SELECT CAST (EP.EventDate as datetime) as EventDate
  • Include a date and event identifier in the query.

Querying a Table

To create a query that uses a table, do the following steps:
  1. Right-click the data source, and select
    Create Query
    .
  2. Enter the query name and description.
  3. (Elasticsearch only) Enter the query index name or index alias.
  4. Enter the query statement.
    An Elasticsearch query must specify a
    _source
    array, and include the properties. The properties become the columns in the staging table.
    The Elasticsearch query must also specify
    sort
    , and should be a single property. This property is used for watermarking against the queried index. Each integration run uses this property with the currently persisted value to determine where to start to pull new data.
    The specification of
    script_fields
    is supported for Elasticsearch. Each script_field becomes a column in the specified staging table.
  5. Enter a table name. The name should include the
    Stg_
    prefix, and identify the client and entity, such as
    Stg_
    <ClientName><EntityName>
    . If the table name field is left blank, then a number is generated for the table name, such as
    Stg_11
    .
  6. Enter the number of days to retain data in the staging table.
  7. Enter the number of minutes to wait before restarting a non-processing job.
  8. (Optional) Enter custom options for the integration, such as adding a column or deleting a row. For example, entering
    -AddColumn:ImportDate:2019-03-21
    adds a column that includes the date for March 21, 2019.
  9. (Splunk only) Enter the following fields for Splunk.
    The following fields overwrite the custom settings in the XML file. Individual linked server queries can use different settings.
    1. Enter the application name. This is the same name as appears on the Splunk server. Default is search.
    2. Enter the user name. Default is the log-in user name for the linked server.
    3. Enter polling interval. This is the amount of time for reporting the debug information for the importer. Default is 30.
    4. Enter the timeout value. This is the amount of time to wait for completion after starting the query. Default is 3600.
    5. Select the log off option. Default is true.
    6. Enter the maximum number of retries for the query. Default is 5.
    7. Enter the search timeout value. Default is 3600.
    8. Select the job completion option. Default is to remove the job after completion.
    9. Enter the number of minutes to search the buffer. Default is 180 minutes.
    10. Enter the page size. Default is 25000.
    11. Enter the page timeout. This is the amount of time to wait for a page to be returned by the query.
  10. (Optional) Test the query, and review the results, as follows:
    1. Click
      Test Query.
    2. Enter the sample size and timeout value.
    3. Click
      Run
      . The query runs or returns an error. If the query runs, then the results are shown in the Query Results field.
    4. Adjust the query and run it again, if needed.
    5. When the query meets your requirements, click
      Use Query
      . Any changes to the original query are transferred to the Data Source Query Editor. The updated query is not saved until Save is clicked.
  11. Click
    Save
    . Clicking Save runs the query and creates the table. The Watermarking/Scheduling page opens.
    If the staging table already exists, then you are prompted if you want to recreate the staging table.
  12. Enter the watermark column, value, and search slice. The watermark should be a unique, incremental value. Not specifying the watermark field causes all data to be queried and uploaded the first time the query is run. The system then sets the watermark to the last data value of query. The search slice sets the interval time for data retrieval. This setting divides the amount of data so the system does not violate quotas. For example, setting this to 180 will retrieve 3 hours of data at a time. The default is 120.
    The Splunk importer relies on the Splunk
    _time
    field for retrieving time series information as well as for storing the watermark for the subsequent data pulls. If another field is used to store the watermark, then it would conflict with how Splunk retrieves the information which could create unexpected results in the staging data.
    For example, Splunk uses the
    _time
    field when setting up time boundaries on a query. If Splunk is set to get the previous 12 hours of data, then internally Splunk parses the query and only retrieves data that has a
    _time
    field less than or equal to
    -12hours
    .  If the system is set to use a different field in the query results as the watermark, such as
    LastModifiedDate
    , then it could introduce unexpected behavior and results.
    The query search should return both the
    _time
    field as well as an epoch representation of the
    _time
    field. This may be accomplished by using the
    eval
    and
    table
    commands, as shown in the following:
    index="_internal" (sourcetype="splunkd_ui_access") (method="DELETE") | eval epochtime = _time | table _time, epochtime, host, source, sourcetype, clientip, bytes
    • For versions earlier than
      version
      6.5.4 MP1:
      The
      epochtime
      field would be set as the Watermark Column, and an appropriate value would be stored in the Watermark Value field, such as the following:
      Watermark Column: epochtime
      Watermark Value: 1609459200
    • For version 6.5.4 MP1, and later:
      The
      epochtime
      field or the _time field would be set as the Watermark Column. To use the
      epochtime
      field, an appropriate value would be stored in the Watermark Value field. To use the
      _time
      field, select it as the Watermark Column and set the appropriately-formatted ISO8601 time with offset in the Watermark Value field, such as the following:
      Watermark Column: _time
      Watermark Value: 2021-01-01T00:00:00.000-05:00
  13. (Optional) Click the plus sign to add the queries that must be run before running this query. If a selected query must complete successfully before running this query, then select the
    Must Succeed
    check box.
  14. (If no queries must be run before this query) Set the initial run for a date and time. This field determines when the job is run in
    Symantec ICA
    . The schedule is not adhered to if you added queries that must run before this query.
  15. (If no queries must be run before this query) Set the frequency of the query. Data source processing is separate from the nightly job processing. Setting the processing interval to Daily is usually sufficient.
  16. Enter the timeout interval in seconds.
    Entering 0 (zero) in the field sets the timeout interval to one week.
  17. (Optional) Click
    Test Query
    , and then click
    Run
    to test the processing job. You can modify the query, and save it by clicking
    Use Query
    .
  18. Click
    Save
    .

Creating a Data Integration

A data integration pulls specific data from a data source, and maps the data to
Symantec ICA
fields. First, an integration pack is created, then an import rule is defined, and then the rule is mapped to fields in a
Symantec ICA
LDW table.
An import rule can have multiple import rule mappings. For example, an import rule for DLP data could have a mapping for computer endpoints, another mapping for applications, and another mapping for users.
To create a data integration using the integration wizard, do the following steps:
  1. In the
    Symantec ICA
    administration section, select
    Integration.
  2. Select the
    Data Integrations
    tab.
  3. Click
    Create Integration Pack
    .
  4. Enter a name and description for the pack, and click
    Save
    .
  5. Right-click the integration pack, and select
    Create Import Rule.
  6. Enter a name and description for the rule.
  7. (Optional) Create additional import rules, as needed.
  8. Right-click the import rule, and select
    Create Import Rule Mapping
    .
  9. Enter or set the following items:
    • Mapping Name: The name of the mapping.
    • Description: The description of the mapping.
    • Data Source: The data source. Available data sources are listed in the field.
    • Query: The query associated with the data source. Available queries are listed in the field.
    • Symantec ICA
      Processing Watermark: Leave this field blank.
    • Run Intra-day: Select
      Yes
      or
      No
      . This field refers to the intraday ingestion process from the data source to the staging table. Setting this to Yes means that the data is staged and loaded in the staging tables for that data source, regardless of the event or entity type. Only DIM data processes the cube-related steps in the job.
    • Run Order: The order of the import process compared to other processes.
      Refer to Considerations Before Using the Integration Wizard for information about the recommended order for the import process.
    • Compression Thresholds: Sets the amount of session window time for inactivity between similar events. For example, if the compression threshold is set to 60 seconds and events come in every second with the same signature, then that session window continues to count the events until there is a 60-second period when no more events come in. Leave the field blank to use the system default.
    • Entity Type: The type of entity associated with the mapping, such as Computer Endpoint or Web Activity.
    • Update Pre-Processing Table: Updates the pre-processing table.
    • Create Entities: Select
      Yes
      to create entities. Select
      No
      to only update existing entities.
      If set to No, then select the entity key.
  10. Map the source fields to the listed
    Symantec ICA
    fields. The listed fields are specific to the entity type.
    • If the type is Source Column, then select the value for the entity.
    • If the type is Formula, then enter a formula or select one from the list in the Value field. A formula is any valid SQL statement that returns a value. Use
      { }
      (braces) for placeholder variables, and
      [ ]
      (brackets) for absolute variables. A default value can also be set.
      The formula option allows you to act on data in the staging tables before the data goes to the LDW tables. For example, a formula could add a prefix to incoming data, or convert alphabetic country codes to numeric country codes.
    • If there are name fields, then you must map the name field to a name and domain field.
      The entity column format, such as NVARCHAR or DATETIME, appears with the entity column name.
  11. Click
    Save
    .
  12. Run the nightly processing job after creating the data sources and mappings. Until the nightly processing job is run, the data is not imported to
    Symantec ICA
    .

Editing, Enabling, Disabling or Deleting a Data Source or Data Integration

After creating an integration pack with the integration wizard, the data source and data integration can be edited, enabled, disabled, or deleted as described in the following sections.
When you reset a data source, you must create a data source connection instead of editing an existing connection. Editing an existing data source connection causes the existing incidents to be over-written by the new incidents.

Editing a Data Source

To edit a data source, do the following steps:
  1. In the
    Symantec ICA
    administration section, select
    Integration
    , and then select
    Data Sources
    .
  2. Right-click the data source, and then select
    Edit Data Source
    .
  3. Edit the fields, as needed.
  4. Click
    Save
    to save the edits.

Editing Data Integration Settings

To edit a data integration setting, do the following steps:
  1. In the
    Symantec ICA
    administration section, select
    Integration.
  2. Select the
    Data Integrations
    tab, right-click the pack name, rule or rule-mapping name, and select the edit option. For example, right-click a mapping name to edit the rule mapping.
  3. Edit the fields, as needed.
  4. Click
    Save
    to save the edits.

Enabling, Disabling or Deleting an Integration Pack

To enable, disable or delete an integration pack, do the following steps:
  1. In the
    Symantec ICA
    administration section, select
    Integration.
  2. Select the
    Data Integrations
    tab.
  3. Right-click the integration pack, and then do one of the following:
    • To enable the integration pack, select
      Enable Integration Pack
      .
    • To disable the integration pack, select
      Disable Integration Pack
      .
    • To delete the integration pack, select
      Delete Integration Pack
      , and click
      Yes
      to confirm the deletion, or
      No
      to cancel the deletion.

Enabling, Disabling or Deleting a Data Source

To enable, disable or delete an integration pack, do the following steps:
  1. In the
    Symantec ICA
    administration section, select
    Integration.
  2. Select the
    Data Sources
    tab.
  3. Right-click the data source, and then do one of the following:
    • To enable the data source, select
      Enable Data Source
      .
    • To disable the data source, select
      Disable Data Source
      .
    • To delete the integration packdata source, select
      Delete Data Source
      , and click
      Yes
      to confirm the deletion, or
      No
      to cancel the deletion.

Enabling, Disabling or Deleting an Import Rule

To enable, disable or delete an import rule from a data integration, do the following steps:
  1. In the
    Symantec ICA
    administration section, select
    Integration.
  2. Select the
    Data Integrations
    tab.
  3. Click the plus sign next to the integration pack.
  4. Right-click the import rule, and and then do one of the following:
    • To enable the import rule mapping, select
      Enable Import Rule
      .
    • To disable the import rule mapping, select
      Disable Import Rule
      .
    • To delete the import rule mapping, select
      Delete Import Rule
      , and click
      Yes
      to confirm the deletion, or
      No
      to cancel the deletion.

Enabling, Disabling or Deleting an Import Rule Mapping

To enable, disable or delete an import rule mapping from a data integration, do the following steps:
  1. In the
    Symantec ICA
    administration section, select
    Integration.
  2. Select the
    Data Integrations
    tab.
  3. Click the plus sign next to the integration pack, and then click the plus sign next to the import rule.
  4. Right-click the import rule mapping, and and then do one of the following:
    • To enable the import rule mapping, select
      Enable Import Rule Mapping
      .
    • To disable the import rule mapping, select
      Disable Import Rule Mapping
      .
    • To delete the import rule mapping, select
      Delete Import Rule Mapping
      , and click
      Yes
      to confirm the deletion, or
      No
      to cancel the deletion.

Enabling, Disabling or Deleting a Data Query

A data query can be enabled or disabled after it is created. Disabling a query lets you stop the query without editing the schedule or deleting the query when its data source is not available. To enable or disable a data query, do the following steps:
  1. In the
    Symantec ICA
    administration section, select
    Integration
    , and then select
    Data Sources
    .
  2. Expand the data source that has the query.
  3. Right-click the data query, and then do one of the following:
    • To enable the data query, select
      Enable Query
      .
    • To disable the data query, select
      Disable Query
      .
    • To delete the data query, select
      Delete Query
      .
    The status of the query is shown in the query details.