ETL+ Load Replace

Load Replace landing page for help links from ETL+.

The Load Replace panel

The user interface for Load Replace appears to represent a conventional, SQL like, filter as might appear in a SQL WHERE clause. However, the internal/behind-the-scene process is a multi-step process. Internally, first the T-SQL expression (right hand side) is evaluated, then a filtered extract from the source using the query system used in the connection to the source object, and finally a load. Most of the time the internal logic anticipates your needs and the end results work out as expected.

Concepts

Replace is a complement to Load All. For comparison, Load All extracts all records from the source table and loads them all into the target table in the data warehouse, completely ‘replacing’ all rows in the target table with new data. (Internally, Load All ensures that all rows in the target table are ‘replaced’ by first deleting all records in that target before beginning to add (load) new rows.)

Replace works most easily with:
Data that doesn’t change after some date or interval of time or
Data where you only need to extract the more recently added records.

 

Proposed instructional text:
Initially:  Load All    Ongoing:  Replace target rows based on the (filter?) criteria.  

Call Outs:

1 = field/column from source table (in Source Object panel)
Proposed Label <candidates) : Source, Source field, filter

2 = constant or T-SQL operator on target (data warehouse) table.
Proposed Label : Target T-SQL

 

Working on the text to appear on the panel. Currently we are going with
Initially:  Load All    Ongoing:  Replace target rows based on the (filter?) criteria.  

 

Example

In this example the filter criteria is “InvoiceDate >= Max(InvoiceDate)".
Field (dropdown menu) shows “InvoiceDate" from the source table.
>= (right hand side) shows Max(InvoiceDate) from the target table.

The Replace Concept

Scenarios & Use Cases

  1. Using a Constant/literal value in the T-SQL/right hand expression.

  2. Using a MAX function in the T-SQL/right hand expression.

  3. Using a T_SQL datetime function in the T-SQL/right hand expression.

 

  1. Using a datetime field for the source field/left hand side.

  2. Using a sequentially incremented key (e.g. a invoice id that is based on a counter that is incremented by 1 after every new invoice record is added).

Concepts

Replace is a type of delta loading option that dramatically speeds up the load after the initial one.

The initial replace is a special case. When a MAX( ) aggregate function is specified in Target T-SQL entry and Replace is run against a new (and therefor empty) target table the load operation is Load All.

Replace deletes all target table’s rows that meet the filter criteria and re-loads rows that meet that criteria from the source table. For example, all data is loaded in the initial InvoiceHeader table extraction, and after it only adds/replaces rows where the control column values >= last extracted InvoiceDate.

For more see ETL+ Replace Concepts

When to Use

  • There’s a single column identifying the relative order in which rows were added to the table. For example a Document Date or an always ascending Document Number.

  • To re-load all records for a recent period that might have records modified and/or deleted. Examples of expressions for reloading recent periods:

    • Last 7 days: DATEADD(DAY, - 7, GETDATE())

    • One calendar month ago: DATEADD(MONTH, - 1, GETDATE())

    • Last day of of the prior calendar month: EOMONTH(GETDATE(), - 1)

    • This calendar month: DATEADD(DAY, 1, EOMONTH(GETDATE(), - 1))

    • Last two calendar months: DATEADD(DAY, 1, EOMONTH(GETDATE(), - 2))

    • This calendar year: DATEADD(YY, DATEDIFF(YY, 0, GETDATE()), 0)

    • Last two calendar years: DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 1, 0)

  • Do NOT use columns that can have their content updated such as LastModifiedDateTime columns. For tables with such columns, use Upsert below instead.

Parameters

  • Select a column in the Field dropdown to function as the Replace filter. This column should identify the order of the table’s row creation.

  • Adjust the Filter Value if needed. By default, the filter value will be Max(ColunmName). You may also customize the Filter Value with values such as:

    • Constants. Ex.: ‘1/1/2022’

    • MS SQL expressions. Ex.: getdate()-30 to Replace all records for the past 30 days.

Requirements

  • Availability of a column to control the delta load subset. Example: a date or a sequentially growing value.

  • The source table allows filtered data extraction.

  • After the initial load, delta loads need columns and formats compatible with the data warehouse table. Addition of new columns and/or increase in column formats require a Load All load to sync up.

  • Rows prior to the max control column value don’t change in the data warehouse. If that can happen, use

Forced Load All (below) to capture modified or deleted records on a less frequent schedule.

Replace Process

1st / Initial Replace (Special Cases)

Concepts. Before Load Replace can find data to replace in the target data warehouse table there must be rows in the table. Pragmatically, the first time Load Replace is run on a target table we want to load all data from the source.
Logic. What actually triggers a Load All, or it’s logical equivalent, is the resolution of the T-SQL entered on the right hand / Target T-SQL side. Any resolution of the Target T-SQL clause that results in a NULL triggers a Load All operation.

  • A T-SQL MAX function against a table that is empty (contains zero records) or doesn't yet exist is evaluated as a NULL.

Ongoing

  1. Evaluates the Target T-SQL clause.
    In the screenshot above, this would be the maximum value of Invoice Date.

  2. IF the Target T-SQL clause from the step above resolves to NULL
    THEN run Load All and end the load.
    Otherwise continue with the next step.

  3. Delete all data warehouse rows where Filter Column >= Filter Value.

  4. Load (append) all source table rows where Filter Column >= Filter Value onto the data warehouse table.

    • The load is first written into a temporary table. Next the appended to the target table.

 

Related Pages