ETL+ Load Replace
Load Replace landing page for help links from ETL+.
The Load Replace panel
The user interface for Load Replace appears to represent a conventional, SQL like, filter as might appear in a SQL WHERE clause. However, the internal/behind-the-scene process is a multi-step process. Internally, first the T-SQL expression (right hand side) is evaluated, then a filtered extract from the source using the query system used in the connection to the source object, and finally a load. Most of the time the internal logic anticipates your needs and the end results work out as expected.
Examples
-
ETL+ Load Replace Example -- PK on Source Field/Watermark — Examples of source data and the target tables that result from a Load Replace.
-
ETL+ Load Replace Example -- Keys & Indexes for Target Table, DW PK not on Source Field/Watermark — Examples of source data and the target tables that result from a Load Replace focusing on the keys and indexes created for the resulting target table created in the data warehouse.
For Excel and .csv file data sources designating a column as a DW PK on the Design page has no impact on the columns in the target table that are indexed. -
ETL+ Load Replace Example -- Duplicate key in Source & PK on Source Field/Watermark — Examples of source data and the target tables that result from a Load Replace.
-
ETL+ Load Replace Example 2 - Primary Key Prevents Duplicates? — Examples of source data and the target tables that result from a Load Replace.
Key point: The so called DW PK column in the Design page does not put a PK in the target table!
Concepts
Replace is a complement to Load All. For comparison, Load All extracts all records from the source table and loads them all into the target table in the data warehouse, completely ‘replacing’ all rows in the target table with new data. (Internally, Load All ensures that all rows in the target table are ‘replaced’ by first deleting all records in that target before beginning to add (load) new rows.)
Replace works most easily with:
Data that doesn’t change after some date or interval of time or
Data where you only need to extract the more recently added records.
Proposed instructional text:
Initially: Load All Ongoing: Replace target rows based on the (filter?) criteria.
Call Outs:
1 = field/column from source table (in Source Object panel)
Proposed Label <candidates) : Source, Source field, filter
2 = constant or T-SQL operator on target (data warehouse) table.
Proposed Label : Target T-SQL
Working on the text to appear on the panel. Currently we are going with
Initially: Load All Ongoing: Replace target rows based on the (filter?) criteria.
Example
InvoiceDate >= Max(InvoiceDate)
".Field (dropdown menu) shows “
InvoiceDate
" from the source table.>= (right hand side) shows
Max(InvoiceDate)
from the target table.The Replace Concept
Scenarios & Use Cases
Using a Constant/literal value in the T-SQL/right hand expression.
Using a MAX function in the T-SQL/right hand expression.
Using a T_SQL datetime function in the T-SQL/right hand expression.
Using a datetime field for the source field/left hand side.
Using a sequentially incremented key (e.g. a invoice id that is based on a counter that is incremented by 1 after every new invoice record is added).
Concepts
Replace is a type of delta loading option that dramatically speeds up the load after the initial one.
The initial replace is a special case. When a MAX( )
aggregate function is specified in Target T-SQL entry and Replace is run against a new (and therefor empty) target table the load operation is Load All.
Replace deletes all target table’s rows that meet the filter criteria and re-loads rows that meet that criteria from the source table. For example, all data is loaded in the initial InvoiceHeader table extraction, and after it only adds/replaces rows where the control column values >= last extracted InvoiceDate.
For more see ETL+ Replace Concepts
When to Use
There’s a single column identifying the relative order in which rows were added to the table. For example a Document Date or an always ascending Document Number.
To re-load all records for a recent period that might have records modified and/or deleted. Examples of expressions for reloading recent periods:
Using date fields:
Last 7 days: DATEADD(DAY, - 7, GETDATE())
One calendar month ago: DATEADD(MONTH, - 1, GETDATE())
Last day of of the prior calendar month: EOMONTH(GETDATE(), - 1)
This calendar month: DATEADD(DAY, 1, EOMONTH(GETDATE(), - 1))
Last two calendar months: DATEADD(DAY, 1, EOMONTH(GETDATE(), - 2))
This calendar year: DATEADD(YY, DATEDIFF(YY, 0, GETDATE()), 0)
Last two calendar years: DATEADD(YY, DATEDIFF(YY, 0, GETDATE()) - 1, 0)
Using fiscal period fields (YYYYPP):
Beginning of this year: SELECT [FY_NoInt4]*100+1 FROM [SchemaName].[4_Today]
Beginning of last year: SELECT [FY_NoInt4]*100+1 FROM [SchemaName].[4_Today]
From X periods ago (assuming 12 periods/yr. For 13per/yr, replace 88 with 87):
SELECT FP_YYYYPP_NoInt6 - CASE WHEN [FP_NoInt2]>X THEN X ELSE (88+X) END FROM [SchemaName].[4_Today]
Do NOT use columns that can have their content updated such as LastModifiedDateTime columns. For tables with such columns, use Upsert below instead.
Parameters
Select a column in the
Field
dropdown to function as the Replace filter. This column should identify the order of the table’s row creation.Adjust the
Filter Value
if needed. By default, the filter value will beMax(ColunmName)
. You may also customize theFilter Value
with values such as:Constants. Ex.: ‘1/1/2022’
MS SQL expressions. Ex.: getdate()-30 to Replace all records for the past 30 days.
Requirements
Availability of a column to control the delta load subset. Example: a date or a sequentially growing value.
The source table allows filtered data extraction.
After the initial load, delta loads need columns and formats compatible with the data warehouse table. Addition of new columns and/or increase in column formats require a Load All load to sync up.
Rows prior to the max control column value don’t change in the data warehouse. If that can happen, use
Forced Load All (below) to capture modified or deleted records on a less frequent schedule.
Replace Process
1st / Initial Replace (Special Cases)
Concepts. Before Load Replace can find data to replace in the target data warehouse table there must be rows in the table. Pragmatically, the first time Load Replace is run on a target table we want to load all data from the source.
Logic. What actually triggers a Load All, or it’s logical equivalent, is the resolution of the T-SQL entered on the right hand / Target T-SQL side. Any resolution of the Target T-SQL clause that results in a NULL triggers a Load All operation.
A T-SQL
MAX
function against a table that is empty (contains zero records) or doesn't yet exist is evaluated as a NULL.
Ongoing
Evaluates the Target T-SQL clause.
In the screenshot above, this would be the maximum value of Invoice Date.IF the Target T-SQL clause from the step above resolves to NULL
THEN run Load All and end the load.
Otherwise continue with the next step.Delete all data warehouse rows where Filter Column >= Filter Value.
Load (append) all source table rows where Filter Column >= Filter Value onto the data warehouse table.
The load is first written into a temporary table. Next the appended to the target table.
Related Pages
T-SQL String Constants (String Literals) The Replace option can specify literals.