ETL+ Load Replace

Load Replace landing page for help links from ETL+.

The user interface for Load Replace appears to represent a conventional, SQL like, filter as might appear in a SQL WHERE clause. However, the internal/behind-the-scene process is a multi-step process.
Most of the time the internal logic anticipates your needs and the end results work out as expected. For more see the Replace Process section below.

Concepts

Replace is a complement to Load All. Replace works like a Load All based on a selected records (a filtered extract) from the source table. For comparison, Load All extracts all records from the source table and loads them all into the target table in the data warehouse, completely ‘replacing’ all rows in the target table with new data.

Load Replace Adds, Updates and/or Deletes Rows in the Target Table

Internally, Load All ensures that all rows in the target table are ‘replaced’ by first deleting all records in that target before beginning to add (load) new rows.

Use Cases

Replace works most easily with:
Data that doesn’t change after some date or interval of time or
Data where you only need to extract the more recently added records.

Load Replace Panel

The on page instructions for Load Replace read:
“1st: Insert rows based on the condition. Ongoing: Replace rows based on the condition.” WHERE

  • “1st” means the load operation when the target table is empty (contains zero rows).

  • The “condition” refers to the conditional expression Source Field >= Target Value or Constant.

Example 1: Load panel configured for Load Replace

Source Field. Name of the field/row on the source table.
This column should identify the order of the rows creation in the table.

Target Value or Constant. Is a T-SQL expression performed on the target table.
This expression is evaluated against the target table and the resulting value is used to filter the data extracted from the source table.
The expression can be:

Example

Example 2: When evaluated the value of the expression in the Target Value or Constant field is = Today’s date - 7 days. Please note the recommendation below.
  • Last 7 full days. Returns a date data type: DATEADD(DAY,-7,CONVERT(DATE, GETDATE()) 

    • When run on 2024-05-07 at 10:24:48.168  results in a date =  2024-05-01

  • Last 7 days. Returns a datetime data type: DATEADD(DAY, - 7, GETDATE())

    • When run on 2024-05-07 at 10:24:48.168  results in a datetime =  2024-05-01 10:24:48.168

    • The datetime value = current datetime - (7 X 24 hours)

  • For Sage 100 data sources refer to the https://dataself.atlassian.net/wiki/spaces/DS/pages/2277867534 page.


When to Use Replace

  1. When there is a single column/field from the Source Field box that:

    • identifies the relative order in which rows were added to the table

    • and where the value assigned to the column is not changed.

For example a Document Date or an always ascending Document Number where the value is assigned once and not changed.

Do Not use columns that can have their content updated such as LastModifiedDateTime columns. If all date columns can be periodically updated then use Load Upsert instead.

  1. When you only need to re-load those records from a recent period that might have had records modified and/or deleted.

Date expressions for Target Value or Constant box

  • Last 7 days: DATEADD(DAY, - 7, GETDATE())

  • Last day of of the prior calendar month: EOMONTH(GETDATE(), - 1)

When Source Fields with DateTime data types contain only dates. (Time is always = 00:00:00.000)

If the date column selected for Source Field only carries date values (no time, or time = 00:00:00.000) then use a CONVERT(DATE, GETDATE()) expression instead of just GETDATE() in Target Value or Constant.

Load Replace is a several step process involving queries against the target table in the data warehouse and a extract from the data source. These steps reply on different file or data base processing systems. This difference in software, variations in SQL dialects, etc. can pose challenges for maintaining logical consistency across processing systems.

Know Issues with Sage100 Date & DateTime Columns

When run on 2024-05-03 at 10:24:48.168  the expression DATEADD(DAY,-2,CONVERT(DATE, GETDATE())    results in a datetime value of  2024-05-01 00:00:00.000.

When a Sage100 table (source type ODBC) is extracted as part of a Load Replace, the step in the Load Replace process will evaluate the WHERE logic as expected but Sage100 will ignore the time portion of the datetime column.

WHERE 2024-05-01 00:00:00.000 >= 2024-05-03 10:24:48.168

For Sage100 see also especially .

 

Fiscal Period expressions (YYYYPP) for Target Value or Constant box

  • Beginning of this year: SELECT [FY_NoInt4]*100 FROM [SchemaName].[4_Today]

For more examples see

Load Replace Requirements

  • Availability of a column to control the delta load subset. Example: a date or a sequentially growing value.

  • The source table allows filtered data extraction.

  • After the initial load, delta loads need columns and formats compatible with the data warehouse table. Addition of new columns and/or increase in column formats require a Load All load to sync up.

  • Rows prior to the max control column value don’t change in the data warehouse. If that can happen, use


Force Load All Option

The Force Load All option on the Job properties panel of the the ETL+ Job page supports maintaining tables normally refreshed with Load Replace.


The Replace Concept

Scenarios & Use Cases

  1. Using a Constant/literal value in the T-SQL/right hand expression.

  2. Using a MAX function in the T-SQL/right hand expression.

  3. Using a T_SQL datetime function in the T-SQL/right hand expression.

 

  1. Using a datetime field for the source field/left hand side.

  2. Using a sequentially incremented key (e.g. a invoice id that is based on a counter that is incremented by 1 after every new invoice record is added).

Concepts

Replace is a type of delta loading option that dramatically speeds up the load after the initial one.

The initial replace is a special case. When a MAX( ) aggregate function is specified in Target T-SQL entry and Replace is run against a new (and therefor empty) target table the load operation is Load All.

For example, all data is loaded in the initial InvoiceHeader table extraction.

After the initial replace, all runs of Load Replace 1) delete all target table’s rows that meet the filter criteria then 2) re-loads rows that meet that criteria from the source table.

Replace Process

1st / Initial Replace (Special Cases)

Concepts. Before Load Replace can find data to replace in the target data warehouse table there must be rows in the table. Pragmatically, the first time Load Replace is run on a target table we want to load all data from the source.

Load Replace – Step-by-step

  1. Evaluate Target Value or Constant. Evaluate the T-SQL expression in Target Value or Constant clause.
    The expression evaluations to a single value. Save this value for the following steps.

  2. First time? IF the Target Value of Constant expression from the step above resolves to NULL
    THEN run Load All and end the load.
    Otherwise continue with the next step.

  3. Delete Selected Rows in Target. Delete all rows in the target table in the data warehouse where Source Field >= Target Value or Constant.

  4. Extract from Source. Extract all rows/records from the source table where Source Field >= Target Value or Constant into a temporary staging table.

  5. Load from Staging Table. Load (append) all rows in the staging table into the target data warehouse table.

 

 

  • The Replace option can specify literals.

  • Use on Job page

  • – see section on Tests for how to test these date functions in SSMS.

 

  • – Use of CONVERT(Date, ... function