ETL+ Table Load Types

ETL+ load types are Load All, Load Replace, Load Upsert, and Load Append. Every table/file seen on the Target Objects panel has a load type.

The Load panel specifies how data is extracted from it’s source and loaded into the target table in the data warehouse. The load types available on the Load panel are Load All, Load Replace, Load Upsert, and Load Append.

Load Types

The Load panel on the https://dataself.atlassian.net/wiki/spaces/DS/pages/1570013215 offers four load options:
Load All, Load Replace, Load Upsert, and Load Append.

image-20240501-035642.png
Load All (Load panel set to Load All)
image-20240501-035822.png

See also pages on https://dataself.atlassian.net/wiki/spaces/DS/pages/1945600019, and .


Load All

Based on the table’s ETL SQL Statement, reads all data in the source table or file and loads it to the data warehouse table (the destination). In effect, this is a source table/file data replication/mirroring.

Load All is the default setting and is useful when the data volume is small, or there are no ways to perform delta loads.

Load All Process

  • Loads the data to a temporary data warehouse table named TableName_Temp. If the temp table already existed before the process started, Load All will delete the temp table first to be sure it is a clean new load.

  • When the temporary table load completes:

    • If there are no table transformations, ETL+ renames TableName_Temp → TableName.

    • If there are table transformations, ETL+ deletes the data warehouse TableName, does a bulk insert from TableName_Temp into TableName applying transformations, and deletes the temp table.

Temp Tables (<tablename>_Temp)

A temp table (or staging table) is created every time there’s a table load. This temporary table is named <tablename>_Temp where <tablename> is the name of the target table.

The temp table allows its target table to be available for reporting during most of the data load processes (the target table remains unaffected until the very last step of the data load process). Once the temp table has completed a new load process, it takes little time for the target table to incorporate the new data. And then the temp table is deleted.

You might see temp tables in the data warehouse if their tables had errors in the last load attempt or the last load was canceled. Some sources (like OData) might have two temp tables, for instance: ARTran_Temp and ARTran_Temp2.

For more on using the temp table/staging table see .


Replace

Replace is a delta loading option that dramatically speeds up the load after the initial one.

Replace deletes all target table’s rows that meet the filter criteria and re-loads rows that meet that criteria from the source table.

For more see the section.


Upsert

Upsert is a delta loading option that dramatically speeds up the subsequent loads after the initial load. Upsert is a portmanteau – a combination of the words “update” and “insert.”
For more see

When a target table in the data warehouse is empty (has zero rows) Upsert calls a Load All to load all rows from the source table and ends the process there.

The next step in the upsert process is to read the target table to find the Column with Last Modified Value with the highest value. That is, the first step is to query the target table for MAX(Column with Last Modified Value). This high value is analogous to a high watermark. Only those source rows with a watermark >= the high water mark are inserted (added) or updated (changed). The watermark column in source table is the column(s) named in the Column with Last Modified Value.

For more see Load Upsert Process -- Step-by-Step

Update or Insert?

Update existing record. Upsert will update existing rows when the row’s Column with Last Modified Value has changed since the last Upsert. For instance, when a source table’s row’s Column with Last Modified Value was changed since the last ETL load.

Insert new record. Load Upsert inserts new rows from the source table when the source’s column named in the Column with Last Modified Value is greater than or equal to the maximum value of the Column with Last Modified Value on the target table.

The target High Watermark is the maximum (e.g.; highest value) of the target table’s column named in the Column with Last Modified Value.

Upsert does not delete rows with Column with Last Modified Value < the High Watermark imported into the data warehouse. For instance, on a daily upsert refresh, a record deleted a week ago from the source table will not be deleted from the target table in the data warehouse.

Example. All data is loaded in the initial SalesOrderHeader table extraction, and after it only updates rows that have changed since the last load and inserts new ones.

When to Use Load Upsert

  • A large table what is slow to load or a table that needs to updated frequently (e.g.; several times a day).

  • When records are deleted from the source table it’s acceptable to retain them in target table temporarily. (Periodic Load All and Load Replace process will eventually delete those records thereby fully synchronizing the source and target tables.)

  • The source table has a unique key.

  • There is a appropriate column/field available for the Column with Last Modified Value such as a “last modified date-time stamp”.

Parameters

  • Column with Last Modified Value. A always increasing value, usually a date-time column/field, where the value when updated or inserted is >= the High Watermark value.
    Datetime and other data types.

  • Primary Key

Requirements

  • There’s a column flagging modified rows for the Column with Last Modified Value.

  • The table has a primary key.

  • The source table allows filtered and indexed data extraction.

NOTES

  • After the initial load, the column names and formats loaded must be compatible with columns in the target table/data warehouse table.

  • Addition of new columns and/or increase in column formats require one Load All load to sync up.

  • Already loaded records are not deleted. Use Forced Load All (below) to capture deleted records on a less frequent schedule.

Load Upsert Process – Step-by-step

  1. Evaluate <target table>.MAX(<Column with Last Modified Value>. Scan the target table in the data warehouse for the highest value of the column named in Column with Last Modified Value.

    Call this value the Last Highwater mark and save this value for the following steps.

  2. Extract from source. Extract all rows/records from the source table where
    source table.Column with Last Modified Value >= Last Highwater mark.
    Save the extracted rows/records into a temporary staging table.

  3. Delete Selected Rows in Target. Delete all rows in the target table where staging table.Unique Key = target table.Unique Key.

  4. Load from Staging Table. Load (append) all rows in the staging table into the target table.

NOTES

  • Target table refers to a table in the data warehouse. The available target tables are listed in the .

  • Source table refers to a table listed in the .


Append

Use this setting when you want to append the source to the data warehouse table.

If you want to add a DateTimeStamp column, either use a field from the source, or apply a transformation in the SQL Statement (if supported), or use the Design page to set up that transformation (getdate()).

Append Process

  • Append source table rows into the data warehouse table.

    • The load is done on a temporary table similar to Load All. When the load is completed, it’s appended to the target table.

    • To add a DateTime stamp on each append, and the source table not having such column, create a dummy column and use the ETL+ Design page to assign it GETDATE() value.


Job → Forced Load All

Running a Forced Load All on a Refresh Batch guarantees that its tables will have their data completely re-loaded, ignoring preconfigured delta refreshes.

Usage

One can use this feature manually to guarantee the reload of all data at will, or set up scheduled refreshes in conjunction with delta refreshes. For instance, run Replace and/or Upsert on a frequent schedule (ex.: hourly), and Forced Load All in a less frequent schedule (ex.: nightly).

For instance, run Replace and/or Upsert loading every hour, and Forced Replace All once every night. With this setting, already extracted rows that are modified (Replace misses modified records) or deleted on a day (Replace and Upsert miss deleted records) will be captured overnight.

How to configure this:

  1. Set Replace and/or Upsert for the desired table(s) - see

  2. Make a job on the ETL+ Job Page to run the default load (delta) on a frequent schedule.

  3. Create a new job on the ETL+ Job Page to run with Forced Load All:

    1. select the table(s) that use Load type options Replace and/or Upsert on the Job Steps panel of the ETL+ Job Page.

    2. Set this job Forced Load All

    3. schedule this job to run on a less frequent schedule.

 

  • Can often be used in place of a datetime field.

  • The Replace option can specify literals.

  • sections.

  • – the Key+ column on the Design page influences how the load process works.

 

v2023.07