Delta / Incremental Loads
Incrementally (or delta) loading data after an initial full data load is a widely used ERP scenario. The delta load types in ETL+ are ETL+ Load Upsert, ETL+ Load Replace and ETL+ Load Append. Work in progress.
In the context of ETL, a delta load (also known as a incremental load) refers to loading/refreshing only the changes or updates that have occurred since the last load.
Delta Data Loading by using a Watermark
In this case, you define a watermark in your source database. A watermark is a column that has the last updated time stamp or an incrementing key. The delta loading solution loads the changed data between an old watermark and a new watermark. –
An incremental/delta load is the selective movement of data from one system to another. An incremental load pattern will attempt to identify the data that was created or modified since the last time the load process ran. The selectivity of the incremental load procedure usually reduces the system overhead required for the ETL process.
Initial Loads vs Incremental/Delta Loads.
Initial loads (Load All in ETL+) loads the entire source table from the source into the target data warehouse table. The difference in performance between delta load and initial load depends on factors such as data volume, structure, target, and load type.
Created or Modified (Updated) Indicators
Columns which indicate a row/record was added or updated/modified recently. These columns are sometimes refereed to as watermark columns
CreatedAt/InsertedAt Columns
A column that indicates the order in which rows were added to a table are typically referred to as a "CreatedAt" or "InsertedAt" column. This column usually stores a timestamp indicating when each row was inserted into the table or is a sequentially assigned code such as a order or invoice number.
LastModified/UpdatedAt Column.
A column that indicates the order in which rows were added or updated to a table can be referred to as a "LastModified" or "UpdatedAt" column. This column usually stores a timestamp indicating when each row was inserted into the table or is a sequentially assigned code such as a order or invoice number.
A Change or Modification Flag Field is a special type of LastModified/UpdatedAt Column.
Sequentially Assigned Codes & Incrementing Keys
Sequentially assigned codes such as order numbers or invoice numbers can be used to track the order of operations or events in a system. These codes are often used as primary keys, typically include a number, and are assigned in a sequential manner as new events occur.
WaterMarks & High Watermark
A watermark is a column that has the last updated time stamp or an incrementing key. The delta loading process loads the changed data when the watermark column in the source table is greater than or equal to the high watermark.
ETL+ Watermark Column
For ETL+ the watermark column is designated in the Load panel for Load Replace or Load Upsert.
Source Field indicates the watermark column
ETL+ High Watermark
For ETL+ the high watermark is the maximum value of the designated watermark column in the target table taken as the very first step in the delta load process.
More Pages on Watermarks
A Watermark for data synchronization describes an object of a predefined format which provides a point of reference value for two systems/datasets attempting to establish delta/incremental synchronization; any object in the queried data source which was created, modified, or deleted after the watermark's value will be qualified as "above watermark" and should be returned to the client requesting data.
This approach allows the client to retrieve only the objects which have changed since the latest watermark …
–
Source Change Detection
ETL delta updates require source change detection strategies to indentify and retrieve only the new and changed data since the last load process. Related to Change/Modification flags. ETL+ uses row-level insert/update dates, datetime stamps or sequentially incremented columns.
Row-level dates or datetime stamps: Each table has a column to store the date or datetime the data was loaded and/or modified. This method allows each row to self-identify when it was changed. However, this method relies on the source system reliably updating the modification date or datetime column.
The selection of data to move is often temporal, based on when the data was created or most recently updated. In some cases, the new or changed data cannot be easily identified solely in the source, so it must be compared to the data already in the destination for the incremental data load to work properly.
Conditional update
allows you to selectively update, delete, or insert data based on certain conditions.
update condition
ETL process will load only the data modified after the last successful load. This is called delta load.
Related Pages
Supplement to this page.