Skip to end of banner
Go to start of banner

Delta / Incremental Loads

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

🛠️ Delta loading is also known as Incremental loading. The delta load types in ETL+ are ETL+ Load Upsert, ETL+ Load Replace and ETL+ Load Append. Work in progress.

In the context of ETL, a delta load (also known as a incremental load) refers to loading/refreshing only the changes or updates that have occurred since the last load.

An incremental/delta load is the selective movement of data from one system to another. An incremental load pattern will attempt to identify the data that was created or modified since the last time the load process ran. The selectivity of the incremental load procedure usually reduces the system overhead required for the ETL process.

Initial Loads vs Incremental/Delta Loads.

Initial loads (Load All in ETL+) loads the entire source table from the source into the target data warehouse table. The difference in performance between delta load and initial load depends on factors such as data volume, structure, target, and load type.

WaterMarks

A watermark is a column that has the last updated time stamp or an incrementing key. The delta loading solution loads the changed data between an old watermark and a new watermark.

A Watermark for data synchronization describes an object of a predefined format which provides a point of reference value for two systems/datasets attempting to establish delta/incremental synchronization; any object in the queried data source which was created, modified, or deleted after the watermark's value will be qualified as "above watermark" and should be returned to the client requesting data.

This approach allows the client to retrieve only the objects which have changed since the latest watermark …
https://en.wikipedia.org/wiki/Watermark_(data_synchronization)

Source Change Detection

To implement ETL delta updates, you can use source change detection to retrieve only the new and changed data since the last load process. Related to Change/Modification flags. ETL+ uses row-level insert/update dates, datetime stamps or sequentially incremented columns.

  1. Row-level dates or datetime stamps: Each table has a column to store the date or datetime the data was loaded and/or modified. This method allows each row to self-identify when it was changed. However, this method relies on the source system reliably updating the modification date or datetime column.

The selection of data to move is often temporal, based on when the data was created or most recently updated. In some cases, the new or changed data cannot be easily identified solely in the source, so it must be compared to the data already in the destination for the incremental data load to work properly.

Conditional update

allows you to selectively update, delete, or insert data based on certain conditions.

update condition

ETL process will load only the data modified after the last successful load. This is called delta load.

Related Pages

  • No labels