ETL+ Data Warehouse

Schemas

Schemas refer to how Target Objects and data warehouses are named.
The Multi-Tenant section below and the Source Driver used for the Source Object determines how schema’s are named.


Data Warehouse List

Only available for super users.

From the ETL+ main page, click Settings -> Data Warehouse.

To select an existing data warehouse, select the line from the list below and click Save.

Important:

  • When you change the data warehouse of an Entity, close and reopen ETL+ to re-load all its settings.

  • ETL+ needs the following MS SQL database roles to the data warehouse: db_datareader & db_datawriter.


Modify or Add New Data Warehouse

Modify Data Warehouse and New Data Warehouse pages are only available for super users.

Select a data warehouse from the list → Modify.

Or click Add New to create a new one.

The page below appears.

Server name: IP, URL or MS SQL Server instance name. Ex.: (local).

Authentication: Windows or MS SQL Server.

Login and Password.

Multi-Tenant. Determines how SQL schema’s in the data warehouse and Target Objects are named.

Database name: Name of the SQL database.

Multi-Tenant: Defines if the data warehouse will host an organization’s specific data, or if for a multi-tenant platform.

  • Multi-Tenant checkbox not checked:

  • Multi-Tenant checkbox checked: User will only have access to the data warehouse schemas:

    • dbo: read only

    • <ETL+ EntityID>: read and write. All tables are loaded into this SQL schema.


Data Warehouse Tables in SSMS

ETL+ data warehouse tables and databases stored in MS SQL Server which can be accessed via SQL Server Management Studio (SSMS).

After a table has been loaded by ETL+, the results of the refresh can be seen in the Create Date and Row Count of the just refreshed table in the data warehouse.

Load type = Load All: If you have a table set for Load All in ETL+, the Create Date displayed by SSMS displays the date from the last refresh because ETL+'s Load All refresh process drops the existing table and creates a new one in it’s place.

Load type = Replace or Upsert or Append: However, if you have a table set for a delta refresh such as Upsert in ETL+, the refresh process doesn’t drop and re-create the table – so the Create Date doesn’t change.

See also: https://dataself.atlassian.net/wiki/spaces/DS/pages/1571225705

Example from the Screenshot Below

  • The following screenshot is from SSMS → View → Object Explorer Details, then select the Tables section in your data warehouse database. If you don't see some of the columns on the Object Explorer Details panel, right-click on this panel’s column header and select the desired columns.

  • On this example, DataSelf ETL+ started to refresh the tables on 4/28/2022. The Address table is set to Load All, and we can see the latest load was on 9/12/2022. The other 3 tables are using Upsert, therefore their Create Date is when the first load happened.

Logging

More at: https://dataself.atlassian.net/wiki/spaces/DS/pages/1570963835

Performance

  • Large Tables with Delta Loads: If you need to join large SQL tables that are using delta load (Replace, Upsert or Append), and performance is slow to rend the join, we recommend adding indexes via SSMS. Delta Load tables are not deleted by ETL+ and will keep the indexes. If you force a Load All on such tables, remember to recreate the indexes since Load All deletes the table. More at:

  • We recommend MS SQL having at least 50% more RAM than the largest data warehouse table (MS SQL data space used). Provide MS SQL with more RAM than all tables combined for best performance.

  • Use SSDs for the MS SQL data warehouse server.

  • MS SQL 2019 is the current most recommend version.

MS SQL Server Configuration

Database naming convention: Dw_<company-name>_<main-source> (example: Dw_AbcInc_NSuite)

For recommendations and more information about properties see:

 

v2022.08