ETL+ for MS Excel
ETL+ reads and extracts data from Microsoft Excel XLS, XLSX, and XLSM files.
ETL Sources & ETL Source Tables
ETL Source. An ETL Source refers to the connection to a data source. Connections to Excel files are maintained in the ETL Source panel with help from the Microsoft Excel page. The screenshot below shows two ETL Sources named Excel Workbook
and zData Warehouse
.
ETL+ supports a number of different types of data sources and data source drivers.
An ETL Source can have one or more ETL Source Tables.
zData Warehouse
is a default ETL Source with a special function.
ETL Source Table. A ETL Source table refers to a table that ETL+ can extract data from. In the screenshot below Sheet1$
is a ETL Source table belonging to the ETL Source named Excel Workbook.
Excel Workbook
is the name of the ETL Source and Sheet1$
is a ETL Source Table in Excel Workbook
. ETL Source Panel
Add a New MS Excel Source
Open and Login to ETL+ for the ETL+ entity. For more see https://dataself.atlassian.net/wiki/spaces/ETL202208/pages/2032239112.
Open the https://dataself.atlassian.net/wiki/spaces/ETL202208/pages/2032239184.
Click Add Source > MS Excel on the ETL Source panel.
Configure the Microsoft Excel page.
Notes
Read the warning below about Data Type Issues and how ETL+ interprets the data in Excel columns. Recommend always checking the data from Excel with the Target Object panels’s Design page.
To test and verify the data extracted from the data source see the headings below.
https://dataself.atlassian.net/wiki/spaces/DS/pages/1816887297 Page from DataSelf Training.
Troubleshooting – https://dataself.atlassian.net/wiki/spaces/ETL202208/pages/2032243557
Maintain a MS Excel Source
Select the Excel data source’s name from the ETL Source panel.
Click Properties.
Configure the Microsoft Excel page.
Microsoft Excel page
The Microsoft Excel page adds and maintains metadata about connections to ETL Sources.
Configure the fields shown below.
Click Connect or Save.
Microsoft Excel Page Settings
Alias. A alias name will be automatically populated, but you can customize it.
File Path. Options are:
Enter the path and file name of a local Excel file.
Select Browse to find the file.
Enter a URL address for the file on your local network or internet.
Troubleshooting – https://dataself.atlassian.net/wiki/spaces/ETL202208/pages/2032243557
View Excel Sheets (Tables) Available from a Excel Data Source
Each Excel workbook shown in the ETL Source panel How to show each Excel sheet Excel workbooks shown as a data source in the ETL Source panel is known to ETL+ as a source table.
Select the Excel data source’s name from the ETL Source panel.
Click Refresh.
Mapping Excel Sheet to Data Warehouse
How to add a ETL source table in the ETL Source panel to a target/destination data warehouse table in the ETL Objects panel.
Double-click the Excel data source’s table name from the ETL Source panel.
In the example above the data source table name is
Sheet 1$
.Alternative. Right click the data source’s table > Add Table.
Click the name of the ETL source table in the ETL Objects panel and click Design, Preview or Load.
See the headings below for more about verifying data and scheduling ETL+.
Data Type Issues
Warning – Unexpected Results Extracting Data from Excel
The driver that opens and reads the Excel files controls how the data is read and extracted from Excel. Warning: The driver does not pay attention to the data type assigned to the columns in Excel! This can lead to unexpected results.
The driver looks at the data in the first few rows in each column to determine the data type (float, integer, varchar, etc) and length of character strings the driver will use to extract and data from Excel.
You may need to change the data type and re-cast the data in the Design page.
Recommendation: Read https://dataself.atlassian.net/wiki/spaces/ETL202208/pages/2032243557.
Preview Data
Preview shows you the data in a ETL source table without (up)loading the data into the destination table in the target data warehouse. For Excel workbooks each sheet in the workbook can be a ETL source table.
Click the name of the data source’s table to preview in the ETL Objects panel.
Click Preview.
Click Close to return to the https://dataself.atlassian.net/wiki/spaces/ETL202208/pages/2032239184.
Load Now
How to load data from a data source to the destination data warehouse.
Right-click the name of the table on the ETL Objects panel >
Load Now
.Click Log to look at the log file entries for the load process.
Examine the Log Records
How to see he log entries from recent ETL+ processes.
Click Log.
Schedule the Data Load
On the ETL main page, click
Refresh Batch
(top right icon on the center panel).Select a Refresh Batch or create a new one to load your Excel data on a schedule (help here) → on the right panel, be sure the Excel tables to load have their boxes checked and in the desired load order.
To load them now, click the
Refresh Batch Now
(play icon) on the left panel for the desired Refresh Batch.To schedule the data load:
Click
WTS Settings
-> enter the WTS parameters →Apply
.This step might require you to run ETL+ app as “Run as administrator” in Windows.
Click
Close
->Close
.
Testing
After ETL+ has loaded your Excel files how to verify that the data extraction process finished successfully and number of rows extracted and loaded matches your Excel table.
Steps
Login to ETL+ (help here).
On the ETL main page, click the top-right
Log
icon. Log page help here.Scroll down on the log page and double-check that no table load failed, and the row count matches with the Excel tables. Following is an example highlighting two tables and their loaded row count.
Troubleshooting – https://dataself.atlassian.net/wiki/spaces/ETL202208/pages/2032243557
Related Pages
https://dataself.atlassian.net/wiki/spaces/DS/pages/1998946305
https://dataself.atlassian.net/wiki/spaces/DS/pages/1920270337 – driver used by https://dataself.atlassian.net/wiki/spaces/DS/pages/1998946305 to open Excel files.
https://dataself.atlassian.net/wiki/spaces/ETL202208/pages/2032239184
https://dataself.atlassian.net/wiki/spaces/ETL202208/pages/2032241929 An overview of the types of data sources ETL+ can connect to.
https://dataself.atlassian.net/wiki/spaces/ETL202208/pages/2032242696 Details about the various data source drivers supported by ETL+.
https://dataself.atlassian.net/wiki/spaces/ETL202208/pages/2032243557 Troubleshooting. A Microsoft driver largely determines how Excel data is extracted.
https://dataself.atlassian.net/wiki/spaces/DS/pages/1816887297 DataSelf Training page.
https://dataself.atlassian.net/wiki/spaces/ETL202208/pages/2032243303
https://dataself.atlassian.net/wiki/spaces/DS/pages/1920270337 – driver required to open Excel files