Capturing Snapshots

Capture periodic snapshots of data, such as Open Receivables, for later reporting.

 

In this section

 

Nightly Refresh Steps

Run the following at the end of the nightly refresh:

  1. Have SQL view that gathers the basic needed data and adds  DateTimeStamp

    1. GETDATE() (which SQL gets once and shares identically with all records)

    2. (Alternatively, might want to use DT_LatestLoadStart, if the refresh starts before midnight and ends after, so that it has yesterday’s date)

  2. (Backup process) Export this basic data view to csv with the datetime stamp as part of the file name, with auto-deletion of older exports

  3. Have a table that keeps appending these basic-data batches, potentially multiple in same day in case of refresh issues.

  4. Have SQL view that does a GroupBy with two columns:

                                                               i.      Date (from DateTimeStamp) (so one record per Date)

                                                             ii.      MAX(DateTimeStamp) *

  1. In tableau: Link the GroupBy view’s MAX(DateTimeStamp) to the append table’s DateTimeStamp

    1. As an equal join, to eliminate extras
      OR

    2. As a left join, in case they want to report on the excluded timestamped postings

  2. Then link the GroupBy’s Date from this view to 5_Date, so this is the date used in reporting

 

 

*To fine-tune these choices, maybe not use MAX(DateTimeStamp), there is plenty of open-box programming here to do whatever. But most of the time, the reason for a second posting in one day is because of a failure of the first one, resulting in some sort of bad data. In which case, the MAX is desired.

 

***********************************************

Components 

Basic Data example

Open AR example for the basic needed data append table:

  • customerID

  • Invoice SlspnID

  • InvoiceDate

  • InvNumber

  • InvDueDate

  • DocType

  • original amount

  • open $

  • GETDATE() AS DateTimeStamp

Keeping the number of columns to a minimum helps offset the number of records that will accumulate day after day

CSV Export

  1. Run the csv export

  2. Run batch/powershell files that maintain it

    1. renames the latest on to ExportName + DateTime stamp (from file datetime)

    2. deletes older backups

The batch file will call the PowerShell script as follows:

REM Edit the command so that: REM -file points to manage_csv_backup.ps1 REM -TableName is the name of the exported file REM -Path is the folder the .csv file is exported to REM For more details, run the following command in PowerShell: Get-Help manage_csv_backup.ps1 PowerShell.exe -NoProfile -ExecutionPolicy Bypass -file "C:\<Path>\manage_csv_backup.ps1" -TableName "<schemaname.tablename>.csv" -Path "C:\<Path>\DS_Downloads" -DaysToDelete 60

Unless you have Admin privileges on the machine this task runs on, you cannot run this task manually. It must be run via the Agent.

Multi-location Jobs and Agent Locations

If this snapshot needs to run all on one box it can be in the same job. But if we need to be able to run it from any box, our ETL needs to be able to call the csv job from anywhere and run it under the correct agent.
Future ETL+ release will have ability to

  • run string of jobs

  • with diff agent for each job