Informatica Cloud: How To Implement Type 1 SCD

Implementing slowly changing dimension with Informatica Cloud requires a little bit of extra effort compared to DataStage or any other ETL tools that have a change capture stage or SCD stage. This example uses hashed values to find out which records are updated, inserted or deleted. We used the CRC32 encoding which creates 10-digit integer values for the selected columns. The dimension table needs to store this value.

To compare hash values between the source and dimension tables, we used Joiner Transformation. The alternative is to use unconnected lookup to compare the value. We used AWS Redshift as source and target. Unconnected Lookup does not work on Redshift. It only works on flat file or relational connection (ODBC, MySQL, Oracle, SqlServer and MS_ACCESS).

In this example, we will create the employee_dim table from the employee and department table. Any update, insert, or delete will be reflected in the dimension table as it is Type 1.

Employee

Department

Employee_Dim

Steps

(1) Join Employee and Department in the source stage by using a custom query

(2) Configure Source to read the employee dimension table.

(3) Use CRC32 to create Hash_Value with all the relevant columns.

(4) Join Source and Target Table by the business key.

(5) Compare Hash_Value and create Change_Capture values.

(6) Use Router Transformation to separate the output into upsert and delete

(7) Map fields for upsert as below.

(8) Map field for delete as below. Only primary key is required for deletion.

Outcome

After initial run to create the dimension table, we ran update, insert and delete statements below on the source table.

1
Update Employee Set extension = '0000', dept_id = 'A3', effective_date = GetDate() Where empl_id = 102;
1
Delete From Employee where empl_id = 109;
1
Insert Into Employee Values (210, 'Blah B', '1111', 'blah@email.com', 'A1', 'Reporting', GetDate());

Source after running queries

Dimension Updated

 

ETL
Running Jobs with Informatica Cloud REST API

Informatica Cloud offers REST API for us to interact with the platform programmatically. At this point in time, the latest official reference is found here. In this post, we are going to use Python to trigger jobs through API. Triggering a job is a 2-step process. We need to authenticate …

ETL
Tips and Troubleshooting For Uploading CSV to Database In Talend

There will be time when you want to upload a big csv file (with many rows and hundreds of columns) to a relational database table. Talend Open Studio is an open source ETL tool that I use regularly to do odd jobs like that. I like using it because it …

ETL
How To Rename Mapping Job In Informatica Cloud

Mappings are where all the magic happens in Informatica Cloud. When I started using it, it took me a while to work out how to rename a mapping job. Since then, a few people asked me the same question. So, I decided to write about it. This is probably the …