Informatica Cloud: Compatibility with AWS Redshift

ETL in Redshift demands a specialised connector that optimises insert and upsert operations. Generic JDBC or ODBC ones are too slow and inefficient. When it comes to bulk loading, Amazon recommends to load data into Redshift via S3 by using a copy command (see here). The traditional insert statement is much less efficient than the magical copy command.

Redshift does not support a single merge or upsert statement. Upserting records in Redshift involves in creating a staging table and loading the data into it first (updating and inserting). From the staging table, we either (1) delete the old record and re-insert the entire updated one (merge by replacing existing rows) or (2) perform update and insert from the staging table (merge by specifying a column list).

Informatica Redshift connector can take care of the logistics for you. It can optimise the load for the cluster size as well as manage the complex data staging for insert and update operations behind the scene. When you specify to insert the record, it first upload the data into S3 and use copy command for the optimal bulk loading performance. For upsert, it will create a staging table and execute the merge operation in the background. I have not yet seen any performance issue in executing incremental load in both Data Synchronization and Mapping tasks. When it comes to loading dimensional tables, it can preserve the dimension key whenever update happens on the record without compromise too much on the loading speed.

Configuring the connector is very similar to other database connectors.

Firstly, add the connection name and choose type as AmazoneRedshift (Informatica Cloud).

For the connection properties, you need to add AWS Access Key ID and AWS Access Key as it uses S3 for loading data. There is no place to specify the bucket name. The bucket needs to be specified in Source or Target stage every time you use the connection.

Further reference can be found here.

Let us know about your experience with the connector.

 

ETL
Running Jobs with Informatica Cloud REST API

Informatica Cloud offers REST API for us to interact with the platform programmatically. At this point in time, the latest official reference is found here. In this post, we are going to use Python to trigger jobs through API. Triggering a job is a 2-step process. We need to authenticate …

ETL
Tips and Troubleshooting For Uploading CSV to Database In Talend

There will be time when you want to upload a big csv file (with many rows and hundreds of columns) to a relational database table. Talend Open Studio is an open source ETL tool that I use regularly to do odd jobs like that. I like using it because it …

ETL
How To Rename Mapping Job In Informatica Cloud

Mappings are where all the magic happens in Informatica Cloud. When I started using it, it took me a while to work out how to rename a mapping job. Since then, a few people asked me the same question. So, I decided to write about it. This is probably the …