ETL

Talend: How To Resolve OutOfMemoryError: Java heap space

When we try to process large data with Talend Studio on your computer, you may get OutOfMemoryError. Exception in thread “main” java.lang.OutOfMemoryError: Java heap space To resolve this error, you need to increase the Xmx parameter in the config file (such as Talend-Studio-win-x86_64.ini or TOS_DI-win-x86_64.ini). By default, the value is set …

ETL

Talend: How To Resolve ‘Failed to load the JNI shared library’ Error

When you try to open Talend application, you might get the error message below. Let’s resolve it. Failed to load the JNI shared library “C:\IBM\InformationServer/jdk32\jre\bin\j9vm\jvm.dll”. Talend (both free and licensed) requires Java 1.8. If you still have Java 1.7, you need to upgrade it to 1.8 and set the Java_Home …

ETL

Informatica Cloud: Compatibility with AWS Redshift

ETL in Redshift demands a specialised connector that optimises insert and upsert operations. Generic JDBC or ODBC ones are too slow and inefficient. When it comes to bulk loading, Amazon recommends to load data into Redshift via S3 by using a copy command (see here). The traditional insert statement is much …

ETL

Informatica Cloud: Incremental Load With Data Synchronization Task

Data Synchronization is a great tool to ingest source data into Data Lake, ODS, or Staging Area. Currently, Data Synchronization does not read database logs to do incremental loads (this is in their road map). Instead, each task automatically stores the last run time stamp ($LastRunTime) in the default task …

ETL

Informatica Cloud: How To Do Ranking

Informatica Cloud does not have Rank Transformer as the Power Center (it is said to be coming in the first quarter of 2018). However, we can use Expressions to elegantly execute the rank transformation according to the specified columns. This is very similar to removing duplicates. It uses the concept …

ETL

Informatica Cloud: How To Remove Duplicate

Informatica Cloud does not have a remove duplicate stage where we can remove duplicate according to the specified column values. However, we can remove duplicate elegantly by using Sort and Expression Transformation. The most important concept about expressions is that Informatica follows the expression position to execute them. We have …

ETL

Informatica Cloud: How To Implement Type 1 & 2 SCD

To expand the Type 1 Employee Dimension, we use the same Employee data to create a dimension table that captures historical changes in department and position. In this dimension, the change in the rest of the column (such as email address) will be simply updated. As discussed in the post, …

ETL

Informatica Cloud: How To Implement Type 1 SCD

Implementing slowly changing dimension with Informatica Cloud requires a little bit of extra effort compared to DataStage or any other ETL tools that have a change capture stage or SCD stage. This example uses hashed values to find out which records are updated, inserted or deleted. We used the CRC32 …

ETL

Informatica Cloud: How To Use Lookup

In Informatica, Joiner and Lookup can both join tables according to the join keys. What are the difference between Joiner and Lookup? Joiner  Active Transformation The query cannot be overridden Works only for equal conditions Can do outer join Can use only as Source Multiple matches return all matching records …

ETL

Informatica Cloud: How To Join Tables With Joiner

This page explains how to join tables in Informatica Cloud with Joiner. In this example, we are adding Product Name and Unit Price to Sales_Record from Products by Product_Id. Sales_Record Products Steps Configure flat file connections to read both Sales_Records and Products. For flat file connection, see here. Product is …