DataStage

DataStage: Useful DataStage Linux Commands

In this post, we will explore useful DataStage Commands. As an example, I made the DataStage installation path as ‘/opt/IBM/InformationServer/Server/PXEngine’. This is probably not the same in your DataStage server. Make sure to get the right installation path. Start and Stop jobmonapp (DataStage Job Monitor application). Unlocking DataStage job Shutting …

DataStage

DataStage: How To Add Redshift JDBC Driver

In order to use a JDBC driver, you need to download the JDBC and set up the configuration files (see here). In this post, we will discuss how to add Redshift JDBC driver to DataStage server and configure it. Steps (1) Download Redshift JDBC driver from here. In my experience, …

DataStage

DataStage: How To Resolve ‘orchadmin.exe: command not found’

Orchadmin is the command line utility in DataStage. The list of orchadmin commands can be found here. It is often used to deal with the ds files. For example, you need to use orchadmin delete to remove .ds files. The .ds file does not contain the actual data. It contains …

DataStage

DataStage: Hierarchical Data Stage Transforming Google Analytics Data

By using Google Analytics Core Reporting API, we can export reports from Google Analytics. To export reports, you need to specify dimensions and metrics. To further explore GA reports, you can use Query Explorer. In this example, we exported the data using the following dimensions and metrics around geographical information …

DataStage

DataStage: Hierarchical Data Stage Transforming JSON Data

Hierarchical Data Stage can parse, compose and transform hierarchical data such as JSON and XML. In this example, we are using the JSON file obtained from Google Geocoding API. Geocoding API turn addresses (1600 Amphitheatre Prakway Mountain View CA) into geographic coordinates (latitude: 37.422, Longitude: -122.085 etc) . The outcome …

ETL

Talend: How To Resolve OutOfMemoryError: Java heap space

When we try to process large data with Talend Studio on your computer, you may get OutOfMemoryError. Exception in thread “main” java.lang.OutOfMemoryError: Java heap space To resolve this error, you need to increase the Xmx parameter in the config file (such as Talend-Studio-win-x86_64.ini or TOS_DI-win-x86_64.ini). By default, the value is set …

ETL

Talend: How To Resolve ‘Failed to load the JNI shared library’ Error

When you try to open Talend application, you might get the error message below. Let’s resolve it. Failed to load the JNI shared library “C:\IBM\InformationServer/jdk32\jre\bin\j9vm\jvm.dll”. Talend (both free and licensed) requires Java 1.8. If you still have Java 1.7, you need to upgrade it to 1.8 and set the Java_Home …

ETL

Informatica Cloud: Compatibility with AWS Redshift

ETL in Redshift demands a specialised connector that optimises insert and upsert operations. Generic JDBC or ODBC ones are too slow and inefficient. When it comes to bulk loading, Amazon recommends to load data into Redshift via S3 by using a copy command (see here). The traditional insert statement is much …

ETL

Informatica Cloud: Incremental Load With Data Synchronization Task

Data Synchronization is a great tool to ingest source data into Data Lake, ODS, or Staging Area. Currently, Data Synchronization does not read database logs to do incremental loads (this is in their road map). Instead, each task automatically stores the last run time stamp ($LastRunTime) in the default task …

ETL

Informatica Cloud: How To Do Ranking

Informatica Cloud does not have Rank Transformer as the Power Center (it is said to be coming in the first quarter of 2018). However, we can use Expressions to elegantly execute the rank transformation according to the specified columns. This is very similar to removing duplicates. It uses the concept …