Infrastructure

ETL Developers are Dead

I remember when I was a DataStage developer in circa 2014. All I did was making DataStage jobs. I was working on an enterprise data warehouse. In that company, they were using it for batch integration between systems. So, I also maintained those jobs. It was my first ETL development …

DataStage

DataStage: Loop With Transformer

The Transformer stage has the built-in looping functionality where you can use Stage Variables and Loop Conditions to construct looping logics. In this post, we will present 3 different examples. Ranking Aggregation Vertical Pivot Before going into the examples, here are the useful variables for loop construction. @ITERATION – System …

DataStage

DataStage: Remove Leading & Trailing Lines in Flat File

When flat file has leading and trailing lines that are not part of the table, we can use the filter in the flat file stage to remove them. As an example, the file below has a leading and trailing lines. We want remove them with the flat file stage. Output Steps …

DataStage

DataStage: Join vs Lookup vs Merge

DataStage has three processing stages that can join tables based on the values of key columns: Lookup, Join and Merge.  In this post, we discuss when to choose which stage, the difference between these stages, and development references when we use those stages. Use the Lookup stage when: Having a …

DataStage

DataStage: Script To Deploy Jobs

I have written a batch script to deploy DataStage jobs. The script itself runs on your computer and can push jobs wherever you want. The script is leveraging the DSXImportService that comes with DataStage installation. The script can: Push both parallel and sequence jobs and parameter files. Works between projects …

DataStage

DataStage: How To Resolve ‘Scratch Space Full’ Error

When the data volume is large, DataStage uses a scratch disk to process data. The default scratch disk space is usually the Scratch folder in the Server folder where the application is installed. To use a larger scratch disk space, we can create a custom configuration file. The default configuration …

DataStage

DataStage: Useful DataStage Linux Commands

In this post, we will explore useful DataStage Commands. As an example, I made the DataStage installation path as ‘/opt/IBM/InformationServer/Server/PXEngine’. This is probably not the same in your DataStage server. Make sure to get the right installation path. Start and Stop jobmonapp (DataStage Job Monitor application). Unlocking DataStage job Shutting …

DataStage

DataStage: How To Add Redshift JDBC Driver

In order to use a JDBC driver, you need to download the JDBC and set up the configuration files (see here). In this post, we will discuss how to add Redshift JDBC driver to DataStage server and configure it. Steps (1) Download Redshift JDBC driver from here. In my experience, …

DataStage

DataStage: How To Resolve ‘orchadmin.exe: command not found’

Orchadmin is the command line utility in DataStage. The list of orchadmin commands can be found here. It is often used to deal with the ds files. For example, you need to use orchadmin delete to remove .ds files. The .ds file does not contain the actual data. It contains …

DataStage

DataStage: Hierarchical Data Stage Transforming Google Analytics Data

By using Google Analytics Core Reporting API, we can export reports from Google Analytics. To export reports, you need to specify dimensions and metrics. To further explore GA reports, you can use Query Explorer. In this example, we exported the data using the following dimensions and metrics around geographical information …