ETL – MyDatahack

Infrastructure

ETL Developers are Dead

I remember when I was a DataStage developer in circa 2014. All I did was making DataStage jobs. I was working on an enterprise data warehouse. In that company, they were using it for batch integration between systems. So, I also maintained those jobs. It was my first ETL development …

ETL

Tips and Troubleshooting For Uploading CSV to Database In Talend

There will be time when you want to upload a big csv file (with many rows and hundreds of columns) to a relational database table. Talend Open Studio is an open source ETL tool that I use regularly to do odd jobs like that. I like using it because it …

ETL

How To Rename Mapping Job In Informatica Cloud

Mappings are where all the magic happens in Informatica Cloud. When I started using it, it took me a while to work out how to rename a mapping job. Since then, a few people asked me the same question. So, I decided to write about it. This is probably the …

php

How To Migrate WordPress MySQL Data From Production To Sandbox With Talend

Once you create a sandbox environment in your server for WordPress and copied all the files from production, it’s time to copy the production data into the sandbox database. There are many ways to do this. I decided to use Talend Open Studio to insert production data into the sandbox …

DataStage

DataStage: Loop With Transformer

The Transformer stage has the built-in looping functionality where you can use Stage Variables and Loop Conditions to construct looping logics. In this post, we will present 3 different examples. Ranking Aggregation Vertical Pivot Before going into the examples, here are the useful variables for loop construction. @ITERATION – System …

DataStage

DataStage: Remove Leading & Trailing Lines in Flat File

When flat file has leading and trailing lines that are not part of the table, we can use the filter in the flat file stage to remove them. As an example, the file below has a leading and trailing lines. We want remove them with the flat file stage. Output Steps …

DataStage

DataStage: Join vs Lookup vs Merge

DataStage has three processing stages that can join tables based on the values of key columns: Lookup, Join and Merge. In this post, we discuss when to choose which stage, the difference between these stages, and development references when we use those stages. Use the Lookup stage when: Having a …

DataStage

DataStage: Script To Deploy Jobs

I have written a batch script to deploy DataStage jobs. The script itself runs on your computer and can push jobs wherever you want. The script is leveraging the DSXImportService that comes with DataStage installation. The script can: Push both parallel and sequence jobs and parameter files. Works between projects …

DataStage

DataStage: How To Resolve ‘Scratch Space Full’ Error

When the data volume is large, DataStage uses a scratch disk to process data. The default scratch disk space is usually the Scratch folder in the Server folder where the application is installed. To use a larger scratch disk space, we can create a custom configuration file. The default configuration …

DataStage

DataStage: How To Resolve ‘orchadmin.exe: command not found’

Orchadmin is the command line utility in DataStage. The list of orchadmin commands can be found here. It is often used to deal with the ds files. For example, you need to use orchadmin delete to remove .ds files. The .ds file does not contain the actual data. It contains …