I remember when I was a DataStage developer in circa 2014. All I did was making DataStage jobs. I was working on an enterprise data warehouse. In that company, they were using it for batch integration between systems. So, I also maintained those jobs. It was my first ETL development …
There will be time when you want to upload a big csv file (with many rows and hundreds of columns) to a relational database table. Talend Open Studio is an open source ETL tool that I use regularly to do odd jobs like that. I like using it because it …
Mappings are where all the magic happens in Informatica Cloud. When I started using it, it took me a while to work out how to rename a mapping job. Since then, a few people asked me the same question. So, I decided to write about it. This is probably the …
Once you create a sandbox environment in your server for WordPress and copied all the files from production, it’s time to copy the production data into the sandbox database. There are many ways to do this. I decided to use Talend Open Studio to insert production data into the sandbox …
The Transformer stage has the built-in looping functionality where you can use Stage Variables and Loop Conditions to construct looping logics. In this post, we will present 3 different examples. Ranking Aggregation Vertical Pivot Before going into the examples, here are the useful variables for loop construction. @ITERATION – System …
When flat file has leading and trailing lines that are not part of the table, we can use the filter in the flat file stage to remove them. As an example, the file below has a leading and trailing lines. We want remove them with the flat file stage. Output Steps …
DataStage has three processing stages that can join tables based on the values of key columns: Lookup, Join and Merge. In this post, we discuss when to choose which stage, the difference between these stages, and development references when we use those stages. Use the Lookup stage when: Having a …
I have written a batch script to deploy DataStage jobs. The script itself runs on your computer and can push jobs wherever you want. The script is leveraging the DSXImportService that comes with DataStage installation. The script can: Push both parallel and sequence jobs and parameter files. Works between projects …
When the data volume is large, DataStage uses a scratch disk to process data. The default scratch disk space is usually the Scratch folder in the Server folder where the application is installed. To use a larger scratch disk space, we can create a custom configuration file. The default configuration …
Orchadmin is the command line utility in DataStage. The list of orchadmin commands can be found here. It is often used to deal with the ds files. For example, you need to use orchadmin delete to remove .ds files. The .ds file does not contain the actual data. It contains …