Data Engineering

How To Get Facebook Data With Python

By using Facebook Graph API, we can get the feed of posts and links published by the specific page, or by others on this page as well as likes and comments (feed api). I have written a python script to scrape the feed info in the JSON format and turn …

Data Engineering

How To Get Twitter Data With Python

In this post, we will discuss how to use Python to grab publicly available Twitter post data (from any user you specify) and convert it into a tabular format so that we can analyse the data through Excel or insert them into a relational database. Python has a package that …

DataStage

DataStage: Loop With Transformer

The Transformer stage has the built-in looping functionality where you can use Stage Variables and Loop Conditions to construct looping logics. In this post, we will present 3 different examples. Ranking Aggregation Vertical Pivot Before going into the examples, here are the useful variables for loop construction. @ITERATION – System …

DataStage

DataStage: Remove Leading & Trailing Lines in Flat File

When flat file has leading and trailing lines that are not part of the table, we can use the filter in the flat file stage to remove them. As an example, the file below has a leading and trailing lines. We want remove them with the flat file stage. Output Steps …

DataStage

DataStage: Join vs Lookup vs Merge

DataStage has three processing stages that can join tables based on the values of key columns: Lookup, Join and Merge.  In this post, we discuss when to choose which stage, the difference between these stages, and development references when we use those stages. Use the Lookup stage when: Having a …

ETL

Informatica Cloud: How To Use Hierarchy Parser To Transform JSON File

Hierarchy Parser in the Informatica Cloud mapping designer can transform JSON or XML files into structured table (see instruction here). In this post, we will transform the JSON file obtained from Google Geocoding API. Geocoding API turn addresses (1600 Amphitheatre Prakway Mountain View CA) into geographic coordinates (latitude: 37.422, Longitude: -122.085 etc) …

ETL

Informatica Cloud: Cannot create an Unstructured Data transformation

Without properly setting up the Secure Agent server, Hierarchy Parser and Builder do not work in the Informatica Cloud mapping designer. In this post, we will discuss how to resolve the error, ‘Cannot create an Unstructured Data transformation’ when the hierarchy data transformation mapping is performed. There are a few …

ETL

Informatica Cloud: How To Resolve Unknown Host Name For Secure Agent In Linux

For some connectors (e.g. Marketing Cloud connector) to work in Informatica Cloud, we need to configure the host name in the server where the Secure Agent is installed. In this post, we will discuss how to resolve the unknown host name issue with Linux. Steps (1) Execute the command below …

DataStage

DataStage: Script To Deploy Jobs

I have written a batch script to deploy DataStage jobs. The script itself runs on your computer and can push jobs wherever you want. The script is leveraging the DSXImportService that comes with DataStage installation. The script can: Push both parallel and sequence jobs and parameter files. Works between projects …

DataStage

DataStage: How To Resolve ‘Scratch Space Full’ Error

When the data volume is large, DataStage uses a scratch disk to process data. The default scratch disk space is usually the Scratch folder in the Server folder where the application is installed. To use a larger scratch disk space, we can create a custom configuration file. The default configuration …