ETL

Informatica Cloud: How To Optimise Joiner Performance In Mapping Designer

Joiner is the stage to join tables in Informatica Cloud (see a quick introduction for Joiner Transformation here). If you have a large volume of data, the joiner transformation becomes very slow without performance optimisation. In this post, we will show you a few tricks that you can use to …

ETL

Informatica Cloud: How To Run More Than 2 Data Synchronization Tasks Concurrently

By default, the secure agent can run 2 data synchronisation tasks at a time. This constraint can become limiting quickly especially when multiple developers are building and testing the data synchronisation tasks at the same time. By adding a custom property on the secure agent, you can run more than …

Data Engineering

How To Get Data From MongoDB With Python

How to get data from MongoDB with Python MongoDB is one of the most popular no SQL databases used as a backend database for web and mobile applications. Data is stored in MongoDB as BSON, which looks like JSON files. Once you understand the way MongoDB stores data, all you …

Data Engineering

How To Get Data From Liveperson And Create Aggregated Table With R

In the previous post, we discussed how to ingest data from Liveperson with Python. In this post, I want to use R to make the same API call and create an aggregated table instead of preparing data for ingestion. The code is based on the example here. For further information …

Data Engineering

How To Ingest Data From Liveperson With Python

Engagment History API let you grab livechat interaction data from Liveperson. It is based on the REST architecture and uses OAuth1.0. You first need to retrieve API Keys. In this example, I am using the requests and requests_oauthlib modules to make API calls from Python. Liveperson offers a good code …

Data Engineering

How To Get Data From SharePoint With Python

It’s sometimes convenient to have a script to get data from SharePoint. We can automate the user managed data ingesting from SharePoint. For example, business users can upload or update the user managed file and a scheduled ETL task fetch and bring it to the datalake. Using SharePoint API is …

Data Engineering

Automate Source And Target Table Column Comparison With Java

We may encounter a situation where we need to check if the source system adds or drops columns regularly. For example, when the source system is constantly going through a heavy development, the audit process automation can be helpful. The code uses JDBC for both target and source database connections. …

Data Engineering

Automate Salesforce Table Creation With Java

When you ingest data from Salesforce into a relational database, you first need to create a table for the object you want to ingest. Writing a create statement manually is cumbersome and you often need to debug it a few times. Salesforce data types are quite different from database ones. …

Data Engineering

How To Get Survey Response Data From Qualtrics With Python

In the previous post, we had a look at Python code examples of basic data engineering with AWS infrastructure. By using Qualtrics API, I would like to present a coding example of API data ingestion into S3 and Redshift. This code can be scheduled hourly, daily or weekly in a …

Data Engineering

Data Engineering in S3 and Redshift with Python

AWS offers a nice solution to data warehousing with their columnar database, Redshift, and an object storage, S3. Python and AWS SDK make it easy for us to move data in the ecosystem. In this post, I will present code examples for the scenarios below: Uploading data from S3 to …