Data Engineering

How to Bulk Load Data into MySQL with Python

As in any other relational databases, the fastest way to load data into MySQL is to upload a flat file into a table. To do this, MySQL has a LOAD DATA INFILE function. We can use Python to execute this command. To connect to MySQL and execute SQL statements with …

Data Engineering

How to Bulk Load Data into PostgreSQL with Python

Bulk loading with the copy command from a CSV file is the fastest option to load a large table with Postgres. In fact, loading data from a flat file is the fastest option in any relational databases. When you have a large table and need to load it to another …

Data Engineering

Event-Driven Data Ingestion with AWS Lambda (S3 to RDS)

In the previous post, we discussed how to move data from the source S3 bucket to the target whenever a new file is created in the source bucket by using AWS Lambda function. In this post, I will show you how to use Lambda to execute data ingestion from S3 …

Data Engineering

Event-Driven Data Ingestion with AWS Lambda (S3 to S3)

Let’s say you have data coming into S3 in your AWS environment every 15 minutes and want to ingest it as it comes. The best approach for this near real-time ingestion is to use AWS lambda function. To demonstrate how to develop and deploy lambda function in AWS, we will …

Data Engineering

Comprehensive Guide to Download Files From S3 with Python

Using AWS SDK for Python can be confusing. First of all, there seems to be two different ones (Boto and Boto3). Even if you choose one, either one of them seems to have multiple ways to authenticate and connect to AWS services. Googling solutions can quickly become confusing as you …

Data Engineering

New JSON Data Ingestion Strategy by Using the Power of Postgres

Postgres always had a JSON support with somehow limited capability before the 9.2 version added the native JSON support. The release of version 9.3 has really taken the JSON feature to the next level with additional constructor and extractor methods. The capability of querying and transforming the JSON data type …

Data Engineering

How To Ingest AES Encrypted Data With Python

To ingest encrypted data into DWH, we may ingest the data as it is or decrypt and load it to the database, depending on the business requirements. It is always good to know how to decrypt encrypted data. There are many encryption methods. Encryption usually happens at the application (either …

Data Engineering

How To Convert Non-UTC Timestamp Into UNIX Epoch Time In Python

When we ingest API data, the query URI string often takes Unix epoch time (or Unix time) in order to specify the datetime range. The epoch time is the way to represent timestamp as the number of seconds that have elapsed since 1970-01-01 00:00:00 UTC. When you have an input …

ETL

Tips and Troubleshooting For Uploading CSV to Database In Talend

There will be time when you want to upload a big csv file (with many rows and hundreds of columns) to a relational database table. Talend Open Studio is an open source ETL tool that I use regularly to do odd jobs like that. I like using it because it …

ETL

How To Rename Mapping Job In Informatica Cloud

Mappings are where all the magic happens in Informatica Cloud. When I started using it, it took me a while to work out how to rename a mapping job. Since then, a few people asked me the same question. So, I decided to write about it. This is probably the …