Infrastructure

How To Install Python 3 and Create Virtual Environment in Centos, Redhat and Amazon Linux

The scope of this post is to install Python 3 for a user and create a virtual environment in Centos, Redhat or Amazone Linux. Linux comes with Python 2.7 and the best practice is not to mess with. This is because Linux OS has some dependency and upgrading it to …

Infrastructure

Resolving Docker Deamon Is Not Running Error From Command Prompt

I recently installed Docker Toolbox on Windows machine (Windows 10 Home version). We all know Docker is awesome and serverless microservice architecture is hot. Docker has a better tool, called Docker for Windows. Unfortunately, this does not work with Windows 10 Home version (it works with Windows 10 Professional versions …

Data Engineering

Comprehensive Guide to Download Files From S3 with Python

Using AWS SDK for Python can be confusing. First of all, there seems to be two different ones (Boto and Boto3). Even if you choose one, either one of them seems to have multiple ways to authenticate and connect to AWS services. Googling solutions can quickly become confusing as you …

DBA

Index JSON In Postgres

To maximise query efficiency for a relational database is to index the columns that are often used for joining or conditions. The awesome thing about querying JSON in Postgres is that you can index it to further optimise query performance. In the previous post, we had a look at the …

DBA

How Postgres JSON Query Handles Missing Key

When we transform JSON to a structured format with a programming language in JSON data ingestion, we have to handle missing key. This is because JSON is schema-less and it doesn’t always have the same keys in all records as opposed to relational database tables. In Python, the missing key …

DBA

How To Format Output in Pretty JSON With Postgres

In the previous post, we discussed the new JSON ingestion strategy by using the power of Postgres native JSON support functionality (New JSON Data Ingestion Strategy By Using the Power of Postgres). Since Postgres 9.5, it’s got a cool function jsonb_pretty() which automatically formats the output in the pretty JSON …

Data Engineering

New JSON Data Ingestion Strategy by Using the Power of Postgres

Postgres always had a JSON support with somehow limited capability before the 9.2 version added the native JSON support. The release of version 9.3 has really taken the JSON feature to the next level with additional constructor and extractor methods. The capability of querying and transforming the JSON data type …

Data Engineering

How To Ingest AES Encrypted Data With Python

To ingest encrypted data into DWH, we may ingest the data as it is or decrypt and load it to the database, depending on the business requirements. It is always good to know how to decrypt encrypted data. There are many encryption methods. Encryption usually happens at the application (either …

Data Engineering

How To Convert Non-UTC Timestamp Into UNIX Epoch Time In Python

When we ingest API data, the query URI string often takes Unix epoch time (or Unix time) in order to specify the datetime range. The epoch time is the way to represent timestamp as the number of seconds that have elapsed since 1970-01-01 00:00:00 UTC. When you have an input …

ETL

Tips and Troubleshooting For Uploading CSV to Database In Talend

There will be time when you want to upload a big csv file (with many rows and hundreds of columns) to a relational database table. Talend Open Studio is an open source ETL tool that I use regularly to do odd jobs like that. I like using it because it …