How to Ingest Data From MongoDB with Node.js

Data ingestion from a NoSQL database often involves the denormalisation of their schema-less data before loading into a relational database. In this post, we will try to grab the restaurant data from MongoDB, denormalise the collection into a parent and a child table, and load them into Postgres.

This exercise has been previously done with Python. The post includes how to get the restaurants dataset and load it to MongoDB. If you need a little bit of guidance on the preparation, you can refer to How to Get Data From MongoDB with Python.

If you need a little bit of MongoDB refresher, check out the posts below:

Dataset

We are using the restaurant dataset from here. Download it as a JSON file and upload it to MongoDB (see here for detailed instruction).

Data Model

We are going to denormalise the dataset into the main restaurants table as a parent and the grades table as a child. Data model is below. For further information including SQL create table statements, see the previous post here.

Code

Our ingestion strategy is:

  • Get data from MongoDB.
  • Transform the format of the timestamp field for the child table.
  • Convert it into two csv files
  • Use the Postgres copy function to bulk insert data from the csv files.

First of all, we will create a module to perform truncate & load operations. The function is straight-forward. It takes csv file and upload it to a target table. Save it as copyPg.js. At the end of the file, we can export it with the module.export function.

I have covered the topic of bulk-loading data into Postgres with Node.js previously. It discusses the modules used for bulk-loading as well as using the config.json file to define connection details. Check out this post if you need a refresher (Converting JSON to CSV and Loading it to Postgress with Node.js).

Once, we have the custom module to load data, let’s write the main part. We first connect to MongoDB with the mongodb module. Then query the collection. The data from the query results is already a JSON object. Therefore, we can pass it to the json2csv converter without data type conversion.

Before passing it to the converter, we are going to reformat the date field from the ISODate format to the YYY-MM-DD HH:MM:SS format. The code snippet below is looping every date field in every grades array to pass the format functions.

To convert JSON into csv, I am using the json2csv module. It makes JSON manipulation easier and covers most of the conversion scenarios.

Finally, I am using the callback function to ensure the truncate & load event happens sequentially for two tables. The rest should be relatively simple.

Data Engineering
Sending XML Payload and Converting XML Response to JSON with Python

If you need to interact with a REST endpoint that takes a XML string as a payload and returns another XML string as a response, this is the quick guide if you want to use Python. If you want to do it with Node.js, you can check out the post …

Data Engineering
Sending XML Payload and Converting XML Response to JSON with Node.js

Here is the quick Node.js example of interacting with a rest API endpoint that takes XML string as a payload and return with XML string as response. Once we get the response, we will convert it to a JSON object. For this example, we will use the old-school QAS (Quick …

Data Engineering
Downloading All Public GitHub Gist Files

I used to use plug-ins to render code blocks for this blog. Yesterday, I decided to move all the code into GitHub Gist and inject them from there. Using a WordPress plugin to render code blocks can be problematic when update happens. Plugins might not be up to date. It …