Index JSON In Postgres

By : Mydatahack
February 23, 2018
Category : DBA, Infrastructure
Tags: B-Tree, GiN, Hash, Indexing, JSON, Postgres

To maximise query efficiency for a relational database is to index the columns that are often used for joining or conditions. The awesome thing about querying JSON in Postgres is that you can index it to further optimise query performance.

In the previous post, we had a look at the new JSON data ingestion strategy by using the power of Postgres. If you haven’t tried to load JSON directly into Postgres, you should check out the blog.

To index JSON in Postgres, the JSON column has to be the jsonb data type (stores data in the binary json format), instead of json (stores data as string). If you have your column set as json, you can change the data type as below. To injest JSON, you should always use jsonb for better query performance and indexing capability.

ALTER TABLE mongodb.restaurants_json
ALTER COLUMN data
SET DATA TYPE jsonb
USING data::jsonb;

Postgres supports a few different indexing method. If you are indexing a single node (such as id or email) for joining or searching, B-Tree is the best one. Hash indexes are discouraged for not being WAL-logged (meaning not crash-safe and not replicated). If you are interested, this blog post explains it further.

GiN (Generalised Inverted Index) is the way to go when the index maps many values to on row (like array or document). For further information about JSON Indexing variations, check out the blog post here.

Let’s have a look at how we can index the table we made in the previous post. Let’s index id assuming this is the join key to another table. For a single value, you can index it as string.

Create Index ind_btree_id On mongodb.restaurants_json Using BTREE ((data->>'Id'));

Let’s also index grades with GIN indexes. For GIN, you have to index it as jsonb data type.

Create Index ind_gin_grades On mongodb.restaurants_json Using GIN ((data->'grades'));

Now, let’s check out the indexes on the table. To check indexes, you need to query the pg_indexes table.

Select * From pg_indexes Where tablename = 'restaurants_json';

Let’s drop indexes. Drop Index command takes a few optional options. I included concurrently and if exists. Concurrently prevents table locks when other processes are querying the table at the same time.

Drop Index Concurrently If Exists mongodb.ind_btree_id;
Drop Index Concurrently If Exists mongodb.ind_gin_grades;

May the index be with you!

SIMILAR NEWS

Git

How to specify which Node version to use in Github Actions

When you want to specify which Node version to use in your Github Actions, you can use actions/setup-node@v2. The alternative way is to use a node container. When you try to use a publicly available node container like runs-on: node:alpine-xx, the pipeline gets stuck in a queue. runs-on is not …

AWS

Using semantic-release with AWS CodePipeline and CodeBuild

Here is the usual pattern of getting the source from a git repository in AWS CodePipeline. In the pipeline, we use AWS CodeStart to connect to a repo and get the source. Then, we pass it to the other stages, like deploy or publish. For some unknown reasons, CodePipeline downloads …

DBA

mysqldump Error: Unknown table ‘COLUMN_STATISTICS’ in information_schema (1109)

mysqldump 8 enabled a new flag called columm-statistics by default. When you have MySQL client above 8 and try to run mysqldump on older MySQL versions, you will get the error below. mysqldump: Couldn’t execute ‘SELECT COLUMN_NAME, JSON_EXTRACT(HISTOGRAM ‘$”number-of-buckets-specified”‘) FROM information_schema.COLUMN_STATISTICS WHERE SCHEMA_NAME = ‘myschema’ AND TABLE_NAME = ‘craue_config_setting’;’: Unknown …