How Postgres JSON Query Handles Missing Key
- By : Mydatahack
- Category : DBA, Infrastructure
- Tags: DBA, JSON, Missing Key, Postgres
When we transform JSON to a structured format with a programming language in JSON data ingestion, we have to handle missing key. This is because JSON is schema-less and it doesn’t always have the same keys in all records as opposed to relational database tables.
In Python, the missing key handling looks like this.
1 2 3 4 | try: tmp.append(record['info']['postcode']) except KeyError: tmp.append('NA') |
In the previous post, we examined Postgres JSON support and discussed how we could use it as a new JSON ingestion strategy (New JSON Data Ingestion Strategy By Using the Power of Postgres).
The first question comes to mind is how Postgres handles missing key when you are used to transforming JSON in a programming language.
I start with the conclusion. Postgres puts null value when the key is missing. It also puts null value when the key value is null. This is awesome because the database can handle the missing key without being specific about key error handling.
Let’s have take a look.
Record Example
We have 6 JSON records.
- Complete record
- Note is null
- Note key is missing
- Payment is null
- Payment Type key is missing
- Payment is missing
JSON File
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 | [ { "Id": 100, "Name": "John", "TransactionId": "tran1", "Transaction": [ { "ItemId":"a100", "price": 200 }, { "ItemId":"a110", "price": 200 } ], "Subscriber": true, "Payment": { "Type": "Credit-Card", "Total": 400, "Success": true }, "Note": "1st Complete Record" }, { "Id": 101, "Name": "Tom", "TransactionId": "tran2", "Transaction": [ { "ItemId":"a100", "price": 200 }, { "ItemId":"a110", "price": 200 } ], "Subscriber": true, "Payment": { "Type": "Debit-Card", "Total": 400, "Success": true }, "Note":null }, { "Id": 102, "Name": "Margaret", "TransactionId": "tran3", "Transaction": [ { "ItemId":"a100", "price": 200 }, { "ItemId":"a110", "price": 200 } ], "Subscriber": true, "Payment": { "Type": "Credit-Card", "Total": 400, "Success": true } }, { "Id": 103, "Name": "Dylan", "TransactionId": "tran4", "Transaction": [ { "ItemId":"a100", "price": 200 }, { "ItemId":"a110", "price": 200 } ], "Subscriber": true, "Payment": null, "Note": "Payment is Null" }, { "Id": 104, "Name": "Oliver", "TransactionId": "tran5", "Transaction": [ { "ItemId":"a100", "price": 200 }, { "ItemId":"a110", "price": 200 } ], "Subscriber": true, "Payment": { "Total": 400, "Success": true }, "Note": "Payment Type is missing" }, { "Id": 105, "Name": "Sarah", "TransactionId": "tran6", "Transaction": [ { "ItemId":"a100", "price": 200 }, { "ItemId":"a110", "price": 200 } ], "Subscriber": true, "Note": "Payment is missing" } ] |
Loading to Postgres with Python
Let’s load this to Postgres with a target table as usermanaged.transaction.
Make sure to create the table with data type jsonb. It is more optimised for queries.
1 2 3 4 | CREATE TABLE usermanaged.transaction ( DATA jsonb ); |
You can use the Python code below to load the data into the table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | import json import psycopg2 connection_string = "dbname='<dbname>' user='<user>' host='<uri>' password='<pw>'" table_name = "usermanaged.transaction" def pg_insert(connection_string, table_name, json_obj): try: conn = psycopg2.connect(connection_string) print("Connecting to Database") cur = conn.cursor() cur.execute("Truncate {} Cascade;".format(table_name)) print("Truncated {}".format(table_name)) for record in json_obj: cur.execute("INSERT INTO {} VALUES ('{}')".format(table_name, json.dumps(record))) cur.execute("commit;") print("Inserted data into {}".format(table_name)) conn.close() print("DB connection closed.") except Exception as e: print('Error {}'.format(str(e))) f = open('./transaction_example.json') f_json = json.load(f) pg_insert(connection_string, table_name, f_json) |
Postgres Query
Let’s query the table to see how Postgres handles missing keys.
1 2 3 4 5 6 7 8 9 10 11 12 13 | SELECT data->>'Id' AS id ,data->>'Name' AS name ,data->>'TransactionId' AS transactionid ,jsonb_array_elements(data->'Transaction')->>'ItemId' AS itemid ,jsonb_array_elements(data->'Transaction')->>'price' AS price ,data->>'Subscriber' AS subscriber ,data->'Payment'->>'Type' AS payment_type ,data->'Payment'->>'Total' AS payment_total ,data->'Payment'->>'Success' AS payment_success ,data->>'Note' AS note ,data->>'Whatever' AS non_existing FROM usermanaged.transaction; |
Results
As you can see, the missing key is set to null. If you have a key that does not exist in any record, it creates a column with null values. When the key value is null, it also sets the value to null.
Good times!