How to Bulk Load Data into PostgreSQL with Python

Bulk loading with the copy command from a CSV file is the fastest option to load a large table with Postgres. In fact, loading data from a flat file is the fastest option in any relational databases. When you have a large table and need to load it to another database, the fastest way is to unload it to a flat file and upload it to the database table.

Let’s first import all the necessary modules. The code here works for both Python 2.7 and 3. The psycopg2 module is pretty much your best friends to connect to Postgres with Python.

1
2
3
import psycopg2
import pandas as pd
import sys

Exporting Table to CSV

Let’s export a table to a csv file. To export an entire table, you can use select * on the target table. Panda’s read_sql function will convert the query result into Pandas’ dataframe. To create a CSV file, you can use to_csv on the dataframe.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def table_to_csv(sql, file_path, dbname, host, port, user, pwd):
    '''
    This function creates a csv file from PostgreSQL with query
    '''

    try:
        conn = psycopg2.connect(dbname=dbname, host=host, port=port,\
         user=user, password=pwd)
        print("Connecting to Database")
        # Get data into pandas dataframe
        df = pd.read_sql(sql, conn)
        # Write to csv file
        df.to_csv(file_path, encoding='utf-8', header = True,\
         doublequote = True, sep=',', index=False)
        print("CSV File has been created")
        conn.close()

    except Exception as e:
        print("Error: {}".format(str(e)))
        sys.exit(1)

# Execution Example with transaction table
sql = 'Select * From sf.transaction'
file_path = '/tmp/transaction.csv'
dbname = 'db name'
host = 'host url'
port = '5432'
user = 'username'
pwd = 'password'

table_to_csv(sql, file_path, dbname, host, port, user, pwd)

Importing CSV to Table

The copy_expert function will give you good option to copy a CSV file. The function takes the input file path, target table and connection details. Note that the column order should be the same between the CSV file and the table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def pg_load_table(file_path, table_name, dbname, host, port, user, pwd):
    '''
    This function upload csv to a target table
    '''

    try:
        conn = psycopg2.connect(dbname=dbname, host=host, port=port,\
         user=user, password=pwd)
        print("Connecting to Database")
        cur = conn.cursor()
        f = open(file_path, "r")
        # Truncate the table first
        cur.execute("Truncate {} Cascade;".format(table_name))
        print("Truncated {}".format(table_name))
        # Load table from the file with header
        cur.copy_expert("copy {} from STDIN CSV HEADER QUOTE '\"'".format(table_name), f)
        cur.execute("commit;")
        print("Loaded data into {}".format(table_name))
        conn.close()
        print("DB connection closed.")

    except Exception as e:
        print("Error: {}".format(str(e)))
        sys.exit(1)

# Execution Example
file_path = '/tmp/restaurants.csv'
table_name = 'usermanaged.restaurants'
dbname = 'db name'
host = 'host url'
port = '5432'
user = 'username'
pwd = 'password'
pg_load_table(file_path, table_name, dbname, host, port, user, pwd)
Data Engineering
Sending XML Payload and Converting XML Response to JSON with Python

If you need to interact with a REST endpoint that takes a XML string as a payload and returns another XML string as a response, this is the quick guide if you want to use Python. If you want to do it with Node.js, you can check out the post …

Data Engineering
Sending XML Payload and Converting XML Response to JSON with Node.js

Here is the quick Node.js example of interacting with a rest API endpoint that takes XML string as a payload and return with XML string as response. Once we get the response, we will convert it to a JSON object. For this example, we will use the old-school QAS (Quick …

Data Engineering
Downloading All Public GitHub Gist Files

I used to use plug-ins to render code blocks for this blog. Yesterday, I decided to move all the code into GitHub Gist and inject them from there. Using a WordPress plugin to render code blocks can be problematic when update happens. Plugins might not be up to date. It …