How to Bulk Load Data with ODBC and Python

I think Hello World of Data Engineering to make an one-to-one copy of a table from the source to the target database by bulk-loading data. The fastest way to achieve this is exporting a table into a CSV file from the source database and importing a CSV file to a table in the target database. With any database, importing data from a flat file is faster than using insert or update statements.

To connect ODBC data source with Python, you first need to install the pyodbc module. Obviously, you need to install and configure ODBC for the database you are trying to connect.

Let’s load the required modules for this exercise. The code here works for both Python 2.7 and 3.

1
2
3
import pyodbc
import sys
import pandas as pd

Exporting table to CSV

The function below takes a select query, file path for exported file and connection details. The best practice is to turn on autocommit. Some ODBC will give you an error if this parameter is not there.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def table_to_csv(sql, file_path, dsn, uid, pwd):
    '''
    This function creates csv file from the query result with ODBC driver
    '''

    try:
        cnxn = pyodbc.connect('DSN={};UID={};PWD={}'\
        .format(dsn, uid, pwd), autocommit=True)
        print('Connected to {}'.format(dns))
        # Get data into pandas dataframe
        df = pd.read_sql(sql, cnxn)
        # Write to csv file
        df.to_csv(file_path, encoding='utf-8', header = True,\
         doublequote = True, sep=',', index=False)
        print("CSV File has been created")
        cnxn.close()

    except Exception as e:
        print("Error: {}".format(str(e)))
        sys.exit(1)

The execution example is exporting the city table to a csv file from MySQL. ODBC is set up with MySQL_ODBC as DSN.

1
2
3
4
5
6
sql = 'Select * From world.city'
file_path = '/tmp/city.csv'
dsn = 'dsn name such as MySQL_ODBC'
uid = 'username'
pwd = 'password'
table_to_csv(sql, file_path, dsn, uid, pwd)

Load Table from CSV

The function below takes a csv upload query and connection details to import CSV to a table. Autocommit should be turned on. The local_infile parameter helps MySQL’s LOAD DATA INFILE commands. It may not relevant for other databases. If the parameter is not relevant in the connection for the specific database, it will ignore it. So, you can keep the local_infile parameter for other databases.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def load_csv(load_sql, dns, uid, pwd):
    '''
    This function will load table from csv file according to
    the load SQL statement through ODBC
    '''

    try:
        cnxn = pyodbc.connect('DSN={};UID={};PWD={}'\
        .format(dns, uid, pwd), autocommit=True, local_infile=1)
        print('Connected to {}'.format(dns))
        cursor = cnxn.cursor()
        # Execute SQL Load Statement
        cursor.execute(load_sql)
        print('Loading table completed successfully.')
        cnxn.close()
       
    except Exception as e:
        print("Error: {}".format(str(e)))
        sys.exit(1)

Let’s load the data exported with the first function into both MySQL and PostgreSQL databases. Each database has SQL syntax for this and you need to pass the statement to the function. MySQL uses the LOAD DATA INFILE command while Postgres uses the copy command.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# (1) Execution Example with MySQL ODBC
load_sql = "LOAD DATA LOCAL INFILE '/tmp/city.csv' INTO TABLE usermanaged.city \
FIELDS TERMINATED BY ',' ENCLOSED BY '\"' IGNORE 1 LINES;"

dsn = 'MySQL_ODBC'
uid = 'username'
pwd = 'password'
load_csv(load_sql, dns, uid, pwd)

# (2) Execution Example with PostgreSQL ODBC
load_sql = "COPY usermanaged.city FROM '/tmp/city.csv' CSV HEADER"
dns = 'PG_ODBC'
uid = 'username'
pwd = 'password'
load_csv(load_sql, dns, uid, pwd)
Data Engineering
Sending XML Payload and Converting XML Response to JSON with Python

If you need to interact with a REST endpoint that takes a XML string as a payload and returns another XML string as a response, this is the quick guide if you want to use Python. If you want to do it with Node.js, you can check out the post …

Data Engineering
Sending XML Payload and Converting XML Response to JSON with Node.js

Here is the quick Node.js example of interacting with a rest API endpoint that takes XML string as a payload and return with XML string as response. Once we get the response, we will convert it to a JSON object. For this example, we will use the old-school QAS (Quick …

Data Engineering
Downloading All Public GitHub Gist Files

I used to use plug-ins to render code blocks for this blog. Yesterday, I decided to move all the code into GitHub Gist and inject them from there. Using a WordPress plugin to render code blocks can be problematic when update happens. Plugins might not be up to date. It …