How to Bulk Load Data into MySQL with Python

As in any other relational databases, the fastest way to load data into MySQL is to upload a flat file into a table. To do this, MySQL has a LOAD DATA INFILE function. We can use Python to execute this command. To connect to MySQL and execute SQL statements with Python, we will use the pymysql module.

As in the previous post with PostgresSQL, we will first export a table into a csv file and then look at how we can load a csv file to a table.

First of all, let’s import all the modules required for this exercise. The code here works for both Python 2.7 and 3.

1
2
3
import pymysql
import pandas as pd
import sys

Exporting MySQL table into a CSV file

First of all, let’s export a table into CSV file. We are using the Pandas module to convert SQL results into Pandas data frame and write it to a csv file.

The function takes a select query, output file path and connection details. To export an entire table, you can use select * on the target table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def mysql_to_csv(sql, file_path, host, user, password):
    '''
    The function creates a csv file from the result of SQL
    in MySQL database.
    '''

    try:
        con = pymysql.connect(host=host,
                                user=user,
                                password=password)
        print('Connected to DB: {}'.format(host))
        # Read table with pandas and write to csv
        df = pd.read_sql(sql, con)
        df.to_csv(file_path, encoding='utf-8', header = True,\
         doublequote = True, sep=',', index=False)
        print('File, {}, has been created successfully'.format(file_path))
        con.close()

    except Exception as e:
        print('Error: {}'.format(str(e)))
        sys.exit(1)

# Execution Example
sql = 'Select * From world.city'
file_path = '/tmp/city.csv'
host = 'host url'
user = 'username'
password = 'password'
mysql_to_csv(sql, file_path, host, user, password)

Importing CSV file to MySQL table

Make sure to have autocommit turned on. To upload files, you need to set the local_infile parameter to 1. The function takes a load data infile statement and connection details.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def csv_to_mysql(load_sql, host, user, password):
    '''
    This function load a csv file to MySQL table according to
    the load_sql statement.
    '''

    try:
        con = pymysql.connect(host=host,
                                user=user,
                                password=password,
                                autocommit=True,
                                local_infile=1)
        print('Connected to DB: {}'.format(host))
        # Create cursor and execute Load SQL
        cursor = con.cursor()
        cursor.execute(load_sql)
        print('Succuessfully loaded the table from csv.')
        con.close()
       
    except Exception as e:
        print('Error: {}'.format(str(e)))
        sys.exit(1)

# Execution Example
load_sql = "LOAD DATA LOCAL INFILE '/tmp/city.csv' INTO TABLE usermanaged.city\
 FIELDS TERMINATED BY ',' ENCLOSED BY '"
' IGNORE 1 LINES;"
host = '
host url'
user = '
username'
password = '
password'
csv_to_mysql(load_sql, host, user, password)
Data Engineering
Sending XML Payload and Converting XML Response to JSON with Python

If you need to interact with a REST endpoint that takes a XML string as a payload and returns another XML string as a response, this is the quick guide if you want to use Python. If you want to do it with Node.js, you can check out the post …

Data Engineering
Sending XML Payload and Converting XML Response to JSON with Node.js

Here is the quick Node.js example of interacting with a rest API endpoint that takes XML string as a payload and return with XML string as response. Once we get the response, we will convert it to a JSON object. For this example, we will use the old-school QAS (Quick …

Data Engineering
Downloading All Public GitHub Gist Files

I used to use plug-ins to render code blocks for this blog. Yesterday, I decided to move all the code into GitHub Gist and inject them from there. Using a WordPress plugin to render code blocks can be problematic when update happens. Plugins might not be up to date. It …