How to Bulk Load Data with ODBC and Python
- By : Mydatahack
- Category : Data Engineering, Data Ingestion
- Tags: Bulk Load, ODBC, pyodbc, Python
I think Hello World of Data Engineering to make an one-to-one copy of a table from the source to the target database by bulk-loading data. The fastest way to achieve this is exporting a table into a CSV file from the source database and importing a CSV file to a table in the target database. With any database, importing data from a flat file is faster than using insert or update statements.
To connect ODBC data source with Python, you first need to install the pyodbc module. Obviously, you need to install and configure ODBC for the database you are trying to connect.
Let’s load the required modules for this exercise. The code here works for both Python 2.7 and 3.
1 2 3 | import pyodbc import sys import pandas as pd |
Exporting table to CSV
The function below takes a select query, file path for exported file and connection details. The best practice is to turn on autocommit. Some ODBC will give you an error if this parameter is not there.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | def table_to_csv(sql, file_path, dsn, uid, pwd): ''' This function creates csv file from the query result with ODBC driver ''' try: cnxn = pyodbc.connect('DSN={};UID={};PWD={}'\ .format(dsn, uid, pwd), autocommit=True) print('Connected to {}'.format(dns)) # Get data into pandas dataframe df = pd.read_sql(sql, cnxn) # Write to csv file df.to_csv(file_path, encoding='utf-8', header = True,\ doublequote = True, sep=',', index=False) print("CSV File has been created") cnxn.close() except Exception as e: print("Error: {}".format(str(e))) sys.exit(1) |
The execution example is exporting the city table to a csv file from MySQL. ODBC is set up with MySQL_ODBC as DSN.
1 2 3 4 5 6 | sql = 'Select * From world.city' file_path = '/tmp/city.csv' dsn = 'dsn name such as MySQL_ODBC' uid = 'username' pwd = 'password' table_to_csv(sql, file_path, dsn, uid, pwd) |
Load Table from CSV
The function below takes a csv upload query and connection details to import CSV to a table. Autocommit should be turned on. The local_infile parameter helps MySQL’s LOAD DATA INFILE commands. It may not relevant for other databases. If the parameter is not relevant in the connection for the specific database, it will ignore it. So, you can keep the local_infile parameter for other databases.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | def load_csv(load_sql, dns, uid, pwd): ''' This function will load table from csv file according to the load SQL statement through ODBC ''' try: cnxn = pyodbc.connect('DSN={};UID={};PWD={}'\ .format(dns, uid, pwd), autocommit=True, local_infile=1) print('Connected to {}'.format(dns)) cursor = cnxn.cursor() # Execute SQL Load Statement cursor.execute(load_sql) print('Loading table completed successfully.') cnxn.close() except Exception as e: print("Error: {}".format(str(e))) sys.exit(1) |
Let’s load the data exported with the first function into both MySQL and PostgreSQL databases. Each database has SQL syntax for this and you need to pass the statement to the function. MySQL uses the LOAD DATA INFILE command while Postgres uses the copy command.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # (1) Execution Example with MySQL ODBC load_sql = "LOAD DATA LOCAL INFILE '/tmp/city.csv' INTO TABLE usermanaged.city \ FIELDS TERMINATED BY ',' ENCLOSED BY '\"' IGNORE 1 LINES;" dsn = 'MySQL_ODBC' uid = 'username' pwd = 'password' load_csv(load_sql, dns, uid, pwd) # (2) Execution Example with PostgreSQL ODBC load_sql = "COPY usermanaged.city FROM '/tmp/city.csv' CSV HEADER" dns = 'PG_ODBC' uid = 'username' pwd = 'password' load_csv(load_sql, dns, uid, pwd) |