How To Ingest AES Encrypted Data With Python

To ingest encrypted data into DWH, we may ingest the data as it is or decrypt and load it to the database, depending on the business requirements. It is always good to know how to decrypt encrypted data. There are many encryption methods. Encryption usually happens at the application (either client or server) and encrypted data get passed to the database. It is up to your business requirements to ingest the encrypted data into DWH  as it is or decrypt it before loading. There are also may ways to decrypt data in data ingestion process. SQL has its own decryption functions. You can decrypt it with ETL tools or programming language.

We will use Python here as it offers a pretty good crypto solutions which enable you to perform crypt tasks without knowing too much about how it works. To narrow out scope, we will focus on most widely used encryption algorithm, AES (Advanced Encryption Standard) encryption.

AES is secure enough (apparently US uses it for securing sensitive but unclassified material) and I have seen it is used quite often to protect customer personal information in ecommerce applications. For example, the website captures personal information on the form and the application encrypts it with AES algorithm before passing it the database. In this way, the customer’s personal information is stored in the encrypted format for privacy protection. To decrypt the data, you need the key.

AES encryption is a type of symetric-key algorithms. Encryption happens by using a key and an initialisation vector. AES is a block cipher that encrypts data on a per-block basis and uses keys that are 16, 24 or 32 bytes (128, 192, or 256 bits) and the initialization vector lengths of 16 bytes (128 bits) which can be randomly generated at encryption. You can check out the online tool to encrypt and decrypt with different methods here. If you are interested in in-depth mechanism, this blog article is a good read.

AES encryption is often used for encrypting files or content in the storage device as well as for personal information for ecommerce applications. For example AWS uses 256-bit AES for the S3 server-side encryption, default server-side Glacier encryption and EBS volume encryption.

The encrypted value for a personal data such as email is a string of random characters, which look like 0GSnnvEk16lFmuSVfmTRjgVQfm70pH0Cgda3Abu+LmiWmnwa88Hci2NPRl51rwHe. The first 16 letters are the initialisation vector, called IV. By using IV and the encryption key, we can decrypt it to the actual email like datahack@datahack.com.

We will use the Crypto package which is a great cryptography toolkit for Python. For AES, we will use Crypto.Cipher.AES. It also has a Random module to create an initialisation vector (IV) randomly for encryption.

The code examples below work for both Python 2.7 and 3.

Scenario

You have an encryption key and an encrypted customer email address below. The email was encrypted by 128-bit AES (AES-128). Let’s decrypt this.

key = 'LyfZTeBjLOmjTsoq'
string = 't-QGGHNPkGR2vSXR1J7fYIfUQE7L5RBbmNkUHi9Nh8e2EV1JSe4GNbjypaX8_o92'

AES128 uses first 16 characters as the initialisation vector. In the cipher function, you can use the key and the first 16 character to decrypt the rest of the string. The unpad() function will remove the white space if there are any. Urlsafe_b64decode usually works better than just using b64decode.

Here is the function. The decrypted email address is john.west@mydatahack.com. It can be used to decrypt any value in the column when you ingest the data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from Crypto import Random
from Crypto.Cipher import AES
import base64

def decrypt(key, string, block_size):
    unpad = lambda s : s[:-ord(s[len(s)-1:])]
    string = base64.urlsafe_b64decode(string)
    iv = string[:block_size]
    cipher = AES.new(key, AES.MODE_CBC, iv )
    return unpad(cipher.decrypt(string[block_size:]))

key = 'LyfZTeBjLOmjTsoq'
string = 't-QGGHNPkGR2vSXR1J7fYIfUQE7L5RBbmNkUHi9Nh8e2EV1JSe4GNbjypaX8_o92'

print(decrypt(key, string, 16))

 

Now, let’s take a look at encryption. We have the email address, john.west@mydatahack.com and encryption key, LyfZTeBjLOmjTsoq. We can use the below function to encrypt it with AES-128.

The most important part is to make the string multiple of the block size (depending on the bits of AES). The block size for AES-128 is 16. The pad() function just does that.

The initialisation vector is generated randomly. Hence, you will get a different encrypted string each time you run the function. As the first 16 letters are used for decryption, you will always get the same original value when you decrypt it.

1
2
3
4
5
6
7
8
9
10
11
12
def encrypt(key, string, block_size):
    # makes string multiple of 16
    pad = lambda s: s + (block_size - len(s) % block_size) * chr(block_size - len(s) % block_size)
    string = pad(string)
    iv = Random.new().read(AES.block_size)
    cipher = AES.new(key, AES.MODE_CBC, iv)
    return base64.urlsafe_b64encode(iv + cipher.encrypt(string))

key = 'LyfZTeBjLOmjTsoq'
string = 'john.west@mydatahack.com'

print(encrypt(key, string, 16))

 

On a final note, encryption key can be generated randomly by using a tool like this one. Depending on the bits, you can generate different lengths of keys. 16, 24 or 32 characters for AES-128, AES-192, or AES-256, respectively.

Awesome!

Data Engineering
Sending XML Payload and Converting XML Response to JSON with Python

If you need to interact with a REST endpoint that takes a XML string as a payload and returns another XML string as a response, this is the quick guide if you want to use Python. If you want to do it with Node.js, you can check out the post …

Data Engineering
Sending XML Payload and Converting XML Response to JSON with Node.js

Here is the quick Node.js example of interacting with a rest API endpoint that takes XML string as a payload and return with XML string as response. Once we get the response, we will convert it to a JSON object. For this example, we will use the old-school QAS (Quick …

Data Engineering
Downloading All Public GitHub Gist Files

I used to use plug-ins to render code blocks for this blog. Yesterday, I decided to move all the code into GitHub Gist and inject them from there. Using a WordPress plugin to render code blocks can be problematic when update happens. Plugins might not be up to date. It …