How To Launch EC2 With Bootstrap in AWS

When you launch an AWS resource (EC2 or RDS), you will have an instance with a default configuration. Then, you have to install software and do custom configuration to bring it to a certain state. Bootstrapping enables you to script software installation and configuration and execute it while launching the instance. Being able to bootstrap EC2 is the first step to become the practitioner of the Infrastructure-as-Code philosophy in AWS.

This entry is a bonus step from How To Create Your Personal Data Science Computing Environment In AWS. In the post, we have a manual step to launch an EC2 instance and install and configure Anaconda environment. In AWS, you can do everything with a piece of code!

In the post above, we launched an EC2 instance and installed Anaconda3. Then we installed the psycopg2 package for the Postgres DB connection. Let’s automate this process.

There are two ways of bootstrapping. The first way is to add the script to the user data field in advanced details in Step3: Configure Instance Details when you launch it manually from AWS Management Console. The second way is to use AWS CLI to code everything up without going into the console.

Let’s get it started! Any additional information, you can refer to the AWS documentation here.

(1) Bootstrapping EC2 with User data

You can simply add the script when you launch the instance from the management console for advanced configuration details in step 3 as below.

For further information on launching EC2 instance, you can refer How To Launch an EC2 Instance From AMI in AWS.

Make sure to enable Auto-assign Public IP. If you disable it, you cannot download anything and yum update won’t work.

Here is the script. AWS recommends to run yum update. I am using curl here. You can also use wget. They are basically the same. The script will be run as the root user. Therefore, you should omit sudo. The scrip below is for Amazon Linux. Other linux may use different code. I tested on Centos7 and it runs fine. Not sure with Ubuntu or other ones.

You can use -b for silent installation of Anaconda (meaning you do not need to type yes for license agreement) and -p for specifying the installation directory.

#!/bin/bash
yum update -y
mkdir /anaconda
cd /anaconda
curl -O https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh
bash Anaconda3-5.0.1-Linux-x86_64.sh -b -p /anaconda/anaconda3
/anaconda/anaconda3/bin/pip install --upgrade pip
/anaconda/anaconda3/bin/pip install psycopg2

(2) Bootstrapping EC2 with AWS CLI

If you are an AWS engineer, you cannot live without AWS Command Line Interface (CLI). It is a command line tool which enables you to do many things you would do in the management console with terminal commands. For installation and basic instruction, see here.

Once you install it, you first need to configure it. It is a relatively easy step. The reference is here.

Let’s code up the launch and configuration of EC2 with AWS CLI.

In this section, we will use 3 commands for EC2, run-instances (for launching an EC2), associate-address (for associating an Elastic IP to the launched instance) and terminate-instance (for terminating the instance).

run-instances

You need to save the launch script as a text file. In the example below, you need to run the launch command from where you save the text file (e.g. cd /scripts/). The file:// prefix is necessary.

Another key point is to make sure you have the associate-public-ip-address option. Without it, nothing gets installed. Rest are perhaps self-explanatory. Just need to get the right security group, subnet, instance type, keys and so on.

Once the instance is up and running, you can ssh to the instance to see if the anaconda is installed as specified. So cool, isn’t it?

aws ec2 run-instances --image-id ami-942dd1f6 --count 1 \
--instance-type t2.micro --key-name mydatahack-ec2 \
--iam-instance-profile Name=ec2-admin --subnet-id <subnet id> \
--security-group-ids <sg id> --associate-public-ip-address \
--user-data file://ec2_anaconda_install.txt

associate-address

Once your instance is ready, you can associate an Elastic IP address. Note that you cannot associate it unless you have a running instance.

aws ec2 associate-address --instance-id <your ec2> --allocation-id <your Elastic IP>

terminate-instances

Yeah, you can terminate the instance with a line of command!

aws ec2 terminate-instances --instance-ids <your ec2>

OK, you can launch a bootstrapped EC2 instance with a line of code now. The same thing can be done with RDS. Have a look at the post here: How To Launch Postgres RDS With AWS Command Line Interface (CLI).

Good Times!

Git
How to specify which Node version to use in Github Actions

When you want to specify which Node version to use in your Github Actions, you can use actions/setup-node@v2. The alternative way is to use a node container. When you try to use a publicly available node container like runs-on: node:alpine-xx, the pipeline gets stuck in a queue. runs-on is not …

AWS
Using semantic-release with AWS CodePipeline and CodeBuild

Here is the usual pattern of getting the source from a git repository in AWS CodePipeline. In the pipeline, we use AWS CodeStart to connect to a repo and get the source. Then, we pass it to the other stages, like deploy or publish. For some unknown reasons, CodePipeline downloads …

DBA
mysqldump Error: Unknown table ‘COLUMN_STATISTICS’ in information_schema (1109)

mysqldump 8 enabled a new flag called columm-statistics by default. When you have MySQL client above 8 and try to run mysqldump on older MySQL versions, you will get the error below. mysqldump: Couldn’t execute ‘SELECT COLUMN_NAME, JSON_EXTRACT(HISTOGRAM ‘$”number-of-buckets-specified”‘) FROM information_schema.COLUMN_STATISTICS WHERE SCHEMA_NAME = ‘myschema’ AND TABLE_NAME = ‘craue_config_setting’;’: Unknown …