Containerization

Containerization is a software deployment process that bundles an application’s code with all the files and libraries it needs to run on any infrastructure. Traditionally, to run any application on your computer, you had to install the version that matched your machine’s operating system. For example, you needed to install the Windows version of a software package on a Windows machine. However, with containerization, you can create a single software package, or container, that runs on all types of devices and operating systems. From AWS

Docker

Docker is a technology (not the only one) that enables you to package and run your applications in entities called Containers.

Image vs Container

The key difference between a Docker image vs. a container is that a Docker image is a read-only immutable template that defines how a container will be created. A Docker container is a runtime instance of a Docker image that gets created when the docker run command is implemented.

For instance, let's run a container based on an official Ubuntu image:

docker run -it ubuntu bash

The above command will pull the Ubuntu image (if not available on your system already) and run the bash command inside it. Now, you have a container from the image to play around with. The -it tag gives you an interactive shell so you can put your commands in. i for interactive and t for terminal.

Note that you can use public images from the community (anyone can build one and we're going to do same later in the course). To see the available public images, head to Docker Hub.

Exercise: Run a container with an interactive shell and create a file. Then, create another container based on the same image and see if the file exists. Spoiler alert: it won't!

Creating a new image from a container

Although images are immutable, it is possible to make changes to a running container with the commit command and then create a new image out of it. The process is the opposite of run. With run, you turn an image to a container; with commit, you turn a container to an image.

docker commit <container-id> <repo>:<tag>

Docker main process

A container’s main running process is the ENTRYPOINT and/or CMD at the end of the Dockerfile. It is generally recommended that you separate areas of concern by using one service per container. That service may fork into multiple processes (for example, Apache web server starts multiple worker processes). When this process exists, the container exists as well.

For some images, such as Ubuntu, we can replace the main process by providing a command at the end of the docker run command.

Running containers in the background

By default, docker run will start the container in the foreground, which means that you will lose the current shell. In order to start a container in the background to get your shell back, use the -d flag short for detached.

docker run --rm -d ubuntu echo "hi"

Checking the logs of a container

As long as the container is still around (you didn't start it with the --rm flag and you didn't remove it afterwards), you can check its logs even if the container has stopped. You can use the docker logs command for that.

docker logs <container-id/container-name>

Killing and removing a container

You can kill a running container using docker kill command and then remove a stopped container using the docker rm command. Note that if you remove a stopped container, you won't have access to its logs anymore.

docker kill <container-id/container-name>
docker rm <container-id/container-name>

Checking resources used by a container

You can see the amount of CPU and Memory that a container is using while running, using the docker stats command. It's useful to use this command to understand the requirements of your application before shipping it in production.

docker stats <container-id/container-name>

Package your code with all the dependencies

As discussed earlier in the class, the docker image of your application must include your application code, plus all the dependencies. These dependecies usually fall under configuration files and libraries necessary for your code to run.

When dockerizing your Go code, you don't need to worry about your dependencies, as the Go build process makes sure that all the dependencies are included in the image. However--as we've seen this before for packaging your Python Lambda code--if your application is in Python, you need to go to some extra step to make sure your dependencies are also included in the image.

This step is in fact no different than packagin your Python code for a Lambda function as we had to do the same there as well. One difference between including your dependencies for a docker image and a Lambda function is the boto3 library--the AWS SDK library for Python. For a Lambda function, this library is already included in the runtime, so you won't need to package it with the rest of your function (although you still could). For a docker image, however, you can only be sure of one thing: that the host machine running your image has docker already installed. No other assumption should be made about the host. Therefore, if you're using boto3, include it in your docker image as it won't be present in runtime, even when you're running in on AWS.

Dockerizing a Python application with dependencies

As mentioned before, a good practice when writing Python code (whether it's for a microservice and or a Lambda function or a batch job), is to include all the external libraries (libraries not included in the Python language by default) in a file named requirements.txt in the root of your project (this is usually where your Dockerfile will be included too).

Here's an example we saw before. A simple Python code retrieving the public IP address of the machine it's running on.

main.py

import requests

def handler(event, context):
    response = get_url("http://checkip.amazonaws.com")
    print("My IP is:", response.text)


def get_url(url):
    try:
        response = requests.get(url)
    except:
        response = None

    return response

handler(None, None)

The requests library is not included in the Python language by default. Hence, we include it in a requirements.txt file to be installed later by pip (Python package manager).

requirements.txt

requests

And here's the Dockerfile we can use to build the image:

Dockerfile

FROM python:alpine

# adding a maintainer to the image
# this only helps with documentation
LABEL maintainer="mkf@mkf.com"

COPY . .

RUN pip install -r requirements.txt

ENTRYPOINT [ "python", "main.py" ]

Build the image with:

docker build -t get-ip .

Run a container from the image with:

docker run --rm --name ip get-ip

Passing environment variables to a container

As discussed in a previous lecture, application configurations should be passed using environment variables. You can pass environment variables to a container using the --env (or -e for short) flag.

Here's a simple Python program that reads from the environment variable NAME to greet the user. If the environment variable doesn't exist, it will use stranger instead.

main.py

import os

def main():
    # reading the NAME environment variable
    # if it doesn't exist, we will use "stranger" as default
    name = os.getenv("NAME", "stranger")
    print(f"Hello, {name}!")

if __name__ == "__main__":
    main()

And here's the Dockerfile for building the image:

Dockerfile

FROM python:alpine

LABEL maintainer="mkf@mkf.com"

COPY . .

ENV NAME=Batman

ENTRYPOINT [ "python", "main.py" ]

Build the image using:

docker build -t env-test .

The ENV directive in the file declares an environment variable named NAME. If this environment variable is not passed when running the container, its values will be used by the container. If passed, however, the value (BATMAN) will be replaced by the new value. Try out these scenarios:

  1. No environment variable is passed
docker run --rm env-test
  1. Pass an environment variable using the --env flag
docker run --rm --env NAME=Alice env-test
  1. Pass an environment variable using the shorthand -e flag
docker run --rm -e NAME=John env-test

Passing multiple environment variables at once

You can use the --env or -e flag multiple times to pass multiple environment variables at once when starting a container:

docker run --rm -e NAME=Alice -e JOB=Engineer env-test

Although this approach works, sometimes you need an easier way and a shorter command to run your containers. You can do so by having all your environment variables in a file and pass the file to your container with the --env-file flag.

We're going to use the sampe Python code as used above. For the Dockerfile, we are using a similar one as shown bellow:

Dockerfile

FROM python:alpine

LABEL maintainer="mkf@mkf.com"

COPY . .

CMD [ "python", "main.py" ]

The Dockefile is almost the same, except we're not defining any default values for environment variables, and we're also using the CMD directive instead of ENTRYPOINT so that we can replace it when starting a container.

Now, we define all our environment variables inside a file named .env. You can name it anything you want, but it's a convention to name them with a . so that they become hidden on a Unix system. You can also include them in the .gitignore file if you have sensitive information there in order to avoid leaking them to others.

.env

NAME=Alice
JOB=Engineer
COLLEGE=Bow Valley

This is a simple key/value file. Let's build the image using:

docker build -t env-file .

Now, we start a container from the image and pass the file as environment variables using the --env-file flag:

docker run --rm --env-file=.env env-file env

The last argument (env) replaces the default CMD in the image, causes the container to show all the environment variables included in the image. You should be able to find all the three environment variables in the file there. Note that you will also find other environment variables set by the base image.

Mounting a volume inside a container

We can mount a volume (such as a folder) inside the container so it can access it's contents. This is a way for the container to access the files on the host machine when it starts.

Let's write a Python program to read the contents of a file in the volume directory. You can name the directory anything you want, but make sure the directory exists (we're going to create it during the build process).

main.py

def main():
    with open("volume/test", "r") as f:
        print(f.read())

if __name__ == "__main__":
    main()

And here's the Dockerfile:

Dockerfile

FROM python:alpine

LABEL maintainer="mkf@mkf.com"

# setting the working directory to /app
# the command will create the directory if not exist
WORKDIR /app

# creating a directory name volume so
# we can use later to mount a folder on the 
# host machine into
RUN mkdir volume

COPY . .

CMD [ "python", "main.py" ]

Build the image with:

docker build -t volume-test .

Let's create a directory on the host machine and put a file in there (the name of the folder and file could be anything):

mkdir $HOME/docker-vo
echo "sample text" > $HOME/docker-vo/test

Now, we start a container and mount the folder on the host to a folder on the container with the help of the -v flag:

docker run --rm -v $HOME/docker-vo:/app/volume volume-test

The application in the container should be able to read the file from the host machine and outputs:

sample text

Passing AWS credentials to a container

Although you can mount the .aws directory on your host on to the container, that is not a best practice. For one thing, if you have set up profiles, you would need to specify the profile name in your application code running inside the container. As we discussed in the class before, this is not a good practice because your code would rely on a profile that won't exist in the cloud (we use a different way to attach policies to containers in the cloud and no profile exists there). As credentials are a form of configuration, your application can read them via environment variables. This way, you won't need to change your code in order to run in different environment. This a best practice.

You can use aws-vault to pass the credentials as environment variables to a container. Let's write a Python application that lists all S3 buckets in an account:

main.py

import boto3

# Retrieve the list of existing buckets
s3 = boto3.client('s3')
response = s3.list_buckets()

# Output the bucket names
print('Existing buckets:')
for bucket in response['Buckets']:
    print(f'  {bucket["Name"]}')

Don't forget to add the boto3 library to the requirements.txt file:

requirements.txt

boto3

Now the Dockerfile:

Dockerfile

FROM python:alpine

LABEL maintainer="mkf@mkf.com"

COPY . .

RUN pip install -r requirements.txt

CMD [ "python", "main.py" ]

Build the image with:

docker build -t aws-test .

We then use aws-vault to pass the necessary credentials using environment variables. One thing to note here is that aws-vault uses temporary credentials. Therefore, we need to pass two more environment variables (besides AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID): AWS_SESSION_TOKEN and AWS_SECURITY_TOKEN.

Let's run a test before running the application to see if the environment variables are being passed correctly. Note that if you're reading an environment variable from the host and passing it as an environment variable with the same name to the container, you don't need to specify both key and value. You can just use -e ENV_NAME when starting a container.

aws-vault exec <YOUR-AWS-VAULE-PROFILE> -- docker run -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY -e AWS_SESSION_TOKEN -e AWS_SECURITY_TOKEN --rm aws-test env | grep AWS

This command should show you all the environment variables getting passed from aws-vault. If you see them listed, you're good to run the container:

aws-vault exec <YOUR-AWS-VAULE-PROFILE> -- docker run -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY -e AWS_SESSION_TOKEN -e AWS_SECURITY_TOKEN --rm aws-test

You should be able to see the list of S3 buckets you have in your account.

Dokcer CLI Cheat Sheet

You can find the official cheat sheet here.

More commands:

  • docker build --platform <platform> -t <tag> .: building a docker image for multiple platforms (architectures). Read more here.
  • docker system prune: to get rid of unused images/containers.
  • docker run --cpus="1" --memory="1g" <image-name>: to limit the resources (cpu and memory) that a container can use.

Dockerfile Cheat Sheet

Here's one good resource on Dockerfile directives.

Example Website Using Docker

Here's a sample website with a Dockerfile included. See the instructions on how to build and run it.

AWS Dockerrun JSON file

AWS Beanstalk requires a file named Dockerrun.aws.json to deploy an application from Docker Hub. Here's what the file should contain (minimum):

{
  "AWSEBDockerrunVersion": "1",
  "Image": {
    "Name": "IMAGE_ADDRESS",
    "Update": "true"
  },
  "Ports": [
    {
      "ContainerPort": "PORT"
    }
  ]
}