Containerization
Containerization is a software deployment process that bundles an application’s code with all the files and libraries it needs to run on any infrastructure. Traditionally, to run any application on your computer, you had to install the version that matched your machine’s operating system. For example, you needed to install the Windows version of a software package on a Windows machine. However, with containerization, you can create a single software package, or container, that runs on all types of devices and operating systems. From AWS
Docker
Docker is a technology (not the only one) that enables you to package and run your applications in entities called Containers.
Image vs Container
The key difference between a Docker image vs. a container is that a Docker image is a read-only immutable template that defines how a container will be created. A Docker container is a runtime instance of a Docker image that gets created when the docker run command is implemented.
For instance, let's run a container based on an official Ubuntu image:
docker run -it ubuntu bash
The above command will pull the Ubuntu image (if not available on your system already) and run the bash command inside it. Now, you have a container from the image to play around with. The -it tag gives you an interactive shell so you can put your commands in. i for interactive and t for terminal.
Note that you can use public images from the community (anyone can build one and we're going to do same later in the course). To see the available public images, head to Docker Hub.
Exercise: Run a container with an interactive shell and create a file. Then, create another container based on the same image and see if the file exists. Spoiler alert: it won't!
Creating a new image from a container
Although images are immutable, it is possible to make changes to a running container with the commit command and then create a new image out of it. The process is the opposite of run. With run, you turn an image to a container; with commit, you turn a container to an image.
docker commit <container-id> <repo>:<tag>
Docker main process
A container’s main running process is the ENTRYPOINT and/or CMD at the end of the Dockerfile. It is generally recommended that you separate areas of concern by using one service per container. That service may fork into multiple processes (for example, Apache web server starts multiple worker processes). When this process exists, the container exists as well.
For some images, such as Ubuntu, we can replace the main process by providing a command at the end of the docker run command.
Running containers in the background
By default, docker run will start the container in the foreground, which means that you will lose the current shell. In order to start a container in the background to get your shell back, use the -d flag short for detached.
docker run --rm -d ubuntu echo "hi"
Checking the logs of a container
As long as the container is still around (you didn't start it with the --rm flag and you didn't remove it afterwards), you can check its logs even if the container has stopped. You can use the docker logs command for that.
docker logs <container-id/container-name>
Killing and removing a container
You can kill a running container using docker kill command and then remove a stopped container using the docker rm command. Note that if you remove a stopped container, you won't have access to its logs anymore.
docker kill <container-id/container-name>
docker rm <container-id/container-name>
Checking resources used by a container
You can see the amount of CPU and Memory that a container is using while running, using the docker stats command. It's useful to use this command to understand the requirements of your application before shipping it in production.
docker stats <container-id/container-name>
Package your code with all the dependencies
As discussed earlier in the class, the docker image of your application must include your application code, plus all the dependencies. These dependecies usually fall under configuration files and libraries necessary for your code to run.
When dockerizing your Go code, you don't need to worry about your dependencies, as the Go build process makes sure that all the dependencies are included in the image. However--as we've seen this before for packaging your Python Lambda code--if your application is in Python, you need to go to some extra step to make sure your dependencies are also included in the image.
This step is in fact no different than packagin your Python code for a Lambda function as we had to do the same there as well. One difference between including your dependencies for a docker image and a Lambda function is the boto3 library--the AWS SDK library for Python. For a Lambda function, this library is already included in the runtime, so you won't need to package it with the rest of your function (although you still could). For a docker image, however, you can only be sure of one thing: that the host machine running your image has docker already installed. No other assumption should be made about the host. Therefore, if you're using boto3, include it in your docker image as it won't be present in runtime, even when you're running in on AWS.
Dockerizing a Python application with dependencies
As mentioned before, a good practice when writing Python code (whether it's for a microservice and or a Lambda function or a batch job), is to include all the external libraries (libraries not included in the Python language by default) in a file named requirements.txt in the root of your project (this is usually where your Dockerfile will be included too).
Here's an example we saw before. A simple Python code retrieving the public IP address of the machine it's running on.
main.py
import requests
def handler(event, context):
response = get_url("http://checkip.amazonaws.com")
print("My IP is:", response.text)
def get_url(url):
try:
response = requests.get(url)
except:
response = None
return response
handler(None, None)
The requests library is not included in the Python language by default. Hence, we include it in a requirements.txt file to be installed later by pip (Python package manager).
requirements.txt
requests
And here's the Dockerfile we can use to build the image:
Dockerfile
FROM python:alpine
# adding a maintainer to the image
# this only helps with documentation
LABEL maintainer="mkf@mkf.com"
COPY . .
RUN pip install -r requirements.txt
ENTRYPOINT [ "python", "main.py" ]
Build the image with:
docker build -t get-ip .
Run a container from the image with:
docker run --rm --name ip get-ip
Passing environment variables to a container
As discussed in a previous lecture, application configurations should be passed using environment variables. You can pass environment variables to a container using the --env (or -e for short) flag.
Here's a simple Python program that reads from the environment variable NAME to greet the user. If the environment variable doesn't exist, it will use stranger instead.
main.py
import os
def main():
# reading the NAME environment variable
# if it doesn't exist, we will use "stranger" as default
name = os.getenv("NAME", "stranger")
print(f"Hello, {name}!")
if __name__ == "__main__":
main()
And here's the Dockerfile for building the image:
Dockerfile
FROM python:alpine
LABEL maintainer="mkf@mkf.com"
COPY . .
ENV NAME=Batman
ENTRYPOINT [ "python", "main.py" ]
Build the image using:
docker build -t env-test .
The ENV directive in the file declares an environment variable named NAME. If this environment variable is not passed when running the container, its values will be used by the container. If passed, however, the value (BATMAN) will be replaced by the new value. Try out these scenarios:
- No environment variable is passed
docker run --rm env-test
- Pass an environment variable using the
--envflag
docker run --rm --env NAME=Alice env-test
- Pass an environment variable using the shorthand
-eflag
docker run --rm -e NAME=John env-test
Passing multiple environment variables at once
You can use the --env or -e flag multiple times to pass multiple environment variables at once when starting a container:
docker run --rm -e NAME=Alice -e JOB=Engineer env-test
Although this approach works, sometimes you need an easier way and a shorter command to run your containers. You can do so by having all your environment variables in a file and pass the file to your container with the --env-file flag.
We're going to use the sampe Python code as used above. For the Dockerfile, we are using a similar one as shown bellow:
Dockerfile
FROM python:alpine
LABEL maintainer="mkf@mkf.com"
COPY . .
CMD [ "python", "main.py" ]
The Dockefile is almost the same, except we're not defining any default values for environment variables, and we're also using the CMD directive instead of ENTRYPOINT so that we can replace it when starting a container.
Now, we define all our environment variables inside a file named .env. You can name it anything you want, but it's a convention to name them with a . so that they become hidden on a Unix system. You can also include them in the .gitignore file if you have sensitive information there in order to avoid leaking them to others.
.env
NAME=Alice
JOB=Engineer
COLLEGE=Bow Valley
This is a simple key/value file. Let's build the image using:
docker build -t env-file .
Now, we start a container from the image and pass the file as environment variables using the --env-file flag:
docker run --rm --env-file=.env env-file env
The last argument (env) replaces the default CMD in the image, causes the container to show all the environment variables included in the image. You should be able to find all the three environment variables in the file there. Note that you will also find other environment variables set by the base image.
Mounting a volume inside a container
We can mount a volume (such as a folder) inside the container so it can access it's contents. This is a way for the container to access the files on the host machine when it starts.
Let's write a Python program to read the contents of a file in the volume directory. You can name the directory anything you want, but make sure the directory exists (we're going to create it during the build process).
main.py
def main():
with open("volume/test", "r") as f:
print(f.read())
if __name__ == "__main__":
main()
And here's the Dockerfile:
Dockerfile
FROM python:alpine
LABEL maintainer="mkf@mkf.com"
# setting the working directory to /app
# the command will create the directory if not exist
WORKDIR /app
# creating a directory name volume so
# we can use later to mount a folder on the
# host machine into
RUN mkdir volume
COPY . .
CMD [ "python", "main.py" ]
Build the image with:
docker build -t volume-test .
Let's create a directory on the host machine and put a file in there (the name of the folder and file could be anything):
mkdir $HOME/docker-vo
echo "sample text" > $HOME/docker-vo/test
Now, we start a container and mount the folder on the host to a folder on the container with the help of the -v flag:
docker run --rm -v $HOME/docker-vo:/app/volume volume-test
The application in the container should be able to read the file from the host machine and outputs:
sample text
Passing AWS credentials to a container
Although you can mount the .aws directory on your host on to the container, that is not a best practice. For one thing, if you have set up profiles, you would need to specify the profile name in your application code running inside the container. As we discussed in the class before, this is not a good practice because your code would rely on a profile that won't exist in the cloud (we use a different way to attach policies to containers in the cloud and no profile exists there). As credentials are a form of configuration, your application can read them via environment variables. This way, you won't need to change your code in order to run in different environment. This a best practice.
You can use aws-vault to pass the credentials as environment variables to a container. Let's write a Python application that lists all S3 buckets in an account:
main.py
import boto3
# Retrieve the list of existing buckets
s3 = boto3.client('s3')
response = s3.list_buckets()
# Output the bucket names
print('Existing buckets:')
for bucket in response['Buckets']:
print(f' {bucket["Name"]}')
Don't forget to add the boto3 library to the requirements.txt file:
requirements.txt
boto3
Now the Dockerfile:
Dockerfile
FROM python:alpine
LABEL maintainer="mkf@mkf.com"
COPY . .
RUN pip install -r requirements.txt
CMD [ "python", "main.py" ]
Build the image with:
docker build -t aws-test .
We then use aws-vault to pass the necessary credentials using environment variables. One thing to note here is that aws-vault uses temporary credentials. Therefore, we need to pass two more environment variables (besides AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID): AWS_SESSION_TOKEN and AWS_SECURITY_TOKEN.
Let's run a test before running the application to see if the environment variables are being passed correctly. Note that if you're reading an environment variable from the host and passing it as an environment variable with the same name to the container, you don't need to specify both key and value. You can just use -e ENV_NAME when starting a container.
aws-vault exec <YOUR-AWS-VAULE-PROFILE> -- docker run -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY -e AWS_SESSION_TOKEN -e AWS_SECURITY_TOKEN --rm aws-test env | grep AWS
This command should show you all the environment variables getting passed from aws-vault. If you see them listed, you're good to run the container:
aws-vault exec <YOUR-AWS-VAULE-PROFILE> -- docker run -e AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY -e AWS_SESSION_TOKEN -e AWS_SECURITY_TOKEN --rm aws-test
You should be able to see the list of S3 buckets you have in your account.
Dokcer CLI Cheat Sheet
You can find the official cheat sheet here.
More commands:
docker build --platform <platform> -t <tag> .: building a docker image for multiple platforms (architectures). Read more here.docker system prune: to get rid of unused images/containers.docker run --cpus="1" --memory="1g" <image-name>: to limit the resources (cpu and memory) that a container can use.
Dockerfile Cheat Sheet
Here's one good resource on Dockerfile directives.
Example Website Using Docker
Here's a sample website with a Dockerfile included. See the instructions on how to build and run it.
AWS Dockerrun JSON file
AWS Beanstalk requires a file named Dockerrun.aws.json to deploy an application from Docker Hub. Here's what the file should contain (minimum):
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "IMAGE_ADDRESS",
"Update": "true"
},
"Ports": [
{
"ContainerPort": "PORT"
}
]
}