Containing the Chaos! | A Three-Part Series Demonstrating the Usefulness of Containerization to HumanGov

Background

HumanGov is a Model Automated Multi-Tenant Architecture (MAMA) that is meant to be used by the 50 states for personnel tracking. Currently the architecture requires Amazon Elastic Compute Cloud (EC2) instances that are maintained by the HumanGov systems administrators. The systems administrators handle all the patching, operations, etc. Due to states wanting their data separate from each other, each state gets a separate set of EC2 instances, which means that as the application scales from one state up to 50 states, the systems administrators will have to patch 50x as many EC2 instances. The system administrators report that they have begun to schedule more and more overtime to handle the maintenance windows for updating the EC2 instances. The lead developer remarked that it was taking longer and longer for new application deployments to get rolled out as the infrastructure grew and more states came on board.

By the way, the development environment does not have standardization, and could be as simple as AWS Cloud9 or as complex as a Linux VM nested inside a Windows host. Regardless, the only thing consistent with the development environment is that it is not consistent with production. More and more, tickets are being created for issues with the application that the developers are unable to replicate.

A new hire asked why the application did not have high availability, considering that was one of the core deliverables and a prime motivation for the states to migrate to the cloud in the first place.

The discussion continues and it gets more and more tense, to the point where everyone begins pointing fingers at the other group to resolve the issues. Of course, all of these issues have bubbled up in the Friday afternoon meeting, and you were sitting quietly in the corner, thankful that no one was blaming you, and also hoping the meeting would end soon when the manager turns to you and asks you for your input on this chaotic situation.

You first get the team to agree on the set of problems (and root cause) they are experiencing with the current architecture:
Problem 1: Contractual requirement is not being met. (The current architecture does not have high availability.)
Problem 2: Issues in production cannot be replicated by developers. (Production and development environments do not match.)
Problem 3: Slow application delivery and increasing system administrator overtime. (EC2 instance management overhead (deployment, patching, etc.) increases with growth of the application.)

After some thought, you propose solutions to the reported issues:
Use a load balancer, Amazon Elastic Load Balancing (ELB) in front of containers in multiple availability zones to add high availablity to the architecture.
Containerize the application using Docker, to address compatibility issues. Production and development will use the same Docker images.
Use a container registry (Amazon Elastic Container Registry (ECR)), managed container service (Amazon Elastic Container Service (ECS)) and serverless (AWS Fargate) to improve application delivery speed and reduce system administrator overtime by removing EC2 management overhead.

Note: The container solution did not have to be Docker, but in this case, your company already had licensed for it. This will be a three part series of articles where you implement a proof-of-concept of your proposed solution to contain the chaos introduced in that Friday afternoon meeting:

Containing the Chaos Part 1 of 3: Docker | Amazon Elastic Container Registry (ECR)
In part 1, the application will be placed into a container image. The container image will then be stored in the Amazon Elastic Container Registtry (ECR).

Containing the Chaos Part 2 of 3: Amazon DynamoDB | Amazon Simple Storage Service (S3) | Amazon Elastic Container Services (ECS) | AWS Fargate | Terraform
In part 2, the DynamoDB table and Amazon S3 buckets will created using Terraform. Further the Amazon Elastic Container Services (EC2) cluster will be initiated on AWS Fargate.

Containing the Chaos Part 3 of 3: Amazon Elastic Container Service (EC2) | Amazon Elastic Load Balancing (ELB) | Terraform
In part 3 (the final part), the task definition will be created for the cluster. A service will be created to handle running the defined tasks. The application will then be tested. Finally, will decommission the resources.

Highlighting some of the aforementioned technologies:

Docker

Docker has value, by providing a common environment that can be used for production or development. Via the Docekrfile, which contains all the commands, in order, needed to build a given image. One can store or retrieve images from the Docker Hub, or one can even have their own private repositories of Docker images. There are alternatives to Docker, such as Podman or Vagrant [and several more] but they will not be discussed here.

Amazon Elastic Container Service (ECS)

Amazon Elastic Container Service (ECS) reduces management overhead by providing a managed service for deploying, managing and scaling your containerized workloads. management for your container workloads.

Amazon Elastic Container Registry (ECR)

Amazon Elastic Container Registery (ECR) provides somewhere to store and host your container images.

AWS Fargate

AWS Fargate provides value in the form of serverless computing for containers. The servers still exist, but you don't have to be concerned with managing, operating, or maintaining the containers. This is all handled by AWS. Thus, "less" servers for you to manage, aka serverless.

Amazon Application Load Balancer (ALB)

Amazon Application Load Balancer (ALB) facilitates high availability by allowing a single load-balancer reference to front resources across multiple availability zones.

References

Amazon ECS on AWS Fargate

Amazon Elastic Container Registry Documentation

Amazon Elastic Container Service Documentation

Amazon Elastic Compute Cloud Documentation

Amazon DynamoDB Documentation

Amazon Simple Storage Service Documentation

AWS Cloud9 Documentation

Elastic Load Balancing Documentation

AWS Identity and Access Management Documentation

Docker Docs

Documentation | Terraform | HashiCorp Developer

Python 3.12.1 documentation

nginx documentation


Lewis Lampkin, III - Blog

Lewis Lampkin, III - LinkedIn

Lewis Lampkin, III - Medium

Comments

Popular posts from this blog

Ansible is the Answer! | A Three-Part Series Demonstrating the Usefulness of Ansible to HumanGov