Canary in the Cloud: Amazon CloudWatch Synthetics

Background

HumanGov is a Model Automated Multi-Tenant Architecture (MAMA) that is meant to be used by the 50 states for personnel tracking. Currently the architecture is fully automated on AWS. The architecture is not ready for go-live. All actions so far are for a proof-of-concept that is presented to the Chief Technical Officer (CTO) for review.

During the weekly team meeting, Woo (Tech Lead) reports that the applications have had several issues and none of the technical staff was aware of the problem. Ko (Operations Lead) reports that notification about issues comes from the end users and not from the infrastructure. Kim (Lead Developer) reports that the team wants to receive any notifications about application issues via their Slack channel.

You first get the team to agree on the set of problems (and root cause):
Problem 1: Developers unaware of application issues. (Applications are not monitored.)
Problem 2: Notification of issues comes from the end user. (Application is not tested from the end user perspective.)
Problem 3: Operations does not receive timely notifcation of issues. (Real-time Notification of issues is not implemented.)
Problem 4: Developers do not receive application issue notifications via Slack. (Notifications are not integrated with Slack.)

After some thought, you propose a solution to (the reported issues):
Use AWS CloudWatch to monitor the infrastructure. (Provide monitoring of application.)
Use AWS CloudWatch Synthetics Canaries to simulate end users. (Provide testing from the end user perspective.)
Use AWS CloudWatch Alarm to provide real-time notification of issues. (Provide real-time notification of issues.)
Use AWS Chatbot to deliver Slack notification. (Provide developers notifications via their Slack channel.)


In this article, you will implement a Canary in the Cloud using Amazon CloudWatch Synthetics.

Highlighting some of the concepts that will be used in this solution:

Amazon CloudWatch

Amazon CloudWatch provides for monitoring and observation of your resources that are on-premises or on AWS. Even though CloudWatch may not have been specifically called out in earlier articles, it has come in handy for troubleshooting whenever I ran into issues.

Amazon Simple Notification Service (Amazon SNS)

Do you want notifications to come from your AWS services? SNS is the fully-managed way to provide notifications via publisher/subscriber, SMS, email or mobile push.

AWS ChatBot

The AWS Chatbot is used to troubleshoot, monitor and operate AWS environments from chat channels. In this article, we will deliver a notification from AWS to an external Slack channel. This integration leverages the AWS Chatbot, which can format notifications and send them to chat clients.

Canary in the coal mine

Canaries are considered a "sentinel species" which means they are a type of organism that can provide an advance notficiation to humans of some danger. Canaries were used historically to detect problems in coal mines as they would become sick from toxic gas sooner than humans would. Thus, the intent behind the "Canary in the Cloud" analogy is that you want to detect that a problem is occurring before the problem gets bad enough to impact all 50 states.

Prequisite 1 of 1. A Kubernetes cluster named `humangov-cluster` with at least one HumanGov Application state deployed running on Amazon EKS.

If you need instructions for that, check the series on Kubernetes.

If you followed my example from the Kubernetes article, you probably have a few pieces to re-do. I'll list them here, so you have a check-list. Disclaimer: the information below is based on the context of having gone through the prior series. There are assumptions/pre-requisites for the information provided here, and if it doesn't work for you, refer to the prior series. For the sake of brevity, screenshots are not included here. Please see the prior series of articles if you want to see some images.

#1 of 15. eks-user Access keys AWS Console -/- Identify and Access Management (IAM) -/- Access management -/- Users [eks-user] [Security credentials] [Create access key] Access key best practices & alternatives -/- Other [Next] Set description tag -optional [Create access key] Retrieve access keys [Done] #2 of 15. Disable managed credentials on Cloud9 Preferences -/- AWS Settings -/- Credentials -/- DISABLE 'AWS managed temporary credentails' #3 of 15. Authenticate with eks-user access key export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXX export AWS_SECRET_ACCESS_KEY=YYYYYYYYYYYYYYYYYYYYYYYYY #4 of 15. Create eks cluster [Warning: this step may take 15 minutes or so] cd ~/environment/human-gov-infrastructure/terraform eksctl create cluster --name humangov-cluster --region us-east-1 --nodegroup-name standard-workers --node-type t3.medium --nodes 1 #5 of 15. Update local Kubernetes config aws eks update-kubeconfig --name humangov-cluster --region us-east-1 #6 of 15. Verify Cluster Connectivity kubectl get svc kubectl get nodes #7 of 15. Load Balancer cd ~/environment curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.5.4/docs/install/iam_policy.json aws iam create-policy \ --policy-name AWSLoadBalancerControllerIAMPolicy \ --policy-document file://iam_policy.json #8 of 15. Associate IAM OIDC provider eksctl utils associate-iam-oidc-provider --cluster humangov-cluster --approve #9 of 15. Create service account for load balancer. eksctl create iamserviceaccount \ --cluster=humangov-cluster \ --namespace=kube-system \ --name=aws-load-balancer-controller \ --role-name AmazonEKSLoadBalancerControllerRole \ --attach-policy-arn=arn:aws:iam::502983865814:policy/AWSLoadBalancerControllerIAMPolicy \ --approve #10 of 15. Install load balancer controller # Add eks-charts repository. helm repo add eks https://aws.github.io/eks-charts # Install helm install aws-load-balancer-controller eks/aws-load-balancer-controller \ -n kube-system \ --set clusterName=humangov-cluster \ --set serviceAccount.create=false \ --set serviceAccount.name=aws-load-balancer-controller #11 of 15. Verify controller installation kubectl get deployment -n kube-system aws-load-balancer-controller #12 of 15. Create role and service account for cluster to S3 and DynamoDB tables eksctl create iamserviceaccount \ --cluster=humangov-cluster \ --name=humangov-pod-execution-role \ --role-name HumanGovPodExecutionRole \ --attach-policy-arn=arn:aws:iam::aws:policy/AmazonS3FullAccess \ --attach-policy-arn=arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess \ --region us-east-1 \ --approve #13 of 15. Apply cd ~/environment/human-gov-application/src kubectl get pods kubectl apply -f humangov-california.yaml kubectl apply -f humangov-florida.yaml kubectl get pods kubectl get svc kubectl get deployment #14 of 15. Ingress kubectl apply -f humangov-ingress-all.yaml kubectl get ingress #15 of 15. Double-check Route 53 Make sure the A records for california.humangov-ll3.click and florida.humangov-ll3.click point to the new load balancer Access the web pages, and confirm they're operational.

1 of 9. [Simple Notification Service (SNS)] Create SNS Topic

[Simple Notification Service] -|- Name: humangov-chatbot-notifications [Next step] [Create topic]

2 of 9. [CloudWatch] Create canaries for California and Florida

[CloudWatch] -/- Application Signals -/- [Synthetics Canaries] [Create canary] Use a blueprint: Heartbeat monitoring Name: humangov-canary-ca Application or endpoint URL: https://california.humangov-ll3.click Screenshots: Take screenshots Schedule: Run continuously Run canary every: 1 minute Start immediately after creation: checked CloudWatch alarms [Add new alarm] Meric name: Failed Alarm condition: Greater/Equal Threshold: 1 Period: 1 minute Set notifications for this canary: Select an existing SNS topic: humangov-chatbot-notifications [Create canary] Repeat the above steps for Florida.

3 of 9. [AWS Chatbot] - Slack Notifications

[AWS Chatbot] -/- Configured clients -/- [Configure new client] Choose client type: Slack [Configure] Slack.com [Create a new workspace] [e-mail address] [Enter CODE] Workspace name: HumanGov Your name: [your name] [Next] Who else is on the HumanGov team? [Skip this step] What's your team working on right now? sre-team [Next] Repeat the previous step to attempt to add Slack, and this time, permit AWS Chatbot to your Slack workspace. [Configure new channel] Configuration name: sre-team Slack channel Channel type: Public Public channel name: sre-team Role name: AWSChatBot-Role-HumanGov SNS Topics Region: US-east-1 Topic: humangov-chatbot-notifications [Configure]

4 of 9. [AWS Chatbot] - Test Message

Send a test message

[AWS Chatbot] -/- Configured clients -/- [Slack workspace: HumanGov] [sre-team] [Send test message]

5 of 9. [Cloud9] - Simulate an outage

kubectl get deployments kubectl scale deployment humangov-nginx-reverse-proxy-california --replicas=0 kubectl get deployments https://california.humangov-ll3.click

6 of 9. [CloudWatch] and [Slack] Check for notification of outage.

[CloudWatch] -/- Application Signals -/- Synthetics Canaries [if neccessary, make sure that the proper region is selected] humangov-canary-ca should show 'failed' status [Slack] Check the sre-team channel, should also see a notification of the outage.

7 of 9. [Cloud9] - Service Restoration

kubectl get deployments kubectl scale deployment humangov-nginx-reverse-proxy-california --replicas=1 kubectl get deployments https://california.humangov-ll3.click

8 of 9. [CloudWatch] Confirm alarm clears

[CloudWatch] -/- Application Signals -/- Synthetics Canaries

9 of 9. [CloudWatch / Chatbot / Simple Notification Service / Cloud9] Environment Clean-up

#1. Deleting Canaries [CloudWatch] -/- Synthetics Canaries select all canaries Actions -/- [Stop] Actions -/- [Delete] [may have to wait a couple minutes] #2. Delete Chatbot Integration [AWS Chatbot] -/- Configured clients -/- [Slack - HumanGov] Configured channels -/- sre-team -/- [Delete] [Remove workspace configuration] #3. Delete SNS Topic [Simple Notification Service] -/-Topics -/- [humangov-chatbot-notifications] [Delete] #4. Delete Kubernetes Ingress kubectl delete -f humangov-ingress-all.yaml #5. Delete the state resources kubectl delete -f humangov-california.yaml kubectl delete -f humangov-florida.yaml kubectl delete -f humangov-staging.yaml #6. Delete EKS cluster eksctl delete cluster --name humangov-cluster --region us-east-1

References

APM Tool - Amazon CloudWatch FAQs - AWS

Using synthetic monitoring - Amazon CloudWatch

Working with hosted zones - Amazon Route 53

Push Notification Service - Amazon Simple Notification Service - AWS

Chatops - AWS Chatbot - AWS

Configure integration between notifications and AWS Chatbot - Developer Tools console

Managing access keys for IAM users - AWS Identity and Access Management

IAM roles - AWS Identity and Access Management

Calling AWS services from an environment in AWS Cloud9 - AWS Cloud9

Creating or updating a kubeconfig file for an Amazon EKS cluster - Amazon EKS

create-policy — AWS CLI 2.15.24 Command Reference

Creating and managing clusters - eksctl

Deleting an Amazon EKS cluster - Amazon EKS

IAM Roles for Service Accounts - eksctl

Installing the AWS Load Balancer Controller add-on - Amazon EKS

Using Helm with Amazon EKS - Amazon EKS

kubectl Quick Reference | Kubernetes

Slack is your productivity platform | Slack

New – Use CloudWatch Synthetics to Monitor Sites, API Endpoints, Web Workflows, and More | AWS News Blog


Lewis Lampkin, III - Blog

Lewis Lampkin, III - LinkedIn

Lewis Lampkin, III - Medium

Comments

Popular posts from this blog

Orphaned No More: Adopting AWS Lambda

Containing the Chaos! | A Three-Part Series Demonstrating the Usefulness of Containerization to HumanGov