Canary in the Cloud: Amazon CloudWatch Synthetics
Background
HumanGov is a Model Automated Multi-Tenant Architecture (MAMA) that is meant to be used by the 50 states for personnel tracking. Currently the architecture is fully automated on AWS. The architecture is not ready for go-live. All actions so far are for a proof-of-concept that is presented to the Chief Technical Officer (CTO) for review.
During the weekly team meeting, Woo (Tech Lead) reports that the applications have had several issues and none of the technical staff was aware of the problem. Ko (Operations Lead) reports that notification about issues comes from the end users and not from the infrastructure. Kim (Lead Developer) reports that the team wants to receive any notifications about application issues via their Slack channel.
You first get the team to agree on the set of problems (and root cause):
Problem 1: Developers unaware of application issues. (Applications are not monitored.)
Problem 2: Notification of issues comes from the end user. (Application is not tested from the end user perspective.)
Problem 3: Operations does not receive timely notifcation of issues. (Real-time Notification of issues is not implemented.)
Problem 4: Developers do not receive application issue notifications via Slack. (Notifications are not integrated with Slack.)
After some thought, you propose a solution to (the reported issues):
Use AWS CloudWatch to monitor the infrastructure. (Provide monitoring of application.)
Use AWS CloudWatch Synthetics Canaries to simulate end users. (Provide testing from the end user perspective.)
Use AWS CloudWatch Alarm to provide real-time notification of issues. (Provide real-time notification of issues.)
Use AWS Chatbot to deliver Slack notification. (Provide developers notifications via their Slack channel.)
In this article, you will implement a Canary in the Cloud using Amazon CloudWatch Synthetics.
Highlighting some of the concepts that will be used in this solution:
Amazon CloudWatch
Amazon CloudWatch provides for monitoring and observation of your resources that are on-premises or on AWS. Even though CloudWatch may not have been specifically called out in earlier articles, it has come in handy for troubleshooting whenever I ran into issues.
Amazon Simple Notification Service (Amazon SNS)
Do you want notifications to come from your AWS services? SNS is the fully-managed way to provide notifications via publisher/subscriber, SMS, email or mobile push.
AWS ChatBot
The AWS Chatbot is used to troubleshoot, monitor and operate AWS environments from chat channels. In this article, we will deliver a notification from AWS to an external Slack channel. This integration leverages the AWS Chatbot, which can format notifications and send them to chat clients.
Canary in the coal mine
Canaries are considered a "sentinel species" which means they are a type of organism that can provide an advance notficiation to humans of some danger. Canaries were used historically to detect problems in coal mines as they would become sick from toxic gas sooner than humans would. Thus, the intent behind the "Canary in the Cloud" analogy is that you want to detect that a problem is occurring before the problem gets bad enough to impact all 50 states.
Prequisite 1 of 1. A Kubernetes cluster named `humangov-cluster` with at least one HumanGov Application state deployed running on Amazon EKS.
If you need instructions for that, check the series on Kubernetes.
If you followed my example from the Kubernetes article, you probably have a few pieces to re-do. I'll list them here, so you have a check-list. Disclaimer: the information below is based on the context of having gone through the prior series. There are assumptions/pre-requisites for the information provided here, and if it doesn't work for you, refer to the prior series. For the sake of brevity, screenshots are not included here. Please see the prior series of articles if you want to see some images.
#1 of 15. eks-user Access keys
AWS Console -/- Identify and Access Management (IAM) -/- Access management -/- Users [eks-user]
[Security credentials]
[Create access key]
Access key best practices & alternatives -/- Other [Next]
Set description tag -optional [Create access key]
Retrieve access keys [Done]
#2 of 15. Disable managed credentials on Cloud9
Preferences -/- AWS Settings -/- Credentials -/- DISABLE 'AWS managed temporary credentails'
#3 of 15. Authenticate with eks-user access key
export AWS_ACCESS_KEY_ID=XXXXXXXXXXXXXXXX
export AWS_SECRET_ACCESS_KEY=YYYYYYYYYYYYYYYYYYYYYYYYY
#4 of 15. Create eks cluster [Warning: this step may take 15 minutes or so]
cd ~/environment/human-gov-infrastructure/terraform
eksctl create cluster --name humangov-cluster --region us-east-1 --nodegroup-name standard-workers --node-type t3.medium --nodes 1
#5 of 15. Update local Kubernetes config
aws eks update-kubeconfig --name humangov-cluster --region us-east-1
#6 of 15. Verify Cluster Connectivity
kubectl get svc
kubectl get nodes
#7 of 15. Load Balancer
cd ~/environment
curl -O https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.5.4/docs/install/iam_policy.json
aws iam create-policy \
--policy-name AWSLoadBalancerControllerIAMPolicy \
--policy-document file://iam_policy.json
#8 of 15. Associate IAM OIDC provider
eksctl utils associate-iam-oidc-provider --cluster humangov-cluster --approve
#9 of 15. Create service account for load balancer.
eksctl create iamserviceaccount \
--cluster=humangov-cluster \
--namespace=kube-system \
--name=aws-load-balancer-controller \
--role-name AmazonEKSLoadBalancerControllerRole \
--attach-policy-arn=arn:aws:iam::502983865814:policy/AWSLoadBalancerControllerIAMPolicy \
--approve
#10 of 15. Install load balancer controller
# Add eks-charts repository.
helm repo add eks https://aws.github.io/eks-charts
# Install
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=humangov-cluster \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller
#11 of 15. Verify controller installation
kubectl get deployment -n kube-system aws-load-balancer-controller
#12 of 15. Create role and service account for cluster to S3 and DynamoDB tables
eksctl create iamserviceaccount \
--cluster=humangov-cluster \
--name=humangov-pod-execution-role \
--role-name HumanGovPodExecutionRole \
--attach-policy-arn=arn:aws:iam::aws:policy/AmazonS3FullAccess \
--attach-policy-arn=arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess \
--region us-east-1 \
--approve
#13 of 15. Apply
cd ~/environment/human-gov-application/src
kubectl get pods
kubectl apply -f humangov-california.yaml
kubectl apply -f humangov-florida.yaml
kubectl get pods
kubectl get svc
kubectl get deployment
#14 of 15. Ingress
kubectl apply -f humangov-ingress-all.yaml
kubectl get ingress
#15 of 15. Double-check Route 53
Make sure the A records for california.humangov-ll3.click and florida.humangov-ll3.click point to the new load balancer
Access the web pages, and confirm they're operational.
1 of 9. [Simple Notification Service (SNS)] Create SNS Topic
[Simple Notification Service] -|-
Name: humangov-chatbot-notifications
[Next step]
[Create topic]
2 of 9. [CloudWatch] Create canaries for California and Florida
[CloudWatch] -/- Application Signals -/- [Synthetics Canaries]
[Create canary]
Use a blueprint: Heartbeat monitoring
Name: humangov-canary-ca
Application or endpoint URL: https://california.humangov-ll3.click
Screenshots: Take screenshots
Schedule: Run continuously
Run canary every: 1 minute
Start immediately after creation: checked
CloudWatch alarms [Add new alarm]
Meric name: Failed
Alarm condition: Greater/Equal
Threshold: 1
Period: 1 minute
Set notifications for this canary:
Select an existing SNS topic: humangov-chatbot-notifications
[Create canary]
Repeat the above steps for Florida.
3 of 9. [AWS Chatbot] - Slack Notifications
[AWS Chatbot] -/- Configured clients -/- [Configure new client]
Choose client type: Slack
[Configure]
Slack.com
[Create a new workspace]
[e-mail address]
[Enter CODE]
Workspace name: HumanGov
Your name: [your name]
[Next]
Who else is on the HumanGov team? [Skip this step]
What's your team working on right now? sre-team
[Next]
Repeat the previous step to attempt to add Slack, and this time, permit AWS Chatbot to your Slack workspace.
[Configure new channel]
Configuration name: sre-team
Slack channel
Channel type: Public
Public channel name: sre-team
Role name: AWSChatBot-Role-HumanGov
SNS Topics
Region: US-east-1
Topic: humangov-chatbot-notifications
[Configure]
4 of 9. [AWS Chatbot] - Test Message
Send a test message
[AWS Chatbot] -/- Configured clients -/- [Slack workspace: HumanGov]
[sre-team]
[Send test message]
5 of 9. [Cloud9] - Simulate an outage
kubectl get deployments
kubectl scale deployment humangov-nginx-reverse-proxy-california --replicas=0
kubectl get deployments
https://california.humangov-ll3.click
6 of 9. [CloudWatch] and [Slack] Check for notification of outage.
[CloudWatch] -/- Application Signals -/- Synthetics Canaries
[if neccessary, make sure that the proper region is selected]
humangov-canary-ca should show 'failed' status
[Slack]
Check the sre-team channel, should also see a notification of the outage.
7 of 9. [Cloud9] - Service Restoration
kubectl get deployments
kubectl scale deployment humangov-nginx-reverse-proxy-california --replicas=1
kubectl get deployments
https://california.humangov-ll3.click
8 of 9. [CloudWatch] Confirm alarm clears
[CloudWatch] -/- Application Signals -/- Synthetics Canaries
9 of 9. [CloudWatch / Chatbot / Simple Notification Service / Cloud9] Environment Clean-up
#1. Deleting Canaries
[CloudWatch] -/- Synthetics Canaries
select all canaries
Actions -/- [Stop]
Actions -/- [Delete] [may have to wait a couple minutes]
#2. Delete Chatbot Integration
[AWS Chatbot] -/- Configured clients -/- [Slack - HumanGov]
Configured channels -/- sre-team -/- [Delete]
[Remove workspace configuration]
#3. Delete SNS Topic
[Simple Notification Service] -/-Topics -/- [humangov-chatbot-notifications]
[Delete]
#4. Delete Kubernetes Ingress
kubectl delete -f humangov-ingress-all.yaml
#5. Delete the state resources
kubectl delete -f humangov-california.yaml
kubectl delete -f humangov-florida.yaml
kubectl delete -f humangov-staging.yaml
#6. Delete EKS cluster
eksctl delete cluster --name humangov-cluster --region us-east-1
References
APM Tool - Amazon CloudWatch FAQs - AWS
Using synthetic monitoring - Amazon CloudWatch
Working with hosted zones - Amazon Route 53
Push Notification Service - Amazon Simple Notification Service - AWS
Configure integration between notifications and AWS Chatbot - Developer Tools console
Managing access keys for IAM users - AWS Identity and Access Management
IAM roles - AWS Identity and Access Management
Calling AWS services from an environment in AWS Cloud9 - AWS Cloud9
Creating or updating a kubeconfig file for an Amazon EKS cluster - Amazon EKS
create-policy — AWS CLI 2.15.24 Command Reference
Creating and managing clusters - eksctl
Deleting an Amazon EKS cluster - Amazon EKS
IAM Roles for Service Accounts - eksctl
Installing the AWS Load Balancer Controller add-on - Amazon EKS
Using Helm with Amazon EKS - Amazon EKS
kubectl Quick Reference | Kubernetes
Slack is your productivity platform | Slack
Comments
Post a Comment