Creating AWS Cloudwatch Alarms Using Boto

In this post I’ll walk through the process of setting up cloudwatch alarms programatically in Python through Boto. We’ll be setting up a single alarm for a metric StatusCheckFailed, but you can configure other alarms as well. Check the AWS alarms console for the full list.

This post assumes you already have an instance, instance_id, AWS, and your boto config set up. Also assumed is that you’ve created a SNS Topic already. My SNS Topic is called “Server_Down”, and is simply an email that gets sent to me when a server fails a status check.

First, you’ll want to import the correct libraries and pull your SNS Topic into a local variable. I’m in us-west-2.

from boto.sns import connect_to_region
sns = connect_to_region('us-west-2')
topics = sns.get_all_topics()

topics will be a nested dictionary, the meat of the matter is here the topic ARN. If you have a single topic, you can just grab the first one like below. If not, you’ll need to filter out the correct one.

topic = topics[u'ListTopicsResponse']['ListTopicsResult']['Topics'][0]['TopicArn']

Now that you have your topic ARN, you can grab all the possible metrics for the instance you want to attach it to.

from boto.ec2.cloudwatch import connect_to_region
cw = connect_to_region('us-west-2')

You can query CloudWatch for all the possible Metrics that are already set up using cw.list_metrics() but this doesn’t help you much if you want to attach it to a specific instance. You’ll want to specify a diction of dimensions, as a filter. To do that, (assuming you have your instance_id in an local instance_id variable):

metric = cw.list_metrics(dimensions={'InstanceId':instance_id},

You now have the metric you want to create an alarm on, and the SNS topic you want to be triggered. It’s time to create the alarm. The below alarm will trigger if 2 checks fail, and each check will happen every 5 minutes. It’ll need a name - for this example we’ll call it “my_sweet_alarm”.

alarm_name = 'my_sweet_alarm'
metric.create_alarm(name=alarm_name, comparison='>=', threshold=1, period=300,
                    evaluation_periods=2, statistic='Average', alarm_actions=[topic])

If you check your alarms, you’ll now see that you’ve got an alarm with the name “my_sweet_alarm”, monitoring for StatusCheckFailed >= 1 for 10 minutes.

If you found this post helpful, please consider sharing to your network. I'm also available to help you be successful with your distributed systems! Please reach out if you're interested in working with me, and I'll be happy to schedule a free one-hour consultation.