AWS CloudWatch: Enhancing Monitoring and Notification

Athira KK
9 min readMay 1, 2024

--

Hey Techies…👋

Introduction to CloudWatch:

  • AWS CloudWatch is a comprehensive monitoring service offered by Amazon Web Services.
  • Originally focused on monitoring, CloudWatch has expanded its capabilities to include logging, events, and more.
  • It serves as a central hub for monitoring the performance and health of AWS environments.

Key Features of CloudWatch:

◼️Monitoring Service:

  • CloudWatch serves as a monitoring solution for AWS resources, tracking performance metrics such as CPU utilization, disk I/O, and network traffic.
  • It automatically generates standard metrics for various AWS services used in a region.

◼️Logging Solution:

  • In addition to monitoring, CloudWatch functions as a logging solution, allowing users to collect, store, and analyze log data generated by AWS resources and applications.
  • Logs from services like EC2 instances can be streamed to CloudWatch for centralized log management.

◼️Event Monitoring:

  • CloudWatch captures real-time events within the AWS environment, such as instance launches, terminations, or volume creations.
  • Users can set triggers and notifications based on these events, often integrated with AWS Lambda functions.

◼️Standard and Custom Metrics:

  • CloudWatch provides both standard and custom metrics for monitoring AWS resources.
  • Standard metrics cover common performance indicators like CPU utilization, network traffic, and disk operations.
  • Users can define custom metrics tailored to their specific monitoring needs.

◼️Alarms and Notifications:

  • Users can set alarms on CloudWatch metrics to trigger notifications when predefined thresholds are breached.
  • Notifications can be sent via email or integrated with Amazon SNS for broader alerting capabilities.

◼️Integration with AWS Services:

  • CloudWatch seamlessly integrates with various AWS services, including EC2 instances and EBS volumes.
  • Metrics and logs from these services are collected and monitored by CloudWatch, providing insights into resource performance.

Practical Use Cases:

  • CloudWatch simplifies monitoring by automatically collecting metrics for AWS resources.
  • Users can customize monitoring settings and set up alarms to receive timely notifications of any performance anomalies.
  • Practical examples include setting alarms for CPU utilization exceeding a certain threshold, which triggers email notifications via SNS.

If you’ve already reviewed the previous blog post, we’ve provided a template for launching EC2 instances. Kindly review it, or proceed to create an EC2 instance.

Select the EC2 instance and click on monitoring:

Before proceeding further, take note of the various metric names available, such as CPU utilization, which indicates the percentage of CPU being used over time. Additionally, observe other metrics like status checks, network in and out (in bytes), network packets in and out (count), and disk read operations.

These metrics are automatically generated by CloudWatch when you launch an EC2 instance.

If you require additional metrics, such as RAM or disk utilization, you’ll need to create custom metrics.However, for the purpose of this session, we’ll focus solely on CPU utilization, as it’s one of the most critical metrics to monitor.

By default, CloudWatch checks and updates these metrics every 5 minutes, populating the corresponding graphs.If you prefer more frequent updates, you can enable detailed monitoring, though it’s important to note that this option isn’t free.

For this hands-on, enabling detailed monitoring is optional.

  1. Select the option to “Enable” detailed monitoring, indicating that CloudWatch should monitor metrics every minute.
  2. Keep in mind that detailed monitoring incurs additional charges compared to the default 5-minute monitoring interval.
  3. Despite the additional cost, enabling detailed monitoring offers more granular insights into your AWS environment’s performance.

Log in to the terminal of the EC2 instance and switch as root user.

Then install the stress tool along with its dependencies on your instance.

Stress is a versatile tool designed to test the limits of your Linux operating system, particularly focusing on CPU performance and other system metrics. Simply running the stress command allows you to stress the CPU. You can specify parameters such as the number of CPUs to stress and even utilize functions like the square root function. Additionally, stress can be used to stress other components like IO and RAM, but for our purpose, we’re concentrating on CPU stress testing.

yum install stress -y
nohup stress -c 3 -t 100 &

This command executes the stress utility in the background using the nohup command, which allows the process to continue running even after the terminal session is terminated. Let’s break down the command:

  • nohup: Prevents the following command from being terminated when the terminal session ends.
  • stress: The stress testing utility.
  • -c 3: Specifies that 3 worker threads will be used to stress the CPU cores.
  • -t 100: Specifies that the stress test will run for 100 seconds.
  • &: Runs the command in the background.

So, this command will run the stress test with 3 worker threads for 100 seconds, and the process will continue running even if the terminal session is closed.

Execute the top command.

top

You’ll notice four stress processes running, indicating the CPU utilization at 100 or close to it, reflected by the load average increment.

Continuously stressing the instance like this will be monitored by CloudWatch, updating the graph every minute.

Repeat this process a few times, running for intervals like 100 seconds and 200 seconds. After a few minutes, you’ll observe a pattern forming on the graph.

Let’s create a script for this process.

vim stress.sh
sleep 60 && stress -c 4 -t 60 && sleep 60 && stress -c 4 -t 60 && sleep 60 && stress -c 4 -t 30 && sleep 60 && stress -c 4 -t 100 && sleep 30 && stress -c 4 -t 200
:wq!

This sequence of commands progressively varies the duration of the stress tests, interspersed with periods of rest. It’s a way to simulate different levels of CPU load on the system over time. designed to stress test the system’s CPU using the stress utility. Let’s break down each part:

◼️sleep 60: This command pauses execution for 60 seconds before proceeding to the next command. It introduces a delay in the sequence.

◼️stress -c 4 -t 60: This command runs the stress utility with the following options:

◼️-c 4: Specifies that 4 worker threads will be used to stress the CPU cores.

◼️-t 60: Specifies that the stress test will run for 60 seconds.

◼️sleep 60: Another pause of 60 seconds.

◼️stress -c 4 -t 60: Similar to the second command, this runs a stress test for another 60 seconds.

◼️sleep 60: Another pause of 60 seconds.

◼️stress -c 4 -t 30: This command runs a shorter stress test for 30 seconds.

◼️sleep 60: Another pause of 60 seconds.

◼️stress -c 4 -t 100: This command runs a longer stress test for 100 seconds.

◼️sleep 30: A shorter pause of 30 seconds.

◼️stress -c 4 -t 200: This command runs an even longer stress test for 200 seconds.

Execute this script

./stress.sh

We generate a graph by running the command. I simply executed it using nohup stress.sh &.

Ensure your script is executable.

chmod +x stress.sh

Now, when observing the top output, you may intermittently notice the stress command based on the ongoing operations within your script.

Now, let’s navigate to the CloudWatch service to configure an alarm for CPU utilization on this instance.

First, head over to the “All alarms” section.

Next, click on the “Create Alarm” button.

To select the CPU utilization metric, start by navigating to the EC2 service.

Then, locate the “Per-instance metrics” section and find your instance. If you don’t immediately see your instance, wait for a few moments for the information to load.

Once you’ve found your instance, look for the CPU utilization metric.

Select the CPU utilization metric by clicking on it.

In this step, we can choose the period for which the alarm will be evaluated.

For this demonstration, I’ll keep the period set to 5 minutes.

Next, we specify the condition for triggering the alarm based on CPU utilization.

For example, we can set the condition to trigger the alarm if the CPU utilization is greater than or equal to 60 for a period of 5 minutes.

Once we’ve defined the condition, we proceed to the next step.

If you don’t find your desired topic listed, you can click on “Create Topic.” Provide a name for the topic, enter your email address, and click on “Create Topic.”

However, since we already created a topic for billing alarms earlier, I’ll select the same topic, which contains my email address.

This topic is designated for notifications. So, when the alarm state is triggered, it will send a notification to this topic, ultimately resulting in an email notification being sent to me.

There are several other actions you can take, such as EC2 actions. For instance, you might want to stop, terminate, or reboot the instance if it triggers an alarm.

In some cases, a high CPU utilization might prevent you from logging into the instance via SSH. Rebooting the instance could be a temporary solution to address this issue.

However, for the purpose of this demonstration, we will skip these additional actions. We will stick to email notifications only and proceed to the next step.

Let’s give the alarm a descriptive name.

The naming convention will be: “warning-web-health-alarm-notification-cpu” for the specific instance.

In our organization, a warning is triggered when the utilization exceeds 60%. We could set up additional alarms for critical situations, such as when it surpasses 80%. However, for now, we’ll proceed with just one alarm.

Now, let’s move on to the next step.

After a while, CloudWatch will collect the data and display whether the instance is in an alarm state or if it’s okay.

Now, this process aligns with the basic principles of any monitoring tool. You have metrics or checks, alarms triggered by those metrics or checks, and actions associated with those alarms, such as sending email notifications.

Whether it’s Prometheus, Nagios, Icinga, Zenos, or others, the concept remains similar. However, with CloudWatch, monitoring is already configured, and you only need to set up alarms. In contrast, with other tools, such as those you set up yourself, the entire monitoring system needs configuration, typically by the monitoring or administration team.

Currently, the graph fluctuates because I’ve executed the stress command multiple times. Remember, you need to run the stress command for at least 5 minutes or longer for it to exceed the threshold and trigger the email notification.

After running the command for 5 minutes, wait for the notification to arrive in your inbox.

Moreover, you can create a reverse alarm to monitor the instance’s OK state. For instance, if the CPU utilization is below 40, it’s considered OK. You can set up alarms for different thresholds and states, and you have the flexibility to create up to ten alarms under the free tier.

Email notification:

I encourage you to experiment with these alarms to familiarize yourself with their functionalities. Set alarms for various data points, both in alarm and OK states, using different conditions like greater than or less than. Once you’re comfortable, remember to terminate the instance and delete the alarms before moving on to the next section, particularly the auto-scaling group section. Reference on few concepts: Imran Teli — Udemy.

Stay tuned for practical applications and hands-on examples.

Thank you 😊🫰🫶

--

--

Athira KK
Athira KK

Written by Athira KK

AWS DevOps Engineer | Calico Big Cats Ambassador | WomenTech Global Ambassador | LinkedIn Top Cloud Computing Voice

No responses yet