Comprehensive Guide to Monitoring and Alerting with Prometheus, Grafana, and Alertmanager
Introduction
This documentation provides detailed instructions for setting up Prometheus, Grafana, and Alertmanager to monitor a Node.js application effectively. By following this guide, you'll learn how to install the necessary components, configure alerts based on specific metrics, and visualize those alerts in Grafana.
Prerequisites
Before starting, ensure that you have the following:
Prometheus: Installed and running.
Grafana: Installed and running.
Alertmanager: For handling alert notifications.
Node.js application: Exposing metrics at the
/metrics
endpoint.Basic understanding of Prometheus metrics, alerts, and Docker (optional but useful).
Table of Contents
-
Option 1: Install using Docker
Option 2: Manual installation
Step 1: Install Alertmanager
Option 1: Install Alertmanager Using Docker (Recommended)
To quickly set up Alertmanager using Docker, execute the following command in your terminal:
docker run -d --name=alertmanager -p 9093:9093 prom/alertmanager
- Access: Once running, Alertmanager will be available at http://localhost:9093.
Option 2: Manual Installation
Download Alertmanager from the official Prometheus downloads page.
Extract the downloaded tarball:
tar -xvzf alertmanager-*.tar.gz
Start Alertmanager with the following command:
./alertmanager --config.file=alertmanager.yml
Step 2: Configure Alerting Rules in Prometheus
Set up Prometheus to trigger alerts based on specific conditions (e.g., CPU usage exceeding 80%).
Create an Alerting Rules File: Create a file named
alert.rules.yml
in your Prometheus directory with the following content:groups: - name: example rules: - alert: HighCPUUsage expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100) > 80 for: 1m labels: severity: critical annotations: summary: "High CPU usage detected on {{ $labels.instance }}" description: "CPU usage has been over 80% for the past 1 minute on {{ $labels.instance }}."
Update the Prometheus Configuration: Add the following to your
prometheus.yml
:rule_files: - "alert.rules.yml" alerting: alertmanagers: - static_configs: - targets: - "localhost:9093" # Address of the Alertmanager instance
Restart Prometheus:
If running manually:
./prometheus --config.file=prometheus.yml
If using Docker:
docker restart prometheus
Step 3: Set Up Alertmanager
Configure Alertmanager to handle alerts triggered by Prometheus.
Create/Modify the Alertmanager Configuration: Create or edit
alertmanager.yml
with the following example configuration to send notifications via email (adjust to your needs):route: receiver: 'email' receivers: - name: 'email' email_configs: - to: 'your-email@example.com' from: 'alertmanager@example.com' smarthost: 'smtp.example.com:587' auth_username: 'username' auth_password: 'password'
Start/Restart Alertmanager:
If running manually:
./alertmanager --config.file=alertmanager.yml
If using Docker:
docker restart alertmanager
Step 4: Verify Alerts in Prometheus
Access Prometheus UI: Open http://localhost:9090.
Navigate to Alerts: Go to the Alerts section to see active and inactive alerts.
Simulate High CPU Usage:
Use a tool like
stress
to create load on your CPU.Alternatively, adjust the PromQL expression to simulate high usage.
Monitor Alert Status: Once the CPU usage exceeds the defined threshold, the alert will transition from Pending to Firing.
Step 5: Display Alerts in Grafana
Grafana can visualize alerts from Prometheus, providing a unified view for monitoring.
Open Grafana: Access Grafana at http://localhost:3000.
Configure Data Source:
Go to Configuration (gear icon) > Data Sources.
Select your Prometheus data source.
Ensure the Alerting section is enabled.
Create a Panel for Alerts:
In your dashboard, click on the + icon and select Dashboard.
Click on Add Panel and use the following PromQL to display alerts:
ALERTS
Choose a Table or Graph visualization type.
Click Apply to save the panel.
Conclusion
By following this guide, you've successfully set up Prometheus to trigger alerts based on specific metrics such as CPU usage. You configured Alertmanager to manage those alerts and displayed them in Grafana for comprehensive monitoring. This setup enhances your ability to monitor the health of your Node.js application and respond promptly to issues.