Skip to content

jeisonBorba/grafana-stack

Repository files navigation

Observability POC

Index

  1. Introduction
  2. Observability Stack
  3. Outbox Concept with Scheduler
  4. Integration Tests and the Test Pyramid
  5. How to Run the Application

Introduction

This repository presents a proof of concept (POC) focused on observability of applications using the Grafana stack. Observability is one of the essential pillars of distributed systems, allowing monitoring, tracing, and identifying problems in real-time.

In this POC, we use the following tools:

  • Grafana Tempo
  • Grafana Loki
  • Prometheus
  • Grafana

The goal is to demonstrate how these tools work together to provide full visibility into the system's behavior.


Observability Stack

OpenTelemetry

OpenTelemetry is one of the most important components for implementing modern observability. It acts as a data collection layer, allowing the tracking of metrics, logs, and distributed traces. It provides:

  • APIs and SDKs to instrument applications.
  • A configurable pipeline to collect, process, and export observability data to tools like Prometheus, Grafana Loki, and Grafana Tempo.

Learn more here.

Grafana Tempo

Grafana Tempo is a distributed tracing system designed to store large volumes of traces at reduced costs. It is optimized to:

  • Aggregate and correlate traces from different sources.
  • Provide insights into latency and performance.
  • Integrate with visualization tools like Grafana.

Learn more here.

Grafana Loki

Grafana Loki is a logging solution focused on simplicity and efficiency. It excels at:

  • Storing logs in a scalable and efficient format.
  • Being optimized for correlation searches with metrics and traces.
  • Working harmoniously with Prometheus and Grafana.

Learn more here.

Prometheus

Prometheus is a powerful monitoring and alerting tool for applications and systems. It is responsible for:

  • Collecting and storing metrics in a time-series database.
  • Performing advanced queries using PromQL.
  • Integrating with Loki and Tempo for a unified view.

Learn more here.

Grafana

Grafana is a data visualization and analysis platform. In this stack, it:

  • Centralizes the visualization of metrics, logs, and traces.
  • Facilitates event correlation for quick analysis.
  • Provides customizable dashboards for real-time observability.

Learn more here.

Observability Flow

The image below demonstrates how the Grafana stack tools, together with OpenTelemetry, enable distributed tracing and provide complete observability:

Observability Flow

Outbox Concept with Scheduler

The Outbox strategy is an approach to ensure eventual consistency between distributed systems. It acts as an intermediary between the database write layer and the messaging system.

How it works?

  1. Safe persistence: Events are recorded in a dedicated table called Outbox.
  2. Replication with scheduler: A scheduler periodically processes the events from the Outbox table and publishes them to messaging systems or external APIs.
  3. Delivery confirmation: Successfully processed events are marked as delivered to avoid duplication.

This approach is particularly useful in scenarios where consistency and integrations are critical, such as microservices or integration with third-party systems.

Learn more about the Outbox pattern here.


Integration Tests and the Test Pyramid

Examples of Integration Tests

To access practical examples of integration tests implemented in this project, check the classes available in the repository:

Integration tests are crucial to validate the behavior of modules and systems together. In this POC, we follow the concept of the test pyramid, which suggests:

Pyramid Layers

  1. Unit Tests (base):

    • Fast and focused on small units of code.
    • Example: Validate a specific data conversion function.
  2. Integration Tests (middle):

    • Verify the interaction between different modules or systems.
    • Examples: Test the integration between the application and the database, between microservices, with systems like Kafka for messaging, Redis as a distributed cache, or the web layer to validate complete flows, such as creating a new product.
  3. Interface/End-to-End Tests (top):

    • Simulate the complete application flow, as a user would.
    • Example: Validate the flow of creating a new product, covering from the web layer to data replication in Kafka and storage in Redis.

Strategy in this POC

We implemented integration tests to validate:

  • Distributed Tracing with the Grafana Stack.
  • Correlation of metrics with traces in Grafana Tempo.
  • Monitoring of events generated by Prometheus.
  • Outbox Concept.
  • Examples of Integration Tests.

To learn more about the test pyramid, read this article.


How to Run the Application

  1. Clone the repository to your local environment:

    https://github.com/jeisonBorba/grafana-stack.git
  2. Check the execution permissions for the docker-compose.sh file:

    chmod +x docker-compose.sh
  3. Run the script to start the containers:

    ./docker-compose.sh

Note for Windows Users

If you use Windows, it is recommended to run the script in Git Bash. To do this:

  1. Install Git Bash.

  2. Open the Git Bash terminal in the project folder.

  3. Run the command:

    ./docker-compose.sh

List of Started Containers

When running the script, the following containers will be started:

  1. MongoDB - NoSQL database for document storage.
  2. Redis - Distributed in-memory cache system.
  3. Zookeeper - Coordination service for Kafka.
  4. Kafka - Distributed messaging platform.
  5. Kafka UI - Graphical interface for managing Kafka, configured to run on port 9001.
  6. Memcached - Lightweight, high-performance in-memory cache.
  7. Tempo - Distributed tracing system.
  8. Loki - Solution for log collection and storage.
  9. Prometheus - Monitoring and metrics collection tool.
  10. Grafana - Data visualization and dashboard platform, configured to run on port 3000.
  11. Solicitação Service - Custom application service, configured to run on port 8094.
  12. Teste Service - Another custom service, configured to run on port 8095.

Step-by-step for Executing Calls

  1. Access the Swagger of Solicitação Service:

    http://localhost:8094/swagger-ui.html
  2. Create a new request:

    • Click on POST /solicitacao and then on Try it out.
    • Fill in the nome and valor fields and click on Execute.

Swagger

Every 10 seconds, the application will process, through a Scheduler, the Outbox table looking for records with the attribute processado = false and send a message to Kafka. The Kafka Consumer of the teste-service application will consume the message and store it in MongoDB.

  1. Access the Swagger of Teste Service:

    http://localhost:8095/swagger-ui.html
  2. Check the created test:

    • Click on GET /testes and then on Try it out. (Option to list all created tests, paginated.)
    • Click on GET /testes/nomes/{nome} and then on Try it out.
    • Fill in the nome field with the name of the test created earlier through the request and click on Execute.

Swagger Teste


Step-by-step for Viewing Logs and Traces

  1. Access Grafana:

    http://localhost:3000

Grafana

  1. Expand the side menu and click on Explore.
  2. Select the Data Source Loki and fill in label filters with application and solicitacao-service.

Loki

  1. Click on Run Query to view the logs generated by Solicitação Service.

Logs

  1. Click on one of the logs to expand and view details.

Log Detail

  1. Click on Tempo to display the path taken by the request.

Tempo


I hope this repository helps to understand the concepts and tools used to build observable and resilient systems.

For any questions or suggestions, feel free to contact!

Thank you for reading this far! 🚀

About

(POC) focused on observability using Grafana Stack

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages