TAGS :Viewed: 15 - Published at: a few seconds ago

FastAPI Microservice Patterns: Application Monitoring What’s the problem? During development and maintenance of microservices things go wrong. In many situations the problems root cause resides in the application code. It’s essential to have insight into the internals of the application. But how to enable observability of numerical metric values to ease understanding applications and troubleshoot problems? Solution alternatives There are several layers w.r.t. service monitoring. Monitoring the container orchestration platform, the service logic, etc. And there are different types of data to gather w.r.t. observability (numerical values e.g. the time spent during execution of an endpoint, log strings, etc.). The monitoring of the different levels and different types complement each other instead of being alternative approaches. To enable observability via logs the log aggregation pattern can be used. To enable observability of timing across several services one uses the distributed tracing pattern. This post is about the service logic level w.r.t. gathering numerical values only. One solution: Application monitoring When applying the application metrics pattern the application code is instrumented to gather metric values and collect them in a central place by either pushing it from a microservice to a metrics service or by letting a metrics service pull it from a microservice. Pattern implementation This example instruments the pattern by using Prometheus to implement the metrics service code. Prometheus can collect different metric types: Counter: “A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart.” Gauge: “A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.” Histogram: “A histogram samples observations (usually things like request durations or response sizes) and counts them in configurable buckets. It also provides a sum of all observed values.” Summary: “Similar to a histogram, a summary samples observations (usually things like request durations and response sizes). While it also provides a total count of observations and a sum of all observed values, it calculates configurable quantiles over a sliding time window.” Instead of using the Prometheus Python Client to instrument the service code one can use one of several higher level libraries. At the time of writing there are prometheus-fastapi-instrumentator, starlette-prometheus or starlette_exporter. The different libraries export different metrics. Starlette is the ASGI framework FastAPI is built on top of. This implies that the libraries starlette-prometheus and starlettte_exporter collect ASGI specific metrics (e.g. information w.r.t. HTTP requests) only. In case one would need other communication related information like e.g. messaging meta-data (How often have messages on a specific topic been processed? etc.) one would have to add this explicitly. This example implementation uses starlette-prometheus. Time will show which starlette-focused library will become the de-facto standard. The example makes provides metrics about the number of times a decorated function was called and the total amount of time spent in a decorated function. The gathered information without some way to read it is of no value. The “dashboard framework for observability” Grafana is used to explore the metrics. Grafana allows to create dashboards to visualize the metrics. This topic is not part of this post. There are a lot of great resources about how to create dashboards available online. The post FastAPI Microservice Patterns: Local Development Environment — Skaffold, docker, kubectl and minikube in a nutshell describes how to setup the local development environment and how to get the source code of the pattern implementations. The sub directory application-monitoring of the source code repository contains the example implementation of the pattern. Run the microservice, Prometheus and Grafana with minikube start followed by skaffold dev --port-forward . Skaffold deploys Prometheus (via prometheus/k8s/deployment.yaml ), Grafana (via grafana/k8s/deployment.yaml ) and a minimalistic “hello world” microservice. Prometheus is accessible via localhost:9090 , Grafana is accessible via localhost:3000 and the microservice via localhost:9000 . The microservice is instrumented using starlette_prometheus to provide it’s metrics via endpoint /metrics in service-a/app/main.py : ... from starlette_prometheus import metrics, PrometheusMiddleware app = FastAPI() app.add_middleware(PrometheusMiddleware) app.add_route(“/metrics”, metrics) ... The metrics are observable via localhost:9000/metrics : localhost:9000/metrics At the time of writing starlette-prometheus supports the following metrics: starlette_requests_total: Total count of requests by method and path. starlette_responses_total: Total count of responses by method, path and status codes. starlette_requests_processing_time_seconds: Histogram of requests processing time by path (in seconds). starlette_exceptions_total: Total count of exceptions raised by path and exception type. starlette_requests_in_progress: Gauge of requests by method and path currently being processed. In addition the builtin metrics of the Prometheus Python client are provided. Garbage collection related metrics are defined in client_python/gc_collector.py : python_gc_objects_collected: Objects collected during GC. python_gc_objects_uncollectable: Uncollectable object found during GC. python_gc_collections: Number of times this generation was collected. Platform related metrics are defined in client_python/platform_collector.py : python_info: Python platform information. Process related metrics are defined in client_python/process_collector.py : virtual_memory_bytes: Virtual memory size in bytes. resident_memory_bytes: Resident memory size in bytes. start_time_seconds: Start time of the process since unix epoch in seconds. cpu_seconds_total: Total user and system CPU time spent in seconds. max_fds: Maximum number of open file descriptors. open_fds: Number of open file descriptors. Prometheus is configured via a config map ( prometheus/k8s/config-map.yaml ) to gather the metrics provided of service-a periodically: ... prometheus.yml: |- global: scrape_interval: 15s evaluation_interval: 15s rule_files: # — “first.rules” # — “second.rules” scrape_configs: - job_name: service-a static_configs: - targets: [‘service-a:80’]