Skip to content
Go back

Logs and metrics — two signals, two jobs

Updated: 

When you run your application or service (especially in production), sooner or later you would need to answer questions like

Observability tools throw around the words logs and metrics all the time. They sound similar (both are “something my code emits”), but they solve different problems and need different handling.

This post sums up:

Table of Contents

Open Table of Contents

What Is a Log?

A log entry is a statement produced by code at the moment something interesting happens.

Examples:

2025-07-21T14:18:04Z INFO api user=42 "payment created"
2025-07-21T14:18:07Z ERROR db "duplicate key value violates unique constraint"

Key points:


What Is a Metric?

A metric is a numeric measurement sampled at a regular interval (seconds, minutes) and named so that you can graph/alert on it.

Examples:

payments_processed_total{service="api"} 2_400_123
request_latency_seconds_bucket{le="0.25", route="/login"} 486
cpu_usage_percent{host="api-01"} 73.2

Key points:

Logs vs Metrics – quick checklist

Think of a web request:

Logs and metrics are complementary: metrics tell you something is wrong; logs tell you why. They overlap, they can be converted into each other, but they play different roles.

LogsMetrics
SampleIndividual eventAggregate/summary
FormatText or structured JSON/protoNumbers with labels
Typical sizeKilobytes eachBytes each
Query styleSearch, grep, full-text, traceMath, filter, group by time
Storage retentionHours to weeks (expensive)Weeks to years (cheap)
Use casesDebug, audit, investigateDashboards, alerts, capacity

Logging Best Practices

  1. Log intent, not just data Branches (if, catch) and state changes are what to look at.
  2. Be structured JSON, key=value, or protobuf — free dimensions for search.
  3. Add correlation IDs trace_id, user_id… lets you stitch full story.
  4. Control volume
    • Rate-limit or sample noisy statements.
    • When rate-limitng, time-based sampling (once per second) usually beats count-based (every 100 calls) when traffic varies.
  5. Keep levels, but use them DEBUG off in prod; ERROR should be rare.
  6. Ship ASAP Buffering is OK, losing logs isn’t (especially crashes).

Pros:

Cons:

Metrics Best Practices

  1. Make them first-class Use a metrics client (prometheus_client, statsd, etc.), not logger.info.
  2. Pick clear names and units queue_size_gauge, request_latency_seconds. Units in the name avoid “is that ms or s?”.
  3. Choose the right type Counter (monotonically increasing), Gauge (up/down), Histogram/Summary (distribution).
  4. Label sparingly 10 high-cardinality labels will sink your TSDB. Pick the ones you actually slice on.
  5. Sample/aggregate in the agent, not in code The collector can down-sample later; raw ≠ noisy.
  6. Alert on RED/USE Request Rate, Errors, Duration; Utilization, Saturation, Errors.

Pros:

Cons:

When to Use Which?

  1. Am I alive? → Metric (up == 1)
  2. Error rate spiking? → Metric (5xx_rate_per_min) alerts, then logs for stack-traces
  3. Need to audit one user’s journey? → Logs (plus distributed trace if you have it)
  4. Capacity planning? → Metrics (CPU, p99 latency, QPS)
  5. Post-mortem timeline? → Both (metrics for impact window, logs for root cause)

Rule of thumb: emit both, treat them differently.

Conclusion

Logs and metrics are signals, not rivals.

Store raw events as logs for source of truth and storytelling; publish rolled-up numbers as metrics for health checks and automation. Mixing them (“metrics-inside-logs” or “log-every-metric-tick”) usually hurts scale and clarity.

Happy shipping, and may your dashboards stay green!


Share this post on:

Previous article
Setting Up Laravel Octane with Swoole on MacOS
Next article
Optimizing Web Assets, Part 2, Fonts