Skip to content

Kickoff

references

  • https://chat.deepseek.com/a/chat/s/60acbcc7-ec7d-4980-8b85-5fa9d2a82f9f forwards logs java1
  • https://chat.deepseek.com/a/chat/s/577558f7-a977-4e8b-b64d-ac8647bcb825 forwards logs java2
  • in progress

what are common things in microservices, developer instruments

Here are the common elements developers instrument in microservices for monitoring with tools like Prometheus:

1. Application Metrics
   HTTP Requests: Latency, throughput, error rates (status codes 4xx/5xx)

API Endpoints: Performance per endpoint (GET/POST/PUT/DELETE)

Custom Business Metrics: Transactions processed, orders placed, etc.

2. System/Infrastructure Metrics
   CPU Usage: Process and system-level CPU consumption

Memory Usage: Heap, non-heap, buffer pools (JVM), RSS (native)

Disk I/O: Read/write operations, latency

Network: Bandwidth, connections, errors

File Descriptors: Open files/sockets

3. JVM Metrics (For Java Services)
   Garbage Collection: GC cycles, pause times, memory reclaimed

Threads: Active, daemon, deadlocked threads

Class Loading: Loaded/unloaded classes

JVM Uptime: Time since last restart

4. Database Metrics
   Connection Pools: Active/idle connections, wait time

Query Performance: Slow queries, latency percentiles

Transaction Rates: Commits, rollbacks, deadlocks

5. Cache Metrics
   Hit/Miss Ratios: Cache effectiveness

Evictions: Items evicted due to size limits

Latency: Read/write times for cache operations

6. Message Queue/Event Metrics
   Queue Length: Messages waiting to be processed

Processing Time: Consumer lag, throughput

Errors: Failed deliveries, retries

7. External Service Dependencies
   HTTP Clients: Latency, errors, retries (outbound calls)

Circuit Breakers: State (open/closed/half-open), failure rates

Retries: Attempt counts, exponential backoff stats

8. Kubernetes/Container Metrics (If Applicable)
   Pod Resource Usage: CPU/memory requests vs. usage

Restarts: Crash loop detection

Liveness/Readiness: Health check statuses

9. Logging & Errors
   Error Rates: Exceptions, log error patterns

Warning Signals: Unusual but non-critical events

10. Synthetic Metrics (Proactive Monitoring)
    Heartbeats: Service alive checks

Scheduled Task Metrics: Cron job durations, success/failure

11. Distributed Tracing Metrics
    Request Flow: Latency across service boundaries

Dependency Map: Service-to-service call patterns

12. Security Metrics
    Authentication Attempts: Success/failure rates

Rate Limiting: Throttled requests

Key Non-Metric Considerations
Labels/Dimensions: Environment, service name, version, region

Cardinality Management: Avoid high-cardinality labels

Sampling: For high-volume metrics

Alerting Rules: Define meaningful thresholds