[go: up one dir, main page]

Discussion: Unified HTTP Server API for Labkit

Objective

Provide a unified, opinionated HTTP server abstraction in Labkit that encapsulates best practices for middleware composition, monitoring, and lifecycle management, reducing boilerplate and ensuring consistent observability across services.Retry

Background

Labkit provides utilities for various HTTP server middleware tasks, such as adding metrics for HTTP handlers and tracing.

However, combining these components into a functioning service is left to the user of the various Labkit packages. For example, to add metrics to the root handler a user would do something like this:

withMetrics := metrics.NewHandlerFactory()

mux := http.NewServeMux()
mux.Handle("/", withMetrics(rootHandler))

To also use tracing, the user is responsible for composing the available Labkit components:

withMetrics := metrics.NewHandlerFactory()

mux := http.NewServeMux()
mux.Handle("/", withMetrics(tracing.Handler(rootHandler)))

This has several draw-backs:

  1. Inconsistent middleware ordering leads to incorrect behavior. The position of middleware in the stack critically affects functionality – for example, authentication must run before rate limiting, and retry logic placement determines whether retries are measured in metrics. Each service may wire these differently, creating subtle bugs.
  2. Error-prone lifecycle management causes production incidents. Managing two separate servers (main HTTP and monitoring) requires careful goroutine coordination and signal handling. Critical requirements like graceful shutdown on SIGTERM (necessary for automatic scaling and zero-downtime deployments) are complex to implement correctly and often done incorrectly or incompletely.
  3. Organization-wide rollouts are slow and risky. Adding new cross-cutting concerns (e.g., rate limiting, circuit breakers) requires modifying every service individually. This leads to inconsistent adoption, extended rollout timelines, and difficulty ensuring all services have critical security or reliability features.
  4. Troubleshooting is difficult due to inconsistent implementations. When services wire components differently, debugging becomes service-specific knowledge. For example, if retried requests don't appear in metrics in one service but do in another, operators waste time discovering each service's quirks rather than solving actual problems.
  5. No organizational defaults increase cognitive load. Without standard patterns, every service author must understand and correctly compose all available middleware components. This opt-in approach means new services often launch missing important observability or reliability features.
  6. Cluttered application code obscures business logic. Middleware wiring and server configuration dominate service initialization code, making it harder to understand the actual application logic and increasing the likelihood of wiring mistakes during refactoring.
  7. Configuration sprawl hinders standardization and compliance. Each service implements its own configuration approach for middleware settings (timeouts, bucket boundaries, sampling rates), using different mechanisms (flags, environment variables, hardcoded values). This makes it impossible to enforce organization-wide standards, audit configuration for compliance, or quickly adjust settings across the fleet in response to incidents.

Proposal

Add a new labkit.HTTPServer type that abstracts away the complexity and offers a single control plane for an HTTP server. The user experience of this type would be like this:

srv := labkit.NewHTTPServer(
  labkit.WithAddress(":8080"),
  labkit.WithMonitoringAddress(":8082"),
)

srv.Handle("/", rootHandler,
  // Override route identifier in metrics and traces.
  labkit.WithRouteIdentifier("main page"),
)

err := srv.Start(ctx)
if !errors.Is(err, http.ErrServerClosed) {
  log.WithError(err).Fatal("HTTP server failed")
}

This will:

  • Start the main HTTP server on port 8080, and the monitoring endpoint on port 8082.
  • Register a single handler for / with metrics, tracing, and all other middleware components that are considered part of our best practise (currently: metrics, tracing, correlation ID).
    • Standard handlers can be disabled, e.g. with labkit.WithMetrics(false)
    • Non-standard handlers can be enabled, e.g. with labkit.WithFutureComponent(true)
    • Middleware is configured via options, e.g. with labkit.WithRequestDurationBuckets(…)
  • Starts the two servers, hiding the complexity of synchronizing Goroutines, installs signal handlers for SIGTERM, etc.

Integration

One of my strategic goals is to ensure Labkit and Runway alignment. To this end, the new API will always give precedence to explicit options (e.g. WithAddress(":8080")) over implicit configuration via environment variables (e.g. PORT). This allows Runway to set defaults, while leaving the final say to the developers.

Suggested environment variables:

Variable Description
PORT Used e.g. by Cloud Run to indicate on which port the application should expect HTTP requests.
RUNWAY_SERVICE_ID Service name provided by Runway. Can be used in metrics, to populate the continuous profiling service name, etc.

Future work

While this proposal focuses on HTTP since it is much more widely used, the same approach could be used for gRPC. A labkit.GRPCServer could serve as the "single pane of glass" for developers, and set up "interceptors" in a standard way behind the scenes.

Edited by Florian Forster