Observer Effect
The first time someone proposed adding Sentry to our server-side stack, I thought they were out of their mind.
Don't get me wrong. Sentry isn't the problem. Error tracking is necessary, and it's genuinely useful. What unnerved me was the tone. "Let's just add it. It's easy. Drop in the SDK." That casualness felt dangerous.
From inside the application, every exception fires an event over the network to an external service. In normal operation, that's fine. Exceptions are rare. The problem is when things go wrong.
An outage produces more exceptions. More exceptions produce more telemetry events. The SDK sends them over TCP. Connections pile up. TIME_WAIT accumulates. Retransmissions and congestion control amplify the delay. Backpressure propagates into the application. Threads starve. The service itself degrades. A primary failure triggers a secondary disaster. The observation breaks the system.
In quantum mechanics, there's the observer effect — the act of measuring a system alters its state. Software monitoring has the same structure. Harmless during normal operation, it changes the system's behavior during failure. For the worse.
An observability system is not a steady-state utility. It's a system that generates the most traffic exactly when you can least afford it. A fire alarm that starts fires.
The root problem is that telemetry sits on the request hot path. In the frontend world, telemetry goes out via beacons or fire-and-forget. If it fails, you drop it. UX comes first. On the server side, people await the send, use synchronous HTTP, and retry on failure. The exact design that frontend engineers know to avoid, backend engineers do without flinching.
Observability should be best-effort. Availability outranks observability. Under pressure, you reduce data, not increase it. Telemetry should flow as a stream, not an RPC. Above all, it must be isolated from the application. Dynamic sampling, rate limiting, circuit breakers, local ring buffers — these aren't nice-to-haves. They're requirements.
The application writes only to an async, lock-free in-process buffer. A separate local agent picks it up, forwards it to a lightweight ingest layer built to drop, and feeds it into a stream. Not "collect everything, then analyze." Drop along the way, keep only what matters. Logs are disposable. The service is not.
I'm not saying don't use Sentry. I remember what it was like before tools like that existed. But the ease of a one-line SDK lets teams skip the design decisions. Where to send. How much to send. What happens during an outage. Jumping straight to "just add it" without asking those questions — that's what scared me.
A safety mechanism, poorly designed, becomes a failure amplifier. The priority isn't precision. It's survival.
In the end, a veteran SRE on the project spoke up. "Connection buildup and congestion under failure — that concerns me." Sentry was scoped to the frontend only. The backend went with a local agent architecture. It was settled before I had to say a word.
That was five or six years ago. Today's Sentry SDK ships with sampling, rate limiting, and backoff built in. Most of what I worried about back then has been addressed on the SDK side.
Maybe I was just being a worrywart. But if being a worrywart is the worst that happens, I'll take it.