We're developing Odigos, an open-source project for effortless distributed tracing. See more at https://github.com/keyval-dev/odigos.

Distributed tracing tracks the journey of requests as they move through a distributed system, offering insights for debugging, performance optimization, and end-to-end visibility. It is crucial for gaining in-depth insights into request flows within complex distributed systems.

However, there are two ways in which it can suck: extensive code changes (requires manual instrumentation), and a significant performance impact.

Odigos addresses the code change challenge by using eBPF for automating tracing without any manual effort or code changes. We generate traces in OpenTelemetry format, ensuring compatibility and avoiding vendor lock-in. You can read more here

That leaves the performance issues, which we’ve been working on. Our performance tests, which we’ll go into below, show that eBPF-based automatic instrumentation is over 20x faster than manually instrumenting code. This is achieved by separating data recording and data processing, eliminating extra workload, object allocation, and network calls during application runtime.

As a result, you can have the best of both worlds: automated distributed tracing with minimal performance overhead.

How we tested

Benchmarking latency is not a trivial task. Latency can be measured in many different ways, each with its own advantages and disadvantages. Our testing is inspired by this excellent talk by Gil Tene called How NOT to Measure Latency. We are using a High Dynamic Range Histogram to visualize the results and wrk2 to generate load.

In order to reduce noise as much as possible, we are running each test on a dedicated AWS bare metal instance (c5n.metal) with Intel Xeon Platinum 8000 series (Skylake-SP) processor.

For the target application, we are using a simple Go HTTP server written in Go 1.21 that returns a simple JSON response.

Test results

Each test is run for 5 minutes and generates 10,000 requests per second. First, we ran the test without any instrumentation. Then we ran the test with manual instrumentation using OpenTelemetry Go SDK and finally, we ran the test with eBPF-based automatic instrumentation.

Test Results

At the lower percentiles (up to the 99.9th percentile), the overhead of not having instrumentation, manual instrumentation, and eBPF-based automatic instrumentation is similar

However, at the higher percentiles (99.9th percentile and above), manual instrumentation has a significantly higher overhead than eBPF-based automatic instrumentation, which is over 20x faster.

If you're wondering whether the top of the spectrum matters, the answer is yes, especially in distributed environments. The following table shows how many clients will experience the 99th percentile latency according to the number of different services involved in the request (taken from Gil's talk)

p99 is a lie

For maximum precision and isolation, we are generating traces containing a single span. In a production environment, we anticipate the performance difference to be even greater.

The performance impact of generating distributed traces

Let's see what happens inside our application when we generate distributed traces, either manually via SDKs or automatically via something like a Java agent or monkey patching:

  1. Recording data - Spans objects that contain the relevant data are being created
  2. Maintaining a queue of data - Spans are being batched in a queue before being sent to the external system
  3. Delivering data to the external system - The application sends the data in the queue to the external system by making network calls, serializing the data, and sending it over the network.

Another consideration to keep in mind is that the application runtime must now manage more objects, which can negatively impact heap size and GC performance, especially in languages with stop-the-world GC like Java. All this can lead to longer GC pauses and decreased performance. We'll dive deeper and explore this topic in a separate blog post. Stay tuned.

Unlike metrics and logs, distributed tracing is a stateful signal. Logs are typically written to a file, and metrics are typically pulled from an HTTP endpoint by the monitoring system (for example /metrics endpoint when exposing Prometheus metrics). Distributed tracing is different. It requires the application to actively send batched data to an external system.

Separation between recording and processing

When using eBPF to generate distributed traces, we are separating the recording of the data from the processing of the data. The recording of the data is done by the eBPF program and the processing and delivery of the data is done by a different process. This means that the application runtime does no additional work, creates no additional objects, and makes no additional network calls. The only additional overhead (compared to not having any instrumentation) is the overhead of invoking eBPF programs (context switching from user space to kernel space).

Conclusion

Traditionally, there has been a tradeoff between automated distributed tracing and performance.

Odigos solves this problem by providing a way to automate distributed tracing that actually improves performance. This is because we use eBPF-based automatic instrumentation to reduce the overhead of generating distributed tracing data.

As a result, you can now have the best of both worlds: automated distributed tracing without any performance overhead.

More updates coming soon

eBPF-based automatic instrumentation is a game-changer, enabling us to generate distributed traces without code changes or performance impact. We're just getting started and will bring this to more programming languages soon. Stay tuned!

Try it out

The easiest way to try eBPF-based distributed tracing today is to use our open-source project, Odigos. Please support us by starring ⭐ the project on GitHub.

logo
LEARN MORE
Related articles