perf: Add benchmarks for timeseries query (exemplars) performance #4665

marcsanmi · 2025-12-04T16:18:44Z

Adds benchmarks to measure and validate the performance of timeseries queries, particularly focusing on the exemplar collection overhead introduced in #4615.

Performance Results

Part 1: Refactoring Cost (NoExemplars vs weekly/f145)

We refactored profileEntryIterator to use a flexible options pattern. This section measures the cost of that refactoring even when exemplars are disabled.

Comparison Setup:

Baseline: weekly/f145 branch (old simple implementation, before exemplar PR)
Current: This branch with NoExemplars (new options pattern, exemplars disabled)

Commands Used:

# On weekly/f145 branch - copied test file and removed WithExemplars variant to make the benchmarks to work
go test -bench=BenchmarkTimeSeriesQuery -benchmem -count=10 ./pkg/querybackend/ > old.txt

# On exemplar branch with NoExemplars
git checkout marcsanmi/exemplars-benchmarks
go test -bench=BenchmarkTimeSeriesQuery -benchmem -count=10 ./pkg/querybackend/ > new.txt

# Compare
benchstat old.txt new.txt

goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/querybackend
cpu: Apple M3 Pro
                                                     │   old.txt    │               new.txt               │
                                                     │    sec/op    │    sec/op     vs base               │
TimeSeriesQuery/NoExemplars-11                         745.1µ ± 14%   753.3µ ± 12%       ~ (p=0.739 n=10)
TimeSeriesQuery_TimeRange/1Minute/NoExemplars-11       614.3µ ±  5%   620.5µ ±  4%       ~ (p=0.481 n=10)
TimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11      639.6µ ± 21%   639.3µ ± 15%       ~ (p=0.853 n=10)
TimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11     653.4µ ± 52%   655.6µ ± 12%       ~ (p=0.971 n=10)
TimeSeriesQuery_TimeRange/1Hour/NoExemplars-11         629.3µ ± 54%   631.6µ ± 17%       ~ (p=0.631 n=10)
TimeSeriesQuery/WithExemplars-11                                      831.5µ ± 22%
TimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                    831.5µ ±  4%
TimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                   848.3µ ± 91%
TimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11                  840.0µ ± 23%
TimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                      836.2µ ± 22%
geomean                                                654.8µ         742.6µ        +0.55%

                                                     │   old.txt    │               new.txt                │
                                                     │     B/op     │     B/op      vs base                │
TimeSeriesQuery/NoExemplars-11                         5.708Mi ± 0%   6.479Mi ± 0%  +13.50% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Minute/NoExemplars-11       5.724Mi ± 0%   6.484Mi ± 0%  +13.26% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11      5.718Mi ± 0%   6.485Mi ± 1%  +13.41% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11     5.719Mi ± 1%   6.485Mi ± 0%  +13.38% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Hour/NoExemplars-11         5.730Mi ± 1%   6.484Mi ± 0%  +13.17% (p=0.000 n=10)
TimeSeriesQuery/WithExemplars-11                                      6.878Mi ± 0%
TimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                    6.877Mi ± 0%
TimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                   6.876Mi ± 1%
TimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11                  6.867Mi ± 0%
TimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                      6.875Mi ± 0%
geomean                                                5.720Mi        6.676Mi       +13.35%

                                                     │   old.txt   │              new.txt               │
                                                     │  allocs/op  │  allocs/op   vs base               │
TimeSeriesQuery/NoExemplars-11                         11.45k ± 0%   11.29k ± 0%  -1.38% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Minute/NoExemplars-11       11.45k ± 0%   11.29k ± 0%  -1.39% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11      11.45k ± 0%   11.29k ± 0%  -1.38% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11     11.45k ± 0%   11.29k ± 0%  -1.39% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Hour/NoExemplars-11         11.45k ± 0%   11.29k ± 0%  -1.40% (p=0.000 n=10)
TimeSeriesQuery/WithExemplars-11                                     16.67k ± 0%
TimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                   16.67k ± 0%
TimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                  16.67k ± 0%
TimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11                 16.67k ± 0%
TimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                     16.67k ± 0%
geomean                                                11.45k        13.72k       -1.39%

Analysis:

✅ Time: No statistically significant regression (p > 0.05, shown as ~)
⚠️ Memory: +13.5% increase (~770KB on 6MB queries)
✅ Allocs: Slight improvement (-1.4%)

Potential memory trade-off explanation:

The refactored profileEntryIterator uses a flexible options pattern with:

Dynamic queryColumns slice (grows based on requested features)
Dynamic processor slice of closures
Column priority sorting logic

Part 2: Exemplar Feature Overhead (NoExemplars vs WithExemplars)

This section measures the additional cost of enabling exemplars on top of the refactored baseline.

Command Used:

#Run all timeseries benchmarks (base + time range variants)
go test -bench=BenchmarkTimeSeriesQuery -benchmem ./pkg/querybackend/

goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/querybackend
cpu: Apple M3 Pro
BenchmarkTimeSeriesQuery/NoExemplars-11             1870            607338 ns/op         6811441 B/op      11289 allocs/op
BenchmarkTimeSeriesQuery/WithExemplars-11                   1441            855327 ns/op         7220978 B/op     16670 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Minute/NoExemplars-11                   1789            657579 ns/op        6808980 B/op       11289 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                 1422            842359 ns/op        7206427 B/op       16669 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11                  1874            728544 ns/op        6800535 B/op       11289 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                1383            823404 ns/op        7201524 B/op       16669 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11                 1940            624539 ns/op        6802911 B/op       11290 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11               1492            930987 ns/op        7199970 B/op       16668 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Hour/NoExemplars-11                     1762            626126 ns/op        6807519 B/op       11290 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                   1461            825313 ns/op        7212226 B/op       16669 allocs/op
PASS
ok      github.com/grafana/pyroscope/pkg/querybackend   14.548s

Analysis:

⚠️ Time: +40.8% overhead (607µs → 855µs baseline, ~30-40% across all time ranges)
✅ Memory: +6.0% increase (~410KB on 6.5MB queries)
⚠️ Allocs: +47.7% increase (11,289 → 16,670 allocations)

Overhead explanation:

When exemplars are enabled, additional data must be fetched:

Profile IDs: UUID column read + conversion
All labels: Complete label set instead of just groupBy subset
Additional processing: Profile ID matching and exemplar construction

The allocation increase comes primarily from fetching full label sets and processing profile IDs for each matching profile.

Overhead consistency across time ranges:

Overhead remains consistent:

1 Minute: +28.0% time, +6% memory
5 Minutes: +13.0% time, +6% memory
15 Minutes: +49.0% time, +6% memory
1 Hour: +31.8% time, +6% memory

Summary

Changes to default path (NoExemplars)

✅ Time: No regression (p > 0.05 across all benchmarks)
⚠️ Memory: +13.5% (~770KB) due to options pattern refactoring
✅ Allocs: -1.4% (slight improvement)

Exemplar overhead

⚠️ Time: +30-40% overhead
✅ Memory: +6% (~410KB)
⚠️ Allocs: +48% (fetching additional data)

Key findings

✅ No time regression for users who don't enable exemplars
✅ Memory overhead is minimal when exemplars are enabled (+6%)
✅ Overhead scales linearly across time ranges (no performance cliffs)
⚠️ Allocation overhead (+48%) - exemplars fetch profile IDs and complete label sets (room for improvement)
⚠️ Time overhead (30-40%) (room for improvement)

Note: Exemplars feature will be opt-in. Users who don't request exemplars are unaffected by the time and allocation overhead. The overhead is inherent to fetching additional data (profile IDs and complete label sets), but future optimizations could reduce the impact if needed.

perf: Add benchmarks for timeseries query (exemplars) performance

68b2dc3

marcsanmi requested a review from aleks-p as a code owner December 4, 2025 16:18

marcsanmi requested a review from a team December 4, 2025 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Add benchmarks for timeseries query (exemplars) performance #4665

perf: Add benchmarks for timeseries query (exemplars) performance #4665

Uh oh!

marcsanmi commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

perf: Add benchmarks for timeseries query (exemplars) performance #4665

Are you sure you want to change the base?

perf: Add benchmarks for timeseries query (exemplars) performance #4665

Uh oh!

Conversation

marcsanmi commented Dec 4, 2025

Performance Results

Part 1: Refactoring Cost (NoExemplars vs weekly/f145)

Part 2: Exemplar Feature Overhead (NoExemplars vs WithExemplars)

Summary

Changes to default path (NoExemplars)

Exemplar overhead

Key findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant