Skip to content

Conversation

@marcsanmi
Copy link
Contributor

Adds benchmarks to measure and validate the performance of timeseries queries, particularly focusing on the exemplar collection overhead introduced in #4615.

Performance Results

Part 1: Refactoring Cost (NoExemplars vs weekly/f145)

We refactored profileEntryIterator to use a flexible options pattern. This section measures the cost of that refactoring even when exemplars are disabled.

Comparison Setup:

  • Baseline: weekly/f145 branch (old simple implementation, before exemplar PR)
  • Current: This branch with NoExemplars (new options pattern, exemplars disabled)

Commands Used:

# On weekly/f145 branch - copied test file and removed WithExemplars variant to make the benchmarks to work
go test -bench=BenchmarkTimeSeriesQuery -benchmem -count=10 ./pkg/querybackend/ > old.txt

# On exemplar branch with NoExemplars
git checkout marcsanmi/exemplars-benchmarks
go test -bench=BenchmarkTimeSeriesQuery -benchmem -count=10 ./pkg/querybackend/ > new.txt

# Compare
benchstat old.txt new.txt

goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/querybackend
cpu: Apple M3 Pro
                                                     │   old.txt    │               new.txt               │
                                                     │    sec/op    │    sec/op     vs base               │
TimeSeriesQuery/NoExemplars-11                         745.1µ ± 14%   753.3µ ± 12%       ~ (p=0.739 n=10)
TimeSeriesQuery_TimeRange/1Minute/NoExemplars-11       614.3µ ±  5%   620.5µ ±  4%       ~ (p=0.481 n=10)
TimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11      639.6µ ± 21%   639.3µ ± 15%       ~ (p=0.853 n=10)
TimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11     653.4µ ± 52%   655.6µ ± 12%       ~ (p=0.971 n=10)
TimeSeriesQuery_TimeRange/1Hour/NoExemplars-11         629.3µ ± 54%   631.6µ ± 17%       ~ (p=0.631 n=10)
TimeSeriesQuery/WithExemplars-11                                      831.5µ ± 22%
TimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                    831.5µ ±  4%
TimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                   848.3µ ± 91%
TimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11                  840.0µ ± 23%
TimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                      836.2µ ± 22%
geomean                                                654.8µ         742.6µ        +0.55%

                                                     │   old.txt    │               new.txt                │
                                                     │     B/op     │     B/op      vs base                │
TimeSeriesQuery/NoExemplars-11                         5.708Mi ± 0%   6.479Mi ± 0%  +13.50% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Minute/NoExemplars-11       5.724Mi ± 0%   6.484Mi ± 0%  +13.26% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11      5.718Mi ± 0%   6.485Mi ± 1%  +13.41% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11     5.719Mi ± 1%   6.485Mi ± 0%  +13.38% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Hour/NoExemplars-11         5.730Mi ± 1%   6.484Mi ± 0%  +13.17% (p=0.000 n=10)
TimeSeriesQuery/WithExemplars-11                                      6.878Mi ± 0%
TimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                    6.877Mi ± 0%
TimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                   6.876Mi ± 1%
TimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11                  6.867Mi ± 0%
TimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                      6.875Mi ± 0%
geomean                                                5.720Mi        6.676Mi       +13.35%

                                                     │   old.txt   │              new.txt               │
                                                     │  allocs/op  │  allocs/op   vs base               │
TimeSeriesQuery/NoExemplars-11                         11.45k ± 0%   11.29k ± 0%  -1.38% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Minute/NoExemplars-11       11.45k ± 0%   11.29k ± 0%  -1.39% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11      11.45k ± 0%   11.29k ± 0%  -1.38% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11     11.45k ± 0%   11.29k ± 0%  -1.39% (p=0.000 n=10)
TimeSeriesQuery_TimeRange/1Hour/NoExemplars-11         11.45k ± 0%   11.29k ± 0%  -1.40% (p=0.000 n=10)
TimeSeriesQuery/WithExemplars-11                                     16.67k ± 0%
TimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                   16.67k ± 0%
TimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                  16.67k ± 0%
TimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11                 16.67k ± 0%
TimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                     16.67k ± 0%
geomean                                                11.45k        13.72k       -1.39%

Analysis:

  • ✅ Time: No statistically significant regression (p > 0.05, shown as ~)
  • ⚠️ Memory: +13.5% increase (~770KB on 6MB queries)
  • ✅ Allocs: Slight improvement (-1.4%)

Potential memory trade-off explanation:

The refactored profileEntryIterator uses a flexible options pattern with:

  • Dynamic queryColumns slice (grows based on requested features)
  • Dynamic processor slice of closures
  • Column priority sorting logic

Part 2: Exemplar Feature Overhead (NoExemplars vs WithExemplars)

This section measures the additional cost of enabling exemplars on top of the refactored baseline.

Command Used:

#Run all timeseries benchmarks (base + time range variants)
go test -bench=BenchmarkTimeSeriesQuery -benchmem ./pkg/querybackend/

goos: darwin
goarch: arm64
pkg: github.com/grafana/pyroscope/pkg/querybackend
cpu: Apple M3 Pro
BenchmarkTimeSeriesQuery/NoExemplars-11             1870            607338 ns/op         6811441 B/op      11289 allocs/op
BenchmarkTimeSeriesQuery/WithExemplars-11                   1441            855327 ns/op         7220978 B/op     16670 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Minute/NoExemplars-11                   1789            657579 ns/op        6808980 B/op       11289 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Minute/WithExemplars-11                 1422            842359 ns/op        7206427 B/op       16669 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/5Minutes/NoExemplars-11                  1874            728544 ns/op        6800535 B/op       11289 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/5Minutes/WithExemplars-11                1383            823404 ns/op        7201524 B/op       16669 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/15Minutes/NoExemplars-11                 1940            624539 ns/op        6802911 B/op       11290 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/15Minutes/WithExemplars-11               1492            930987 ns/op        7199970 B/op       16668 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Hour/NoExemplars-11                     1762            626126 ns/op        6807519 B/op       11290 allocs/op
BenchmarkTimeSeriesQuery_TimeRange/1Hour/WithExemplars-11                   1461            825313 ns/op        7212226 B/op       16669 allocs/op
PASS
ok      github.com/grafana/pyroscope/pkg/querybackend   14.548s

Analysis:

  • ⚠️ Time: +40.8% overhead (607µs → 855µs baseline, ~30-40% across all time ranges)
  • ✅ Memory: +6.0% increase (~410KB on 6.5MB queries)
  • ⚠️ Allocs: +47.7% increase (11,289 → 16,670 allocations)

Overhead explanation:

When exemplars are enabled, additional data must be fetched:

  • Profile IDs: UUID column read + conversion
  • All labels: Complete label set instead of just groupBy subset
  • Additional processing: Profile ID matching and exemplar construction

The allocation increase comes primarily from fetching full label sets and processing profile IDs for each matching profile.

Overhead consistency across time ranges:

Overhead remains consistent:

  • 1 Minute: +28.0% time, +6% memory
  • 5 Minutes: +13.0% time, +6% memory
  • 15 Minutes: +49.0% time, +6% memory
  • 1 Hour: +31.8% time, +6% memory

Summary

Changes to default path (NoExemplars)

  • Time: No regression (p > 0.05 across all benchmarks)
  • ⚠️ Memory: +13.5% (~770KB) due to options pattern refactoring
  • Allocs: -1.4% (slight improvement)

Exemplar overhead

  • ⚠️ Time: +30-40% overhead
  • Memory: +6% (~410KB)
  • ⚠️ Allocs: +48% (fetching additional data)

Key findings

  • ✅ No time regression for users who don't enable exemplars
  • ✅ Memory overhead is minimal when exemplars are enabled (+6%)
  • ✅ Overhead scales linearly across time ranges (no performance cliffs)
  • ⚠️ Allocation overhead (+48%) - exemplars fetch profile IDs and complete label sets (room for improvement)
  • ⚠️ Time overhead (30-40%) (room for improvement)

Note: Exemplars feature will be opt-in. Users who don't request exemplars are unaffected by the time and allocation overhead. The overhead is inherent to fetching additional data (profile IDs and complete label sets), but future optimizations could reduce the impact if needed.

@marcsanmi marcsanmi requested a review from aleks-p as a code owner December 4, 2025 16:18
@marcsanmi marcsanmi requested a review from a team December 4, 2025 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant