Skip to content

Conversation

@suchapalaver
Copy link
Collaborator

This PR improves Prometheus metrics handling in tap-agent with two fixes:

  • Eagerly initialize metrics at startup: Metrics defined with LazyLock are now initialized before the metrics server starts, ensuring /metrics returns all tap-agent metrics even when no senders have escrow accounts or pending allocations
  • Clean up gauge metrics when actors stop: Sender and allocation-level gauges are now properly removed when actors stop, preventing stale values from accumulating in Prometheus

Joseph Livesey joseph@semiotic.ai

   Prometheus metrics in tap-agent are defined using LazyLock,
   which means
   they are only registered with the Prometheus registry when
   first
   accessed. If no senders have escrow accounts or pending
   allocations at
   startup, no SenderAccount actors are spawned, the metrics are
    never
   accessed, and /metrics returns empty.

   This fix creates init_metrics() functions that force all
   LazyLock metric
   statics to initialize at startup, ensuring they are
   registered with
   Prometheus before any sender activity occurs.

   Changes:
   - Make all metric static definitions pub(crate) in
   sender_account.rs,
     sender_allocation.rs, and sender_accounts_manager.rs
   - Add init_metrics() function to each module that
   dereferences each
     LazyLock to force initialization
   - Add public init_metrics() to agent.rs that calls all
   module-level
     init_metrics functions
   - Call agent::init_metrics() in main.rs before spawning the
   metrics
     server
   - Add test to verify all metrics are properly registered
   Clean up sender and allocation-level gauge metrics when
   actors stop to
   prevent stale values from accumulating in Prometheus.

   Changes:
   - Clean up UNAGGREGATED_FEES_BY_VERSION gauge on allocation
   stop
   - Clean up INVALID_RECEIPT_FEES gauge on allocation stop
   - Clean up sender-level gauges (SENDER_DENIED,
   ESCROW_BALANCE,
     SENDER_FEE_TRACKER, MAX_FEE_PER_SENDER,
   RAV_REQUEST_TRIGGER_VALUE)
     in SenderAccount post_stop
   - Add tests to verify metric cleanup behavior
@suchapalaver suchapalaver changed the title Fix/tap agent/eagerly initialize metrics at startup fix(tap-agent): eagerly initialize metrics at startup Dec 6, 2025
@coveralls
Copy link

Pull Request Test Coverage Report for Build 19990119371

Details

  • 218 of 246 (88.62%) changed or added relevant lines in 5 files are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.4%) to 63.982%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/tap-agent/src/main.rs 0 2 0.0%
crates/tap-agent/src/agent/sender_account.rs 145 153 94.77%
crates/tap-agent/src/agent.rs 53 71 74.65%
Files with Coverage Reduction New Missed Lines %
crates/watcher/src/lib.rs 3 85.42%
Totals Coverage Status
Change from base Build 19943819085: 0.4%
Covered Lines: 9884
Relevant Lines: 15448

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants