Skip to content

Conversation

@darunrs
Copy link
Collaborator

@darunrs darunrs commented Nov 12, 2024

We were previously incrementing S3 request count in all cases. However, this was misleading as the metrics indicated a significantly higher amount of requests than expected, and the metric was being used as a proxy for cost of the service. Research indicated that many requests were in fact failing due to a dispatch error, shown below.

GetObjectBytesError(DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("dns error", Custom { kind: Uncategorized, error: "failed to lookup address information: nodename nor servname provided, or not known" })), connection: Unknown } }))

In order to more accurately gauge actual S3 usage by the service, I've changed the increment counter to occur below the S3 get request, so that only successful requests get counted. However, it was observed that this metric is somehow still inaccurate, reporting an inconsistent number compared to the actual requests made, which were verified through cache size and logs. Regardless, this should still be an improvement to the metric.

@darunrs darunrs requested a review from a team as a code owner November 12, 2024 21:44
@darunrs darunrs merged commit 7679e0d into main Nov 12, 2024
4 checks passed
@darunrs darunrs deleted the fix-cache-misses branch November 12, 2024 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants