Skip to content

Conversation

@BenMatase
Copy link

@BenMatase BenMatase commented Sep 30, 2025

Added a flag --no-tls to the mach rpm entrypoint and forced download/upload/latency requests to not go over TLS.

The reason for this change: We are seeing high cpu usage for mach's rpm test. We compared it to the ookla speedtest cli tool and mach was using significantly more cpu.

I instrumented mach and saw that a lot of the usage is in crypto operations. I also did some pcaps on the ookla tool and saw that they were doing bare HTTP traffic. This seems appropriate since the data that is being transferred isn't sensitive.

tls-flamegraph (8)

After the changes, you can see all of the crypto calls are now gone and we can see better results. The AIM reporting still will use HTTPS.

http-flamegraph

Results on a 2 core aws instance:

  • no tls: ~7Gbps (I saw one 10Gbps)
  • tls: ~5Gbps

Also cpu usage is better:

  • no tls: maxing out around 70%
  • tls: maxing out around 100%

The cpu is not as great as an improvement but when you consider that it is doing higher speeds (and is processing more packets), it is better than the numbers suggest. If the network was the bottleneck, I'd suspect the cpu usage would have a better improvement.

A couple of observations from making this change:

  1. Just moving to http/port 80 was not enough. h2 was having issue with the http endpoint so had to force h1
  2. h1 expects more headers in general and some other tweaks to work e2e with the default urls
  3. the aim report seems to only be https which makes sense considering it is actually transporting interesting data that we would want to secure
  4. uploads (POST) seemed to not like the really large uploads when doing just http. Had to limit the size of the uplaods to 16Mb. It seemed that the upload connection was being broken early in this case when too large. Probably endpoints didn't like someone uploading huge files over http.
  5. The loaded latency measurements in http mode were having issues getting decent numbers. I believe it was because the connection was being overwhelmed and there is no multiplexing in http 1.1, so made a scheme where it would use finished connections

@kornelski
Copy link
Collaborator

Thanks for the PR.

I think having more control over the protocol is fine, but this change is not just removing TLS, but also downgrades H/2 to H/1. Could that affect measurements? Maybe it'd be better to make these separate options?

max_loaded_connections: 16,
interval_duration_ms: 1000, // 1s
test_duration_ms: 12_000, // 12s
disable_aim_scores: false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having both no- and disable- is inconsistent. no- is more typical for args, so maybe rename the other arg to no-aim-scores and add disable-aim-scores as an alias for backwards compatibility?

let result = run_test(&LatencyConfig {
url: url.parse()?,
runs,
no_tls: false,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use tls: true.

--no-tls makes sense as an arg, but internally working with negated names is adding cognitive overhead.

Some(v) => v,
None => {
tracing::warn!("no loaded latency measurements; defaulting to 0ms");
0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reporting a wrong value is not desirable. Could you keep it as an error? Or make the field an Option?

no_tls,
));

let conn_type = if no_tls { ConnectionType::H1 } else { ConnectionType::H2 };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unencrypted H/2 theoretically exists, so this downgrade isn't obvious. Maybe the no_tls option could have a different name? or tls and h2 have their own options?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having trouble with unencrypted http2 working with the cloudflare endopints. I'll try again, but @fisherdarling has agreed that might be the case

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely expect that unencrypted H/2 will be difficult to use.

I'm mainly worried about H/1 vs H/2+TLS changing two variables at once, from UI perspective. In case somebody wanted to compare TLS overhead, it isn't apples to apples comparison.

So the toggle could be just clearer about what changes, maybe --plain-h1.

Or if the two could be toggled separately, H/1+TLS would be a possibile option.

let conn_type = if no_tls { ConnectionType::H1 } else { ConnectionType::H2 };
let response = Client::default()
.new_connection(ConnectionType::H2)
.plain_http_mode(no_tls)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plain_http_mode and no_tls mix terminology

ConnectionType::H1 => b"\x08http/1.1",
ConnectionType::H2 => b"\x02h2",
ConnectionType::H3 => b"\x02h3",
ConnectionType::H2 => b"\x02h2\x08http/1.1",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this affect existing users not using --no-tls?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that is a good point. I added this when I was doing some testing. Let me see if I can remove this

.ssl()
.selected_alpn_protocol()
.map(|p| String::from_utf8_lossy(p).to_string())
.unwrap_or_else(|| "<none>".to_string());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it's doesn't make a noticeable difference, but just FYI you can avoid .to_string() in such cases if you first assign the Option<String> or Option<Cow> version to a variable, then use .as_deref(), and you'll be able to have &str fallback.

or you could use std::str::from_utf8(p).unwrap_or("<invalid>") and not allocate String at all.

let scheme = uri_scheme.as_deref();
let default_port = match scheme { Some("https") => 443, _ => 80 };
let need_port = uri_port.is_some() && uri_port.unwrap() != default_port;
let host_value = if need_port { format!("{}:{}", host_hdr, uri_port.unwrap()) } else { host_hdr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current versions of Rust automagically make this work: if need_port { &format!(…) } else { host_hdr }

.loads
.iter()
.filter(|l| l.finished_at.is_some())
.collect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ThroughputClient::download().plain_http_mode(true)
.with_connection(conn)
.send(
self.config.small_download_url.as_str().parse()?,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a lot of copy-paste. Could you factor out download() and send() out of the conditions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants