Skip to content

Conversation

@mattsu2020
Copy link
Contributor

@mattsu2020 mattsu2020 commented Nov 17, 2025

fix #9264

…line tuning

Add support for filtering non-printing and non-dictionary characters in sort keys,
along with options to ignore case. Implement dynamic buffer size normalization
and pipeline depth tuning based on user settings and file size for improved performance.
Add new fields to LineData for caching filtered lines and UTF-8 data.
This improves sort accuracy and efficiency for large file sorting scenarios.
Use utf8_cache to retrieve precomputed UTF-8 strings in fast lexicographic mode, falling back to standard from_utf8 conversion if not available. This reduces redundant UTF-8 validations and improves performance for repeated comparisons.
…function

Format long chain method calls over multiple lines for better code clarity and maintainability.
…cations

Refactored LineData to use a single Vec<u8> for filtered_lines_data and a Vec<(usize, usize)> for ranges, instead of Vec<Vec<u8>>. Updated build_filtered_line to append_filtered_line_to for appending. This reduces per-line allocations and improves memory efficiency in sorting operations.
… impl

Use '_' instead of 'a' for the lifetime parameter in the impl block to simplify and modernize the code without changing behavior.
@github-actions
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch)

@codspeed-hq
Copy link

codspeed-hq bot commented Nov 17, 2025

CodSpeed Performance Report

Merging #9306 will degrade performance by 3.67%

Comparing mattsu2020:sort_fix_rebased (774fe29) with main (cc103ec)

Summary

❌ 1 regression
✅ 126 untouched
⏩ 6 skipped1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Efficiency
sort_numeric 23.9 ms 24.8 ms -3.67%

Footnotes

  1. 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

…ache

Remove the unused utf8_cache field from LineData struct and related code for caching UTF-8 strings during line parsing. This simplifies the lexicographic comparison logic in compare_by to always perform byte-level comparison, reducing code complexity and potential maintenance overhead without affecting sorting functionality. The previous cache was intended for faster lexical sorting but is no longer needed.
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

@mattsu2020 mattsu2020 marked this pull request as draft December 2, 2025 11:58
@mattsu2020 mattsu2020 marked this pull request as ready for review December 2, 2025 12:05
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)

@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/sort/sort-stale-thread-mem. tests/sort/sort-stale-thread-mem is passing on 'main'. Maybe you have to rebase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: uu-sort fails to reorder binary lines containing NUL bytes

1 participant