-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix invalid UTF-8 encoding #2094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: docs
Are you sure you want to change the base?
Conversation
|
All these changes need to be confirmed manually, and there are so many the tool goes too slowly. It's unlikely I will have time to approve these. |
|
Note that this was an automated replacement. If you prefer to verify it, you can create a script which simply replaces the specified bytes and verify the result. |
|
Every correct to the docs needs to be manually verified by policy. We can't just accept bulk updates. |
|
I understand. Let me know if there's anything else I can do to help with this, like providing a Python script which does the replacements. |
|
Is there a way to focus on the most egregious errors, and then break them up into smaller PRs? |
|
I don't think there are more or less egregious errors here. There are just many files which aren't valid UTF-8, which is the format everyone expects nowadays. I don't think going over manually is the best way to handle it, but if you have to and if it will help, I can split it to chunks and open several PRs. |
-------------------------------------------------- Hex Count Percentage Character -------------------------------------------------- a0 1648 98.6% ' ' 96 18 1.1% '–' ae 4 0.2% '®' 97 2 0.1% '—' --------------------------------------------------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes invalid UTF-8 encoding issues across documentation files. The changes primarily replace corrupted Unicode characters (replacement characters, non-breaking spaces, and Windows-1252 encoded characters) with their proper UTF-8 equivalents, ensuring correct display of special characters in documentation.
- Replaced Unicode replacement character (�) with proper space character in Windows version requirements
- Fixed en-dash (–) and em-dash (—) characters that were incorrectly encoded
- Corrected non-breaking spaces and other whitespace issues
- Fixed note/alert formatting with proper spacing
Reviewed changes
Copilot reviewed 300 out of 1291 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| dvbsiparser/*.md | Fixed Windows 7 version references with corrupted replacement characters |
| dvbsiparser/nf--geteit.md | Corrected hex range formatting (0x50-0x5F, 0x60-0x6F) with proper en-dashes |
| dvbsiparser/nf--idvblogicalchannel.md | Fixed note formatting with proper spaces replacing non-breaking spaces |
| bdatif/*.md | Corrected spacing in tables and note sections |
| bdaiface/*.md | Fixed Windows version references and note formatting |
| bdaiface/nf-*-put_enablediseqcommands.md | Fixed en-dash in time range (100–300 milliseconds) |
| bdaiface/nf--signalproperties.md | Corrected note formatting and spacing |
| atscpsipparser/*.md | Fixed Windows Vista/7 version references and em-dash in ISO standard reference |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Note that 85 files also contain the Replacement Character (�). These files weren't fixed, they have to be fixed manually since the character can't be recovered automatically.
Edit: See #2095.