feat: Add punctuation and math symbol tokenization #170
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements automatic tokenization of punctuation and math symbols during parsing, as requested in issue #148. The parser now separates these symbols from adjacent characters, making them individual references in Links Notation.
Key Features
Punctuation tokenization: Symbols like
,,.,;,!,?are separated from words1, 2 and 3→["1", ",", "2", "and", "3"]hello, world→["hello", ",", "world"]Math symbol tokenization: Symbols like
+,-,*,/,=are separated only when between digits1+1→["1", "+", "1"]10-20→["10", "-", "20"]Hyphenated words preserved: Words with hyphens between letters are kept intact
Jean-Luc Picard→["Jean-Luc", "Picard"]conan-center-index→["conan-center-index"]Quoted strings preserved: Content inside quotes is not tokenized
"1,2,3"→["1,2,3"]"hello, world"→["hello, world"]Base64 preserved: Strings like
bmFtZQ==are kept intactCompact formatting: New option to restore human-readable output without spaces around symbols
Backward Compatibility
Set
tokenizeSymbols: false(JS/Python) or useparse_lino_raw()(Rust) to disable tokenization.Test Plan
Version Bump
Updated all implementations to version 0.13.0.
Fixes #148
🤖 Generated with Claude Code