-
Notifications
You must be signed in to change notification settings - Fork 3
fix: fix grammar for while loops, for loops, if statements, do statements and add more tests to core grammar #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hkimura-intersys
wants to merge
22
commits into
intersystems:main
Choose a base branch
from
hkimura-intersys:test
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The tree-sitter grammars for injection languages is public and easy to get. I added instructions on how to include those grammars in one's project if they are developing in Rust.
The AST has changed following the numeric_literal change, which allowed negative values. Updating the expected test result with the new AST so tests still pass.
Right now, syntax highlighting only works on a class that extends something else. However classes that don't extend anything are also valid classes. This change will include those classes in syntax highlighting as well.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
The main changes in this PR include:
cargo buildstep.Fix for if statements:
Before, certain if statements were resulting in errors. For example
If classname = "%Net.HttpRequest"would give us this resulting tree:The cause of this was due to both the grammar and the scanner. I fixed this by removing two external tokens:
whitespace_before_blockandinline_statement_separator, as they would sometimes match one of the choices for an if command that we didn't actually want to match. Using the above example, the inline_statement_separator would be matched, and since that was a possibility in all of the if statement choices, it would match the wrong if statement, leading to an error.The new command_if choices are structured as follows:
argumentless_command_end, theterminationand theimmediate_single_whitespace_followed_by_non_whitespace. Theimmediate_single_whitespace_followed_by_non_whitespacetoken is needed for if the command has arguments.Fix for For Loops
Some changes include:
foris valid. This is allowed in objectscript.Fix for while loops
The only change here is that the
_immediate_single_whitespace_followed_by_non_whitespaceis no longer optional. For while loops in objectscript, this syntax is not optional, it is required.Fix for do statements
The main changes for do statements include adding a
post_conditionalnode as an option for the do_parameter as well as making theindirectionan expression rather than just a glvn. Additionally,routine_refdidn't include all valid options, notably something likeDO ^|"SAMPLES"|fibonacci. I adjusted the logic to allow for this.I had to add a
post_conditionalas an option, because in objectscript, this kind of syntax is valid:DO:F>0 A:F=1,B:F=2,C:((F'=1)&&(F'=2))I had to switch the
indirectionto an expression because having it as a glvn didn't cover cases such asDO @("Item"_num)^Menu.Additionally, do statements with the legacy dotted statements was not working previously. I adjusted the scanner and grammar so this now works.
Lastly, I made it so things like
do{}while xare valid, where there is no spacedoand{.Open Statement Adjustments
Before, valid open commands were erroring:

I adjusted the grammar to make these kinds of open commands work. I also added tests to check for all valid types of open statements:

Lock Statement Adjustments
Before, lock statements such as
LOCK +(var1(1),var2(1)):10were incorrectly being parsed as the first variant, which isn't a list because the grammar for the second. variant didn't allow + before (. This caused tree-sitter test to fail even though it looked fine on the playground and while parsing. I adjusted the grammar so this works with the second variant. Additionally, this PR adds unit tests for the different variants of valid lock statements.Write Statement Adjustments
Before, valid write statements such as


WRITE "Hello",!?5,"world"and WRITE ##class(Patient)patient.Doctor.Hospital.Name were being marked as invalid, The grammar is now adjusted to account for these valid statements.Else Statement Adjustments
The old else statement is valid on its own now. Additionally, this was fixed for certain edge cases. Additionally, tests were added to check for both valid and invalid else statements.
Kill Statement Adjustments
Before, kill was marking valid statements such as

KILL myarray.%Get(index).md(subscript)as invalid.The reason this isn't working is because syntactically there is nothing that distinguishes a multidimensional property from a method. This actually made it so the set statements were saying set myarray.%Get(index).md(subscript) = 2 was invalid as well, when it isn't actually. Since we can't distinguish between a multidimensional property and an oref method syntactically, we will have to do that semantically. So, I had to adjust the oref_set_target to be more permissive.
Now it is working:

Fix for Set Statements (and indirectly, write statements with certain indirections)
There were two big fixes for set statements. First, this kind of thing must be allowed, because in certain scenarios it is valid:

set @b.Additionally, valid indirections were not being allowed because subscripts were not supported, for example:

To fix this, I defined

indirected_glvn, which allows the subscript at the end of a glvn. Now, these types of indirections work:Indirection fix
valid indirections such as SET LINETEXT = $TEXT(@$PIECE(LINE,"^",1)^@$PIECE(LINE,"^",2)) were being parsed as errors. To fix this, I added a node that defined TEXT specifically, rather than the generic version.
Scanner Adjustments
I simplified the scanner logic for finding if there was an
argumentless_command_end,_argumentless_loop, or_immediate_single_whitespace_followed_by_non_whitespace. I removed unnecessary logic that was duplicated and adjusted a few clauses, as before the scanner was incorrectly returning that something was an argumentless_command_end. Additionally, the scanner now provides a_terminationtoken, which is used in the old style version of for loops and if statements.Core Grammar Test Suite
I used all examples on the InterSystems documentation for ObjectScript to test that we are correctly parsing while loops, do statements, for loops (old and new, argumentless and argumentful), do statements, do while statements, if statements, and expressions.
Highlighting Issues
While most highlighting issues were caused by errors in the parse tree (now fixed), one issue was caused by something else. For commands like
new $NAMESPACE, although namespace is a system defined variable, it wasn't classified as one in the command_new logic. Now, this could be fixed by just highlighting an entire command_new node, but what we really want is for all $NAMESPACE occurrences to be highlighted the same color. To make sure that all $NAMESPACE are colored the same, I made a namespace token that could be referenced. I did this for estack, etrap, and roles as well.TESTING:
Tested all valid variants of the different parts of the core grammar that I changed:
Verified that highlighting is now working on bugs that were found before:
Tested highlighting for estack, etrap, and roles:
Tested highlighting for do statements:
Tested highlighting for do while statements:
Tested highlighting for if statements:
Tested highlighting in for loops:
Tested highlighting of different valid expressions: