fix: fix grammar for while loops, for loops, if statements, do statements and add more tests to core grammar #19

hkimura-intersys · 2025-11-17T01:17:56Z

Overview

The main changes in this PR include:

apply changes to the grammar to fix issues that were caught and reported.
Add a test suite in the core grammar to test that all valid syntax forms are being parsed correctly for while loops, do statements, for loops (old and new, argumentless and argumentful), do statements, do while statements, if statements, and expressions.
Implement changes to get rid of warnings that were coming up in the cargo build step.
Add a NotProcedureBlock node to make it easier to implement go to def in potential language servers.

Fix for if statements:

Before, certain if statements were resulting in errors. For example If classname = "%Net.HttpRequest" would give us this resulting tree:

command_if [821, 8] - [823, 8]
                      command_name: keyword_if [821, 8] - [821, 10]
                      expression [821, 11] - [821, 20]
                        expr_atom [821, 11] - [821, 20]
                          lvn [821, 11] - [821, 20]
                            objectscript_identifier [821, 11] - [821, 20]
                      ERROR [821, 21] - [821, 40]
                        = [821, 21] - [821, 22]
                        " [821, 23] - [821, 24]
                        macro_arg [821, 24] - [821, 27]
                        . [821, 27] - [821, 28]
                        . [821, 31] - [821, 32]
                        " [821, 39] - [821, 40]
                      statement [821, 41] - [823, 8]
                        command_continue [821, 41] - [823, 8]
                          command_name: keyword_continue [821, 41] - [821, 49]

The cause of this was due to both the grammar and the scanner. I fixed this by removing two external tokens: whitespace_before_block and inline_statement_separator, as they would sometimes match one of the choices for an if command that we didn't actually want to match. Using the above example, the inline_statement_separator would be matched, and since that was a possibility in all of the if statement choices, it would match the wrong if statement, leading to an error.

The new command_if choices are structured as follows:

if the command is the old style command, it requires a termination at the end of the line. This follows objectscript syntax, which specifies that all commands on that same line are executed as part of the if statement. So, the grammar knows the if statement is complete if there is a termination (new line, end of file ,etc). Rather than the old version of the else command be its own node, it is now part of the old if statement. This gets rid of the ambiguity caused by the grammar trying to decide between the old else keyword and the new else keyword. Additionally, if the command is argumentless, it requires exactly two spaces. The external scanner handles looking for the argumentless_command_end, the termination and the immediate_single_whitespace_followed_by_non_whitespace. The immediate_single_whitespace_followed_by_non_whitespace token is needed for if the command has arguments.
For the block style version of if statements, the only change was removing the whitespace_before_block and the inline_statement_separator tokens.

Fix for For Loops

Some changes include:

now, just the word for is valid. This is allowed in objectscript.
before returning that something is a termination (aka old for loop style), the grammar scans through the whitespace and checks if the next token is a block, and instead returns argumentless_loop. So, termination is only used for old style, and argumentless_loop is used for block style.

Fix for while loops

The only change here is that the _immediate_single_whitespace_followed_by_non_whitespace is no longer optional. For while loops in objectscript, this syntax is not optional, it is required.

Fix for do statements

The main changes for do statements include adding a post_conditional node as an option for the do_parameter as well as making the indirection an expression rather than just a glvn. Additionally, routine_ref didn't include all valid options, notably something like DO ^|"SAMPLES"|fibonacci. I adjusted the logic to allow for this.

I had to add a post_conditional as an option, because in objectscript, this kind of syntax is valid:
DO:F>0 A:F=1,B:F=2,C:((F'=1)&&(F'=2))

I had to switch the indirection to an expression because having it as a glvn didn't cover cases such as DO @("Item"_num)^Menu.

Additionally, do statements with the legacy dotted statements was not working previously. I adjusted the scanner and grammar so this now works.

Lastly, I made it so things like do{}while x are valid, where there is no space do and {.

Open Statement Adjustments

Before, valid open commands were erroring:

I adjusted the grammar to make these kinds of open commands work. I also added tests to check for all valid types of open statements:

Lock Statement Adjustments

Before, lock statements such as LOCK +(var1(1),var2(1)):10 were incorrectly being parsed as the first variant, which isn't a list because the grammar for the second. variant didn't allow + before (. This caused tree-sitter test to fail even though it looked fine on the playground and while parsing. I adjusted the grammar so this works with the second variant. Additionally, this PR adds unit tests for the different variants of valid lock statements.

Write Statement Adjustments

Before, valid write statements such as WRITE "Hello",!?5,"world" and WRITE ##class(Patient)patient.Doctor.Hospital.Name were being marked as invalid, The grammar is now adjusted to account for these valid statements.

Else Statement Adjustments

The old else statement is valid on its own now. Additionally, this was fixed for certain edge cases. Additionally, tests were added to check for both valid and invalid else statements.

Kill Statement Adjustments

Before, kill was marking valid statements such as KILL myarray.%Get(index).md(subscript) as invalid.

The reason this isn't working is because syntactically there is nothing that distinguishes a multidimensional property from a method. This actually made it so the set statements were saying set myarray.%Get(index).md(subscript) = 2 was invalid as well, when it isn't actually. Since we can't distinguish between a multidimensional property and an oref method syntactically, we will have to do that semantically. So, I had to adjust the oref_set_target to be more permissive.

Now it is working:

Fix for Set Statements (and indirectly, write statements with certain indirections)

There were two big fixes for set statements. First, this kind of thing must be allowed, because in certain scenarios it is valid: set @b.

Additionally, valid indirections were not being allowed because subscripts were not supported, for example:

To fix this, I defined indirected_glvn, which allows the subscript at the end of a glvn. Now, these types of indirections work:

Indirection fix

valid indirections such as SET LINETEXT = $TEXT(@$PIECE(LINE,"^",1)^@$PIECE(LINE,"^",2)) were being parsed as errors. To fix this, I added a node that defined TEXT specifically, rather than the generic version.

Scanner Adjustments

I simplified the scanner logic for finding if there was an argumentless_command_end, _argumentless_loop, or _immediate_single_whitespace_followed_by_non_whitespace. I removed unnecessary logic that was duplicated and adjusted a few clauses, as before the scanner was incorrectly returning that something was an argumentless_command_end. Additionally, the scanner now provides a _termination token, which is used in the old style version of for loops and if statements.

Core Grammar Test Suite

I used all examples on the InterSystems documentation for ObjectScript to test that we are correctly parsing while loops, do statements, for loops (old and new, argumentless and argumentful), do statements, do while statements, if statements, and expressions.

Highlighting Issues

While most highlighting issues were caused by errors in the parse tree (now fixed), one issue was caused by something else. For commands like new $NAMESPACE, although namespace is a system defined variable, it wasn't classified as one in the command_new logic. Now, this could be fixed by just highlighting an entire command_new node, but what we really want is for all $NAMESPACE occurrences to be highlighted the same color. To make sure that all $NAMESPACE are colored the same, I made a namespace token that could be referenced. I did this for estack, etrap, and roles as well.

TESTING:

Tested all valid variants of the different parts of the core grammar that I changed:

Verified that highlighting is now working on bugs that were found before:

Tested highlighting for estack, etrap, and roles:

Tested highlighting for do statements:

Tested highlighting for do while statements:

Tested highlighting for if statements:

Tested highlighting in for loops:

Tested highlighting of different valid expressions:

The tree-sitter grammars for injection languages is public and easy to get. I added instructions on how to include those grammars in one's project if they are developing in Rust.

The AST has changed following the numeric_literal change, which allowed negative values. Updating the expected test result with the new AST so tests still pass.

Right now, syntax highlighting only works on a class that extends something else. However classes that don't extend anything are also valid classes. This change will include those classes in syntax highlighting as well.

…ghlighting

hkimura-intersys added 17 commits October 29, 2025 12:56

docs(README): add instructions to get injection grammars.

3e4388e

The tree-sitter grammars for injection languages is public and easy to get. I added instructions on how to include those grammars in one's project if they are developing in Rust.

feat: update Cargo.toml with required fields for rust crates.io

f3935c8

feat: update code to address any warnings during cargo build

bbdad1f

fix(tests): update expr test with new AST.

21e4f5f

The AST has changed following the numeric_literal change, which allowed negative values. Updating the expected test result with the new AST so tests still pass.

fix(queries): fix class highlights so results are exhaustive.

594fbb5

Right now, syntax highlighting only works on a class that extends something else. However classes that don't extend anything are also valid classes. This change will include those classes in syntax highlighting as well.

feat(cargo): update read me for rust crate and bump version

fa62b36

feat(rust): make all constants consistent and bump version

13ad86c

feat: separate ProcedureBlock and Not ProcedureBlock as keywords

a4a1f3b

feat: update kw_notprocedureblock to work for methods as well

b75aa1f

bump version to publish changes

d1c7674

if statements now working

be53530

fix: adjust for loop and do command

cf98fc3

fix all cases for do and add testing

14c99c5

fix: fixed for loop, while loop, do while loop, if statements, and hi…

74a7ddd

…ghlighting

remove inline_statement_separator from scanner

12cec67

Merge remote-tracking branch 'upstream/main'

fb35674

merge main into branch

9d118ae

hkimura-intersys self-assigned this Nov 17, 2025

hkimura-intersys changed the title ~~fix:~~ fix: fix grammar for while loops, for loops, if statements, do statements and add more tests to core grammar Nov 17, 2025

hkimura-intersys added 5 commits November 16, 2025 21:26

fix highlighting for in new command

113df2a

chore: bump cargo version

f99d669

remove unused files

cbc6b8e

fix: make publiclist work for multiple variables

db576f3

fix(grammar): loops fixed, now allows for, else, if on their own

3e26c36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: fix grammar for while loops, for loops, if statements, do statements and add more tests to core grammar #19

fix: fix grammar for while loops, for loops, if statements, do statements and add more tests to core grammar #19

Uh oh!

hkimura-intersys commented Nov 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: fix grammar for while loops, for loops, if statements, do statements and add more tests to core grammar #19

Are you sure you want to change the base?

fix: fix grammar for while loops, for loops, if statements, do statements and add more tests to core grammar #19

Uh oh!

Conversation

hkimura-intersys commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Fix for if statements:

Fix for For Loops

Fix for while loops

Fix for do statements

Open Statement Adjustments

Lock Statement Adjustments

Write Statement Adjustments

Else Statement Adjustments

Kill Statement Adjustments

Fix for Set Statements (and indirectly, write statements with certain indirections)

Indirection fix

Scanner Adjustments

Core Grammar Test Suite

Highlighting Issues

TESTING:

Tested all valid variants of the different parts of the core grammar that I changed:

Verified that highlighting is now working on bugs that were found before:

Tested highlighting for estack, etrap, and roles:

Tested highlighting for do statements:

Tested highlighting for do while statements:

Tested highlighting for if statements:

Tested highlighting in for loops:

Tested highlighting of different valid expressions:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hkimura-intersys commented Nov 17, 2025 •

edited

Loading