Skip to content

Conversation

@hkimura-intersys
Copy link
Collaborator

@hkimura-intersys hkimura-intersys commented Nov 17, 2025

Overview

The main changes in this PR include:

  1. apply changes to the grammar to fix issues that were caught and reported.
  2. Add a test suite in the core grammar to test that all valid syntax forms are being parsed correctly for while loops, do statements, for loops (old and new, argumentless and argumentful), do statements, do while statements, if statements, and expressions.
  3. Implement changes to get rid of warnings that were coming up in the cargo build step.
  4. Add a NotProcedureBlock node to make it easier to implement go to def in potential language servers.

Fix for if statements:

Before, certain if statements were resulting in errors. For example If classname = "%Net.HttpRequest" would give us this resulting tree:

command_if [821, 8] - [823, 8]
                      command_name: keyword_if [821, 8] - [821, 10]
                      expression [821, 11] - [821, 20]
                        expr_atom [821, 11] - [821, 20]
                          lvn [821, 11] - [821, 20]
                            objectscript_identifier [821, 11] - [821, 20]
                      ERROR [821, 21] - [821, 40]
                        = [821, 21] - [821, 22]
                        " [821, 23] - [821, 24]
                        macro_arg [821, 24] - [821, 27]
                        . [821, 27] - [821, 28]
                        . [821, 31] - [821, 32]
                        " [821, 39] - [821, 40]
                      statement [821, 41] - [823, 8]
                        command_continue [821, 41] - [823, 8]
                          command_name: keyword_continue [821, 41] - [821, 49]

The cause of this was due to both the grammar and the scanner. I fixed this by removing two external tokens: whitespace_before_block and inline_statement_separator, as they would sometimes match one of the choices for an if command that we didn't actually want to match. Using the above example, the inline_statement_separator would be matched, and since that was a possibility in all of the if statement choices, it would match the wrong if statement, leading to an error.

The new command_if choices are structured as follows:

  1. if the command is the old style command, it requires a termination at the end of the line. This follows objectscript syntax, which specifies that all commands on that same line are executed as part of the if statement. So, the grammar knows the if statement is complete if there is a termination (new line, end of file ,etc). Rather than the old version of the else command be its own node, it is now part of the old if statement. This gets rid of the ambiguity caused by the grammar trying to decide between the old else keyword and the new else keyword. Additionally, if the command is argumentless, it requires exactly two spaces. The external scanner handles looking for the argumentless_command_end, the termination and the immediate_single_whitespace_followed_by_non_whitespace. The immediate_single_whitespace_followed_by_non_whitespace token is needed for if the command has arguments.
  2. For the block style version of if statements, the only change was removing the whitespace_before_block and the inline_statement_separator tokens.

Fix for For Loops

Some changes include:

  • now, just the word for is valid. This is allowed in objectscript.
  • before returning that something is a termination (aka old for loop style), the grammar scans through the whitespace and checks if the next token is a block, and instead returns argumentless_loop. So, termination is only used for old style, and argumentless_loop is used for block style.

Fix for while loops

The only change here is that the _immediate_single_whitespace_followed_by_non_whitespace is no longer optional. For while loops in objectscript, this syntax is not optional, it is required.

Fix for do statements

The main changes for do statements include adding a post_conditional node as an option for the do_parameter as well as making the indirection an expression rather than just a glvn. Additionally, routine_ref didn't include all valid options, notably something like DO ^|"SAMPLES"|fibonacci. I adjusted the logic to allow for this.

I had to add a post_conditional as an option, because in objectscript, this kind of syntax is valid:
DO:F>0 A:F=1,B:F=2,C:((F'=1)&&(F'=2))

I had to switch the indirection to an expression because having it as a glvn didn't cover cases such as DO @("Item"_num)^Menu.

Additionally, do statements with the legacy dotted statements was not working previously. I adjusted the scanner and grammar so this now works.

Lastly, I made it so things like do{}while x are valid, where there is no space do and {.

Open Statement Adjustments

Before, valid open commands were erroring:
image

I adjusted the grammar to make these kinds of open commands work. I also added tests to check for all valid types of open statements:
image

Lock Statement Adjustments

Before, lock statements such as LOCK +(var1(1),var2(1)):10 were incorrectly being parsed as the first variant, which isn't a list because the grammar for the second. variant didn't allow + before (. This caused tree-sitter test to fail even though it looked fine on the playground and while parsing. I adjusted the grammar so this works with the second variant. Additionally, this PR adds unit tests for the different variants of valid lock statements.

Write Statement Adjustments

Before, valid write statements such as WRITE "Hello",!?5,"world" and WRITE ##class(Patient)patient.Doctor.Hospital.Name were being marked as invalid, The grammar is now adjusted to account for these valid statements.
image
image

Else Statement Adjustments

The old else statement is valid on its own now. Additionally, this was fixed for certain edge cases. Additionally, tests were added to check for both valid and invalid else statements.

Kill Statement Adjustments

Before, kill was marking valid statements such as KILL myarray.%Get(index).md(subscript) as invalid.
image

The reason this isn't working is because syntactically there is nothing that distinguishes a multidimensional property from a method. This actually made it so the set statements were saying set myarray.%Get(index).md(subscript) = 2 was invalid as well, when it isn't actually. Since we can't distinguish between a multidimensional property and an oref method syntactically, we will have to do that semantically. So, I had to adjust the oref_set_target to be more permissive.

Now it is working:
image

Fix for Set Statements (and indirectly, write statements with certain indirections)

There were two big fixes for set statements. First, this kind of thing must be allowed, because in certain scenarios it is valid: set @b.
image

Additionally, valid indirections were not being allowed because subscripts were not supported, for example:
image

To fix this, I defined indirected_glvn, which allows the subscript at the end of a glvn. Now, these types of indirections work:
image

Indirection fix

valid indirections such as SET LINETEXT = $TEXT(@$PIECE(LINE,"^",1)^@$PIECE(LINE,"^",2)) were being parsed as errors. To fix this, I added a node that defined TEXT specifically, rather than the generic version.

Scanner Adjustments

I simplified the scanner logic for finding if there was an argumentless_command_end, _argumentless_loop, or _immediate_single_whitespace_followed_by_non_whitespace. I removed unnecessary logic that was duplicated and adjusted a few clauses, as before the scanner was incorrectly returning that something was an argumentless_command_end. Additionally, the scanner now provides a _termination token, which is used in the old style version of for loops and if statements.

Core Grammar Test Suite

I used all examples on the InterSystems documentation for ObjectScript to test that we are correctly parsing while loops, do statements, for loops (old and new, argumentless and argumentful), do statements, do while statements, if statements, and expressions.

Highlighting Issues

While most highlighting issues were caused by errors in the parse tree (now fixed), one issue was caused by something else. For commands like new $NAMESPACE, although namespace is a system defined variable, it wasn't classified as one in the command_new logic. Now, this could be fixed by just highlighting an entire command_new node, but what we really want is for all $NAMESPACE occurrences to be highlighted the same color. To make sure that all $NAMESPACE are colored the same, I made a namespace token that could be referenced. I did this for estack, etrap, and roles as well.

TESTING:

Tested all valid variants of the different parts of the core grammar that I changed:

image

Verified that highlighting is now working on bugs that were found before:

image

Tested highlighting for estack, etrap, and roles:

image

Tested highlighting for do statements:

image

Tested highlighting for do while statements:

image

Tested highlighting for if statements:

image

Tested highlighting in for loops:

image

Tested highlighting of different valid expressions:

image

The tree-sitter grammars for injection languages is public
and easy to get. I added instructions on how to include
those grammars in one's project if they are developing
in Rust.
The AST has changed following the numeric_literal change,
which allowed negative values. Updating the expected test
result with the new AST so tests still pass.
Right now, syntax highlighting only works on a class that extends
something else. However classes that don't extend anything are
also valid classes. This change will include those classes in
syntax highlighting as well.
@hkimura-intersys hkimura-intersys self-assigned this Nov 17, 2025
@hkimura-intersys hkimura-intersys changed the title fix: fix: fix grammar for while loops, for loops, if statements, do statements and add more tests to core grammar Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant