Better grader error display #376

Matistjati · 2025-12-06T19:06:02Z

This PR does 3 things:

Rewrites some awkward open() code to be crash-safe
Prints stderr in case the grader crashes
Removes support for multiple graders

Sample outputs:

Differing grader outputs:

ERROR Different graders gave different results:
ERROR grader.py: AC 0.0
ERROR grader2.py: AC 1.0

Grader stderr (bizarre error, I know):

Checking submissions
ERROR Judge error: grader.py (Python 3 (w/PyPy3)) crashed
ERROR Grader stderr:
==134096==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12)
==134096==ReserveShadowMemoryRange failed while trying to map 0xdfff0001000 bytes. Perhaps you're using ulimit -v


ERROR Judge error: grader.py (Python 3 (w/PyPy3)) crashed
ERROR Grader stderr:
==134097==ERROR: AddressSanitizer failed to allocate 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno: 12)
==134097==ReserveShadowMemoryRange failed while trying to map 0xdfff0001000 bytes. Perhaps you're using ulimit -v
...

Matistjati · 2025-12-06T20:00:29Z

Let's hold off with merging this until we get some input on the intended behavior for multiple graders from here Kattis/problem-package-format#529.

Matistjati

Me and Fredrik agree that there should be at most one grader, and that we should add it to the spec. We decided not to rename the folder to grader for backwards compatibility, and thus I didn't rename the class name.

Matistjati · 2025-12-07T15:38:22Z

problemtools/verifyproblem.py

+        if not grader.compile()[0]:
+            self.fatal(f'Failed to compile grader {grader}')
+            return ('JE', None)


We should really document what invariants we have for check, setup and what their purposes are.

This check is already performed in check. Do we want to have the an invariant that check is always called before grade? To me, it feels strange that compilation happens as part of check, does it not make more sense to compile as part of setup?

My inferred view is that we keep setup cheap so that we don't have to pay for problem parts we don't use. But then, are we guaranteed that check is always called before calling the part's "do work" function? I would assume yes, in which case we can remove this check.

Yes, we should document this better. As I mentioned on Slack, I'm looking to do a larger refactor at some point (separating the parsing of a problem package from the various checks we have), which should hopefully resolve that.

Until then, my understanding is that setup is basically a constructor. It should probably only fail when things are in such a bad state that we're not sure we're making sense of what we're seeing. It is also called from the constructor, so it is safe to assume it has been run before check. That said, setup should basically never do any actual checks (as we may need to run setup on aspects which the user has explictly told us not to check).

Matistjati · 2025-12-07T15:40:32Z

problemtools/verifyproblem.py

+            if not self._default_grader:
+                self.fatal('Failed to locate default grader')
+                return ('JE', None)
+            grader = self._default_grader
        else:
-            graders = self._graders
+            if not self._grader:
+                self.fatal('Problem has grading: custom without any custom grader')
+                return ('JE', None)
+            grader = self._grader


In TestcaseGroup, there are already checks for this. Do we want to assume that TestcaseGroup's check function must be called before running grade, so these checks can be removed? In that case, should we document that dependency?

The code there is

if self.config['grading'] == 'custom' and self._problem.graders._grader is None: self._problem.graders.fatal(f'{self} has custom grading but no custom graders provided') if self.config['grading'] == 'default' and Graders._default_grader is None: self._problem.graders.fatal(f'{self} has default grading but I could not find default grader')

Order of checks is still a bit of a mess. I think with the current state of the code base, it's better to be defensive and duplicate checks like this, particularly fatal checks (so we're not gonna spam twice when this happens, as we're going to stop execution).

Matistjati added 2 commits December 6, 2025 20:03

Better grader error display

32a59c1

Minor fixes

93b18d7

gkreitz approved these changes Dec 6, 2025

View reviewed changes

Remove support for multiple graders

1a0bddf

Matistjati commented Dec 7, 2025

View reviewed changes

Matistjati added 2 commits December 7, 2025 16:44

Use f-string

0044160

Rewrite some if:s to be more DRY

f3dd242

Matistjati mentioned this pull request Dec 8, 2025

Display warning when grader compilation fails when testing submissions #133

Closed

Better error message for compile error

01a3865

gkreitz merged commit 8e7148e into Kattis:master Dec 11, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better grader error display #376

Better grader error display #376

Uh oh!

Matistjati commented Dec 6, 2025 •

edited

Loading

Uh oh!

Matistjati commented Dec 6, 2025

Uh oh!

Matistjati left a comment

Uh oh!

Matistjati Dec 7, 2025

Uh oh!

gkreitz Dec 8, 2025

Uh oh!

Matistjati Dec 7, 2025

Uh oh!

gkreitz Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Better grader error display #376

Better grader error display #376

Uh oh!

Conversation

Matistjati commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Matistjati commented Dec 6, 2025

Uh oh!

Matistjati left a comment

Choose a reason for hiding this comment

Uh oh!

Matistjati Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

gkreitz Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Matistjati Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

gkreitz Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Matistjati commented Dec 6, 2025 •

edited

Loading