Skip to content

Conversation

@qandrew
Copy link
Contributor

@qandrew qandrew commented Dec 5, 2025

Purpose

Test Plan

One shortcoming of ResponsesParser, note </think>\n\n[e~[\n]~b] which is in the 2nd turn prompt but not in first output.

output.prompt
']~!b[]~b]system\nYou are a helpful assistant.\n\n# Tools\nYou may call one or more tools to assist with the user query.\nHere are the tools available in JSONSchema format:\n\n<tools>\n<tool>{"server_label": "code_interpreter", "type": "mcp", "allowed_tools": null, "authorization": null, "connector_id": null, "headers": {"test": "test"}, "require_approval": null, "server_description": null, "server_url": "IGNORED"}</tool>\n</tools>\n\nWhen making tool calls, use XML format to invoke tools and pass parameters:\n\n<minimax:tool_call>\n<invoke name="tool-name-1">\n<parameter name="param-key-1">param-value-1</parameter>\n<parameter name="param-key-2">param-value-2</parameter>\n...\n</invoke>\n</minimax:tool_call>[e~[\n]~b]user\nMultiply 64548*15151 using the python tool.[e~[\n]~b]ai\n<think>\nThe user explicitly asked to multiply 64548 by 15151 "using the python tool". I have a `code_interpreter` tool available, which is designed for executing Python code. So, the logical step is to call `code_interpreter` with the appropriate Python code to perform this multiplication.\n\n</think>\n\n[e~[\n]~b]ai\n\n<minimax:tool_call>\n<invoke name="code_interpreter">\n<parameter name="code">64548 * 15151</parameter>\n</invoke>\n</minimax:tool_call>[e~[\n]~b]tool\n<response>977966748\n\n</response>[e~[\n]~b]ai\n<think>\n'
self.output_messages.message
'The user explicitly asked to multiply 64548 by 15151 "using the python tool". I have a `code_interpreter` tool available, which is designed for executing Python code. So, the logical step is to call `code_interpreter` with the appropriate Python code to perform this multiplication.\n</think>\n\n\n<minimax:tool_call>\n<invoke name="code_interpreter">\n<parameter name="code">64548 * 15151</parameter>\n</invoke>\n</minimax:tool_call>'

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces input/output message tracking for ResponsesParser and enables enable_response_messages for ParsableContext. However, there are several issues identified, including incorrect population of input_messages in multi-turn conversations, unresolved TODOs that indicate known parsing shortcomings, and the presence of a debugger breakpoint in the code. These issues could lead to incorrect API responses and should be addressed.

if maybe_validation_error is not None:
return maybe_validation_error

fbvscode.set_trace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

A debugger breakpoint (fbvscode.set_trace()) has been left in the code. This should be removed before merging to avoid unexpected behavior or blocking execution in a production environment.

Comment on lines +279 to +285
if len(self.input_messages) == 0:
self.input_messages.append(
ResponseRawMessageAndToken(
message=output_prompt,
tokens=output_prompt_token_ids,
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The input_messages list is only populated during the first call to append_output due to the if len(self.input_messages) == 0: condition. In a multi-turn conversation, subsequent prompts (which are part of output.prompt) will not be added to input_messages. This will result in input_messages not accurately reflecting all prompts sent to the model across turns, which is likely unintended for a comprehensive input message log.

            self.input_messages.append(
                ResponseRawMessageAndToken(
                    message=output_prompt,
                    tokens=output_prompt_token_ids,
                )
            )

Comment on lines +287 to +288
# TODO: merge them in properly together
# TODO: responsesParser doesn't parse kimi k2 sentences correctly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

These TODO comments indicate known issues with merging messages and parsing kimi k2 sentences. Since output_messages is directly used in serving_responses.py (line 663), these unresolved issues could lead to incorrect or incomplete data being returned in the API response, especially in multi-turn scenarios. This impacts the correctness of the API output.

)
engine_prompt = engine_prompts[0]
request_prompt = request_prompts[0]
prompt_text, _, _ = self._get_prompt_components(request_prompt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The variable prompt_text is extracted here but is not used anywhere in the subsequent code within this elif block. This is dead code and should be removed to maintain code cleanliness and prevent confusion.

Comment on lines +662 to +663
input_messages = context.input_messages
output_messages = context.output_messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The input_messages and output_messages from the context are being directly assigned here. As noted in vllm/entrypoints/context.py, input_messages might not be fully populated for multi-turn conversations, and output_messages has unresolved TODOs regarding proper merging and parsing. This could lead to incomplete or incorrect data being returned when enable_response_messages is true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

1 participant