-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[responsesAPI][6] input/output messages for ResponsesParser #30158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Andrew Xia <axia@fb.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces input/output message tracking for ResponsesParser and enables enable_response_messages for ParsableContext. However, there are several issues identified, including incorrect population of input_messages in multi-turn conversations, unresolved TODOs that indicate known parsing shortcomings, and the presence of a debugger breakpoint in the code. These issues could lead to incorrect API responses and should be addressed.
| if maybe_validation_error is not None: | ||
| return maybe_validation_error | ||
|
|
||
| fbvscode.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if len(self.input_messages) == 0: | ||
| self.input_messages.append( | ||
| ResponseRawMessageAndToken( | ||
| message=output_prompt, | ||
| tokens=output_prompt_token_ids, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input_messages list is only populated during the first call to append_output due to the if len(self.input_messages) == 0: condition. In a multi-turn conversation, subsequent prompts (which are part of output.prompt) will not be added to input_messages. This will result in input_messages not accurately reflecting all prompts sent to the model across turns, which is likely unintended for a comprehensive input message log.
self.input_messages.append(
ResponseRawMessageAndToken(
message=output_prompt,
tokens=output_prompt_token_ids,
)
)| # TODO: merge them in properly together | ||
| # TODO: responsesParser doesn't parse kimi k2 sentences correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These TODO comments indicate known issues with merging messages and parsing kimi k2 sentences. Since output_messages is directly used in serving_responses.py (line 663), these unresolved issues could lead to incorrect or incomplete data being returned in the API response, especially in multi-turn scenarios. This impacts the correctness of the API output.
| ) | ||
| engine_prompt = engine_prompts[0] | ||
| request_prompt = request_prompts[0] | ||
| prompt_text, _, _ = self._get_prompt_components(request_prompt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| input_messages = context.input_messages | ||
| output_messages = context.output_messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input_messages and output_messages from the context are being directly assigned here. As noted in vllm/entrypoints/context.py, input_messages might not be fully populated for multi-turn conversations, and output_messages has unresolved TODOs regarding proper merging and parsing. This could lead to incomplete or incorrect data being returned when enable_response_messages is true.
Purpose
Test Plan
One shortcoming of ResponsesParser, note
</think>\n\n[e~[\n]~b]which is in the 2nd turn prompt but not in first output.Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.