-
Notifications
You must be signed in to change notification settings - Fork 63
Extend on-device sampling support for dual QPC VLMs #597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend on-device sampling support for dual QPC VLMs #597
Conversation
af8e673 to
df3501a
Compare
quic-hemagnih
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add the CI test cases.
Signed-off-by: quic-xiyushi <xiyushi@qti.qualcomm.com>
df3501a to
d722a5a
Compare
Signed-off-by: quic-xiyushi <xiyushi@qti.qualcomm.com>
d722a5a to
e06e175
Compare
Signed-off-by: quic-sanising <sanising@qti.qualcomm.com> Signed-off-by: sanising <sanising@qti.qualcomm.com>
900aee5 to
3e242ce
Compare
Signed-off-by: quic-xiyushi <xiyushi@qti.qualcomm.com>
Signed-off-by: sanising <sanising@qti.qualcomm.com>
| QEffGPTJForCausalLM, | ||
| QEffGraniteForCausalLM, | ||
| QEffGraniteMoeForCausalLM, | ||
| QEffInternDecoderWrapper, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we are enabling sampling only for intern model?
Will other VLMs also be supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other VLMs are also supposed to be supported. But currently only InternVL and Qwen VL 2.5 have been tested.
Signed-off-by: quic-xiyushi <xiyushi@qti.qualcomm.com>
cc44ad0 to
ef9ae14
Compare
Signed-off-by: quic-xiyushi <xiyushi@qti.qualcomm.com>
Signed-off-by: quic-xiyushi <xiyushi@qti.qualcomm.com>
8d00cb1 to
5e2afb7
Compare
@quic-hemagnih CI added. Please review this PR again. Thank you! |
7d06a75 to
a0716fa
Compare
Signed-off-by: quic-xiyushi <xiyushi@qti.qualcomm.com>
a0716fa to
10990a9
Compare
| top_ps: Optional[torch.Tensor] = None, | ||
| min_ps: Optional[torch.Tensor] = None, | ||
| random_numbers: Optional[torch.Tensor] = None, | ||
| vision_embeds: Optional[torch.Tensor] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add these both vision_embeds and image_idx in docs Args list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since vision_embeds and image_idx come from the original forward method for VLMs and are not specifically added for on-device sampling, and because the docstring is intended only for parameters introduced for on-device sampling support, function docsstring was not added for vision_embeds and image_idx. Instead, I added added a note stating:
"The vision_embeds and image_idx parameters are optional and are used only for VLMs when supported by the original forward function."
In addition, to make this clearer, I reordered the arguments so that vision_embeds and image_idx appear right after num_logits_to_keep, before the on-device sampling arguments.
| min_ps: Optional[torch.Tensor] = None, | ||
| random_numbers: Optional[torch.Tensor] = None, | ||
| vision_embeds: Optional[torch.Tensor] = None, | ||
| image_idx: Optional[torch.Tensor] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep dtype of these 2 consistent as per lines 27-28. also update function docstring for these newly added args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the dtype of these two parameters so that they are consistent with lines 27–28. For the function docstring, since vision_embeds and image_idx come from the original forward method for VLMs and are not specifically added for on-device sampling, and because the docstring is intended only for parameters introduced for on-device sampling support, function docsstring was not added for vision_embeds and image_idx. Instead, I added added a note stating:
"The vision_embeds and image_idx parameters are optional and are used only for VLMs when supported by the original forward function."
In addition, to make this clearer, I reordered the arguments so that vision_embeds and image_idx appear right after num_logits_to_keep, before the on-device sampling arguments.
|
Please resolve the conflicts. |
ochougul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge if CI is passing
Signed-off-by: Mamta Singh <168400541+quic-mamta@users.noreply.github.com>
Signed-off-by: quic-xiyushi <xiyushi@qti.qualcomm.com>
Signed-off-by: quic-xiyushi <xiyushi@qti.qualcomm.com>
Done. |
0c1fddf to
ac48615
Compare
e5e509f to
2533262
Compare
Signed-off-by: quic-xiyushi <xiyushi@qti.qualcomm.com>
0d0d04d to
86aaad2
Compare
Signed-off-by: quic-xiyushi <xiyushi@qti.qualcomm.com>
Overview
On-device sampling can significantly reduce host overhead and improve inference throughput; however, so far it has only been implemented for
QEffForCausalLMmodels. This PR extends on-device sampling support to the language decoder of dual QPC vision language models,QEffCausalLMForTextImageToTextModel. In addition, it fixes the bug in gumbel noise so that it correctly simulates a multinomial distribution for random sampling.Implementation details
Usage
The usage is the similar to enable on-device sampling for
QEffForCausalLM.