flash feature refactor #778

michaelfeil · 2025-12-18T00:28:19Z

This PR extends the Qwen2 architecture to other models other than Alibaba-NLP/gte-Qwen2-7B-instruct, given that the prior implementation was only covering such cases so as to use causal attention on CUDA and to rely on the provided tokenizer rather than patching it.

What does this PR do?

makes flash-attn-3 and flash-attn-cpu easier to add.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil

This PR extends the Qwen2 architecture to other models other than `Alibaba-NLP/gte-Qwen2-7B-instruct`, given that the prior implementation was only covering such cases so as to use causal attention on CUDA and to rely on the provided tokenizer rather than patching it. # What does this PR do? - makes flash-attn-3 and flash-attn-cpu easier to add.   Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/text-embeddings-inference/blob/main/CONTRIBUTING.md)? - [ ] Was this discussed/approved via a GitHub issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs). - [ ] Did you write any new necessary tests? If applicable, did you include or update the `insta` snapshots? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

michaelfeil and others added 3 commits December 18, 2025 00:28

add varlen attention interface

6679919

Update lib.rs

6e5b163

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

flash feature refactor #778

flash feature refactor #778

Uh oh!

michaelfeil commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

flash feature refactor #778

Are you sure you want to change the base?

flash feature refactor #778

Uh oh!

Conversation

michaelfeil commented Dec 18, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant