Optimize group_index_select_or_add_2d_kernel on ROCm by adding a separate codepath for small embedding dimensions #5233

aryaman-gupta · 2025-12-16T19:04:00Z

This PR optimizes the performance on ROCm of the group_index_select_or_add_2d_kernel kernel on tables with small embedding dimensions (i.e., num_cols).

For tables with small embedding dimensions, the code is refactored to process multiple rows within the same warp. Two files are changed:

fbgemm_gpu/src/sparse_ops/sparse_ops_gpu.cpp - The calculation of the warp_offsets is changed in the host-side code.
fbgemm_gpu/src/sparse_ops/sparse_group_index.cu - The group_index_select_or_add_2d_kernel kernel is modified to process multiple rows within a warp for small embedding dimensions.

…r_add_2d_kernel

…or_add_2d_kernel

meta-codesync · 2025-12-16T21:34:59Z

@q10 has imported this pull request. If you are a Meta employee, you can view this in D89316371.

…p_index_select_or_add_2d_kernel

…zed small embedding dims path

…isable optimized smallEmbD path

aryaman-gupta added 3 commits December 12, 2025 15:09

adds optimized path for small dimension sizes to group_index_select_o…

85caa29

…r_add_2d_kernel

sparse_group_index.cu: edits some comments

ff1b9b6

adds USE_ROCM guards to subwarp optimizations for group_index_select_…

439a51a

…or_add_2d_kernel

pytorch-bot bot added the module: rocm label Dec 16, 2025

meta-cla bot added the cla signed label Dec 16, 2025

aryaman-gupta added 3 commits December 18, 2025 10:11

sparse_group_index: handle UNROLL_FACTOR for small dimensions in grou…

2a85d73

…p_index_select_or_add_2d_kernel

sparse_group_index: handle fixed-column-size case correctly in optimi…

2f54140

…zed small embedding dims path

group_index_select_or_add_2d_kernel: when num_cols < UNROLL_FACTOR, d…

e0edc40

…isable optimized smallEmbD path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize group_index_select_or_add_2d_kernel on ROCm by adding a separate codepath for small embedding dimensions #5233

Optimize group_index_select_or_add_2d_kernel on ROCm by adding a separate codepath for small embedding dimensions #5233

Uh oh!

aryaman-gupta commented Dec 16, 2025

Uh oh!

meta-codesync bot commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Optimize group_index_select_or_add_2d_kernel on ROCm by adding a separate codepath for small embedding dimensions #5233

Are you sure you want to change the base?

Optimize group_index_select_or_add_2d_kernel on ROCm by adding a separate codepath for small embedding dimensions #5233

Uh oh!

Conversation

aryaman-gupta commented Dec 16, 2025

Uh oh!

meta-codesync bot commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant