Skip to content

Conversation

@aryaman-gupta
Copy link
Contributor

This PR optimizes the performance on ROCm of the group_index_select_or_add_2d_kernel kernel on tables with small embedding dimensions (i.e., num_cols).

For tables with small embedding dimensions, the code is refactored to process multiple rows within the same warp. Two files are changed:

  1. fbgemm_gpu/src/sparse_ops/sparse_ops_gpu.cpp - The calculation of the warp_offsets is changed in the host-side code.
  2. fbgemm_gpu/src/sparse_ops/sparse_group_index.cu - The group_index_select_or_add_2d_kernel kernel is modified to process multiple rows within a warp for small embedding dimensions.

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Dec 16, 2025

@q10 has imported this pull request. If you are a Meta employee, you can view this in D89316371.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant