Skip to content

Conversation

@kernel-patches-daemon-bpf-rc
Copy link

Pull request for series with
subject: xsk: introduce pre-allocated memory per xsk CQ
version: 2
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1033607

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 6f0b824
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1033607
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: e7a0adb
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1033607
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: ec439c3
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1033607
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: ec439c3
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1033607
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: 3d60306
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1033607
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: d2749ae
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1033607
version: 2

@kernel-patches-daemon-bpf-rc
Copy link
Author

Upstream branch: f785a31
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1033607
version: 2

This is a prep that will be used to store the addr(s) of descriptors so
that each skb going to the end of life can publish corresponding addr(s)
in its completion queue that can be read by userspace.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Before the commit 30f241f ("xsk: Fix immature cq descriptor
production"), there is one issue[1] which causes the wrong publish
of descriptors in race condidtion. The above commit fixes the issue
but adds more memory operations in the xmit hot path and interrupt
context, which can cause side effect in performance.

Based on the existing infrastructure, this patch tries to propose
a new solution to fix the problem by using a pre-allocated memory
that is local completion queue to avoid frequently performing memory
functions. The benefit comes from replacing xsk_tx_generic_cache with
local cq.

The core logics are as show below:
1. allocate a new local completion queue when setting the real queue.
2. write the descriptors into the local cq in the xmit path. And
   record the prod as @start_pos that reflects the start position of
   skb in this queue so that later the skb can easily write the desc
   addr(s) from local cq to cq addrs in the destruction phase.
3. initialize the upper 24 bits of destructor_arg to store @start_pos
   in xsk_skb_init_misc().
4. Initialize the lower 8 bits of destructor_arg to store how many
   descriptors the skb owns in xsk_inc_num_desc().
5. write the desc addr(s) from the @start_addr from the local cq
   one by one into the real cq in xsk_destruct_skb(). In turn sync
   the global state of the cq as before.

The format of destructor_arg is designed as:
 ------------------------ --------
|       start_pos        |  num   |
 ------------------------ --------
Using upper 24 bits is enough to keep the temporary descriptors. And
it's also enough to use lower 8 bits to show the number of descriptors
that one skb owns.

[1]: https://lore.kernel.org/all/20250530095957.43248-1-e.kubanski@partner.samsung.com/

Signed-off-by: Jason Xing <kernelxing@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants