Skip to content

Conversation

@Carbonhell
Copy link

@Carbonhell Carbonhell commented Dec 22, 2025

Closes #494.

I wasn't sure whether to use IDs from the pull_request table to represent PRs in the mapping, or just the GitHub PR number - I went with the latter as there would be no record for the rollup at the time of its creation, as that gets populated via GitHub events, if I understood correctly.

I haven't experienced rollups at all as I haven't (yet?) contributed to a project with bors, but I noticed there's no safety check for multiple rollups containing the same PR(s) being opened. I found a related comment here, but I kept it out of this PR to reduce its scope and simplify review. I think such a safety check could be added easily by checking for rollups associated to each candidate PR, and verifying the status of the rollup PR to ignore closed ones. That would still allow one to recreate the same scenario by reopening a previously closed rollup, though.

One thing I'm unsure about is how to handle the error when writing the rollup_contents records in the DB. I think that should only happen either in case of DB connection errors, or if the list of PRs contains duplicates. I assume the former can be propagated to the caller, whereas I'm not sure if the latter is realistic and should be accounted for. I'm thinking we can just ignore the conflicting records in such case - let me know if I should proceed with that.

Edit: sorry, I forgot the migration test files. Will push them tomorrow as soon as possible

…nsert/get logic

chore(git): add relevant gitattributes to ensure correct line endings for sqlx files
@Kobzol
Copy link
Member

Kobzol commented Dec 22, 2025

Hi, thanks for the PR! I'll take a look, but I don't know when I'll be able to get to it, since it's Holiday season :)

I wasn't sure whether to use IDs from the pull_request table to represent PRs in the mapping, or just the GitHub PR number - I went with the latter as there would be no record for the rollup at the time of its creation, as that gets populated via GitHub events, if I understood correctly.

Yeah, that makes sense.

but I noticed there's no safety check for multiple rollups containing the same PR(s) being opened.

This is not a problem on its own, there are often multiple rollups that contain a single PR, because if a rollup fails, you create a new one, often before the old one is closed. I don't think that we need to think about that right now.

if the list of PRs contains duplicates. I assume the former can be propagated to the caller, whereas I'm not sure if the latter is realistic and should be accounted for.

Also a good point! I'll ignore duplicate PR numbers in rollups.

Copy link
Member

@Kobzol Kobzol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the main query that we will want to execute on the queue page would look something like this:

SELECT
    rollup_pr_number, member_pr_number
FROM rollup_member
WHERE repository = $1 AND
      rollup_pr_number IN (SELECT number
                           FROM pull_request AS pr
                           WHERE pr.repository = $1
                             AND pr.status IN ('open', 'draft'))
ORDER BY rollup_pr_number;

And we get back something like HashMap<PrNum, HashSet<PrNum>>, which maps rollups to their member PRs. Because we assume that there will be only a couple of open rollups, so it seems easier and faster to do it this way, rather than try to find a corresponding rollup for each PR in the queue.

@@ -0,0 +1 @@
/.sqlx/*.json binary linguist-generated=true No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have any specific use-case for this? GitHub already marked them as generated before.

Copy link
Author

@Carbonhell Carbonhell Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yes, it's mostly for the binary part, I have a windows workstation and the sqlx command to generate those files was causing diffs due to line endings. I copied this file from the docs.rs repo to avoid this issue. I can remove it from the PR as it's out of scope, if you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Let's keep it then, no real harm in that.

/// Register the contents of a rollup in the DB.
/// The contents are stored as rollup PR number - member PR number records, scoped under a specific repository.
/// All records are inserted in a single transaction.
pub async fn create_rollup(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub async fn create_rollup(
pub async fn register_rollup_members(

I think that's a bit less misleading.

@@ -0,0 +1,11 @@
CREATE TABLE IF NOT EXISTS rollup_contents
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the member terminology that you have used, so let's go with it.

Suggested change
CREATE TABLE IF NOT EXISTS rollup_contents
CREATE TABLE IF NOT EXISTS rollup_member

}

/// Returns the numbers of all the PRs associated with the requested rollup.
pub async fn get_rollup_pr_contents(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the main operation that we will want to do is load members information for a set of PRs that will be displayed on the queue page. So I'm thinking something like:

  1. Load non-closed PRs on the queue page endpoint
  2. Do a query that will fetch all members for all rollups in those PRs.

And then in the frontend, we'll build a reverse index, so that we can also associate a PR with its rollup(s).

As an alternative to 1) + 2), we could directly attach a bool whether a given PR is a rollup or not to the return value of 1), and then only ask for members from actual rollups, to reduce the DB traffic slightly. But we'd need to take a look at the query plans to see if it makes sense. And probably perf. will be fine with both approaches.


PRIMARY KEY (repository, rollup_pr_number, member_pr_number)
);
CREATE INDEX rollup_contents_member_pr_number_idx ON rollup_contents (repository, member_pr_number);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my other comment, I think that it makes more sense to create the index on the rollup_pr_number attribute, as we will be loading members of rollups, and not looking up rollups for individual PRs, at least for the queue page.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will proceed removing this index, then. For the index on the rollup_pr_attribute I assume you mean including the repository attribute as well, like the query in your review. That should already be covered by the index generated by the primary key, as I've purposefully ordered the columns by putting the member_pr_number attribute last, as per https://www.postgresql.org/docs/current/indexes-multicolumn.html .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, also including repository. Fair enough, as long as the index is used for the query I posted, it should be fine.

I ran some simulations with 20k PRs and 3k rollup_member rows on a local Postgres instance, and EXPLAIN ANALYZE was still choosing a sequential scan over rollup members anyway. So maybe the index won't be even needed so much. But that shouldn't be an issue for some time anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Store mapping of PRs to rollups

2 participants