-
Notifications
You must be signed in to change notification settings - Fork 41
feat(rollups): add rollup_contents mapping #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…nsert/get logic chore(git): add relevant gitattributes to ensure correct line endings for sqlx files
|
Hi, thanks for the PR! I'll take a look, but I don't know when I'll be able to get to it, since it's Holiday season :)
Yeah, that makes sense.
This is not a problem on its own, there are often multiple rollups that contain a single PR, because if a rollup fails, you create a new one, often before the old one is closed. I don't think that we need to think about that right now.
Also a good point! I'll ignore duplicate PR numbers in rollups. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the main query that we will want to execute on the queue page would look something like this:
SELECT
rollup_pr_number, member_pr_number
FROM rollup_member
WHERE repository = $1 AND
rollup_pr_number IN (SELECT number
FROM pull_request AS pr
WHERE pr.repository = $1
AND pr.status IN ('open', 'draft'))
ORDER BY rollup_pr_number;And we get back something like HashMap<PrNum, HashSet<PrNum>>, which maps rollups to their member PRs. Because we assume that there will be only a couple of open rollups, so it seems easier and faster to do it this way, rather than try to find a corresponding rollup for each PR in the queue.
| @@ -0,0 +1 @@ | |||
| /.sqlx/*.json binary linguist-generated=true No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you have any specific use-case for this? GitHub already marked them as generated before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, yes, it's mostly for the binary part, I have a windows workstation and the sqlx command to generate those files was causing diffs due to line endings. I copied this file from the docs.rs repo to avoid this issue. I can remove it from the PR as it's out of scope, if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Let's keep it then, no real harm in that.
| /// Register the contents of a rollup in the DB. | ||
| /// The contents are stored as rollup PR number - member PR number records, scoped under a specific repository. | ||
| /// All records are inserted in a single transaction. | ||
| pub async fn create_rollup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pub async fn create_rollup( | |
| pub async fn register_rollup_members( |
I think that's a bit less misleading.
| @@ -0,0 +1,11 @@ | |||
| CREATE TABLE IF NOT EXISTS rollup_contents | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the member terminology that you have used, so let's go with it.
| CREATE TABLE IF NOT EXISTS rollup_contents | |
| CREATE TABLE IF NOT EXISTS rollup_member |
| } | ||
|
|
||
| /// Returns the numbers of all the PRs associated with the requested rollup. | ||
| pub async fn get_rollup_pr_contents( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the main operation that we will want to do is load members information for a set of PRs that will be displayed on the queue page. So I'm thinking something like:
- Load non-closed PRs on the queue page endpoint
- Do a query that will fetch all members for all rollups in those PRs.
And then in the frontend, we'll build a reverse index, so that we can also associate a PR with its rollup(s).
As an alternative to 1) + 2), we could directly attach a bool whether a given PR is a rollup or not to the return value of 1), and then only ask for members from actual rollups, to reduce the DB traffic slightly. But we'd need to take a look at the query plans to see if it makes sense. And probably perf. will be fine with both approaches.
|
|
||
| PRIMARY KEY (repository, rollup_pr_number, member_pr_number) | ||
| ); | ||
| CREATE INDEX rollup_contents_member_pr_number_idx ON rollup_contents (repository, member_pr_number); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my other comment, I think that it makes more sense to create the index on the rollup_pr_number attribute, as we will be loading members of rollups, and not looking up rollups for individual PRs, at least for the queue page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will proceed removing this index, then. For the index on the rollup_pr_attribute I assume you mean including the repository attribute as well, like the query in your review. That should already be covered by the index generated by the primary key, as I've purposefully ordered the columns by putting the member_pr_number attribute last, as per https://www.postgresql.org/docs/current/indexes-multicolumn.html .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, also including repository. Fair enough, as long as the index is used for the query I posted, it should be fine.
I ran some simulations with 20k PRs and 3k rollup_member rows on a local Postgres instance, and EXPLAIN ANALYZE was still choosing a sequential scan over rollup members anyway. So maybe the index won't be even needed so much. But that shouldn't be an issue for some time anyway.
Closes #494.
I wasn't sure whether to use IDs from the pull_request table to represent PRs in the mapping, or just the GitHub PR number - I went with the latter as there would be no record for the rollup at the time of its creation, as that gets populated via GitHub events, if I understood correctly.
I haven't experienced rollups at all as I haven't (yet?) contributed to a project with bors, but I noticed there's no safety check for multiple rollups containing the same PR(s) being opened. I found a related comment here, but I kept it out of this PR to reduce its scope and simplify review. I think such a safety check could be added easily by checking for rollups associated to each candidate PR, and verifying the status of the rollup PR to ignore closed ones. That would still allow one to recreate the same scenario by reopening a previously closed rollup, though.
One thing I'm unsure about is how to handle the error when writing the rollup_contents records in the DB. I think that should only happen either in case of DB connection errors, or if the list of PRs contains duplicates. I assume the former can be propagated to the caller, whereas I'm not sure if the latter is realistic and should be accounted for. I'm thinking we can just ignore the conflicting records in such case - let me know if I should proceed with that.
Edit: sorry, I forgot the migration test files. Will push them tomorrow as soon as possible