-
Notifications
You must be signed in to change notification settings - Fork 80
Job queue, matrix builder, concurrency control #709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trh/compilers-in-metadata
Are you sure you want to change the base?
Conversation
|
Main chunks of work to be done at this point:
|
|
But the first two bullets can be implemented as separate changes, right? They're not needed for strict parity. |
|
@fsoikin the overall goal is to merge #669 ASAP, so that we can reupload the whole registry. We could be doing all these changes in separate PRs but I don't see the point since they will all need to end up in the |
| insertPackageJob :: PackageOperation -> ContT Response (Run _) Response | ||
| insertPackageJob operation = do | ||
| lift $ Log.info $ "Enqueuing job for package " <> PackageName.print (Operation.packageName operation) | ||
| jobId <- newJobId | ||
| lift $ Db.insertPackageJob { jobId, payload: operation } | ||
| jsonOk V1.jobCreatedResponseCodec { jobId } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code checked for running jobs before creating new ones:
lift (Db.runningJobForPackage packageName) >>= case _ of
Right { jobId, jobType: runningJobType } -> ...Looks like the duplicate checking is no longer there, is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have noticed as well, and I am reshuffling this part of the code to add this back, among other things.
I have been thinking about splitting the "package job" table as well into different tables for publish/unpublish/transfer, because they do require different checking (publishing/unpublishing needs to check for the tuple package name and version, transferring only needs to match the package name). Thoughts?
| JOIN ${JOB_INFO_TABLE} info ON job.jobId = info.jobId | ||
| WHERE info.finishedAt IS NULL | ||
| AND info.startedAt IS NULL | ||
| ORDER BY info.createdAt DESC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why DESC here? Don't we want the oldest job to go out first (FIFO), like it previously was? Same for the other selectNext* functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I have noticed too, it should be ASC.
And it should probably not be LIMIT 1 either, as we likely want to do some toposorting on the result. I have not figured out yet where we draw the line though, I wouldn't like to be toposorting thousands of matrix jobs over and over, so we might want keep the limit and toposort on insertion. But then we need to ORDER BY something else than creation date when fishing that out. Ideas?
|
Also the tests are failing because of a missing |
In addition to your list (some responses to that below), the other point that I'm not seeing here is that we need to ensure that the github issue module only proxies requests over to the server API and writes comments on 'notify' logs but does not actually execute e.g. the publish operation itself. Otherwise we've defeated the purpose of the pull request, as we can't enforce a lock on the git commits anymore.
This makes sense to me: start with no-dependency packages, and on completion we queue dependents that are satisfied by this package existing, propagate outwards.
What do we need to do beyond what we have today? Today, we open an issue, The same approach is used for transfers or unpublishing: open the issue as a |
Yeah, I have not fixed that yet because I was not sure we need the version in there at all. Help me with reminding why we need the version now 😄 Also: I see these are on a different branch? I will cherry-pick them here, and feel free to push new stuff here as well.
Yeah it's planned but I want to thread through the matrix jobs and the package set jobs to the database before going to the github interaction part. It might be a nice spot where we can parallelise work if you want to have a go at it. I will push the current state of my branch so you have the latest.
Yeah if you think it's straightforward then it probably is 😄 I have not looked at this chunk of code yet so I might be overthinking stuff. I like the public aspect of the package set updates as well, and I in fact want to make it public-only, in the sense that I'd put the package-set server endpoints behind encryption/authentication (with just a simple shared secret for example) so that only the code that runs on github can hit them and we don't have to worry about things coming to the registry server from other places. No strong feelings about this though if you think it's too convoluted |
|
Ah, if you want it to be forced public then we could indeed do a shared secret we send in the HTTP header or something. I’d like transfer and unpublish to be public too — we could potentially remove the ability for pacchettibotti signatures to be valid for those and only the shared secret can be used as a trustee override? |
the publishCodec requires a version file but the test fixtures weren't updated to include it
The JS insertMatrixJobImpl expects columns [jobId, packageName, packageVersion, compilerVersion, payload] but the PureScript types were missing packageName and packageVersion
WIP, should fix #696, fix #649