-
Notifications
You must be signed in to change notification settings - Fork 42
Spoc 342. Replace core-based delta apply approach with native extensible one. #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e173ee7 to
fc31a21
Compare
| /* Additional cache, only valid as long as relation mapping is. */ | ||
| bool hasTriggers; | ||
|
|
||
| Oid *delta_functions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should avoid changing variable names unless the original name is very confusing. It makes diffing different versions more difficult to sort through
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand what it means - it is a new variable, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looked like delta_apply_functions -> delta_functions and has_delta_columns -> has_delta_apply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we may maintain two approaches (core-based and extension-based), but they must be kept separate. So, their delta_apply structures can't intersect. That's why they have different names. If we decide to stick to the extension-based approach, one more commit might rename everything.
| #define SPOCK_VERSION_NUM 60000 | ||
|
|
||
| #define EXTENSION_NAME "spock" | ||
| #define spock_SECLABEL_PROVIDER "spock" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use capitalized SPOCK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may capitalise the define's name - I just followed the naming example (borrowed from pgactive) to make cross-code browsing easier. We definitely shouldn't capitalise the define's value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like they use pgative_ everywhere for their #defines, whereas we use SPOCK_ everywhere for our existing #defines. It would be more consistent with our existing code to use SPOCK_.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, there are no objectives from my side - it's just too trivial stuff.
| * refresh of the SpockRelation entry and guarantees actual state of the | ||
| * delta_apply columns. | ||
| */ | ||
| EXECUTE format('ALTER TABLE %I REPLICA IDENTITY FULL', rel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I comment out this stuff as well as the RAISE above, it seems to work ok.
That is, if I update a delta apply column, since the value changed, it will need to send over the value.
If I don't update the column, it does not change, it does not send the changed value, which is fine.
So, do we actually need REPLICA IDENTITY FULL if we only send over data when the column value is modified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scope of this task was to reimplement the feature in an extensible manner, not to revise its code. The initial developer might have reasons, and I just followed the common idea. At least it is a much safer way to manage development risks. We can optimise the code later if needed. Of course, in this case, we would need to invent an alternative way to send invalidation messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth avoiding sending unneeded column data (although perhaps we want to allow doing that in the future to be able to convert UPDATEs to INSERTs, but that is a separate topic).
Re invalidation, can it be done in spock_object_relabel?
Otherwise, if using REPLICA IDENTITY FULL to trigger it, although it is a kludge, what if we immediately followed it with an ALTER TABLE REPLICA IDENTITY DEFAULT? Or find some benign ALTER TABLE variant.
(Either way we will need to revisit this once we revisit actually supporting REPLICA IDENTITY FULL)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth avoiding sending unneeded column data
Currently, we have no 'unneeded' cases. In the async LR, no one knows which value to anticipate on the other side - especially in the case of conflict. So, right now, delta is calculated each time; accordingly, the old value must be included. The current proposal seeks to change the existing conflict-resolution logic. I have no opinion on that right now - I prefer to read the initial RFC or other specs in advance.
Anyway, as a rule of thumb, I do recommend dividing complex tasks into smaller ones rather than combining them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about REPLICA IDENTITY FULL, if we set that, will it internally do a scan instead of using the unique index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I resolved this case with the code and the comment:
if (entry->has_delta_apply)
{
/*
* It looks like a hack — which, in fact, it is.
* We assume that delta_apply may be used for the DEFAULT identity
* only and will be immediately removed after altering the table.
* Also, if an ERROR happens here we will stay with an inconsistent
* value of the relreplident field. But it is just a cache ...
*/
relinfo->ri_RelationDesc->rd_rel->relreplident = REPLICA_IDENTITY_DEFAULT;
entry->idxoid = RelationGetReplicaIndex(relinfo->ri_RelationDesc);
Assert(entry->idxoid != InvalidOid);
relinfo->ri_RelationDesc->rd_rel->relreplident = REPLICA_IDENTITY_FULL;
}
Is it unclear?
With this commit, we add the 'spock.delta_apply' routine, which adds or removes a security label on a specific table column. It is mostly useless for now. It demonstrates how to consistently manage security labels and survive ALTER/DROP DDL commands, including column type change. Trivial regression tests are added to fix the appropriate behaviour.
pg_seclabel is a pg_catalog table, and the core forbids LR on it. Hence, we need to allow this non-DDL utility command to be replicated. During creation/removal of such a label, warn if auto-ddl replication is disabled.
The general idea is to utilise the REPLICA_DENTITY_FULL status to force the backend to write the WAL record columns we need to calculate a delta tuple. Auto DDL replication transfers security labels (and replica identity changes) across all nodes involved. On the subscriber side, the SpockRelation cache contains a specific Oid list for each column, or an InvalidOid if the column doesn't need the delta_apply feature. This commit doesn't contain any tests, nor does it remove the previous 'core' approach; that should be done in the following commits. Words on upgrade: there is no easy way to upgrade from a patched version to the extensible one. So, at a glance, it seems we could just recommend that people remove the delta apply settings before the upgrade and add them after the upgrade using the new interface. NOTE: There is no good reason to deliver the whole tuple by the LR protocol to the subscriber. We can send only IDENTITY columns and columns needed for the delta_apply. It may be done by employing SpockRelayion cache inside the walsender. TODO: To stay safe, we should limit 'delta apply' types with increment, decrement and initialisation operators only. Also, we should decline nullable columns.
With this patch, we merely remove the core patch and the delta apply feature based on it. Little code readability has been made, just being in place.
…izen. There is a limitation in Spock: FULL replica identity relations can't be replicated. So, we need to add machinery to identify specific delta_apply cases; there, we will use the PK on the replica and shouldn't run into any issues. Add a trivial tap test demonstrating how delta_apply works. TODO: tests for corner cases are needed: NULL value, restrictions, or incorrect usage of delta_apply.
Having this allowed, we should resolve multiple cases with NULL op NULL and NULL op value that seem too expensive, keeping in mind that the database schema shouldn't let that in principle. This example also highlights that we don't yet support 2PC commit in our protocol. Here, it could do the stuff more consistently. TAP and regression tests were changed to fit the new behaviour.
No TAP tests yet. Just for the demo and check actions.