Skip to content

Conversation

@danolivo
Copy link
Contributor

No TAP tests yet. Just for the demo and check actions.

@danolivo danolivo requested a review from mason-sharp November 26, 2025 22:11
@danolivo danolivo self-assigned this Nov 26, 2025
@danolivo danolivo added the enhancement New feature or request label Nov 26, 2025
@danolivo danolivo force-pushed the spoc-342 branch 3 times, most recently from e173ee7 to fc31a21 Compare December 2, 2025 18:56
/* Additional cache, only valid as long as relation mapping is. */
bool hasTriggers;

Oid *delta_functions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should avoid changing variable names unless the original name is very confusing. It makes diffing different versions more difficult to sort through

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand what it means - it is a new variable, isn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked like delta_apply_functions -> delta_functions and has_delta_columns -> has_delta_apply

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we may maintain two approaches (core-based and extension-based), but they must be kept separate. So, their delta_apply structures can't intersect. That's why they have different names. If we decide to stick to the extension-based approach, one more commit might rename everything.

#define SPOCK_VERSION_NUM 60000

#define EXTENSION_NAME "spock"
#define spock_SECLABEL_PROVIDER "spock"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use capitalized SPOCK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may capitalise the define's name - I just followed the naming example (borrowed from pgactive) to make cross-code browsing easier. We definitely shouldn't capitalise the define's value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like they use pgative_ everywhere for their #defines, whereas we use SPOCK_ everywhere for our existing #defines. It would be more consistent with our existing code to use SPOCK_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, there are no objectives from my side - it's just too trivial stuff.

* refresh of the SpockRelation entry and guarantees actual state of the
* delta_apply columns.
*/
EXECUTE format('ALTER TABLE %I REPLICA IDENTITY FULL', rel);
Copy link
Member

@mason-sharp mason-sharp Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I comment out this stuff as well as the RAISE above, it seems to work ok.

That is, if I update a delta apply column, since the value changed, it will need to send over the value.

If I don't update the column, it does not change, it does not send the changed value, which is fine.

So, do we actually need REPLICA IDENTITY FULL if we only send over data when the column value is modified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scope of this task was to reimplement the feature in an extensible manner, not to revise its code. The initial developer might have reasons, and I just followed the common idea. At least it is a much safer way to manage development risks. We can optimise the code later if needed. Of course, in this case, we would need to invent an alternative way to send invalidation messages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth avoiding sending unneeded column data (although perhaps we want to allow doing that in the future to be able to convert UPDATEs to INSERTs, but that is a separate topic).

Re invalidation, can it be done in spock_object_relabel?

Otherwise, if using REPLICA IDENTITY FULL to trigger it, although it is a kludge, what if we immediately followed it with an ALTER TABLE REPLICA IDENTITY DEFAULT? Or find some benign ALTER TABLE variant.

(Either way we will need to revisit this once we revisit actually supporting REPLICA IDENTITY FULL)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth avoiding sending unneeded column data

Currently, we have no 'unneeded' cases. In the async LR, no one knows which value to anticipate on the other side - especially in the case of conflict. So, right now, delta is calculated each time; accordingly, the old value must be included. The current proposal seeks to change the existing conflict-resolution logic. I have no opinion on that right now - I prefer to read the initial RFC or other specs in advance.

Anyway, as a rule of thumb, I do recommend dividing complex tasks into smaller ones rather than combining them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about REPLICA IDENTITY FULL, if we set that, will it internally do a scan instead of using the unique index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I resolved this case with the code and the comment:

		if (entry->has_delta_apply)
		{
			/*
			 * It looks like a hack — which, in fact, it is.
			 * We assume that delta_apply may be used for the DEFAULT identity
			 * only and will be immediately removed after altering the table.
			 * Also, if an ERROR happens here we will stay with an inconsistent
			 * value of the relreplident field. But it is just a cache ...
			 */
			relinfo->ri_RelationDesc->rd_rel->relreplident = REPLICA_IDENTITY_DEFAULT;
			entry->idxoid = RelationGetReplicaIndex(relinfo->ri_RelationDesc);
			Assert(entry->idxoid != InvalidOid);
			relinfo->ri_RelationDesc->rd_rel->relreplident = REPLICA_IDENTITY_FULL;
		}

Is it unclear?

With this commit, we add the 'spock.delta_apply' routine, which adds or removes
a security label on a specific table column.
It is mostly useless for now. It demonstrates how to consistently manage
security labels and survive ALTER/DROP DDL commands, including column
type change.
Trivial regression tests are added to fix the appropriate behaviour.
pg_seclabel is a pg_catalog table, and the core forbids LR on it.
Hence, we need to allow this non-DDL utility command to be replicated.
During creation/removal of such a label, warn if auto-ddl replication
is disabled.
The general idea is to utilise the REPLICA_DENTITY_FULL status to force
the backend to write the WAL record columns we need to calculate a delta tuple.
Auto DDL replication transfers security labels (and replica identity changes)
across all nodes involved. On the subscriber side, the SpockRelation cache
contains a specific Oid list for each column, or an InvalidOid if the column
doesn't need the delta_apply feature.

This commit doesn't contain any tests, nor does it remove the previous 'core'
approach; that should be done in the following commits.

Words on upgrade: there is no easy way to upgrade from a patched version to
the extensible one. So, at a glance, it seems we could just recommend that
people remove the delta apply settings before the upgrade and add them after
the upgrade using the new interface.

NOTE:
There is no good reason to deliver the whole tuple by the LR protocol to
the subscriber. We can send only IDENTITY columns and columns needed for
the delta_apply. It may be done by employing SpockRelayion cache inside
the walsender.

TODO:
To stay safe, we should limit 'delta apply' types with increment, decrement
and initialisation operators only. Also, we should decline nullable columns.
With this patch, we merely remove the core patch and the delta apply feature
based on it. Little code readability has been made, just being in place.
…izen.

There is a limitation in Spock: FULL replica identity relations can't
be replicated. So, we need to add machinery to identify specific delta_apply
cases; there, we will use the PK on the replica and shouldn't run into any
issues.
Add a trivial tap test demonstrating how delta_apply works.

TODO: tests for corner cases are needed: NULL value, restrictions, or incorrect
usage of delta_apply.
Having this allowed, we should resolve multiple cases with NULL op NULL and
NULL op value that seem too expensive, keeping in mind that the database schema
shouldn't let that in principle.

This example also highlights that we don't yet support 2PC commit in our
protocol. Here, it could do the stuff more consistently.

TAP and regression tests were changed to fit the new behaviour.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants