Flink: Dynamic Sink: Add support for dropping columns #14728

mxm · 2025-12-01T13:53:56Z

This commit adds support for strict 1:1 schema synchronization by allowing columns to be dropped from the table when they are not present in the input schema. This is controlled via a new dropUnusedColumns parameter in DynamicIcebergSink.

The default behavior (dropUnusedColumns=false) remains unchanged.

pvary · 2025-12-02T10:02:13Z

Before I go into the details of the PR, could you please help me understand what to expect in the interim period, when the sink receives records with and without the dropped column?

mxm · 2025-12-02T10:07:57Z

Before I go into the details of the PR, could you please help me understand what to expect in the interim period, when the sink receives records with and without the dropped column?

New records will write with the new schema with any columns not part of the input schema dropped. Old records will continue to write with the old schema, which still exists. If there are any previously unseen schemas which include removed columns, those columns will be re-added as new columns. This is a catch which users will have to accept. That's why the feature is opt-in and disabled by default.

pvary · 2025-12-02T12:12:23Z

Essentially, there’s a race condition between adding and dropping columns. For example, if a user does the following:

Creates a new schema S1
Sends a record R1 using S1
Creates a new schema S2
Sends a record R2 using S2

If these actions occur within a short time frame and the streams are skewed, the table could end up with either:

Schema S2, if R2 arrives later
Schema S1, if R1 arrives later

Afterward, querying the table with the “old” schema becomes difficult.

Additionally, users cannot revert the table to any previously created schema using DynamicSink. This behavior is consistent with the current implementation, but with column-dropping support, users might expect this capability.

@Guosmilesmile: Would these restrictions impact your use cases?

mxm · 2025-12-02T12:57:19Z

IMHO this is fine if the user opts in. We deliberately chose not to allow dropping columns because of this race condition; it's important that this remains the default setting. I agree that we should add more documentation around the semantics of removing columns.

Guosmilesmile · 2025-12-02T14:33:13Z

For my scenario, because there are many downstream consumers, the impact of removing fields is often unknown and hard to assess, so removals are rare — fields are basically deprecated instead.
Since this feature is optional, it's acceptable to me; whether to use it should be left to users, but the limitations must be clearly documented. This is my personal opinion.

pvary · 2025-12-02T14:48:33Z

Cool! Seems like an ok feature.
Let's move forward with the review then.

flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/CompareSchemasVisitor.java

flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/EvolveSchemaVisitor.java

flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/TableMetadataCache.java

flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/CompareSchemasVisitor.java

flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/TableUpdater.java

mxm · 2025-12-05T11:33:01Z

Rebased to fix merge conflicts in CI after #14406.

This commit adds support for strict 1:1 schema synchronization by allowing columns to be dropped from the table when they are not present in the input schema. This is controlled via a new dropUnusedColumns parameter in DynamicIcebergSink. The default behavior (dropUnusedColumns=false) remains unchanged.

.../v2.1/flink/src/test/java/org/apache/iceberg/flink/sink/dynamic/TestEvolveSchemaVisitor.java

pvary · 2025-12-05T12:19:41Z

flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/EvolveSchemaVisitor.java

 * We don't support:
 *
 * <ul>
- *   <li>Dropping columns
 *   <li>Renaming columns
 * </ul>


Nit: remove the list, as it is not needed anymore 😄

If you don't mind, I'll keep this list with one element 😄

pvary · 2025-12-05T12:22:47Z

Please update the documentation, and highlight the caveats, like what happens when the column is added back with the same name, and what happens when multiple schema changes happen simultaneously

github-actions bot added the flink label Dec 1, 2025

mxm force-pushed the drop-columns branch from c4d8579 to 5ec5c0d Compare December 2, 2025 09:55

pvary reviewed Dec 2, 2025

View reviewed changes

flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/CompareSchemasVisitor.java Outdated Show resolved Hide resolved

pvary reviewed Dec 2, 2025

View reviewed changes

flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/EvolveSchemaVisitor.java Outdated Show resolved Hide resolved

pvary reviewed Dec 2, 2025

View reviewed changes

flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/TableMetadataCache.java Outdated Show resolved Hide resolved

pvary reviewed Dec 2, 2025

View reviewed changes

flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/CompareSchemasVisitor.java Outdated Show resolved Hide resolved

pvary reviewed Dec 5, 2025

View reviewed changes

flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/TableUpdater.java Outdated Show resolved Hide resolved

mxm force-pushed the drop-columns branch from b0682bb to 7c1cf8d Compare December 5, 2025 11:32

mxm added 5 commits December 5, 2025 12:51

fixup! Fix detection logic for unused fields

f72625c

fixup! Address further comments

e0f3cdf

fixup! Fix tests in TestTableUpdater and add additional test

0e125a1

fixup! Revert change in TableUpdater

959fc63

mxm force-pushed the drop-columns branch from 7c1cf8d to 959fc63 Compare December 5, 2025 11:51

pvary reviewed Dec 5, 2025

View reviewed changes

.../v2.1/flink/src/test/java/org/apache/iceberg/flink/sink/dynamic/TestEvolveSchemaVisitor.java Show resolved Hide resolved

pvary reviewed Dec 5, 2025

View reviewed changes

mxm added 2 commits December 5, 2025 14:11

Improve tests (mix top-level and nested fields)

4bbc394

docs

b96421c

github-actions bot added the docs label Dec 5, 2025

Flink: Dynamic Sink: Add support for dropping columns #14728

Are you sure you want to change the base?

Flink: Dynamic Sink: Add support for dropping columns #14728

Conversation

mxm commented Dec 1, 2025

Uh oh!

pvary commented Dec 2, 2025

Uh oh!

mxm commented Dec 2, 2025

Uh oh!

pvary commented Dec 2, 2025

Uh oh!

mxm commented Dec 2, 2025

Uh oh!

Guosmilesmile commented Dec 2, 2025

Uh oh!

pvary commented Dec 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mxm commented Dec 5, 2025

Uh oh!

Uh oh!

pvary Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

mxm Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pvary commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mxm Dec 5, 2025 •

edited

Loading