Skip to content

Conversation

@krconv
Copy link

@krconv krconv commented Nov 24, 2025

The HFiles generated by incremental backups cannot be properly read by tooling such as the ClientSideRequestScanner, because the generated HFiles do not include the MAX_SEQ_ID metadata. The scanner will ignore cell-level sequence IDs and instead sort the HFiles arbitrarily. This causes incorrect results when scanning overwrites to cells with the same timestamp.

This PR adds a new option to the HFileOutputFormat2 that will calculate and set the required metadata. This only really effects the ClientSideRequestScanner, as the sequence ID will be recalculated when bulk-loaded anyways.

Part of https://issues.apache.org/jira/browse/HBASE-29716

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@krconv krconv force-pushed the HBASE-29716-set-sequence-id-option branch from 7317c4d to 94750f1 Compare November 25, 2025 02:10
@krconv krconv force-pushed the HBASE-29716-set-sequence-id-option branch from 94750f1 to 45234b1 Compare November 25, 2025 02:13
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

}

private void close(final StoreFileWriter w) throws IOException {
private void close(final StoreFileWriter w, final WriterInfo wl) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing, mind changing wl to wi here to match the rest of the patch?

wl.written += length;
wi.writer.append((ExtendedCell) kv);
wi.written += length;
wi.maxSequenceId = Math.max(kv.getSequenceId(), wi.maxSequenceId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any concerns that Cell#getSequenceId is removed in HBase 3? Any plans for how we should handle that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, as long as this is an ExtendedCell looks like this should be possible in branch-3

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@krconv krconv force-pushed the HBASE-29716-set-sequence-id-option branch from d440969 to e4921a1 Compare December 2, 2025 12:18
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 34s Maven dependency ordering for branch
+1 💚 mvninstall 3m 32s master passed
+1 💚 compile 1m 12s master passed
+1 💚 checkstyle 0m 28s master passed
+1 💚 spotbugs 1m 3s master passed
+1 💚 spotless 0m 49s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
+1 💚 mvninstall 3m 12s the patch passed
+1 💚 compile 1m 7s the patch passed
+1 💚 javac 1m 7s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 25s the patch passed
+1 💚 spotbugs 1m 13s the patch passed
+1 💚 hadoopcheck 12m 15s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚 spotless 0m 46s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 17s The patch does not generate ASF License warnings.
35m 40s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7480/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7480
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 6707fe2a3565 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e4921a1
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-mapreduce hbase-backup U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7480/4/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for branch
+1 💚 mvninstall 3m 30s master passed
+1 💚 compile 0m 41s master passed
+1 💚 javadoc 0m 28s master passed
+1 💚 shadedjars 6m 15s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for patch
+1 💚 mvninstall 3m 8s the patch passed
+1 💚 compile 0m 40s the patch passed
+1 💚 javac 0m 40s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 shadedjars 6m 10s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 17m 43s hbase-mapreduce in the patch passed.
+1 💚 unit 10m 16s hbase-backup in the patch passed.
52m 6s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7480/4/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7480
Optional Tests javac javadoc unit compile shadedjars
uname Linux 2df0d15dc25e 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / e4921a1
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7480/4/testReport/
Max. process+thread count 3131 (vs. ulimit of 30000)
modules C: hbase-mapreduce hbase-backup U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7480/4/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants