Skip to content

Commit 43ac136

Browse files
authored
Implement target distance metrics (#230)
* refactor: Update hashing logic to use TargetHash and TargetDigest The new fields in TargetHash and TargetDigest are to track the direct srcs/attrs hash separately from the overall hash. Currently, the directHash is not used at all. * Update RuleHasher to track separate digests * Add ability to compute target distance metrics * Add ability to dump dependency edges to json file. * Update get-impacted-targets Add e2e test for dump distances TODO: It would be nice if there was a better e2e test here -- Ask Maxwell how he generates the integration workspace data. * Update documentation * Move target type filtering to happen after impacted targets are computed * Address review feedback 1. Output detailed hash data as json 2. Use a different serialization format for targethashes to avoid ambiguity * Fix typo in variable name * Add test workspace and e2e test * remove printlns * Missing bazelignore * Remove e2e test that is duplicated by other one * Update BUILD
1 parent 3fffd5f commit 43ac136

33 files changed

+1003
-121
lines changed

.bazelignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
cli/src/test/resources/workspaces

README.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,46 @@ Open `bazel-diff-example.sh` to see how this is implemented. This is purely an e
6161

6262
* We run `bazel-diff` on the starting and final JSON hash filepaths to get our impacted set of targets. This impacted set of targets is written to a file.
6363

64+
## Build Graph Distance Metrics
65+
66+
`bazel-diff` can optionally compute build graph distance metrics between two revisions. This is
67+
useful for understanding the impact of a change on the build graph. Directly impacted targets are
68+
targets that have had their rule attributes or source file dependencies changed. Indirectly impacted
69+
targets are that are impacted only due to a change in one of their target dependencies.
70+
71+
For each target, the following metrics are computed:
72+
73+
* `target_distance`: The number of dependency hops that it takes to get from an impacted target to a directly impacted target.
74+
* `package_distance`: The number of dependency hops that cross a package boundary to get from an impacted target to a directly impacted target.
75+
76+
Build graph distance metrics can be used by downstream tools to power features such as:
77+
78+
* Only running sanitizers on impacted tests that are in the same package as a directly impacted target.
79+
* Only running large-sized tests that are within a few package hops of a directly impacted target.
80+
* Only running computationally expensive jobs when an impacted target is within a certain distance of a directly impacted target.
81+
82+
To enable this feature, you must generate a dependency mapping on your final revision when computing hashes, then pass it into the `get-impacted-targets` command.
83+
84+
```bash
85+
git checkout BASE_REV
86+
bazel-diff generate-hashes [...]
87+
88+
git checkout FINAL_REV
89+
bazel-diff generate-hashes --depsFile deps.json [...]
90+
91+
bazel-diff get-impacted-targets --depsFile deps.json [...]
92+
```
93+
94+
This will produce an impacted targets json list with target label, target distance, and package distance:
95+
96+
```text
97+
[
98+
{"label": "//foo:bar", "targetDistance": 0, "packageDistance": 0},
99+
{"label": "//foo:baz", "targetDistance": 1, "packageDistance": 0},
100+
{"label": "//bar:qux", "targetDistance": 1, "packageDistance": 1}
101+
]
102+
```
103+
64104
## CLI Interface
65105

66106
`bazel-diff` Command
@@ -355,6 +395,13 @@ Now you can simply run `bazel-diff` from your project:
355395
bazel run @bazel_diff//:bazel-diff -- bazel-diff -h
356396
```
357397

398+
## Learn More
399+
400+
Take a look at the following bazelcon talks to learn more about `bazel-diff`:
401+
402+
* [BazelCon 2023: Improving CI efficiency with Bazel querying and bazel-diff](https://www.youtube.com/watch?v=QYAbmE_1fSo)
403+
* BazelCon 2024: Not Going the Distance: Filtering Tests by Build Graph Distance: Coming Soon
404+
358405
## Running the tests
359406

360407
To run the tests simply run

cli/BUILD

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,14 @@ kt_jvm_test(
5151
runtime_deps = [":cli-test-lib"],
5252
)
5353

54+
kt_jvm_test(
55+
name = "TargetHashTest",
56+
jvm_flags = ["-Djava.security.manager=allow"],
57+
test_class = "com.bazel_diff.hash.TargetHashTest",
58+
runtime_deps = [":cli-test-lib"],
59+
)
60+
61+
5462
kt_jvm_test(
5563
name = "SourceFileHasherTest",
5664
data = [
@@ -101,6 +109,7 @@ kt_jvm_test(
101109
jvm_flags = ["-Djava.security.manager=allow"],
102110
test_class = "com.bazel_diff.e2e.E2ETest",
103111
runtime_deps = [":cli-test-lib"],
112+
data = [":workspaces"],
104113
)
105114

106115
kt_jvm_test(
@@ -130,3 +139,10 @@ kt_jvm_library(
130139
"@bazel_diff_maven//:org_mockito_kotlin_mockito_kotlin",
131140
],
132141
)
142+
143+
filegroup(
144+
name = "workspaces",
145+
srcs = [
146+
"src/test/resources/workspaces",
147+
],
148+
)

cli/src/main/kotlin/com/bazel_diff/cli/GenerateHashesCommand.kt

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,14 @@ class GenerateHashesCommand : Callable<Int> {
136136
)
137137
var ignoredRuleHashingAttributes: Set<String> = emptySet()
138138

139+
@CommandLine.Option(
140+
names = ["-d", "--depEdgesFile"],
141+
description = ["Path to the file where dependency edges are written to. If not specified, the dependency edges will not be written to a file. Needed for computing build graph distance metrics. See bazel-diff docs for more details about build graph distance metrics."],
142+
scope = CommandLine.ScopeType.INHERIT,
143+
defaultValue = CommandLine.Parameters.NULL_VALUE
144+
)
145+
var depsMappingJSONPath: File? = null
146+
139147
@CommandLine.Option(
140148
names = ["-m", "--modified-filepaths"],
141149
description = ["Experimental: A text file containing a newline separated list of filepaths (relative to the workspace) these filepaths should represent the modified files between the specified revisions and will be used to scope what files are hashed during hash generation."]
@@ -159,14 +167,15 @@ class GenerateHashesCommand : Callable<Int> {
159167
cqueryCommandOptions,
160168
useCquery,
161169
keepGoing,
170+
depsMappingJSONPath != null,
162171
fineGrainedHashExternalRepos,
163172
),
164173
loggingModule(parent.verbose),
165174
serialisationModule(),
166175
)
167176
}
168177

169-
return when (GenerateHashesInteractor().execute(seedFilepaths, outputPath, ignoredRuleHashingAttributes, targetType, includeTargetType, modifiedFilepaths)) {
178+
return when (GenerateHashesInteractor().execute(seedFilepaths, outputPath, depsMappingJSONPath, ignoredRuleHashingAttributes, targetType, includeTargetType, modifiedFilepaths)) {
170179
true -> CommandLine.ExitCode.OK
171180
false -> CommandLine.ExitCode.SOFTWARE
172181
}.also { stopKoin() }

cli/src/main/kotlin/com/bazel_diff/cli/GetImpactedTargetsCommand.kt

Lines changed: 27 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import com.bazel_diff.di.loggingModule
44
import com.bazel_diff.di.serialisationModule
55
import com.bazel_diff.interactor.CalculateImpactedTargetsInteractor
66
import com.bazel_diff.interactor.DeserialiseHashesInteractor
7+
import com.bazel_diff.interactor.TargetTypeFilter
78
import org.koin.core.context.startKoin
89
import org.koin.core.context.stopKoin
910
import picocli.CommandLine
@@ -38,6 +39,14 @@ class GetImpactedTargetsCommand : Callable<Int> {
3839
)
3940
lateinit var finalHashesJSONPath: File
4041

42+
@CommandLine.Option(
43+
names = ["-d", "--depEdgesFile"],
44+
description = ["Path to the file where dependency edges are. If specified, build graph distance metrics will be computed from the given hash data."],
45+
scope = CommandLine.ScopeType.INHERIT,
46+
defaultValue = CommandLine.Parameters.NULL_VALUE
47+
)
48+
var depsMappingJSONPath: File? = null
49+
4150
@CommandLine.Option(
4251
names = ["-tt", "--targetType"],
4352
split = ",",
@@ -49,7 +58,7 @@ class GetImpactedTargetsCommand : Callable<Int> {
4958
@CommandLine.Option(
5059
names = ["-o", "--output"],
5160
scope = CommandLine.ScopeType.LOCAL,
52-
description = ["Filepath to write the impacted Bazel targets to, newline separated. If not specified, the targets will be written to STDOUT."],
61+
description = ["Filepath to write the impacted Bazel targets to. If using depEdgesFile: formatted in json, otherwise: newline separated. If not specified, the output will be written to STDOUT."],
5362
)
5463
var outputPath: File? = null
5564

@@ -66,21 +75,20 @@ class GetImpactedTargetsCommand : Callable<Int> {
6675

6776
validate()
6877
val deserialiser = DeserialiseHashesInteractor()
69-
val from = deserialiser.execute(startingHashesJSONPath, targetType)
70-
val to = deserialiser.execute(finalHashesJSONPath, targetType)
78+
val from = deserialiser.executeTargetHash(startingHashesJSONPath)
79+
val to = deserialiser.executeTargetHash(finalHashesJSONPath)
7180

72-
val impactedTargets = CalculateImpactedTargetsInteractor().execute(from, to)
73-
74-
return try {
75-
BufferedWriter(when (val path=outputPath) {
81+
val outputWriter = BufferedWriter(when (val path = outputPath) {
7682
null -> FileWriter(FileDescriptor.out)
7783
else -> FileWriter(path)
78-
}).use { writer ->
79-
impactedTargets.forEach {
80-
writer.write(it)
81-
//Should not depend on OS
82-
writer.write("\n")
83-
}
84+
})
85+
86+
return try {
87+
if (depsMappingJSONPath != null) {
88+
val depsMapping = deserialiser.deserializeDeps(depsMappingJSONPath!!)
89+
CalculateImpactedTargetsInteractor().executeWithDistances(from, to, depsMapping, outputWriter, targetType)
90+
} else {
91+
CalculateImpactedTargetsInteractor().execute(from, to, outputWriter, targetType)
8492
}
8593
CommandLine.ExitCode.OK
8694
} catch (e: IOException) {
@@ -101,5 +109,11 @@ class GetImpactedTargetsCommand : Callable<Int> {
101109
"Incorrect final hashes: file doesn't exist or can't be read."
102110
)
103111
}
112+
if (depsMappingJSONPath != null && !depsMappingJSONPath!!.canRead()) {
113+
throw CommandLine.ParameterException(
114+
spec.commandLine(),
115+
"Incorrect dep edges file: file doesn't exist or can't be read."
116+
)
117+
}
104118
}
105119
}

cli/src/main/kotlin/com/bazel_diff/di/Modules.kt

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ fun hasherModule(
2828
cqueryOptions: List<String>,
2929
useCquery: Boolean,
3030
keepGoing: Boolean,
31+
trackDeps: Boolean,
3132
fineGrainedHashExternalRepos: Set<String>,
3233
): Module = module {
3334
val cmd: MutableList<String> = ArrayList<String>().apply {
@@ -61,8 +62,8 @@ fun hasherModule(
6162
single { BazelClient(useCquery, fineGrainedHashExternalRepos) }
6263
single { BuildGraphHasher(get()) }
6364
single { TargetHasher() }
64-
single { RuleHasher(useCquery, fineGrainedHashExternalRepos) }
65-
single { SourceFileHasher(fineGrainedHashExternalRepos) }
65+
single { RuleHasher(useCquery, trackDeps, fineGrainedHashExternalRepos) }
66+
single<SourceFileHasher> { SourceFileHasherImpl(fineGrainedHashExternalRepos) }
6667
single { ExternalRepoResolver(workingDirectory, bazelPath, outputPath) }
6768
single(named("working-directory")) { workingDirectory }
6869
single(named("output-base")) { outputPath }

cli/src/main/kotlin/com/bazel_diff/hash/BuildGraphHasher.kt

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ class BuildGraphHasher(private val bazelClient: BazelClient) : KoinComponent {
9999
ignoredAttrs: Set<String>,
100100
modifiedFilepaths: Set<Path>
101101
): Map<String, TargetHash> {
102-
val ruleHashes: ConcurrentMap<String, ByteArray> = ConcurrentHashMap()
102+
val ruleHashes: ConcurrentMap<String, TargetDigest> = ConcurrentHashMap()
103103
val targetToRule: MutableMap<String, BazelRule> = HashMap()
104104
traverseGraph(allTargets, targetToRule)
105105

@@ -114,7 +114,12 @@ class BuildGraphHasher(private val bazelClient: BazelClient) : KoinComponent {
114114
ignoredAttrs,
115115
modifiedFilepaths
116116
)
117-
Pair(target.name, TargetHash(target.javaClass.name.substringAfterLast('$'), targetDigest.toHexString()))
117+
Pair(target.name, TargetHash(
118+
target.javaClass.name.substringAfterLast('$'),
119+
targetDigest.overallDigest.toHexString(),
120+
targetDigest.directDigest.toHexString(),
121+
targetDigest.deps,
122+
))
118123
}
119124
.filter { targetEntry: Pair<String, TargetHash>? -> targetEntry != null }
120125
.collect(

cli/src/main/kotlin/com/bazel_diff/hash/RuleHasher.kt

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ import org.koin.core.component.inject
99
import java.util.concurrent.ConcurrentMap
1010
import java.nio.file.Path
1111

12-
class RuleHasher(private val useCquery: Boolean, private val fineGrainedHashExternalRepos: Set<String>) : KoinComponent {
12+
class RuleHasher(private val useCquery: Boolean, private val trackDepLabels: Boolean, private val fineGrainedHashExternalRepos: Set<String>) : KoinComponent {
1313
private val logger: Logger by inject()
1414
private val sourceFileHasher: SourceFileHasher by inject()
1515

@@ -28,31 +28,31 @@ class RuleHasher(private val useCquery: Boolean, private val fineGrainedHashExte
2828
fun digest(
2929
rule: BazelRule,
3030
allRulesMap: Map<String, BazelRule>,
31-
ruleHashes: ConcurrentMap<String, ByteArray>,
31+
ruleHashes: ConcurrentMap<String, TargetDigest>,
3232
sourceDigests: ConcurrentMap<String, ByteArray>,
3333
seedHash: ByteArray?,
3434
depPath: LinkedHashSet<String>?,
3535
ignoredAttrs: Set<String>,
3636
modifiedFilepaths: Set<Path>
37-
): ByteArray {
37+
): TargetDigest {
3838
val depPathClone = if (depPath != null) LinkedHashSet(depPath) else LinkedHashSet()
3939
if (depPathClone.contains(rule.name)) {
4040
throw raiseCircularDependency(depPathClone, rule.name)
4141
}
4242
depPathClone.add(rule.name)
4343
ruleHashes[rule.name]?.let { return it }
4444

45-
val finalHashValue = sha256 {
46-
safePutBytes(rule.digest(ignoredAttrs))
47-
safePutBytes(seedHash)
45+
val finalHashValue = targetSha256(trackDepLabels) {
46+
putDirectBytes(rule.digest(ignoredAttrs))
47+
putDirectBytes(seedHash)
4848

4949
for (ruleInput in rule.ruleInputList(useCquery, fineGrainedHashExternalRepos)) {
50-
safePutBytes(ruleInput.toByteArray())
50+
putDirectBytes(ruleInput.toByteArray())
5151

5252
val inputRule = allRulesMap[ruleInput]
5353
when {
5454
inputRule == null && sourceDigests.containsKey(ruleInput) -> {
55-
safePutBytes(sourceDigests[ruleInput])
55+
putDirectBytes(sourceDigests[ruleInput])
5656
}
5757

5858
inputRule?.name != null && inputRule.name != rule.name -> {
@@ -66,7 +66,7 @@ class RuleHasher(private val useCquery: Boolean, private val fineGrainedHashExte
6666
ignoredAttrs,
6767
modifiedFilepaths
6868
)
69-
safePutBytes(ruleInputHash)
69+
putTransitiveBytes(ruleInput, ruleInputHash.overallDigest)
7070
}
7171

7272
else -> {
@@ -75,7 +75,7 @@ class RuleHasher(private val useCquery: Boolean, private val fineGrainedHashExte
7575
heuristicDigest != null -> {
7676
logger.i { "Source file $ruleInput picked up as an input for rule ${rule.name}" }
7777
sourceDigests[ruleInput] = heuristicDigest
78-
safePutBytes(heuristicDigest)
78+
putDirectBytes(heuristicDigest)
7979
}
8080

8181
else -> logger.w { "Unable to calculate digest for input $ruleInput for rule ${rule.name}" }

cli/src/main/kotlin/com/bazel_diff/hash/SourceFileHasher.kt

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,12 @@ import org.koin.core.qualifier.named
99
import java.nio.file.Path
1010
import java.nio.file.Paths
1111

12-
class SourceFileHasher : KoinComponent {
12+
interface SourceFileHasher {
13+
fun digest(sourceFileTarget: BazelSourceFileTarget, modifiedFilepaths: Set<Path> = emptySet()): ByteArray
14+
fun softDigest(sourceFileTarget: BazelSourceFileTarget, modifiedFilepaths: Set<Path> = emptySet()): ByteArray?
15+
}
16+
17+
class SourceFileHasherImpl : KoinComponent, SourceFileHasher {
1318
private val workingDirectory: Path
1419
private val logger: Logger
1520
private val relativeFilenameToContentHash: Map<String, String>?
@@ -38,9 +43,9 @@ class SourceFileHasher : KoinComponent {
3843
this.externalRepoResolver = externalRepoResolver
3944
}
4045

41-
fun digest(
46+
override fun digest(
4247
sourceFileTarget: BazelSourceFileTarget,
43-
modifiedFilepaths: Set<Path> = emptySet()
48+
modifiedFilepaths: Set<Path>
4449
): ByteArray {
4550
return sha256 {
4651
val name = sourceFileTarget.name
@@ -94,7 +99,7 @@ class SourceFileHasher : KoinComponent {
9499
}
95100
}
96101

97-
fun softDigest(sourceFileTarget: BazelSourceFileTarget, modifiedFilepaths: Set<Path> = emptySet()): ByteArray? {
102+
override fun softDigest(sourceFileTarget: BazelSourceFileTarget, modifiedFilepaths: Set<Path>): ByteArray? {
98103
val name = sourceFileTarget.name
99104
val index = isMainRepo(name)
100105
if (index == -1) return null

0 commit comments

Comments
 (0)