why touching a file does not cause Bazel to rebuild my project?

Observation

Recently I have been playing with Bazel as it seems more and more projects are moved to Bazel.

While playing with the examples/cpp/helloworld that comes with the source code, I noticed my usual trick of “touching a file” does not cause it to recompile.

Suspecting this might be due to a bug rather than a feature, I dug into the source code and it turns out Bazel actually has a sophisticated way to reduce unnecessary compiling to the minimum, leveraging on things called “ActionCache” and file digest such as SHA256.

I will try to reveal some details in the rest of the post.

First to show my observations of touching hello-world.cc and build examples/cpp:hello-world


$bazel build examples/cpp:hello-world
INFO: Analyzed target //examples/cpp:hello-world (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples/cpp:hello-world up-to-date:
  bazel-bin/examples/cpp/hello-world
INFO: Elapsed time: 0.061s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action

$touch examples/cpp/hello-world.cc 

$bazel build examples/cpp:hello-world 
INFO: Analyzed target //examples/cpp:hello-world (0 packages loaded, 0 targets configured). 
INFO: Found 1 target... 
Target //examples/cpp:hello-world up-to-date: 
  bazel-bin/examples/cpp/hello-world 
INFO: Elapsed time: 0.050s, Critical Path: 0.00s 
INFO: 0 processes. 
INFO: Build completed successfully, 1 total action

I can go even further by opening the file with vi, force saving it without changing anything and see how Bazel behaves.
The reason I use vi is vi will usually do some trick like “creating a temp file then rename it”, hence it goes more aggressive than mere touch, and this aggressiveness can be seen in “stat” command where you will see inode is changed.

if you however write a small program to simply open a file, change a char in place, and save it, it will not actually change inode, let me show it with the example.


$stat examples/cpp/hello-world.cc
  File: ‘examples/cpp/hello-world.cc’
  Size: 747             Blocks: 8          IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 50383878    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  codywu)   Gid: ( 1000/  codywu)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2020-01-05 14:35:52.179841956 -0500
Modify: 2020-01-05 14:35:52.179841956 -0500
Change: 2020-01-05 14:35:52.181841946 -0500
 Birth: -

$touch examples/cpp/hello-world.cc

$stat examples/cpp/hello-world.cc
  File: ‘examples/cpp/hello-world.cc’
  Size: 747             Blocks: 8          IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 50383878    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  codywu)   Gid: ( 1000/  codywu)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2020-01-05 14:38:07.496045397 -0500
Modify: 2020-01-05 14:38:07.496045397 -0500
Change: 2020-01-05 14:38:07.496045397 -0500
 Birth: -

$vim examples/cpp/hello-world.cc

$stat examples/cpp/hello-world.cc
  File: ‘examples/cpp/hello-world.cc’
  Size: 747             Blocks: 8          IO Block: 4096   regular file
Device: fd03h/64771d    Inode: 13229799    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  codywu)   Gid: ( 1000/  codywu)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2020-01-05 14:38:14.179003529 -0500
Modify: 2020-01-05 14:38:14.179003529 -0500
Change: 2020-01-05 14:38:14.180003523 -0500
 Birth: -

and not surprisingly that does not help either. (note I did not change any content)


$bazel build examples/cpp:hello-world
INFO: Analyzed target //examples/cpp:hello-world (1 packages loaded, 5 targets configured).
INFO: Found 1 target...
Target //examples/cpp:hello-world up-to-date:
  bazel-bin/examples/cpp/hello-world
INFO: Elapsed time: 0.069s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action

Now let’s change the file for real and I am going to change the “world” to “world!” and this time bazel builds it for real.


$bazel-bin/examples/cpp/hello-world
Hello world

$vim examples/cpp/hello-world.cc

$bazel build examples/cpp:hello-world
INFO: Analyzed target //examples/cpp:hello-world (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples/cpp:hello-world up-to-date:
  bazel-bin/examples/cpp/hello-world
INFO: Elapsed time: 0.255s, Critical Path: 0.21s
INFO: 2 processes: 2 processwrapper-sandbox.
INFO: Build completed successfully, 3 total actions

$bazel-bin/examples/cpp/hello-world
Hello world!

note in the above the output it no longer “1 total actions” but “3 total actions” and 2 other actions are “compiling” and “linking” the updated hello-world.cc.

If you run bazel with “bazel build -s examples/cpp:hello-world” you will actually see “gcc” showing up on the command line.

So what happened here? Bazel seems to be pretty smart in this case to figure out when “gcc” is actually needed.

Let’s go in and dive into the details.

First Level Dirtiness Check

In the first round, Bazel will walk through the dependent node map trying to find all the dependencies based on the target specified on the command line, which in this case, is the target //examples/cpp:hello-world.

After all the dependencies are identified, Bazel will run Dirtiness Check and find all the Dirty nodes.

After it finds the dirty nodes it will then evaluate all those dirty nodes and come up with the actions that needs to be performed.

Below is the relevant code that deals with changed files


 531   private void handleChangedFiles(
 532       Collection diffPackageRootsUnderWhichToCheck,
 533       Diff diff,
 534       boolean managedDirectoriesChanged) {
 535     int numWithoutNewValues = diff.changedKeysWithoutNewValues().size();
 536     Iterable keysToBeChangedLaterInThisBuild = diff.changedKeysWithoutNewValues();
 537     Map changedKeysWithNewValues = diff.changedKeysWithNewValues();
 538
 539     // If managed directories settings changed, do not inject any new values, just invalidate
 540     // keys of the changed values. {@link #invalidateCachedWorkspacePathsStates()}
 541     if (managedDirectoriesChanged) {
 542       numWithoutNewValues += changedKeysWithNewValues.size();
 543       keysToBeChangedLaterInThisBuild =
 544           Iterables.concat(keysToBeChangedLaterInThisBuild, changedKeysWithNewValues.keySet());
 545       changedKeysWithNewValues = ImmutableMap.of();
 546     }
 547
 548     logDiffInfo(
 549         diffPackageRootsUnderWhichToCheck,
 550         keysToBeChangedLaterInThisBuild,
 551         numWithoutNewValues,
 552         changedKeysWithNewValues);
 553
 554     recordingDiffer.invalidate(keysToBeChangedLaterInThisBuild);
 555     recordingDiffer.inject(changedKeysWithNewValues);
 556     modifiedFiles += getNumberOfModifiedFiles(keysToBeChangedLaterInThisBuild);
 557     modifiedFiles += getNumberOfModifiedFiles(changedKeysWithNewValues.keySet());
 558     incrementalBuildMonitor.accrue(keysToBeChangedLaterInThisBuild);
 559     incrementalBuildMonitor.accrue(changedKeysWithNewValues.keySet());
 560   }

the call stack of handleChangedFiles looks like below


changedKeysWithNewValues={FILE_STATE:[/home/codywu/bazel]/[examples/cpp/hello-world.cc]=RegularFileStateValue{digest=null, size=749, contentsProxy=ctime of 1578250746334 and nodeId of 170525141}}
com.google.devtools.build.lib.skyframe.SequencedSkyframeExecutor.handleChangedFiles(SequencedSkyframeExecutor.java:558)
com.google.devtools.build.lib.skyframe.SequencedSkyframeExecutor.handleDiffsWithMissingDiffInformation(SequencedSkyframeExecutor.java:536)
com.google.devtools.build.lib.skyframe.SequencedSkyframeExecutor.handleDiffs(SequencedSkyframeExecutor.java:373)
com.google.devtools.build.lib.skyframe.SequencedSkyframeExecutor.sync(SequencedSkyframeExecutor.java:262)
com.google.devtools.build.lib.runtime.CommandEnvironment.setupPackageCache(CommandEnvironment.java:598)
com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:536)
com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:215)
com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:603)
com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$2(GrpcServerImpl.java:654)
io.grpc.Context$1.run(Context.java:595)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.base/java.lang.Thread.run(Unknown Source)

The diffs are calculated using function getDirtyValues, which applies dirtiness checker to those files:


481   private BatchDirtyResult getDirtyValues(ValueFetcher fetcher,
482       Iterable keys, final SkyValueDirtinessChecker checker,
483       final boolean checkMissingValues) throws InterruptedException {
484     ExecutorService executor =
485         Executors.newFixedThreadPool(
486             DIRTINESS_CHECK_THREADS,
487             new ThreadFactoryBuilder().setNameFormat("FileSystem Value Invalidator %d").build());
488
489     final BatchDirtyResult batchResult = new BatchDirtyResult();
490     ThrowableRecordingRunnableWrapper wrapper =
491         new ThrowableRecordingRunnableWrapper("FilesystemValueChecker#getDirtyValues");
492     final AtomicInteger numKeysScanned = new AtomicInteger(0);
493     final AtomicInteger numKeysChecked = new AtomicInteger(0);
494     ElapsedTimeReceiver elapsedTimeReceiver =
495         elapsedTimeNanos -> {
496           if (elapsedTimeNanos > 0) {
497             logger.info(
498                 String.format(
499                     "Spent %d ms checking %d filesystem nodes (%d scanned)",
500                     TimeUnit.MILLISECONDS.convert(elapsedTimeNanos, TimeUnit.NANOSECONDS),
501                     numKeysChecked.get(),
502                     numKeysScanned.get()));
503           }
504         };
505     try (AutoProfiler prof = AutoProfiler.create(elapsedTimeReceiver)) {
506       for (final SkyKey key : keys) {
507         numKeysScanned.incrementAndGet();
508         if (!checker.applies(key)) {
509           continue;
510         }
511         Preconditions.checkState(
512             key.functionName().getHermeticity() == FunctionHermeticity.NONHERMETIC,
513             "Only non-hermetic keys can be dirty roots: %s",
514             key);
515         executor.execute(
516             wrapper.wrap(
517                 () -> {
518                   SkyValue value;
519                   try {
520                     value = fetcher.get(key);
521                   } catch (InterruptedException e) {
522                     // Exit fast. Interrupt is handled below on the main thread.
523                     return;
524                   }
525                   if (!checkMissingValues && value == null) {
526                     return;
527                   }
528
529                   numKeysChecked.incrementAndGet();
530                   DirtyResult result = checker.check(key, value, tsgm);
531                   if (result.isDirty()) {
532                     batchResult.add(key, value, result.getNewValue());
533                   }
534                 }));
535       }

the most critical part of code above is line #530 where check tries to check dependent nodes and add to the changed files if the result is “dirty”.

The dirtiness checker is a UnionDirtinessChecker of ExternalDirtinessChecker and MissingDiffDirtinessChecker


 508       diff =
 509           fsvc.getDirtyKeys(
 510               memoizingEvaluator.getValues(),
 511               new UnionDirtinessChecker(
 512                   Iterables.concat(
 513                       customDirtinessCheckers,
 514                       ImmutableList.of(
 515                           new ExternalDirtinessChecker(tmpExternalFilesHelper, fileTypesToCheck),
 516                           new MissingDiffDirtinessChecker(diffPackageRootsUnderWhichToCheck)))));
 517     }

In this case, the ultimate check is done by MissingDiffDirtinessChecker, which uses FileDirtinessChecker::createNewValues() to create new value for comparing with cached value saved in Bazel’s cache.


 44     @Override
 45     @Nullable
 46     public SkyValue createNewValue(SkyKey key, @Nullable TimestampGranularityMonitor tsgm) {
 47       RootedPath rootedPath = (RootedPath) key.argument();
 48       try {
 49         return FileStateValue.create(rootedPath, tsgm);
 50       } catch (IOException e) {
 51         // TODO(bazel-team): An IOException indicates a failure to get a file digest or a symlink
 52         // target, not a missing file. Such a failure really shouldn't happen, so failing early
 53         // may be better here.
 54         return null;
 55       }
 56     }
 57   }

the returned SkyValue is a RegularFileStateValue created from below function:


 76   public static FileStateValue create(
 77       RootedPath rootedPath,
 78       FilesystemCalls syscallCache,
 79       @Nullable TimestampGranularityMonitor tsgm)
 80       throws InconsistentFilesystemException, IOException {
 81     Path path = rootedPath.asPath();
 82     Dirent.Type type = syscallCache.getType(path, Symlinks.NOFOLLOW);
 83     if (type == null) {
 84       return NONEXISTENT_FILE_STATE_NODE;
 85     }
 86     switch (type) {
 87       case DIRECTORY:
 88         return DIRECTORY_FILE_STATE_NODE;
 89       case SYMLINK:
 90         return new SymlinkFileStateValue(path.readSymbolicLinkUnchecked());
 91       case FILE:
 92       case UNKNOWN:
 93         {
 94           FileStatus stat = syscallCache.statIfFound(path, Symlinks.NOFOLLOW);
 95           Preconditions.checkNotNull(
 96               stat, "File %s found in directory, but stat failed", rootedPath);
 97           return createWithStatNoFollow(rootedPath, FileStatusWithDigestAdapter.adapt(stat), tsgm);
 98         }
 99       default:
100         throw new IllegalStateException(type.toString());
101     }
102   }

The RegularFileStateValue has below defined to compare old and new values:


281     @Override
282     public boolean equals(Object obj) {
283       if (obj == this) {
284         return true;
285       }
286       if (!(obj instanceof RegularFileStateValue)) {
287         return false;
288       }
289       RegularFileStateValue other = (RegularFileStateValue) obj;
290       return size == other.size
291           && Arrays.equals(digest, other.digest)
292           && Objects.equals(contentsProxy, other.contentsProxy);
293     }

Digest value in this case is null and FileContentsProxy only compares change time (ctime) and Inode Id:


 61   @Override
 62   public boolean equals(Object other) {
 63     if (other == this) {
 64       return true;
 65     }
 66
 67     if (!(other instanceof FileContentsProxy)) {
 68       return false;
 69     }
 70
 71     FileContentsProxy that = (FileContentsProxy) other;
 72     return ctime == that.ctime && nodeId == that.nodeId;
 73   }

so we end up comparing 3 things for first level dirtiness:

  • File Size
  • Change Time
  • Inode Id

“touching” will change “Change Time” so touching indeed makes the file dirty, which will cause “hello-world.cc” to be added to the Action for evaluating.

Second Level Digest Check

After first stage identified “dirty” nodes we actually get into the second level check where the concept of “ActionCache” as well as “File Digest” is used to prevent unnecessary compiling & linking.

the entry point of the second level is the calling to funciton mustExecute() which checks to see if the action is indeed needed.

It starts by first fetching the cache entry for the action and then calls into mustExecute().


280     ActionCache.Entry entry = getCacheEntry(action);
281     if (mustExecute(
282         action,
283         entry,
284         handler,
285         metadataHandler,
286         actionInputs,
287         clientEnv,
288         remoteDefaultPlatformProperties)) {
289       if (entry != null) {
290         removeCacheEntry(action);
291       }
292       return new Token(getKeyString(action));
293     }

mustExecute() does a few checks and most important to us is validateArtifacts on line #326 and comparison of ActionKey on line #330.


301   protected boolean mustExecute(
302       Action action,
303       @Nullable ActionCache.Entry entry,
304       EventHandler handler,
305       MetadataHandler metadataHandler,
306       Iterable actionInputs,
307       Map clientEnv,
308       Map remoteDefaultPlatformProperties) {
309     // Unconditional execution can be applied only for actions that are allowed to be executed.
310     if (unconditionalExecution(action)) {
311       Preconditions.checkState(action.isVolatile());
312       reportUnconditionalExecution(handler, action);
313       actionCache.accountMiss(MissReason.UNCONDITIONAL_EXECUTION);
314       return true;
315     }
316     if (entry == null) {
317       reportNewAction(handler, action);
318       actionCache.accountMiss(MissReason.NOT_CACHED);
319       return true;
320     }
321
322     if (entry.isCorrupted()) {
323       reportCorruptedCacheEntry(handler, action);
324       actionCache.accountMiss(MissReason.CORRUPTED_CACHE_ENTRY);
325       return true;
326     } else if (validateArtifacts(entry, action, actionInputs, metadataHandler, true)) {
327       reportChanged(handler, action);
328       actionCache.accountMiss(MissReason.DIFFERENT_FILES);
329       return true;
330     } else if (!entry.getActionKey().equals(action.getKey(actionKeyContext))) {
331       reportCommand(handler, action);
332       actionCache.accountMiss(MissReason.DIFFERENT_ACTION_KEY);
333       return true;
334     }
335     Map usedEnvironment =
336         computeUsedEnv(action, clientEnv, remoteDefaultPlatformProperties);
337     if (!Arrays.equals(entry.getUsedClientEnvDigest(), DigestUtils.fromEnv(usedEnvironment))) {
338       reportClientEnv(handler, action, usedEnvironment);
339       actionCache.accountMiss(MissReason.DIFFERENT_ENVIRONMENT);
340       return true;
341     }
342
343     entry.getFileDigest();
344     actionCache.accountHit();
345     return false;
346   }

First let’s talk about ActionKey comparison on line #330 as it is actually not our focus here.

ActionKey is a digest created from a certain Action’s inputs/outputs and various other arguments.

The goal is to check to see if certain action is forced to update due to command line option or build files changes etc.

In our example, we only modified the file contents hence it actually does not apply to us anyway.

The more interesting one is the validateArtifacts which actually goes into getting the SHA256 digest of the file:


135   /**
136    * Validate metadata state for action input or output artifacts.
137    *
138    * @param entry cached action information.
139    * @param action action to be validated.
140    * @param actionInputs the inputs of the action. Normally just the result of action.getInputs(),
141    *     but if this action doesn't yet know its inputs, we check the inputs from the cache.
142    * @param metadataHandler provider of metadata for the artifacts this action interacts with.
143    * @param checkOutput true to validate output artifacts, Otherwise, just validate inputs.
144    * @return true if at least one artifact has changed, false - otherwise.
145    */
146   private static boolean validateArtifacts(
147       ActionCache.Entry entry,
148       Action action,
149       Iterable actionInputs,
150       MetadataHandler metadataHandler,
151       boolean checkOutput) {
152     Map mdMap = new HashMap();
153     if (checkOutput) {
154       for (Artifact artifact : action.getOutputs()) {
155         mdMap.put(artifact.getExecPathString(), getMetadataMaybe(metadataHandler, artifact));
156       }
157     }
158     for (Artifact artifact : actionInputs) {
159       mdMap.put(artifact.getExecPathString(), getMetadataMaybe(metadataHandler, artifact));
160     }
161     return !Arrays.equals(DigestUtils.fromMetadata(mdMap), entry.getFileDigest());
162   }

as we can see, we add both inputs and outputs files to the mdMap and then generate digests from the map, and use this final digest to compare with the cached digest.

DigestUtils does simple XOR of everything’s digest (and in this case SHA256 Digest) in the map:


278   /**
279    * @param mdMap A collection of (execPath, FileArtifactValue) pairs. Values may be null.
280    * @return an order-independent digest from the given "set" of (path, metadata) pairs.
281    */
282   public static byte[] fromMetadata(Map mdMap) {
283     byte[] result = new byte[1]; // reserve the empty string
284     // Profiling showed that MessageDigest engine instantiation was a hotspot, so create one
285     // instance for this computation to amortize its cost.
286     Fingerprint fp = new Fingerprint();
287     for (Map.Entry entry : mdMap.entrySet()) {
288       result = xor(result, getDigest(fp, entry.getKey(), entry.getValue()));
289     }
290     return result;
291   }

The file digest is calculated when evaluation the target File hello-world.cc with stack trace like below:


getDigest java.base/java.lang.Thread.getStackTrace(Unknown Source)
getDigest com.google.devtools.build.lib.vfs.FileSystem.getDigest(FileSystem.java:320)
getDigest com.google.devtools.build.lib.unix.UnixFileSystem.getDigest(UnixFileSystem.java:415)
getDigest com.google.devtools.build.lib.vfs.Path.getDigest(Path.java:775)
getDigest com.google.devtools.build.lib.actions.cache.DigestUtils.getDigestInternal(DigestUtils.java:163)
getDigest com.google.devtools.build.lib.actions.cache.DigestUtils.getDigestOrFail(DigestUtils.java:252)
getDigest com.google.devtools.build.lib.actions.FileArtifactValue.create(FileArtifactValue.java:240)
getDigest com.google.devtools.build.lib.actions.FileArtifactValue.createForSourceArtifact(FileArtifactValue.java:184)
getDigest com.google.devtools.build.lib.skyframe.ArtifactFunction.createSourceValue(ArtifactFunction.java:279)
getDigest com.google.devtools.build.lib.skyframe.ArtifactFunction.compute(ArtifactFunction.java:89)
getDigest com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:445)
getDigest com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:399)
getDigest java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
getDigest java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
getDigest java.base/java.lang.Thread.run(Unknown Source)

and the digest function is done with below getDigest():


310   /**
311    * Returns the digest of the file denoted by the path, following symbolic links.
312    *
313    * 

Subclasses may (and do) optimize this computation for a particular digest functions. 314 * 315 * @return a new byte array containing the file's digest 316 * @throws IOException if the digest could not be computed for any reason 317 */ 318 protected byte[] getDigest(final Path path) throws IOException { 319 return new ByteSource() { 320 @Override 321 public InputStream openStream() throws IOException { 322 return getInputStream(path); 323 } 324 }.hash(digestFunction.getHashFunction()).asBytes(); 325 }

and currently the two hash functions are registered as digest:


 70   public static final DigestHashFunction SHA1 = register(Hashing.sha1(), "SHA-1", "SHA1");
 71   public static final DigestHashFunction SHA256 = register(Hashing.sha256(), "SHA-256", "SHA256");

this is why Bazel is able to actually detect if the file content is changed or not and why “touching” cannot trigger a rebuild.

About codywu2010

a programmer
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s