-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang module] Current Working Directory Pruning #124786
Conversation
@llvm/pr-subscribers-clang Author: Qiongsi Wu (qiongsiwu) ChangesWhen computing the context hash, Specifically, Full diff: https://github.com/llvm/llvm-project/pull/124786.diff 5 Files Affected:
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
index 4a343f2872d8d9..9ad8e68c33eb10 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
@@ -63,7 +63,10 @@ enum class ScanningOptimizations {
/// Canonicalize -D and -U options.
Macros = 8,
- DSS_LAST_BITMASK_ENUM(Macros),
+ /// Ignore the compiler's working directory if it is safe.
+ IgnoreCWD = 0x10,
+
+ DSS_LAST_BITMASK_ENUM(IgnoreCWD),
Default = All
};
diff --git a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
index 2e97cac0796cee..714efb86fa3796 100644
--- a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
+++ b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
@@ -397,9 +397,92 @@ void ModuleDepCollector::applyDiscoveredDependencies(CompilerInvocation &CI) {
}
}
+static bool isSafeToIgnoreCWD(const CowCompilerInvocation &CI) {
+ // Check if the command line input uses relative paths.
+ // It is not safe to ignore the current working directory if any of the
+ // command line inputs use relative paths.
+#define IF_RELATIVE_RETURN_FALSE(PATH) \
+ do { \
+ if (!PATH.empty() && !llvm::sys::path::is_absolute(PATH)) \
+ return false; \
+ } while (0)
+
+#define IF_ANY_RELATIVE_RETURN_FALSE(PATHS) \
+ do { \
+ if (std::any_of(PATHS.begin(), PATHS.end(), [](const auto &P) { \
+ return !P.empty() && !llvm::sys::path::is_absolute(P); \
+ })) \
+ return false; \
+ } while (0)
+
+ // Header search paths.
+ const auto &HeaderSearchOpts = CI.getHeaderSearchOpts();
+ IF_RELATIVE_RETURN_FALSE(HeaderSearchOpts.Sysroot);
+ for (auto &Entry : HeaderSearchOpts.UserEntries)
+ if (Entry.IgnoreSysRoot)
+ IF_RELATIVE_RETURN_FALSE(Entry.Path);
+ IF_RELATIVE_RETURN_FALSE(HeaderSearchOpts.ResourceDir);
+ IF_RELATIVE_RETURN_FALSE(HeaderSearchOpts.ModuleCachePath);
+ IF_RELATIVE_RETURN_FALSE(HeaderSearchOpts.ModuleUserBuildPath);
+ for (auto I = HeaderSearchOpts.PrebuiltModuleFiles.begin(),
+ E = HeaderSearchOpts.PrebuiltModuleFiles.end();
+ I != E;) {
+ auto Current = I++;
+ IF_RELATIVE_RETURN_FALSE(Current->second);
+ }
+ IF_ANY_RELATIVE_RETURN_FALSE(HeaderSearchOpts.PrebuiltModulePaths);
+ IF_ANY_RELATIVE_RETURN_FALSE(HeaderSearchOpts.VFSOverlayFiles);
+
+ // Preprocessor options.
+ const auto &PPOpts = CI.getPreprocessorOpts();
+ IF_ANY_RELATIVE_RETURN_FALSE(PPOpts.MacroIncludes);
+ IF_ANY_RELATIVE_RETURN_FALSE(PPOpts.Includes);
+ IF_RELATIVE_RETURN_FALSE(PPOpts.ImplicitPCHInclude);
+
+ // Frontend options.
+ const auto &FrontendOpts = CI.getFrontendOpts();
+ for (const FrontendInputFile &Input : FrontendOpts.Inputs) {
+ if (Input.isBuffer())
+ continue; // FIXME: Can this happen when parsing command-line?
+
+ IF_RELATIVE_RETURN_FALSE(Input.getFile());
+ }
+ IF_RELATIVE_RETURN_FALSE(FrontendOpts.CodeCompletionAt.FileName);
+ IF_ANY_RELATIVE_RETURN_FALSE(FrontendOpts.ModuleMapFiles);
+ IF_ANY_RELATIVE_RETURN_FALSE(FrontendOpts.ModuleFiles);
+ IF_ANY_RELATIVE_RETURN_FALSE(FrontendOpts.ModulesEmbedFiles);
+ IF_ANY_RELATIVE_RETURN_FALSE(FrontendOpts.ASTMergeFiles);
+ IF_RELATIVE_RETURN_FALSE(FrontendOpts.OverrideRecordLayoutsFile);
+ IF_RELATIVE_RETURN_FALSE(FrontendOpts.StatsFile);
+
+ // Filesystem options.
+ const auto &FileSystemOpts = CI.getFileSystemOpts();
+ IF_RELATIVE_RETURN_FALSE(FileSystemOpts.WorkingDir);
+
+ // Codegen options.
+ const auto &CodeGenOpts = CI.getCodeGenOpts();
+ IF_RELATIVE_RETURN_FALSE(CodeGenOpts.DebugCompilationDir);
+ IF_RELATIVE_RETURN_FALSE(CodeGenOpts.CoverageCompilationDir);
+
+ // Sanitizer options.
+ IF_ANY_RELATIVE_RETURN_FALSE(CI.getLangOpts().NoSanitizeFiles);
+
+ // Coverage mappings.
+ IF_RELATIVE_RETURN_FALSE(CodeGenOpts.ProfileInstrumentUsePath);
+ IF_RELATIVE_RETURN_FALSE(CodeGenOpts.SampleProfileFile);
+ IF_RELATIVE_RETURN_FALSE(CodeGenOpts.ProfileRemappingFile);
+
+ // Dependency output options.
+ for (auto &ExtraDep : CI.getDependencyOutputOpts().ExtraDeps)
+ IF_RELATIVE_RETURN_FALSE(ExtraDep.first);
+
+ return true;
+}
+
static std::string getModuleContextHash(const ModuleDeps &MD,
const CowCompilerInvocation &CI,
bool EagerLoadModules,
+ bool IgnoreCWD,
llvm::vfs::FileSystem &VFS) {
llvm::HashBuilder<llvm::TruncatedBLAKE3<16>, llvm::endianness::native>
HashBuilder;
@@ -410,7 +493,7 @@ static std::string getModuleContextHash(const ModuleDeps &MD,
HashBuilder.add(getClangFullRepositoryVersion());
HashBuilder.add(serialization::VERSION_MAJOR, serialization::VERSION_MINOR);
llvm::ErrorOr<std::string> CWD = VFS.getCurrentWorkingDirectory();
- if (CWD)
+ if (CWD && !IgnoreCWD)
HashBuilder.add(*CWD);
// Hash the BuildInvocation without any input files.
@@ -443,8 +526,11 @@ static std::string getModuleContextHash(const ModuleDeps &MD,
void ModuleDepCollector::associateWithContextHash(
const CowCompilerInvocation &CI, ModuleDeps &Deps) {
- Deps.ID.ContextHash = getModuleContextHash(
- Deps, CI, EagerLoadModules, ScanInstance.getVirtualFileSystem());
+ bool IgnoreCWD = any(OptimizeArgs & ScanningOptimizations::IgnoreCWD) &&
+ isSafeToIgnoreCWD(CI);
+ Deps.ID.ContextHash =
+ getModuleContextHash(Deps, CI, EagerLoadModules, IgnoreCWD,
+ ScanInstance.getVirtualFileSystem());
bool Inserted = ModuleDepsByID.insert({Deps.ID, &Deps}).second;
(void)Inserted;
assert(Inserted && "duplicate module mapping");
diff --git a/clang/test/ClangScanDeps/modules-context-hash-cwd.c b/clang/test/ClangScanDeps/modules-context-hash-cwd.c
new file mode 100644
index 00000000000000..45be72301c635d
--- /dev/null
+++ b/clang/test/ClangScanDeps/modules-context-hash-cwd.c
@@ -0,0 +1,123 @@
+// Test current directory pruning when computing the context hash.
+
+// REQUIRES: shell
+
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+// RUN: sed -e "s|DIR|%/t|g" %t/cdb0.json.in > %t/cdb0.json
+// RUN: sed -e "s|DIR|%/t|g" %t/cdb1.json.in > %t/cdb1.json
+// RUN: sed -e "s|DIR|%/t|g" %t/cdb2.json.in > %t/cdb2.json
+// RUN: clang-scan-deps -compilation-database %t/cdb0.json -format experimental-full > %t/result0.json
+// RUN: clang-scan-deps -compilation-database %t/cdb1.json -format experimental-full > %t/result1.json
+// RUN: clang-scan-deps -compilation-database %t/cdb2.json -format experimental-full -optimize-args=header-search,system-warnings,vfs,canonicalize-macros > %t/result2.json
+// RUN: cat %t/result0.json %t/result1.json | FileCheck %s
+// RUN: cat %t/result0.json %t/result2.json | FileCheck %s -check-prefix=SKIPOPT
+
+//--- cdb0.json.in
+[{
+ "directory": "DIR",
+ "command": "clang -c DIR/tu.c -fmodules -fmodules-cache-path=DIR/cache -IDIR/include/ -o DIR/tu.o",
+ "file": "DIR/tu.c"
+}]
+
+//--- cdb1.json.in
+[{
+ "directory": "DIR/a",
+ "command": "clang -c DIR/tu.c -fmodules -fmodules-cache-path=DIR/cache -IDIR/include/ -o DIR/tu.o",
+ "file": "DIR/tu.c"
+}]
+
+//--- cdb2.json.in
+[{
+ "directory": "DIR/a/",
+ "command": "clang -c DIR/tu.c -fmodules -fmodules-cache-path=DIR/cache -IDIR/include/ -o DIR/tu.o",
+ "file": "DIR/tu.c"
+}]
+
+//--- include/module.modulemap
+module mod {
+ header "mod.h"
+}
+
+//--- include/mod.h
+
+//--- tu.c
+#include "mod.h"
+
+// Check that result0 and result1 compute the same hash with optimization
+// on. The only difference between result0 and result1 is the compiler's
+// working directory.
+// CHECK: {
+// CHECK-NEXT: "modules": [
+// CHECK-NEXT: {
+// CHECK-NEXT: "clang-module-deps": [],
+// CHECK: "context-hash": "[[HASH:.*]]",
+// CHECK: }
+// CHECK: "translation-units": [
+// CHECK: {
+// CHECK: "commands": [
+// CHECK: {
+// CHECK-NEXT: "clang-context-hash": "{{.*}}",
+// CHECK-NEXT: "clang-module-deps": [
+// CHECK-NEXT: {
+// CHECK-NEXT: "context-hash": "[[HASH]]",
+// CHECK-NEXT: "module-name": "mod"
+// CHECK: }
+// CHECK: ],
+// CHECK: {
+// CHECK-NEXT: "modules": [
+// CHECK-NEXT: {
+// CHECK-NEXT: "clang-module-deps": [],
+// CHECK: "context-hash": "[[HASH]]",
+// CHECK: }
+// CHECK: "translation-units": [
+// CHECK: {
+// CHECK: "commands": [
+// CHECK: {
+// CHECK-NEXT: "clang-context-hash": "{{.*}}",
+// CHECK-NEXT: "clang-module-deps": [
+// CHECK-NEXT: {
+// CHECK-NEXT: "context-hash": "[[HASH]]",
+// CHECK-NEXT: "module-name": "mod"
+// CHECK: }
+// CHECK: ],
+
+// Check that result0 and result2 compute different hashes because
+// the working directory optmization is turned off for result2.
+// SKIPOPT: {
+// SKIPOPT-NEXT: "modules": [
+// SKIPOPT-NEXT: {
+// SKIPOPT-NEXT: "clang-module-deps": [],
+// SKIPOPT: "context-hash": "[[HASH0:.*]]",
+// SKIPOPT: }
+// SKIPOPT: "translation-units": [
+// SKIPOPT: {
+// SKIPOPT: "commands": [
+// SKIPOPT: {
+// SKIPOPT-NEXT: "clang-context-hash": "{{.*}}",
+// SKIPOPT-NEXT: "clang-module-deps": [
+// SKIPOPT-NEXT: {
+// SKIPOPT-NEXT: "context-hash": "[[HASH0]]",
+// SKIPOPT-NEXT: "module-name": "mod"
+// SKIPOPT: }
+// SKIPOPT: ],
+// SKIPOPT: {
+// SKIPOPT-NEXT: "modules": [
+// SKIPOPT-NEXT: {
+// SKIPOPT-NEXT: "clang-module-deps": [],
+// SKIPOPT-NOT: "context-hash": "[[HASH0]]",
+// SKIPOPT: "context-hash": "[[HASH2:.*]]",
+// SKIPOPT: }
+// SKIPOPT: "translation-units": [
+// SKIPOPT: {
+// SKIPOPT: "commands": [
+// SKIPOPT: {
+// SKIPOPT-NEXT: "clang-context-hash": "{{.*}}",
+// SKIPOPT-NEXT: "clang-module-deps": [
+// SKIPOPT-NEXT: {
+// SKIPOPT-NOT: "context-hash": "[[HASH0]]",
+// SKIPOPT-NEXT: "context-hash": "[[HASH2]]"
+// SKIPOPT-NEXT: "module-name": "mod"
+// SKIPOPT: }
+// SKIPOPT: ],
+
diff --git a/clang/test/ClangScanDeps/working-dir.m b/clang/test/ClangScanDeps/working-dir.m
index a04f8c2486b98d..c6b7b1988d3cf7 100644
--- a/clang/test/ClangScanDeps/working-dir.m
+++ b/clang/test/ClangScanDeps/working-dir.m
@@ -2,7 +2,7 @@
// RUN: split-file %s %t
// RUN: sed -e "s|DIR|%/t|g" %t/build/compile-commands.json.in > %t/build/compile-commands.json
// RUN: clang-scan-deps -compilation-database %t/build/compile-commands.json \
-// RUN: -j 1 -format experimental-full --optimize-args=all > %t/deps.db
+// RUN: -j 1 -format experimental-full --optimize-args=header-search,system-warnings,vfs,canonicalize-macros > %t/deps.db
// RUN: cat %t/deps.db | sed 's:\\\\\?:/:g' | FileCheck %s -DPREFIX=%/t
// Check that there are two separate modules hashes. One for each working dir.
diff --git a/clang/tools/clang-scan-deps/ClangScanDeps.cpp b/clang/tools/clang-scan-deps/ClangScanDeps.cpp
index 709dc513be2811..8d429534a20073 100644
--- a/clang/tools/clang-scan-deps/ClangScanDeps.cpp
+++ b/clang/tools/clang-scan-deps/ClangScanDeps.cpp
@@ -164,6 +164,8 @@ static void ParseArgs(int argc, char **argv) {
.Case("system-warnings", ScanningOptimizations::SystemWarnings)
.Case("vfs", ScanningOptimizations::VFS)
.Case("canonicalize-macros", ScanningOptimizations::Macros)
+ .Case("ignore-current-working-dir",
+ ScanningOptimizations::IgnoreCWD)
.Case("all", ScanningOptimizations::All)
.Default(std::nullopt);
if (!Optimization) {
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the right approach, but I think it would be good to have a test for the relative path checking. Not every option needs a test, just one should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better if this optimization happens really early in the process since you only visit all the options in CI? In that case, you can just reset the CurrentWorkingDirectory
in the VFS so all the searching is done without CWD. This avoids any hard to debug issues if some options are not taken care of (needs CWD but not checked) but the trade off is more explicit errors during scanning.
It has to happen after the header search optimization in case that removes relative header search paths. |
…ugh -working-directory.
Gentle ping for review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some small style comments.
clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
Outdated
Show resolved
Hide resolved
When computing the context hash, `clang` always includes the compiler's working directory. This can lead to situations when the only difference between two compilations is the working directory, different module variants are generated. These variants are redundant. This PR implements an optimization that ignores the working directory when computing the context hash when safe. Specifically, `clang` checks if it is safe to ignore the working directory in `isSafeToIgnoreCWD`. The check involves going through compile command options to see if any paths specified are relative. The definition of relative path used here is that the input path is not empty, and `llvm::sys::path::is_absolute` is false. If all the paths examined are not relative, `clang` considers it safe to ignore the current working directory and does not consider the working directory when computing the context hash. (cherry picked from commit 54acda2)
When computing the context hash,
clang
always includes the compiler's working directory. This can lead to situations when the only difference between two compilations is the working directory, different module variants are generated. These variants are redundant. This PR implements an optimization that ignores the working directory when computing the context hash when safe.Specifically,
clang
checks if it is safe to ignore the working directory inisSafeToIgnoreCWD
. The check involves going through compile command options to see if any paths specified are relative. The definition of relative path used here is that the input path is not empty, andllvm::sys::path::is_absolute
is false. If all the paths examined are not relative,clang
considers it safe to ignore the current working directory and does not consider the working directory when computing the context hash.