Skip to content

VCS resolving nested provenances fails for monorepo packages when using WorkingTreeCache #10996

@sasa-boros-cp

Description

@sasa-boros-cp

Describe the bug

ORT scanner fails to resolve nested provenances for packages in monorepos (e.g., Babel) with git checkout errors when WorkingTreeCache reuses the same cached repository clone for multiple packages.

Root Cause:

WorkingTreeCache reuses the same cached repository clone for multiple packages from the same monorepo (cache key is type|url, excluding path and revision), but does not clean the
working tree between git checkout operations.

To Reproduce

Not so easy to reproduce, as you need multiple runs starting with clear cache:

  1. Run ORT analyze on repo that contains multiple monorepo packages
  2. Run ORT scan
  3. Run ORT scan again
  4. Could not resolve nested provenance for package errors are in the logs and scan-result.yml for monorepo packages
  5. Provenances not included in the final report

Expected behavior

Scan completes without issue for monorepo packages and they are therefore included in the final reports.

Console / log output

Example error Message for Babel package:

  Could not resolve nested provenance for package 'NPM:@babel:runtime:7.7.4':
  IOException: Running 'git checkout 75767d87cb147709b9bd9b99bf44daa6688874a9' in
  '/tmp/ort-DefaultWorkingTreeCache10381812055788114988' failed with exit code 1:
  error: Your local changes to the following files would be overwritten by checkout:
    packages/babel-plugin-transform-async-to-generator/test/fixtures/async-to-generator/async-arrow-in-method/actual.js
    packages/babel-plugin-transform-async-to-generator/test/fixtures/async-to-generator/object-method-with-arrows/actual.js
    packages/babel-plugin-transform-async-to-generator/test/fixtures/async-to-generator/object-method-with-arrows/expected.js
  Please commit your changes or stash them before you switch branches.
  Aborting.

Environment

  • ORT version: 70.1 (LTS)
  • Java version: 21
  • OS: Linux (Docker)

Additional context

  • Scanner fails to process packages from monorepos
  • Errors occur intermittently depending on package processing order

I already have a proposed solution for the issue which tries to clean the working copy before checkout. Looking forward to discussing it further:

  1. Add cleanWorkingTree() helper method in Git.kt:
  private fun cleanWorkingTree(workingTree: GitWorkingTree) {
      runCatching {
          workingTree.runGit("reset", "--hard", "HEAD")
          workingTree.runGit("clean", "-fd")
      }.onFailure {
          logger.warn { "Failed to clean working tree: ${it.collectMessages()}" }
      }
  }
  1. Call cleanup before checkout at line 273:
 }.mapCatching { fetchResult ->
     // Clean the working tree before checkout to avoid conflicts
     cleanWorkingTree(workingTree)

     workingTree.runGit("checkout", revision)
     // ... rest of existing code

The working copy is changed because git update is executed before trying to resolve: git submodule update --init --recursive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    downloaderAbout the downloader tool

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions