Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dirty-water report of Spoon: analysis and mitigation #37

Open
monperrus opened this issue Nov 7, 2024 · 12 comments
Open

dirty-water report of Spoon: analysis and mitigation #37

monperrus opened this issue Nov 7, 2024 · 12 comments

Comments

@monperrus
Copy link
Collaborator

thanks a lot @randomicecube for preparing the dirty-water report of Spoon
https://gist.github.com/monperrus/34663084981de3c56f3120f932e0a4b7

let's fix everything

where do the 4 404 github_url come from? (extracted? computed)

index package_name github_url github_exists
1 org.slf4j:[email protected] https://github.com/qos-ch/slf4j/slf4j-parent/slf4j-api False
2 org.assertj:[email protected] https://github.com/assertj/assertj/assertj-parent/assertj-core False
3 com.google.guava:[email protected] https://github.com/google/guava/guava False
4 ch.qos.logback:[email protected] https://github.com/qos-ch/logback/logback-classic False
@randomicecube
Copy link
Collaborator

randomicecube commented Nov 7, 2024

@monperrus the URLs are computed from each dependency, using the following command:

mvn help:evaluate -Dexpression=project.scm.url -Dartifact={group_id}:{artifact_id}:{version} -q -DforceStdout

I believe that when researching, I got the idea to use project.scm.url from here: https://maven.apache.org/plugins/maven-project-info-reports-plugin/scm-mojo.html

As an example, if you have this call for the first package in this table, you get the following interaction:

❯ mvn help:evaluate -Dexpression=project.scm.url -Dartifact=org.slf4j:slf4j-api:2.0.16 -q -DforceStdout
https://github.com/qos-ch/slf4j/slf4j-parent/slf4j-api

You can also see that in maven's central repository for this dependency, the source control URL on the tab at the bottom right of the page leads to a URL which doesn't exist: https://central.sonatype.com/artifact/org.slf4j/slf4j-api/2.0.16

I'm not quite sure how this URL is actually acquired, since the pom for this dependency doesn't actually have an SCM tag, only its supermodule does (and that one has a correct URL) -- perhaps this is dynamically acquired from the supermodule? And if they (hypothetically) just append the project's name to the URL, it's bound for these errors to happen, I'd say.

Overall, it seems like this issue is probably related with maintainers not writing the SCM tag for submodules of a supermodule. What do you think?

@monperrus
Copy link
Collaborator Author

Interesting. There are probably two mitigations:

  1. PR those projects to add project.scm.url.
  2. We probably found a bug Maven itself on the dynamic generation of project.scm.url. We can PR Maven itself to fix it.

Option 1 is the easiest, could you give it a try?

Thanks!

@randomicecube
Copy link
Collaborator

@monperrus regarding the second point, I found a thread which might be useful for us here: https://stackoverflow.com/questions/50321827/which-urls-does-maven-resolve-with-a-subpath-in-child-poms

The URL to the project's browsable SCM repository, such as ViewVC or Fisheye.
Default value is: parent value [+ path adjustment] + artifactId
So, the default value for all of these elements includes both path adjustment and artifactId.

So essentially, this default value is the root cause here, since the SCM tag is not defined in some child modules, as with slf4j-api. Being a default value, it might even be the case that this is intended behavior (?) but I could investigate further.
Regardless, I'll try to open PRs for these projects soon. I'll be away from my computer until Monday morning, I'll come back to this by then.

@monperrus
Copy link
Collaborator Author

I'll try to open PRs for these projects soon.

great, thanks @randomicecube

@monperrus
Copy link
Collaborator Author

monperrus commented Nov 8, 2024

citing the Stackoverflow post:

As it turns out, project.directory overrides artifactId for all three URLs. This is unexpected and is not covered in the documentation.

This is exactly the bug we're talking about. It would be great to firt report it and then fix it.

@monperrus
Copy link
Collaborator Author

@MartinWitt suggests that there is a bug because we have way more than 17 dependencies in the supply chain.

@randomicecube could you have a look? thanks!

@randomicecube
Copy link
Collaborator

@MartinWitt are you sure? I'm saying this because looking strictly at spoon's pom.xml file, it looks like every dependency is being correctly parsed. Each sub-project (e.g., spoon-visualization) has a further pom.xml, with more dependencies, of course, but those aren't to be directly parsed here, I think.

One think which would be fair, though, is that I don't think the dependencies would be, for now, correctly parsed for the child projects -- that is, for spoon-visualization, for instance, we'd parse that child project's pom.xml, but the parent's dependencies would be ignored (and they shouldn't, I think).

Do you agree?

cc @monperrus

@I-Al-Istannen
Copy link

Spoon has 17 direct dependencies, but also inherits some from its parent. Even if your tool only analyzes direct dependencies, it should be 23:

❯ mvn dependency:tree -DoutputType=json -DoutputFile=/dev/stdout | grep -v "\[INFO" | jq '.children | length'
23

But the first line in your readme states (emphasis mine)

Dirty-waters automatically finds software supply chain issues in software projects by analyzing the available metadata of all dependencies, transitively

If you include those it gets closer to 45:

❯ mvn dependency:tree -DoutputType=json -DoutputFile=/dev/stdout | grep -v "\[INFO" | jq '[.children | .. | {groupId, artifactId}?] | length'
45

And then you also have omitted dependencies:

❯ mvn dependency:tree -Dverbose -DoutputType=json -DoutputFile=/dev/stdout | grep -v "\[INFO" | jq '[.children | .. | {version, groupId, artifactId}?] | unique | length'
53

@randomicecube
Copy link
Collaborator

Thanks for the detailed feedback, I'll take a look!

@I-Al-Istannen
Copy link

I-Al-Istannen commented Nov 12, 2024

Martin Witt also rightly complained that this does not include plugins and their transitive dependencies. I omitted those as there isn't a good and easy way to fetch them. But this should come close (though it overcounts because it includes conflicts):

❯ mvn dependency:resolve-plugins | sed -n '/The following plugins/,$p' | tail +2 | head -n -8 | sed 's/\[INFO\] *//' | sort | uniq | wc -l
311

Quite a few more, but not all of them necessarily have much input in the final build.

randomicecube added a commit that referenced this issue Nov 20, 2024
TODO: missing plugin support, as referenced in #37 (comment)
@monperrus
Copy link
Collaborator Author

improved report at https://gist.github.com/monperrus/34663084981de3c56f3120f932e0a4b7

now 352 dependencies in the Spoon supply chain

@algomaster99
Copy link
Member

algomaster99 commented Nov 27, 2024

I did not follow all the conversation but at least I got "I'm not quite sure how this URL is actually acquired, since the pom for this dependency doesn't actually have an SCM tag, only its supermodule does (and that one has a correct URL)"

See Inheritance Assembly. In short, the child module appends the parent's URLs with artifact ID. If parent-module is org.example:parent and URL is https://github.com/org/example, the child module would be org.example:child and URL https://github.com/org/example/child. Whether artifact ID is appended or not can be controlled using the property child.scm.url.inherit.append.path.

Trivia: This inheritance algorithm worked well for Apache Subversion based repositories but not anymore because majority are hosted on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants