Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate regression cases found in integration tests after updating ScanCode #1183

Open
qtomlinson opened this issue Aug 23, 2024 · 5 comments

Comments

@qtomlinson
Copy link
Collaborator

qtomlinson commented Aug 23, 2024

This comes from the discussion on PR to integrate new ScanCode, specifically on the license differences in integration tests before and after integrating v32 ScanCode.

  1. nuget/nuget/-/NuGet.Protocol/6.7.1. See discussion on root cause at Add new summarizer for recent ScanCode versions #1056 (comment).
  • expected: {"path":"clearlydefined/downloaded/LICENSE","license":"Apache-2.0", ...}
  • actual: {"path":"clearlydefined/downloaded/LICENSE","license":"Apache-2.0 AND (ECL-2.0 AND Apache-2.0)", ...}
  • The clearlydefined/downloaded/LICENSE is the license obtained from https://licenses.nuget.org/Apache-2.0 (licenseUrl from the component manifest)
  1. pypi/pypi/-/sdbus/0.12.0. Need to investigate the root cause and fix.
  • expected: declared: 'GPL-2.0 AND LGPL-2.0-or-later AND LGPL-2.1-or-later'
  • actual: declared: 'GPL-1.0-or-later AND GPL-2.0 AND LGPL-2.0-or-later AND LGPL-2.1-only AND LGPL-2.1-or-later AND Python-2.0'
@qtomlinson
Copy link
Collaborator Author

@elrayle @yashkohli88

@qtomlinson qtomlinson changed the title Investigate regression cases found in integration test after integrating v32 ScanCode Investigate regression cases found in integration tests after updating ScanCode Aug 23, 2024
@yashkohli88
Copy link
Contributor

I implemented a change to resolve the license issue for Nuget.Protocol coordinate. Now the code will only consider the matches if the score is greater than 80%. But it triggered other components to fail in the below mentioned places.

  1. pypi/pypi/-/platformdirs/4.2.0 -
    LicenseRef-scancode-unknown-license-reference is being reported by scancode in PKG-INFO file with 100 score. This adds
    LicenseRef-scancode-unknown-license-reference in the list of discovered license.

Scancode result -

{
            "license_expression": "unknown-license-reference",
            "license_expression_spdx": "LicenseRef-scancode-unknown-license-reference",
            "from_file": "cd-aYG6pL/platformdirs-4.2.0/PKG-INFO",
            "start_line": 11,
            "end_line": 11,
            "matcher": "2-aho",
            "score": 100,
            "matched_length": 3,
            "match_coverage": 100,
            "rule_relevance": 100,
            "rule_identifier": "unknown-license-reference_see_license_at_manifest_1.RULE",
            "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/unknown-license-reference_see_license_at_manifest_1.RULE",
            "matched_text": "License-File: LICENSE",
            "matched_text_diagnostics": "License-File: LICENSE"
          }

Below is the licensed section from definition for changed code.

"licensed": {
        "declared": "MIT",
        "toolScore": {
            "total": 45,
            "declared": 30,
            "discovered": 0,
            "consistency": 0,
            "spdx": 15,
            "texts": 0
        },
        "facets": {
            "core": {
                "attribution": {
                    "unknown": 22
                },
                "discovered": {
                    "unknown": 19,
                    "expressions": [
                        "LicenseRef-scancode-unknown-license-reference AND MIT",
                        "MIT"
                    ]
                },
                "files": 22
            }
        },
        "score": {
            "total": 45,
            "declared": 30,
            "discovered": 0,
            "consistency": 0,
            "spdx": 15,
            "texts": 0
        }
    }
  1. 'conda/conda-forge/linux-aarch64/numpy/1.16.6-py36hdc1b780_0' - The 'NOASSERTION' keyword has been replaced by 'LicenseRef-scancode-unknown-license-reference' on many instances. Some places this unknown license expression has been added. Below is the comparison from integration test
expected: {"path":"info/about.json","license":"BSD-3-Clause","hashes":{"sha1":"75bee71c98128117d0a567f2ad35cd01f75750e0","sha256":"5f961516903bac3ca1dd9111c72a858f852b6112da3fda7829bf5d825cd25b37"}}
actual:   {"path":"info/about.json","license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference","hashes":{"sha1":"75bee71c98128117d0a567f2ad35cd01f75750e0","sha256":"5f961516903bac3ca1dd9111c72a858f852b6112da3fda7829bf5d825cd25b37"}}
-------------------
expected: {"path":"info/recipe/meta.yaml","license":"BSD-3-Clause","hashes":{"sha1":"f1022538c9bd0fb683318f39954ae2a085d73a10","sha256":"3d2a25d96d805e0c5b0cab0615118d8bcb860ef92611b188534da32a301be623"}}
actual:   {"path":"info/recipe/meta.yaml","license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference","hashes":{"sha1":"f1022538c9bd0fb683318f39954ae2a085d73a10","sha256":"3d2a25d96d805e0c5b0cab0615118d8bcb860ef92611b188534da32a301be623"}}
-------------------
expected: {"path":"info/recipe/meta.yaml.template","license":"BSD-3-Clause","hashes":{"sha1":"4312867c86b5c46e98b65ed788975a530fd3236a","sha256":"8902b1e3e0205039794cd2702848b717055a0f5dbed0697249c8a4ddffc0543f"}}
actual:   {"path":"info/recipe/meta.yaml.template","license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference","hashes":{"sha1":"4312867c86b5c46e98b65ed788975a530fd3236a","sha256":"8902b1e3e0205039794cd2702848b717055a0f5dbed0697249c8a4ddffc0543f"}}
-------------------
expected: {"path":"lib/python3.6/site-packages/numpy/distutils/fcompiler/absoft.py","attributions":["Copyright Absoft Corporation","Copyright Absoft Corporation 1994-2002 Absoft Pro FORTRAN","Copyright Absoft Corporation 1994-1998 mV2 Cray Research, Inc. 1994-1996 CF90"],"hashes":{"sha1":"af8d91b136b5a80ae20f9a7245809be4cc852420","sha256":"00a6e3e6e1abf1da460cbcd12096dd5275d702d17fe64e09aa7ab04d6bf2fad4"}}
actual:   {"path":"lib/python3.6/site-packages/numpy/distutils/fcompiler/absoft.py","attributions":["Copyright Absoft Corporation","Copyright Absoft Corporation 1994-2002 Absoft Pro FORTRAN","Copyright Absoft Corporation 1994-1998 mV2 Cray Research, Inc."],"hashes":{"sha1":"af8d91b136b5a80ae20f9a7245809be4cc852420","sha256":"00a6e3e6e1abf1da460cbcd12096dd5275d702d17fe64e09aa7ab04d6bf2fad4"}}
-------------------
expected: {"path":"lib/python3.6/site-packages/numpy/f2py/f2py2e.py","license":"BSD-3-Clause AND NOASSERTION","attributions":["Copyright 1999 2011 Pearu Peterson","Copyright 1999 - 2011 Pearu Peterson"],"hashes":{"sha1":"a6c6f2bbc8cd3bed85610cf122cd6264c949dae3","sha256":"c3dcd2246ded9c23323ab81926a8598845280279c8ee853ad64619cefb0b75fa"}}
actual:   {"path":"lib/python3.6/site-packages/numpy/f2py/f2py2e.py","license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference","attributions":["Copyright 1999-2011 Pearu Peterson","Copyright 1999 - 2011 Pearu Peterson"],"hashes":{"sha1":"a6c6f2bbc8cd3bed85610cf122cd6264c949dae3","sha256":"c3dcd2246ded9c23323ab81926a8598845280279c8ee853ad64619cefb0b75fa"}}
-------------------
expected: {"path":"lib/python3.6/site-packages/numpy/f2py/setup.py","license":"BSD-3-Clause AND NOASSERTION","attributions":["Copyright 2001-2005 Pearu Peterson"],"hashes":{"sha1":"0f3d561e9548e842b8694b5fa479ebe718245ce1","sha256":"a8d088a913dca445212418e286d11711ee088a5e170d8551008fec666ef16613"}}
actual:   {"path":"lib/python3.6/site-packages/numpy/f2py/setup.py","license":"BSD-3-Clause AND LicenseRef-scancode-free-unknown","attributions":["Copyright 2001-2005 Pearu Peterson"],"hashes":{"sha1":"0f3d561e9548e842b8694b5fa479ebe718245ce1","sha256":"a8d088a913dca445212418e286d11711ee088a5e170d8551008fec666ef16613"}}
-------------------
expected: {"path":"lib/python3.6/site-packages/numpy/f2py/__pycache__/f2py2e.cpython-36.pyc","license":"BSD-3-Clause AND NOASSERTION","attributions":["Copyright 1999 2011 Pearu Peterson","Copyright 1999 - 2011 Pearu Peterson"],"hashes":{"sha1":"13f8ab8f760195b5599f66f4be8c8381f68ecad8","sha256":"50297551bfc28e1e9d91879accc23544a05b2446f2f121ee32dc30acc87a8fa0"}}
actual:   {"path":"lib/python3.6/site-packages/numpy/f2py/__pycache__/f2py2e.cpython-36.pyc","license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference","attributions":["Copyright 1999-2011 Pearu Peterson","Copyright 1999 - 2011 Pearu Peterson"],"hashes":{"sha1":"13f8ab8f760195b5599f66f4be8c8381f68ecad8","sha256":"50297551bfc28e1e9d91879accc23544a05b2446f2f121ee32dc30acc87a8fa0"}}
-------------------
expected: {"path":"lib/python3.6/site-packages/numpy/f2py/__pycache__/setup.cpython-36.pyc","license":"BSD-3-Clause AND NOASSERTION","attributions":["Copyright 2001-2005 Pearu Peterson"],"hashes":{"sha1":"f5b2d8b039f675eb7b28c52b936a39c092832f61","sha256":"0c23abb7e046eb20beab087ae9d791a957fc553c191811c94c6ada2d08121a21"}}
actual:   {"path":"lib/python3.6/site-packages/numpy/f2py/__pycache__/setup.cpython-36.pyc","license":"BSD-3-Clause AND LicenseRef-scancode-free-unknown","attributions":["Copyright 2001-2005 Pearu Peterson"],"hashes":{"sha1":"f5b2d8b039f675eb7b28c52b936a39c092832f61","sha256":"0c23abb7e046eb20beab087ae9d791a957fc553c191811c94c6ada2d08121a21"}}
-------------------
expected: {"path":"lib/python3.6/site-packages/numpy-1.16.6.dist-info/METADATA","license":"BSD-3-Clause AND NOASSERTION","hashes":{"sha1":"854d9701eb6441931a7916c8780a5e74bedd5831","sha256":"f8f6b36613e999ecc1fe61cea6ba132d66708aeb7c132c69ce587a0fd25f1b9b"}}
actual:   {"path":"lib/python3.6/site-packages/numpy-1.16.6.dist-info/METADATA","license":"BSD-3-Clause AND LicenseRef-scancode-free-unknown","hashes":{"sha1":"854d9701eb6441931a7916c8780a5e74bedd5831","sha256":"f8f6b36613e999ecc1fe61cea6ba132d66708aeb7c132c69ce587a0fd25f1b9b"}} 
  1. pypi/pypi/-/sdbus/0.12.0 - This coordinate is in discussion to raise a ticket with scancode about its license findings.
  2. pod/cocoapods/-/SoftButton/0.1.0 – Readme.MD file license is detected in new code which was not getting in earlier version
  3. crate/cratesio/-/ratatui/0.26.0 – testcase failing due to change in repo namespace. All other things are working as previously
  4. npm/npmjs/-/redis/0.1.0 – Declared license is getting populated, notice is generated, scores improved.
  5. Nuget.Protocol/6.7.1 – NOASSERTION and ECL has been taken care off. Test case failing due to change in the score.
  6. deb/debian/-/mini-httpd/1.30-0.2_arm64 – Passed
  7. debsrc/debian/-/mini-httpd/1.30-0.2 – Passed
  8. pod/cocoapods/-/xcbeautify/0.9.1 – Passed
  9. maven/mavencentral/org.apache.httpcomponents/httpcore/4.4.16 – Passed
  10. maven/mavengoogle/android.arch.lifecycle/common/1.0.1 – Passed
  11. go/golang/rsc.io/quote/v1.3.0 – Passed
  12. composer/packagist/symfony/polyfill-mbstring/v1.28.0 – Passed
  13. gem/rubygems/-/sorbet/0.5.11226 – Passed
  14. git/github/ratatui-org/ratatui/bcf43688ec4a13825307aef88f3cdcd007b32641 – Passed

Here are the code changes related to this - yashkohli88#5

In my opinion regarding 'LicenseRef-scancode-unknown-license-reference' cases, this license match is triggered specifically by 'License' keyword present in those files.

@yashkohli88
Copy link
Contributor

Most of the differences have occured due to presence of 'License' keyword in any of the file. New scancode triggers 'LicenseRef-scancode-unknown-license-reference' whenever a license keyword is found in the file. In both the above failed scenarios I have observed this behavior. Attached screenshot where 'matched_text' field from scancode results can be observed to contain the text where this match is found.

'pypi/pypi/-/platformdirs/4.2.0' -
There is a 'LicenseRef-scancode-unknown-license-reference' reported in discovered license.
image

'conda/conda-forge/linux-aarch64/numpy/1.16.6-py36hdc1b780_0' -
Difference 1 - "path":"info/about.json" -
Expected - "license":"BSD-3-Clause"
Actual - "license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference"
LicenseRef-scancode-unknown-license-reference is detected because of the keyword 'License.txt'. This can be verified from the screenshot below.

image

Difference 2 - "path":"info/recipe/meta.yaml" -
Expected - "license":"BSD-3-Clause"
Actual - "license":"BSD-3-Clause AND LicenseRef-scancode-unknown-license-reference"

image

@qtomlinson
Copy link
Collaborator Author

qtomlinson commented Oct 11, 2024

@yashkohli88 Thanks for the detailed explanation! I have summarized the findings of adding filtering below:
Pros:

  1. Fixed 1 out of 2 license detection differences in Nuget/nuget/-/NuGet.Protocol/6.7.1
  2. Reduced number of license detection differences for conda/conda-forge/linux-aarch64/numpy/1.16.6-py36hdc1b780_0.
  • Prior to filtering, license detection difference is observed in 12 files
  • After filtering is added, this number is reduced to 9.
  1. Fixed the license detection difference in git/github/ratatui-org/ratatui/bcf43688ec4a13825307aef88f3cdcd007b32641. The definition is now the same as production deployment

Cons:

  1. regression: License detection for file platformdirs-4.2.0/PKG-INFO in pypi/pypi/-/platformdirs/4.2.0 now includes LicenseRef-scancode-unknown-license-reference.

@qtomlinson
Copy link
Collaborator Author

As per our discussion, need to update the fixture and track the ones with regression in a documentation in operation repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants