Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Ruby Package Ecosystem/Datafile Handler to tag key_files properly #3881

Open
swastkk opened this issue Aug 5, 2024 · 2 comments
Open

Comments

@swastkk
Copy link
Collaborator

swastkk commented Aug 5, 2024

Description

The Ruby Package Ecosystem miss to tag the key_files properly that affects the proper attributes population at Package Level and further the license_clarity_score

Example

While scanning https://github.com/inspec/inspec/archive/refs/tags/v6.8.2.zip , got the license_clarity_score as 0 with LICENSE at inspec-bin/LICENSE and not at root is not tagged as key_file

{
      "path": "inspec-6.8.2.tar.gz-extract/inspec-6.8.2/inspec-bin/LICENSE",
      "type": "file",
      "name": "LICENSE",
      "base_name": "LICENSE",
      "extension": "",
      "size": 590,
      "date": "2024-08-02",
      "sha1": "f7fbb40d12aae4849b657cc27937e3a0f2b3dbad",
      "md5": "81b0e16be045534c5330969d1e542bb4",
      "sha256": "7f93f3fbf47c2b8129a7c1524f2fc9ed0b18e8cd0d21ab8f66dad6928ce43172",
      "mime_type": "text/plain",
      "file_type": "ASCII text",
      "programming_language": null,
      "is_binary": false,
      "is_text": true,
      "is_archive": false,
      "is_media": false,
      "is_source": false,
      "is_script": false,
      "package_data": [],
      "for_packages": [],
      "is_legal": true,
      "is_manifest": false,
      "is_readme": false,
      "is_top_level": false,
      "is_key_file": false,
      "detected_license_expression": "apache-2.0",
      "detected_license_expression_spdx": "Apache-2.0",
      "license_detections": [
        {
          "license_expression": "apache-2.0",
          "license_expression_spdx": "Apache-2.0",
          "matches": [
            {
              "license_expression": "apache-2.0",
              "spdx_license_expression": "Apache-2.0",
              "from_file": "inspec-6.8.2.tar.gz-extract/inspec-6.8.2/inspec-bin/LICENSE",
              "start_line": 3,
              "end_line": 13,
              "matcher": "2-aho",
              "score": 100.0,
              "matched_length": 85,
              "match_coverage": 100.0,
              "rule_relevance": 100,
              "rule_identifier": "apache-2.0_7.RULE",
              "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/apache-2.0_7.RULE",
              "matched_text": "   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License.",
              "matched_text_diagnostics": "Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License."
            }
          ],
          "detection_log": [],
          "identifier": "apache_2_0-c4e30bcd-ccfd-bbc3-d2f1-196ab911e47d"
        }
      ],
      "license_clues": [],
      "percentage_of_license_text": 93.41,
      "copyrights": [
        {
          "copyright": "Copyright (c) 2019 Chef Software Inc.",
          "start_line": 1,
          "end_line": 1
        }
      ],
      "holders": [
        {
          "holder": "Chef Software Inc.",
          "start_line": 1,
          "end_line": 1
        }
      ],
      "authors": [],
      "emails": [],
      "urls": [
        {
          "url": "http://www.apache.org/licenses/LICENSE-2.0",
          "start_line": 7,
          "end_line": 7
        }
      ],
      "files_count": 0,
      "dirs_count": 0,
      "size_count": 0,
      "scan_errors": []
    },
    {

https://rubygems.org/gems/inspec-bin

Consequently the package attributes like copyright, holder, etc are not populated well and got the license_clarity_score as 0

@Ripoohann
Copy link

Hi I am new to contribution and would Like to work on this issue, could you please elaborate

@swastkk
Copy link
Collaborator Author

swastkk commented Aug 8, 2024

Hey @Ripoohann Actually this issue involves the scanning of a Monorepo that contains various Rubygem packages and as #3792 states the Package Level Summary is to be computed, and under that we are calculating the license_clarity_score and populating the various top level package attributes like copyright, holder, other_license_expression, notice_text So We are facing issue in this Monorepo and further in rubygem package ecosystem where we are not tagging the key_files properly that consequently helps in calculation of that license clarity score and package attributes that needs to be populated well. So we need to implement something in the Datafile handler that can help to tag the key_files properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants