-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate "or later" licenses #9
Comments
* this is an alternative approach to deal with "or later" where these are recognized as their own token. * as a side effect, a symbol needs to have an "or later" attribute and the render() calls needs to accept an "or later" template for proper rendering. * license keys can no longer contain a "+" or "or later" though this is could be relaxed as the automaton-based tokenizer could also handle this if needed Link: #9 Signed-off-by: Philippe Ombredanne <[email protected]>
* tokenize() now handles two cases: the Licensing was created with or without symbols. In the first case, the automaton-based tokenizer is used; otherwise a plain regex-based splitter is used and more constraints are enforced on license symbols: they cannot contain spaces and only the "+" is recognized as or later. Link: #9 Signed-off-by: Philippe Ombredanne <[email protected]>
#11 has been merged. Is there anything left to do here? |
Based on Annex D: SPDX license expressions, shouldn't I be able to parse >>> from license_expression import get_spdx_licensing
>>> licensing = get_spdx_licensing()
>>> licensing.parse("Apache-2.0+", validate=True, strict=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<redacted>/python3.9/site-packages/license_expression/__init__.py", line 559, in parse
self.validate_license_keys(expression)
File "<redacted>/python3.9/site-packages/license_expression/__init__.py", line 466, in validate_license_keys
raise ExpressionError(msg)
license_expression.ExpressionError: Unknown license key(s): Apache-2.0+ |
@RazerM Thank you for chiming in! We use the bundled SPDX license list (derived from ScanCode) to validate strictly and we do not consider non-catalogued usages of the + trailing sign as a special construct: if you (really) want to support some You will note that even at SPDX, the use of trailing + is generally out of favor when not flat-out deprecated as it is for GPL/LGPL in the https://spdx.org/licenses/ "Deprecated License Identifiers" section in favor of using proper dedicated licenses keys such as GPL-2.0-or-later. So technically the spec would allow We do support all common usage of a In earnest I only ever saw an explicit usage of an "Apache-2.0 or later" license notice in the wild a handful of times; and it is a minor point in the grand scheme of things until there will be such thing as another version of the Apache license which is a fairly unlikely event to happen in the next few years. This said, we still have a handful of rules in scancode and we report an Apache 2.0 in these cases, so this is not an entirely unknown quantity albeit odd and rare in the wild:
So to recap:
|
Just to explain further, here I am adding >>> from license_expression import *
>>> index = get_license_index()
>>> apache = [l for l in index if l["license_key"] == "apache-2.0"][0]
>>> apache["other_spdx_license_keys"].append("Apache-2.0+")
>>> spdx_licensing = build_spdx_licensing(index)
>>> spdx_licensing.parse("Apache-2.0+", validate=True, strict=True)
LicenseSymbol('Apache-2.0', aliases=('Apache-2.0+',), is_exception=False) or with a new symbol: >>> index = get_license_index()
>>> a2plus = {'license_key': 'apache-2.0-plus', 'spdx_license_key': 'Apache-2.0+', 'other_spdx_license_keys': [], 'is_exception': False, 'is_deprecated': False, 'json': 'TBD', 'yml': 'TBD', 'html': 'TBD', 'text': 'TBD'}
>>> index.append(a2plus)
>>> spdx_licensing = build_spdx_licensing(index)
>>> spdx_licensing.parse("Apache-2.0+", validate=True, strict=True)
LicenseSymbol('Apache-2.0+', is_exception=False) |
@pombredanne thank you for the thorough response! I didn't see Given the workarounds that are available if this does come up, it should be fine. |
@RazerM you wrote:
I can state --without a hint of bias 👼 -- that this is likely the finest license-expression parsing library available in Python (and I think it is also the only one :] ) and likely one of the nicest anywhere! ... more seriously keep us posted with your eval. This is used in a few other projects such as AboutCode toolkit, ScanCode toolkit and ScanCode.io, FSFE REUSE and other places. It is supposed to be decently good at the small things it knows to do. And it is backed by a not too shabby boolean engine by @bastikr that I co-maintain. If there is something that does not work as it should, I would like it fixed, so your feedback is mucho welcomed! |
I've used license-expression as part of pycargoebuild to parse license strings from Rust packages. Unfortunately, there are quite some people who unpredictably use
Could you explain? I find that specification very hard to read but I couldn't find anything saying that. |
@mgorny do you have a list of all these cases?
So here, we could effectively parse MPL-2.0+ to MPL-2.0 with a proper alias in the licensing symbols IMHO. |
@mgorny The deprecation of "+" in favor "or-later" may not be in the spec, but it is in the (long) history of SPDX discussions. The "+" symbol was kept for backward compatibility. |
I don't have a list (this is the first time it was reported to me, with |
Well, then, perhaps it should be in the spec and propagated more widely because right now new projects are using it (whether it's meaningless or not). |
|
https://crates.io/crates/smartstring
I don't. I suppose you could grab the database dump (warning: 275M; linked on data access). |
@mgorny for reference, I just had a quick look at the versions.csv of the db dump: once filtered it yields these 482 unique license statements, with a good number that NOT valid SPDX expressions in the first place:
|
The
master
branch implementation treats "or later" licenses as separate keys with an eventual aliases.The
alternate-or-later-handling
branch implementation treats "or later" as keywords and not as separate license keys.If this later implementation ends up a winner, if would make sense to add validation to license symbols to check if a license supports an "or later" version or not to avoid stupid things like "MIT or later"
The text was updated successfully, but these errors were encountered: