-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
License header formatting #983
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for this!! added some inline suggestions. a few comments more generally:
- there should be no references to "server" here. Any dynamic/repo-specific behavior should be handled via arguments or the config file - we would like checks in checks-superstaq to generalize outside of our repos as much as possible (within reason)
- once we correct outdated headers once, it seems like we shouldn't need to keep checking them? in which case maybe the "outdated" functionality doesn't need to live in in this script - we can save the code you use to make these initial corrections somewhere internally, and then use this script to check headers from here on out
- if we still need to explicitly check for the string "Infleqtion" in places, maybe we could add a "licensee" value to the config file in addition to the header, instead of hard coding it?
- similarly, i can maybe see why it's unavoidable but i feel like the hard-coded "apache" checks somewhat defeat the purposes of saving the header in the config. do you think there's an easy way to check if the headers are ~the same, up to licensee/year? maybe we could allow the header in pyproject.toml to include
{YEAR}
and{LICENSEE}
tags, which we could convert to wildcards when comparing against existing licenses
also fwiw it's also ok if this script doesn't handle every possible case perfectly - if it gets confused it can always just throw an error saying to fix the headers manually :)
""" | ||
) | ||
parser.add_argument( | ||
"--apply", action="store_true", help="Add the license header to files.", default=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to set default=False
with action="store_true"
"--apply", action="store_true", help="Add the license header to files.", default=False | |
"--apply", action="store_true", help="Add the license header to files." |
(ditto below)
license_header = "" | ||
exceptions = ["# pylint:", "#!/", "# mypy:"] | ||
|
||
with open(file, "r+") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the "+" here? if not:
with open(file, "r+") as f: | |
with open(file, "r") as f: |
try: | ||
data: dict[str, Any] = tomlkit.parse(Path("pyproject.toml").read_text()) | ||
expected_license_header = str(data["tool"]["license_header_format"]["license_header"]) | ||
in_server = "Apache" not in expected_license_header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should put this in a function instead executing it globally
raise KeyError( | ||
"Under [tool.license_header_format] add a license_header field with the license\ | ||
heder that should be added to source code files in the repository." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to raise an error in this case, we can just have an info message saying that no license header was found and then return as if it succeeded
return f""" | ||
Beginning at line: {self.start_line_num} | ||
Ending at line : {self.end_line_num}\n | ||
{self.license_header}\n""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
return f""" | |
Beginning at line: {self.start_line_num} | |
Ending at line : {self.end_line_num}\n | |
{self.license_header}\n""" | |
return ( | |
f"Beginning at line: {self.start_line_num}\n" | |
f"Ending at line : {self.end_line_num}\n\n" | |
f"{self.license_header}\n" | |
) |
That makes sense. We can also keep it but change the logic a bit. After the initial fix, instead of checking for ColdQuanta in the license header, we can check if it belongs to the licensee but is a different license. This can catch cases of changing the license provider.
I added a few more fields to replace the hard-coded variables. The cirq license header check pylint plugin does something similar. However, apache 2.0 licenses seem to have 2 different formattings from what I have seen in the license headers and that would mess with the matching if we use the wild card approach. |
and license_header.start_line_num <= line_num + 1 < license_header.end_line_num | ||
): | ||
if line[-2] == ",": | ||
prepend += line[:-1] + f" 2024 {licensee}.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
silly legal q: will we need to update the year in every file on 1/1/2025? or do they stay the same until if/when we update the file?
if the former we might want to replace 2024
with e.g. datetime.datetime.now().year
. if the latter maybe we want to make the last two digits wildcards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will we need to update the year in every file on 1/1/2025?
No. Given that, we can pick whichever path is easier.
… are from the same license, wildcards in pyproject toml input
@richrines1 can you please take a look? The biggest change is that I am using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(partial review)
if re.search(copyright_pattern, line): | ||
copyright_line += line | ||
body = "\n".join(header_as_lst[idx + 1 :]).strip("#") | ||
break | ||
else: | ||
copyright_line += line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if re.search(copyright_pattern, line): | |
copyright_line += line | |
body = "\n".join(header_as_lst[idx + 1 :]).strip("#") | |
break | |
else: | |
copyright_line += line | |
copyright_line += line | |
if re.search(copyright_pattern, line): | |
body = "\n".join(header_as_lst[idx + 1 :]).strip("#") | |
break |
|
||
for license_header in license_header_lst: | ||
similar_body = ( | ||
difflib.SequenceMatcher(None, body, license_header.license_header).ratio() > 0.94 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like this is comparing the existing header to the part of the existing header below the copyright line. should it be comparing to the expected header instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also might want to generalize this a bit to determine similarity, e.g. by comparing
"".join(line.lstrip("#").strip().lower() for line in header.splitlines()),
"".join(line.lstrip("#").strip().lower() for line in expected_header.splitlines()),
so that licenses will always get marked as similar if they only differ by cases/whitespace/comment style/etc
target = ( | ||
expected_license_header.replace("{YEAR}", r"20\d{2}") | ||
.replace("{LICENSEE}", licensee) | ||
.replace("\n", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we removing the newlines?
.replace("(", r"\(") | ||
.replace(")", r"\)") | ||
.replace(".", r"\.") | ||
.replace("'", r"\'") | ||
.replace('"', r"\"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use re.escape()
to do this?
license_header.header_type = HeaderType.VALID | ||
valid = True | ||
elif similar_body and re.search(appended_pattern, license_header.license_header): | ||
license_header.header_type = HeaderType.VALID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to treat this differently - if the licenses are similar but not exactly the same we should probably still rewrite them to match the expected formatting
""" | ||
copyright_line = "" | ||
body = "" | ||
copyright_pattern = re.compile(r"Copyright .*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make this one case insensitive? e.g.
copyright_pattern = re.compile(r"Copyright .*") | |
copyright_pattern = re.compile(r"Copyright .*", flags=re.IGNORECASE) |
@richrines1 can you take one last look at this please? I have responded to the comments. |
There are inconsistencies in when license headers are added to source code (and whether the added ones are the same). This PR solves this with a checker that checks if the source code has a license header and if it does, whether it is the correct one. It accounts for
shebang
lines, comments at the beginning of files unrelated to license headers, andpylint
andmypy
disable lines. This check has been added toall_.py
../checks/license_header_format_.py
from the root directory.-i
can be used to perform an incremental check. This is not enabled by default.*.py
.--no-header
flag is used to handle only cases where no header is found. By default, it is False.--bad-header
flag is used to handle only cases where an incorrect header is found. By default, it is False.--apply
flag is used to fix the header problems. By default, it only checks if the headers are correct and does not make any fixes.