Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trusted publisher release workfiow #13048

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
6 changes: 5 additions & 1 deletion .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,12 @@ updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "monthly"
interval: "weekly"
groups:
github-actions:
patterns:
- "*"
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "weekly"
44 changes: 44 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: Publish Python 🐍 distribution 📦 to PyPI

on:
push:
tags:
- "*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary, it's the same by default:

Suggested change
- "*"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I remove this line, vscode complains. I could put an empty sequence but I'm not sure it is easier to read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that an empty sequence has the same semantics. I think null or ~ might be equivalent, though.


jobs:
build:
name: Build distribution 📦
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
with:
persist-credentials: false
- name: Build a binary wheel and a source tarball
run: ./build-project.py
- name: Store the distribution packages
uses: actions/upload-artifact@65c4c4a1ddee5b72f698fdd19549f0f0fb45cf08 # v4
with:
name: python-package-distributions
path: dist/

publish-to-pypi:
name: >-
Publish Python 🐍 distribution 📦 to PyPI
needs:
- build
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/project/pip/${{ github.ref_name }}
permissions:
id-token: write # IMPORTANT: mandatory for trusted publishing

steps:
- name: Download all the dists
uses: actions/download-artifact@fa0a91b85d4f404e444e00e005971372dc801d16 # v4
with:
name: python-package-distributions
path: dist/
- name: Publish distribution 📦 to PyPI
uses: pypa/gh-action-pypi-publish@67339c736fd9354cd4f8cb0b744f2b82a74b5c70 # release/v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that I missed the comment here...

release/v1 is a branch, i.e. a moving pointer to commits. If you want to implement pinning, do that through regular tags. I'm pretty sure dependabot will get confused here, when attempting to bump, it will likely be unable to deduce the previous version pinned or update the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never used dependabot for pinned GitHub actions, so I don't know.

But which tag, then ? As I understand other actions have a "stable" tag that moves with each release. For instance, at the moment, action/checkout tags v4 and v4.2.2 both point to the same commit. Should there be some similar mechanism for pypa/gh-action-pypi-publish ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I really don't like having commit IDs like this - it feels annoyingly unmaintainable. I guess, if it's the recommended approach for securely using code from github URLs, then we have to do it, but I can't lie, I think it sucks.

If there's another, sufficiently secure, way of doing this that uses proper symbolic names, I'd far rather we used that. But if not, I'll live with this (I'll just ignore it and let someone else deal with keeping it up to date).

Copy link
Member Author

@sbidoul sbidoul Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the recommendation from @sethmlarson, who may confirm it's the best approach.

I personally don't find the hashes in the GitHub action worse than those in build-requirements.txt.

But, the worry I have is with reviewing a PR that would update those hashes. I would not know how to check efficiently that such a PR is not malicious, short of running the update locally myself and comparing, which at that point would be more work than doing the update myself. So my hope is that we can trust dependabot to do the update PRs (both for the GitHub action and build-requirements.txt), but if that does not work then I'm not sure I will be comfortable with the hashes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashes in requirement files supplement the version number. A mismatch is an error. In this case, the version is a comment, and there's no check that the commit ID and the version match. Dependabot might well cover this, but I'm not comfortable relying solely on an external tool for security here.

Copy link
Member

@ichard26 ichard26 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While that's a neat feature, unless the update is small, we are not equipped to review large diffs of JS projects, especially if they check in their built JS code. I think what @sethmlarson was implying that GitHub itself would check that any SHAs are linked to the versions they purport to be for edits to GHA workflows. That's a pipe dream (and maybe infeasible for other reasons, too).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also dislike having to stick hashes everywhere (from the UX/DX perspective). While I use the branch in all the examples in the README and the PyPUG guide, I think I left a few hints there that using SHAs is more secure in general.

Dependabot does infer the tags from SHAs it's auto-bumping. In that sense, it's already on GH not to mess it up. I'm not super happy with Dependabot on a few things (like it'd assume SemVer / tag semantics and would be rather noisy for my taste). But in general, this is what people use. Having zizmor could be a good middle ground for detecting mismatches between the SHAs and the tags.

On one had, you could also decide to trust that I won't break the moving pointer branch and just keep that. But this would not be setting a great example. Plus, it's still a good idea to be in control of reproducibility regardless. But with that, you'd go for bumping the tags already. And since the tool that does the bumping can update both tags (in comments) and hashes, it's probably fine to delegate that to it.

Note that Dependabot also tries to extract the commit list and change logs of things it's offering to update. They are usually collapsed but are available to read before merging its PRs. I personally set the updates to be as rare as possible, which is especially noticeable when maintaining multiple projects.

One possible future solution to this might be a maintained reusable workflow. I have something in my mind already, but unfortunately, trusted publishing does not yet support these. But in general, provided that there'd be reusable workflows with everything pinned, they could reduce the number of places where those pins would need to be updated/checked. That's a separate discussion topic, though.

Copy link
Contributor

@potiuk potiuk Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For checking the validity of what's been done, either CodeQL or Zizmor are the way to go, even if its done locally during development of the action.

Oh absolutely - we use CodeQL in CI and Zizmor in pre-commits and that combo works great.

Is this are you expecting? if not yeah absolutely we can contribute to dependaboat what is required :)

Yep. I think the easier you make to take a look at the action change/diff the better

While that's a neat feature, unless the update is small, we are not equipped to review large diffs of JS projects,

I think (and this is what we did) is not a complete review, more like "eyeball check" if there is nothing suspicious. If you blindly bump hash commit without any kind of verification of what's coming in it (and no tooling does it really now) - hash commit give you false sense of security because you are just slowing the rate of updates. Simply not reviewing what's coming at least to get a sense of "yeah, looks legit" is as insecure as having a moving tag.

Without actually reviewing or "knowing" what the the new tag has (when for example you contributed your own code) - using hash commits is mostly a "cargo cult".

Copy link
Member Author

@sbidoul sbidoul Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone.

My takeaway is that the PR is good as is, and we'll see how dependabot behaves.

Given the difficulty of actually verifying the hashes, we are going to trust dependabot to not be a bad actor, and assume it proposes hashes that actually correspond to the branch/tag it writes in comment.

If we merge the dependabot PRs at the beginning of the quarterly pip release cycle, it leaves 2-3 months for a possible compromise in a dependency to be detected before it actually causes harm in a pip release.

Compared to the process before this PR, the reproducibility of the build is improved.

Regarding security, I see it as essentially moving the trust from the integrity of the maintainer machine to the integrity of the GitHub runner, dependabot and 4 GitHub actions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding security, I see it as essentially moving the trust from the integrity of the maintainer machine to the integrity of the GitHub runner, dependabot and 4 GitHub actions.

Agreed.

To summarise my position on this (and to support the "we won't review diffs from action updates" position), I take the view that we build our release workflow around "release X of action Y". We trust the authors of action Y to have delivered the functionality they claim in release X, and to have ensured that what they publish as release X actually is release X. As such, we care about the release number, and translating that into a commit ID (either to reference in our workflow, or to check the ID that automated tool like Dependabot used) is error-prone busywork.

But to be clear, if our position is now that we trust the runner, the actions (and their authors) and dependabot, then that's fine with me, as it means we can validly accept dependabot commits (because we trust dependabot) without worrying about the commit IDs, and just basing the decision on what the new version release notes say.

4 changes: 4 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ include README.rst
include SECURITY.md
include pyproject.toml

include build-requirements.in
include build-requirements.txt
include build-project.py

include src/pip/_vendor/README.rst
include src/pip/_vendor/vendor.txt
recursive-include src/pip/_vendor *LICENSE*
Expand Down
67 changes: 67 additions & 0 deletions build-project.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, this file is pretty ugly IMO. If it were a nox session, it'd be so simple. However, nox does introduce its own non-insignificant attack surface, so I guess this is the price we pay.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we don't want to have to pin nox and all it's dependencies. This is invoked with nox -s build-release, though.

Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#!/usr/bin/env python3
"""Build pip using pinned build requirements."""

import subprocess
import tempfile
import venv
from os import PathLike
from pathlib import Path
from types import SimpleNamespace


class EnvBuilder(venv.EnvBuilder):
"""A subclass of venv.EnvBuilder that exposes the python executable command."""

def ensure_directories(
self, env_dir: str | bytes | PathLike[str] | PathLike[bytes]
) -> SimpleNamespace:
context = super().ensure_directories(env_dir)
self.env_exec_cmd = context.env_exec_cmd
return context


def get_git_head_timestamp() -> str:
return subprocess.run(
[
"git",
"log",
"-1",
"--pretty=format:%ct",
],
text=True,
stdout=subprocess.PIPE,
).stdout.strip()
Comment on lines +23 to +33
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that I literally used to maintain Black, but I never liked how Black splits long lists into sprawling messes like this. I would be in favour of using # fmt: off/on (or # fmt: skip) here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make for at least one a very long line, and I chose to have the calls presented in a uniform way across the file. I personally find the fmt comments would make things worse. A matter of taste, I guess.



def main() -> None:
with tempfile.TemporaryDirectory() as build_env:
env_builder = EnvBuilder(with_pip=True)
env_builder.create(build_env)
subprocess.run(
[
env_builder.env_exec_cmd,
"-Im",
"pip",
"install",
"--no-deps",
"--only-binary=:all:",
"--require-hashes",
"-r",
Path(__file__).parent / "build-requirements.txt",
],
check=True,
)
subprocess.run(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I suggest invoking check_call() instead?

Copy link
Member Author

@sbidoul sbidoul Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's what I used to use, but I kind of understood check_call and check_output where considered old style, so, I was trying to be a good boy and obey the Python documentation :)

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... I think it's talking about run() vs. Popen() there. The former is newer and higher level, while the latter is low-level and more difficult to work with (it's still used under the hood, though). I'd say Popen() is old-style in this case.

check_call() and check_output() are just convenience shortcuts for two typical uses of run(), AFAIU.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, over time I came to like using run over check_call and check_output. I find it equally easy to read and easier to teach and remember.

[
env_builder.env_exec_cmd,
"-Im",
"build",
"--no-isolation",
],
check=True,
env={"SOURCE_DATE_EPOCH": get_git_head_timestamp()},
)


if __name__ == "__main__":
main()
2 changes: 2 additions & 0 deletions build-requirements.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
build
setuptools
24 changes: 24 additions & 0 deletions build-requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#
# This file is autogenerated by pip-compile with Python 3.12
# by the following command:
#
# pip-compile --allow-unsafe --generate-hashes build-requirements.in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: these options can be set in a config file that pip-tools support. Consider doing so in a follow-up if more of pip-tools will eventually end up being used.

#
build==1.2.2.post1 \
--hash=sha256:1d61c0887fa860c01971625baae8bdd338e517b836a2f70dd1f7aa3a6b2fc5b5 \
--hash=sha256:b36993e92ca9375a219c99e606a122ff365a760a2d4bba0caa09bd5278b608b7
# via -r build-requirements.in
packaging==24.2 \
--hash=sha256:09abb1bccd265c01f4a3aa3f7a7db064b36514d2cba19a2f694fe6150451a759 \
--hash=sha256:c228a6dc5e932d346bc5739379109d49e8853dd8223571c7c5b55260edc0b97f
# via build
pyproject-hooks==1.2.0 \
--hash=sha256:1e859bd5c40fae9448642dd871adf459e5e2084186e8d2c2a79a824c970da1f8 \
--hash=sha256:9e5c6bfa8dcc30091c74b0cf803c81fdd29d94f01992a7707bc97babb1141913
# via build

# The following packages are considered to be unsafe in a requirements file:
setuptools==75.8.0 \
--hash=sha256:c5afc8f407c626b8313a86e10311dd3f661c6cd9c09d4bf8c15c0e11f9f2b0e6 \
--hash=sha256:e3982f444617239225d675215d51f6ba05f845d4eec313da4418fdbb56fb27e3
# via -r build-requirements.in
7 changes: 2 additions & 5 deletions docs/html/development/release-process.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,11 +146,8 @@ Creating a new release
This will update the relevant files and tag the correct commit.
#. Submit the ``release/YY.N`` branch as a pull request and ensure CI passes.
Merge the changes back into ``main`` and pull them back locally.
#. Build the release artifacts using ``nox -s build-release -- YY.N``.
This will checkout the tag, generate the distribution files to be
uploaded and checkout the main branch again.
#. Upload the release to PyPI using ``nox -s upload-release -- YY.N``.
#. Push the tag created by ``prepare-release``.
#. Push the tag created by ``prepare-release``. This will trigger the release
workflow on GitHub and publish to PyPI.
#. Regenerate the ``get-pip.py`` script in the `get-pip repository`_ (as
documented there) and commit the results.
#. Submit a Pull Request to `CPython`_ adding the new version of pip
Expand Down
3 changes: 3 additions & 0 deletions news/13048.process.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Started releasing to PyPI from a GitHub Actions CI/CD workflow that implements trusted publishing and bundles :pep:`740` digital attestations.

In addition to being signed, the released distribution packages are now reproducible through the commit timestamp.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should've flagged this directly. This is currently inaccurate due to the setuptools bug I mentioned. I think it may be better to simply drop this until we confirm with setuptools that they're reproducible-ready.

6 changes: 3 additions & 3 deletions noxfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,7 @@ def build_release(session: nox.Session) -> None:
)

session.log("# Install dependencies")
session.install("build", "twine")
session.install("twine")

with release.isolated_temporary_checkout(session, version) as build_dir:
session.log(
Expand Down Expand Up @@ -375,11 +375,11 @@ def build_dists(session: nox.Session) -> List[str]:
)

session.log("# Build distributions")
session.run("python", "-m", "build", silent=True)
session.run("python", "build-project.py", silent=True)
produced_dists = glob.glob("dist/*")

session.log(f"# Verify distributions: {', '.join(produced_dists)}")
session.run("twine", "check", *produced_dists, silent=True)
session.run("twine", "check", "--strict", *produced_dists, silent=True)

return produced_dists

Expand Down
Loading