Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PexKeyringConfigurationRequest API for configuring Pex/Pip keyring #21852

Closed
wants to merge 9 commits into from

Conversation

tdyas
Copy link
Contributor

@tdyas tdyas commented Jan 17, 2025

Add an API to allow plugins to provide credentials to be used by Pex/Pip when accessing secured Python package indexes.

@tdyas tdyas changed the title add PexKeyringConfigurationRequest API for configuring Pex/Pip keyring add PexKeyringConfigurationRequest API for configuring Pex/Pip keyring Jan 17, 2025
@tdyas
Copy link
Contributor Author

tdyas commented Jan 18, 2025

This PR is the base for the AWS CodeArtifact support to be added by #21853.

@tdyas tdyas requested review from huonw and benjyw January 19, 2025 02:47
@tdyas tdyas marked this pull request as ready for review January 19, 2025 02:47
Comment on lines +306 to +307
# TODO: Consider removing the run ID so that this value does not contribute to the cache key
# and cause unneeded invalidation of build actions as it changes for each run.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a main issue of this PR. Where do we store the credentials out of band in a way that does not cause unnecessary invalidation of processes across different Pants runs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern is that the credentials file gets rewritten on concurrent Pants runs. Maybe it should be stored per-pantsd process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative solution might install the keyring package somewhere (whether manually by the user or Pants doing it) and then set the password by calling keyring set SITE USER PASSWORD. Pants would no longer use a shim binary in that case.

An advantages is that we would no longer need an environment variable to point at the data file. This would prevent invalidating for changes to that filename (e.g., for different run/session IDs).

It assumes that updating the password is idempotent in the sense that any caller of keyring get SITE USER will be able to use that password. This should in fact be the case given how keyring works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also have the shim binary be for the entire workspace and have the data file name embedded in it. This would also avoid having the data file name in an environment variable for a Process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can pants not depend on keyring and use it programmatically in-process?

Copy link
Contributor Author

@tdyas tdyas Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the trade off made was, we cannot assume the in-process Pip support works under Pex and we cannot assume the user has installed a global keyring binary, so with this PR, Pants will try to inject a keyring binary for Pip to see and Pip will need to be in --keyring-provider=subprocess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, maybe the trade off of requiring an installed keyring provider will actually be the better solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Yeah, requiring keyring as a system dependency outside of Pants seems reasonable for a first cut at least?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the case, then this PR would reduce to just a --pex-cli-keyring-provider option instead of a plugin API? The PR stacked on top of this one to add AWS CodeArtifact token renewal is the driving force for this PR.

What are your thoughts on such token auto-renewal being driven from within a Pants plugin (which can be invoked on a per-session basis)?

An alternative solution would give users the ability to invoke a runnable during session startup as configured by a new target type. Then the user could define the aws token logic in a BUiLD file. Then there would be no need for most of the current solution direction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the simplest possible solution, that punts installing, locking down and configuring a keychain binary to the user, is fine...

# the `keyring` package. Pip will be configured to use its `subprocess` mode which will obtain
# credentials by invoking this "keyring" binary from the PATH.
#
# Credentials are stored in the file stored in the `__PANTS_KEYRING_DATA` environment variable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would emphasize (via name and comment) that the value in the env var is a path, not actual data content.

# and cause unneeded invalidation of build actions as it changes for each run.
keyring_data_path = build_root.pathlib_path / ".pants.d" / "keyring" / str(run_id) / "data.sh"
keyring_data_path.parent.mkdir(parents=True, exist_ok=True)
keyring_data_path.write_bytes(content.getvalue().encode())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not ensure that this file has no greater than 0600 perms?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And should we not delete it when the process returns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment at #21852 (comment) about concurrent runs. Would love your input on that question.

@tdyas
Copy link
Contributor Author

tdyas commented Jan 23, 2025

Going to close this for now since the easiest way to configure this support is likely to just use --pex-cli-global-args.

@tdyas tdyas closed this Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants