Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocq Package Metadata: Piggybacking on Findlib. #101

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
267 changes: 267 additions & 0 deletions text/000-rocq-findlib.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
- Title: Rocq Package Metadata: Piggybacking on Findlib

- Drivers: Rodolphe Lepigre

----

# Summary

The current installation scheme for Rocq theories, which consists in mingling
files from various packages under a single `user-contrib` folder, poses many
modularity problems. To solve them, we propose to piggyback on existing OCaml
Findlib infrastructure, which would support a much better installation scheme,
targeting different directories for different packages (as is typically done
in other languages, for example OCaml). This proposal is in part motivated by
the desire to improve the Rocq support in the `dune` build system.

# Motivation

Rocq packages, typically in the form of Opam packages, currently adhere to the
following installation scheme (focusing on plugins and theories only). If
there are plugins, they are simply installed as standard OCaml libraries. This
builds on existing OCaml infrastructure, and works very well in practice since
the move to the `findlib` method for loading plugins. Rocq theories, however,
are all installed under `$PREFIX/lib/rocq/user-contrib/`, independently of the
package they originate from.

The `user-contrib` setup poses several modularity issues:
1. The connection between theories and the packages they originate from is
lost. In particular, this makes it non-trivial to uninstall a package, and
extra metadata needs to be kept in a package-specific way. For example, the
`dune` build system relies on its `dune-package` files to remember exactly
what files belong to a package.
2. Two packages can overlap in the directory paths they define, but the setup
prevents them from defining theories with the same name (they would end up
at exactly the same file path). For instance, this forbids one to provide
alternative packages defining theories with the same interface, which can
be useful for configuration options, or alternative implementations when
there are various trade-offs to choose from.
3. It is not possible to infer the transitive dependencies of a given package
from its installed theories. As a consequence, build systems are left with
two alternatives:
- Always give access to the whole of user-contrib, which is what happens
with typical `rocq_makefile` setups.
- Only give access to user-selected theories (as with the `dune` setup),
but require users to make all transitive dependencies explicit, which
clearly does not scale.
4. It is not possible to know what plugins a given package might want to
load, which again poses problems for build systems like `dune`.

# Detailed design

The main idea is to piggyback on the OCaml Findlib infrastructure. It is used
for plugins already, and that should remain unchanged. In particular, plugins
should continue to be installed in the same way.

The proposal is to install Rocq theories similarly to OCaml packages. Here,
the notion of package is that of Findlib, not that of Opam, even though they
are related (an Opam package generally consists in a collection of Findlib
packages). Findlib packages are declared in `META` files, which typically
look like the following.
```
version = "dev"
description = "..."
requires = "str unix yojson"
...

package "subpackage1" (
directory = "subpackage1"
version = "dev"
description = "..."
requires = "str unix yojson"
...
)

package "subpackage2" (
directory = "package2"

package "subsubpackage1" (
directory = "subsubpackage1"
version = "dev"
description = "..."
requires = "str unix"
...
)

package "subsubpackage2" (
...
)
)
```
Assuming that the above `META` file is placed in a folder `example`, then
it declares four packages named `example`, `example.subpackage1`,
`example.subpackage2.subsubpackage1`, and
`example.subpackage2.subsubpackage2`.
Each package has associated metadata, including a list of the other packages
it depends on (`requires` field), and a directory (`directory` field). The
directory for package `example` is the `example/` directory holding the `META`
file. The directory for `example.subpackage1` is `example/subpackage1/`. In
effect, the nesting of packages reflects a nesting of directories, and it is
expected for packages to place their files (which, or OCaml would be `.cma`,
`.cmxa`, and `.cmxs` files, for example) in their respective directories.

For Rocq theories, the same structure can be preserved: the package name can
be used to require the corresponding theory in build systems, and the package
directory can be used to store the theory's files (`.v`, `.vo`, `.glob`). The
`requires` field can also be used to specify dependencies on both plugins, and
other theories uniformly. The only missing piece is the expected directory
path for the theory. For that, we can simply rely on a new `rocqpath` field,
whose presence also indicates what findlib package is a Rocq theory.

## Package structure

A Rocq theory Findlib package contains the following information:
- The package name built in the usual Findlib way. Note that the package name
may (and often will) differ from the directory path of the theory. Indeed,
the contrary would be way too constraining (and would also prevent fixing
modularity issue 2). For example, the findlib package for the Rocq stdlib
would likely be named `rocq-stdlib`, while its directory path is `Stdlib`.
- Field `rocqpath` specifying the directory path associated to the package's
theory. The presence of this field also distinguishes Rocq theory packages
from standard OCaml package (including Rocq plugins).
- Standard field `directory`, indicating the directory under which the files
for the package are expected to be found. Unlike OCaml packages, theories do
not generally have a flat directory structure. To guard against collisions
between package names and Rocq path members, theory files should be stored
under an additional `rocq.d` directory. Note that since `rocq.d` contains a
dot, it does not constitute a valid package name, which should prevent name
clashes in practice. (Generally, (sub-)package names coincide with their
directory field, but that is not a requirement of the META format. If that
convention was to be adopted, clashes would become impossible.)
- Standard field `requires`, giving a list of dependencies, in the form of
package names. These include both plugins and other theories, as they can
be handled uniformly. Note that a package can always be identified as a
theory by looking for the `rocqpath` field.
- Other standard metadata fields like `version` or `description`.

## Requiring a package

When a package `my_package` is required via a build system, the metadata can
be used to compute a set of transitive dependencies for it. Using `ocamlfind`,
this can be done as follows.
```
ocamlfind query p -r -format "%p"
```

The obtained packages can then be sorted into two lists: one containing OCaml
libraries (including Rocq plugins), and one containing Rocq theories. One may
run the following command to determine whether a package is a theory or not.
```
ocamlfind query some_dep -format "%(rocqpath)"
```
The result is empty if `some_dep` is a standard OCaml package (possibly a Rocq
plugin), and it gives a directory path for Rocq theories.

The mapping for each theory package `some_theory` can then be computed using
the following command.
```
ocamlfind query some_theory -format "-Q %d/rocq.d %(rocqpath)"
```

## Example: stdpp

The packages for [stdpp](https://gitlab.mpi-sws.org/iris/stdpp) could end up
being installed under `<PREFIX>/lib/rocq-stdpp/`, which would contains the
following `META` file.
```
version = "dev"
description = "An extended \"Standard Library\" for Rocq."
rocqpath = "stdpp"
requires = "rocq-stdlib"

package "bitvector" (
directory = "bitvector"
version = "dev"
description = "A library for bitvectors based on std++."
rocqpath = "stdpp.bitvector"
requires = "rocq-stdpp"
)

package "unstable" (
directory = "unstable"
version = "dev"
description = "Unfinished std++ libraries."
rocqpath = "stdpp.unstable"
requires = "rocq-stdpp"
)
```

Differences with the current `stdpp` setup:
- Rocq-ified package names.
- Use of sub-packages form `stdpp.bitvector` and `stdpp.unstable`, instead of
different Opam packages.

## Example: coq-elpi

```
rocqpath = "elpi"
requires = "rocq-stdlib coq-elpi.plugin"
...

package "plugin" (
directory = "plugin"
requires = "rocq-core.plugins.ltac rocq-core.vernac elpi"
...
)

package "apps" (
directory "apps"

package "derive" (
directory "derive"
rocqpath = "elpi.apps.derive"
requires = "rocq-elpi rocq-elpi.apps.derive.elpi"
...

package "elpi" (
directory "elpi"
rocqpath = "elpi.apps.derive.elpi"
requires = "rocq-elpi"
...
)
)

package "tc" (
directory "tc"
rocqpath = "elpi.apps.tc"
requires = "rocq-elpi rocq-elpi.apps.tc.elpi rocq-elpi.apps.tc.plugin"
...

package "plugin" (
directory = "plugin"
requires = "rocq-core.plugins.ltac rocq-core.vernac rocq-elpi.plugin"
...
)

package "elpi" (
directory "elpi"
rocqpath = "elpi.apps.tc.elpi"
requires = "rocq-elpi"
...
)
)

...
)
```

# Drawbacks

None, except the eventual need to port all projects to use a build system that
respects the new installation scheme. However, as a first step, it is possible
Comment on lines +250 to +251
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if the build systems are ported, maybe there will be no porting needed on project side.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think that porting will be required regardless: the build systems will need to be provided with different information (basically a list of package names the project depends on). For dune, this information is already required in some form, but that might need to change. In particular, theories that are dependencies currently need to be listed as directory paths, and these would need to become package names. (Furthermore, the (plugins ...) and (theories ...) fields of the coq.theory stanza could be merged in a single (deps ...) field.)

In any case, care needs to be taken to ensure that a smooth transition is possible.

Copy link
Member

@jfehrle jfehrle Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best transition is one that allows using either the old or the new build method for a transition period. That may be difficult to achieve. Requiring immediate but very simple changes to every interesting Rocq package in opam could be a rocky transition.

to keep using the current installation scheme at the same time.

# Alternatives

All reasonable systems have a modular installation scheme. An alternative
implementation would be to start from scratch and not reuse the Findlib file
format, but given the close integration of Rocq in the OCaml ecosystem that
probably would not make sense.

# Unresolved questions
Copy link
Member

@jfehrle jfehrle Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When dune is enhanced, perhaps we can also make progress on these related issues:

  • From within Rocq, there's currently no way to learn what packages are installed. The installation process should save the opam package name, version number, description and the associated Rocq load path names (-R -Q etc, determining the module name) so that Rocq can show this information.

  • Rocq should be able to show package dependencies for installed packages. Non-developer users won't want to wade through a bunch of configuration files to get this info. If they can get readily get it from the OS command line in an understandable format, that would be reasonable provided we document how to do this.

  • There is no mechanism to ensure that two packages don't use the same load path, in which case you can't use both simultaneously without modifying one of the packages. This could be addressed with a registry of package load paths for Rocq opam packages and checking that new packages have unique load paths. This would be an opam change rather than dune.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that most of these things could be done by querying opam and using ocamlfind (or the underlying findlib OCaml library). In any case, a tool could be developed to easily query that information if people find that is useful.

About your the last point: I believe there should be no such mechanism. It is perfectly fine if two independent packages use the same load path, even if two files end up a the same module path. This is even sometimes desirable, for example for configuration packages not meant to be used at the same time. If two packages define the same module, then the order of dependencies should be good enough for disambiguation.

Copy link
Member

@jfehrle jfehrle Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc section I mention above lists some useful opam commands. I'm not aware of how you get the full list of dependencies for a package. Also, there's no clean way to get the Coq load path (logical name) for an opam package. Non-developer users will want to know the dependencies.

For load paths, there could be a way to skip the check for the case you mention.

If two packages define the same module, then the order of dependencies should be good enough for disambiguation.

Where does this order of dependencies come from? You shouldn't have to specify -R on the command line for installed packages. Also it would be strange IINM to have a subdirectory that's in the overridden package that doesn't exist in the overriding package show up in the midst of the overriding package in Print LoadPath. Don't you think that breaks the appearance of modularity? Seems like this would be confusing for users.


Can we adapt `rocq_makefile` so that it expects dependencies to be installed
following the considered installation scheme, and itself installs libraries
appropriately?
Comment on lines +263 to +265
Copy link
Member

@jfehrle jfehrle Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having rocq_makefile install packages may be nice as an option. However, developers modifying a package should probably still be able to build it without installing anything (instead reporting errors for missing packages).

If you're modifying a package, the code you're modifying should not be in opam or Rocq directories (good development practice). So there should be a way to build in a user specified directory (downloading source with git). Developers may want to modify a package that's a dependency of their main package, testing by running their main package. How would that work? There's also the more obscure (hypothetical?) case where you're modifying multiple packages at once--all of which should be outside of opam/Rocq directories.

One difficulty with modifying rocq_makefile is that IINM GUIs know and parse the file. (A better design would be for only Rocq to parse the file and provide an API the GUIs can call to get the details in a processed format.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the typical workflow for maintaining packages is to create an opam switch with all dependencies installed in the appropriate version, and then building the project from its sources. For example, I know that this is how most Iris devs do things, and they even use some infrastructure for pinning the right versions of their dependencies in opam.

Alternatives exist, like using dune and it's composed builds, which lets you compile a project and its dependencies in the same workspace, without having to install said dependencies. What you are talking about above seems to be similar to that if I understand correction. In any case, a mechanism like that is very useful, but rather orthogonal to the particular installation scheme / package metadata.

I have never heard of anything like GUIs parsing rocq_makefile files. Do you have pointers to that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orthogonal to the particular installation scheme / package metadata.

My first paragraph above responds directly to your comments about installing libraries. Also what I said about using a non-opam, non-Rocq directory for packages that you're modifying. That is directly related to the installation scheme, isn't it?

Currently, the typical workflow for maintaining packages is to create an opam switch

Perhaps, but I expect that requires considerable familiarity with opam. The procedure should be documented rather than forcing everyone to figure it out on their own. Last I checked, about 2 years ago, the opam doc was still quite incomplete. Pretty much the same with the dune documentation and its dependencies. Does this procedure let you put the code you want to modify in a non-opam directory?

I wrote this section about 2 years ago. Some of the procedures are quite awkward due to missing features.

I have never heard of anything like GUIs parsing rocq_makefile files. Do you have pointers to that?

The dependency is on _CoqProject, my mistake.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orthogonal to the particular installation scheme / package metadata.

My first paragraph above responds directly to your comments about installing libraries. Also what I said about using a non-opam, non-Rocq directory for packages that you're modifying. That is directly related to the installation scheme, isn't it?

I'm sorry, I realize now that I did not quite understand what you meant in your first paragraph, and in fact I still don't. Clearly, I'm failing to communicate what I have in mind. What kind of situation do you have in mind exactly when writing "developers modifying a package"?

In case that helps, let me try to clarify what I meant to communicate in the CEP paragraph we are discussing: I have not yet worked out how rocq_makefile can be made to work with the discussed installation scheme. That roughly means figuring out two things:

  1. How does a developer communicate to rocq_makefile the theory dependencies of the project in practice? Currently there is no need to do so because everything that is installed is always available, but that will eventually need to change when we stop putting theories in user-contrib. Instead, we will need to be explicit about what packages we depend on, and -Q bindings will need to be somehow added for all transitive deps.
  2. How do we make it so that rocq_makefile produces a Makefile that, when invoked with make install, has the effect of installing the theories as expected for the new installation scheme?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of situation do you have in mind exactly when writing "developers modifying a package"?

Primarily the author or a maintainer of the package who needs to modify the code in the package while doing maintenance, perhaps preparing PRs, but potentially could also be a developer who wants to tweak the package locally, e.g. to work around a limitation. Is that specific enough?


We need to confirm that this is good enough for good `dune` support.