Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3Packages.torcheval: init at 0.0.7 #376129

Merged
merged 1 commit into from
Feb 10, 2025

Conversation

bengsparks
Copy link

@bengsparks bengsparks commented Jan 23, 2025

A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to facilitate metric computation in distributed training and tools for PyTorch model evaluations.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux and removed 10.rebuild-linux: 1-10 labels Jan 23, 2025
@osbm
Copy link
Member

osbm commented Jan 23, 2025

Fixes: #376031

@bengsparks bengsparks force-pushed the torcheval branch 2 times, most recently from 12f241b to 53b19a6 Compare January 24, 2025 14:16
@bengsparks bengsparks removed the 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin label Jan 24, 2025
@osbm
Copy link
Member

osbm commented Jan 24, 2025

i think torchtnt should be declared as on its own library so that nixpkgs users have access to both libraries

@github-actions github-actions bot added the 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin label Jan 24, 2025
@bengsparks bengsparks removed the 2.status: work-in-progress This PR isn't done label Jan 24, 2025
@bengsparks
Copy link
Author

@osbm That version of torchtnt is unlikely to be of great to use outside of this package.
I've updated the inclusion of torchtnt with a comment explaining how this came to be.

@bengsparks
Copy link
Author

bengsparks commented Jan 24, 2025

Alternatively, I could patch / correct a codepath that makes use of the breaking change in torchtnt, then nixpkgs' torchtnt can be used.
The diff in question is a

--- a/torcheval/tools/module_summary.py
+++ b/torcheval/tools/module_summary.py
@@ -29,7 +29,7 @@ from torch.utils._pytree import PyTree, tree_flatten
 from torch.utils.hooks import RemovableHandle
 
 from torchtnt.utils import Timer
-from torchtnt.utils.version import is_torch_version_geq_1_13
+from torchtnt.utils.version import is_torch_version_geq
 from typing_extensions import Literal
 
 _ATTRIB_TO_COL_HEADER = {
@@ -242,7 +242,7 @@ def _get_module_flops_and_activation_sizes(
     module_kwargs = module_kwargs or {}
     flops_forward = None
     flops_backward = None
-    if not is_torch_version_geq_1_13():
+    if not is_torch_version_geq("1.13.0"):
         warnings.warn(
             "Please install PyTorch 1.13 or higher to compute FLOPs: https://pytorch.org/get-started/locally/"
         )

@osbm
Copy link
Member

osbm commented Jan 24, 2025

It would actually be awesome to open pull requests on these upstream repositories first, if they get added quickly nixpkgs would never have to deal with these. Can you open these PRs or should i? I am talking about the both patches of torcheval and your patch of torchtnt.

@bengsparks
Copy link
Author

bengsparks commented Jan 24, 2025

@osbm unfortunately, while this patch does let the tests run, it reveals further breaking changes down the line, including the removal of torchtnt.utils.timer.py'sTimer.start and Timer.stop functions, with the only remaining alternative being the @contextmanager-annotated function Timer.action.

The usage of Timer.start and Timer.stop is complex enough (tools/module_summary.py:210) that I don't see how to replace the calls to Timer.start / Timer.stop with Timer.action, as _register_hooks makes use of register_module_forward_pre_hook to start and stop the timers.

forward_elapsed_time_handles = _register_hooks(
    module,
    [
        (_forward_time_pre_hook(forward_timer_mapping), _HookType.FORWARD_PRE_HOOK), # call to Timer.start
        (
            _forward_time_hook(forward_timer_mapping, forward_elapsed_times_sec), # call to Timer.stop
            _HookType.FORWARD_HOOK,
        ),
    ],
)

...

for hook_handle in forward_elapsed_time_handles:
    hook_handle.remove()

At that point it would be better to wait for torcheval to be tagged with 0.0.7 on GitHub, and bump this package accordingly (which I would gladly do as maintainer).
This PR, which made it into the GitHub repository 10 days before torcheval 0.0.7 was released on Pypi, removed torchtnt as a runtime dependency, which should make this easier.

Until then, I think I'll submit this package.
If you have a suitable codebase, could you import and call some functions of this package to see if things work as expected?

@osbm
Copy link
Member

osbm commented Jan 24, 2025

I have some scripts i can test it. But for that you need to add this package to the top-level/python-packages.nix list.

If you can also add me to the maintainer list to this package i can help update/maintain it in the future.

@github-actions github-actions bot added the 6.topic: flakes The experimental Nix feature label Jan 24, 2025
@github-actions github-actions bot removed the 6.topic: flakes The experimental Nix feature label Jan 24, 2025
@nix-owners nix-owners bot requested a review from natsukium January 24, 2025 23:21
@github-actions github-actions bot added 10.rebuild-linux: 1 10.rebuild-linux: 1-10 and removed 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux labels Jan 24, 2025
@bengsparks
Copy link
Author

@osbm I've added the relevant entry to top-level/python-packages.nix. Let me know if something breaks

@bengsparks
Copy link
Author

bengsparks commented Jan 26, 2025

The newest push stems from an updated comment that documents why the tests disabled in 27fb1ae cannot work, even if they are granted filesystem access.
The resulting derivation remains identical.

@bengsparks bengsparks force-pushed the torcheval branch 2 times, most recently from b609a14 to fb05836 Compare February 3, 2025 17:08
@bengsparks
Copy link
Author

bengsparks commented Feb 3, 2025

Using torchtnt-nightly directly as input causes tests to fail immediately

{ torchtnt-nightly }:
buildPythonPackage {
  # ...
  dependencies = [
    torchtnt-nightly
    typing-extensions
  ];
}
> =========================== short test summary info ============================
> ERROR tests/tools/test_flops.py
> ERROR tests/tools/test_module_summary.py
> !!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!
> ================== 1 deselected, 1 warning, 2 errors in 1.63s ==================
error: build of '/nix/store/1jx7cmyhhrpx52ryq93463m718sdnki6-python3.12-torcheval-0.0.6.drv' on 'ssh-ng://rosetta-builder' failed: builder for '/nix/store/1jx7cmyhhrpx52ryq93463m718sdnki6-python3.12-torcheval-0.0.6.drv' failed with exit code 2;
       last 25 log lines:
       > E   ImportError: cannot import name 'is_torch_version_geq_1_13' from 'torchtnt.utils.version' (/nix/store/mhzrnmh20f4kild55i233yzg71ywy24d-python3.12-torchtnt-nightly-2024.8.1/lib/python3.12/site-packages/torchtnt/utils/version.py)

When overriden to use the version I had picked out, the tests pass.

{ torchtnt-nightly }:
let
  torchtnt = torchtnt-nightly.overridePythonAttrs rec {
    pname = "torchtnt-nightly";
    version = "2023.1.25";
    src = fetchPypi {
      inherit pname version;
      hash = "sha256-eFouZgdVQaqrXIKLY7geLV8GEVPGu6IHKIPK6aLJH8w=";
    };
  };
in
buildPythonPackage {
  # ...
  dependencies = [
    torchtnt
    typing-extensions
  ];
}
$ nix-build -A python3Packages.torcheval --system aarch64-linux                 
/nix/store/sxcy8l7isk1ynxhg406zvkymv58d1lx6-python3.12-torcheval-0.0.6

@wegank wegank removed the 12.approvals: 1 This PR was reviewed and approved by one reputable person label Feb 4, 2025
@GaetanLepage
Copy link
Contributor

GaetanLepage commented Feb 10, 2025

@bengsparks have you tried reaching upstream?
This packaging endeavor would be made significantly easier if the versions of the packages were matching.
Maybe you can open issues to ask torchtnt to make a new release?

@bengsparks
Copy link
Author

bengsparks commented Feb 10, 2025

@GaetanLepage I think waiting for torcheval 0.0.7 on GitHub is the correct move then. (I have, since my work here, come to understand that Nix cannot fix package conflicts in Python).
This version of torcheval uses torchtnt as a dev dependency, which would make this process trivial. The only dependency would be typing extensions.

Alternatively I could package torcheval 0.0.7 from PyPI, but there are no tests there.
pytorch/torcheval#215 has seen no activity, and the repository doesn't seem particularly active.

@GaetanLepage
Copy link
Contributor

@GaetanLepage I think waiting for torcheval 0.0.7 on GitHub is the correct move then. (I have, since my work here, come to understand that Nix cannot fix package conflicts in Python). This version of torcheval uses torchtnt as a dev dependency, which would make this process trivial. The only dependency would be typing extensions.

Alternatively I could package torcheval 0.0.7 from PyPI, but there are no tests there. pytorch/torcheval#215 has seen no activity, and the repository doesn't seem particularly active.

Well, I think you could wait for a long time.
Instead, I suggest to use src.rev to manually fetch this commit.
Set the version to 0.0.7 and we should be good to go.

@GaetanLepage
Copy link
Contributor

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 376129


x86_64-linux

❌ 2 packages failed to build:
  • python312Packages.torcheval
  • python312Packages.torcheval.dist

aarch64-linux

❌ 2 packages failed to build:
  • python312Packages.torcheval
  • python312Packages.torcheval.dist

@GaetanLepage GaetanLepage changed the title python3Packages.torcheval: init at 0.0.6 python3Packages.torcheval: init at 0.0.7 Feb 10, 2025
@GaetanLepage
Copy link
Contributor

Tests are missing the skimage dependency (it's scikit-image I think).

@bengsparks bengsparks force-pushed the torcheval branch 2 times, most recently from 27fffba to 6f0f318 Compare February 10, 2025 11:00
@bengsparks
Copy link
Author

nixpkgs-review is failing on my system due to reasons unrelated to this PR, so I'm currently unable to test locally.

@GaetanLepage
Copy link
Contributor

GaetanLepage commented Feb 10, 2025

nixpkgs-review is failing on my system due to reasons unrelated to this PR, so I'm currently unable to test locally.

As this is a leaf package, you don't really need nixpkgs-review. Simply ensure that you can build python31[2-3]Packages.torcheval on the systems you have access to :)

@bengsparks bengsparks force-pushed the torcheval branch 2 times, most recently from 6aa89f1 to bb541cc Compare February 10, 2025 20:52
Copy link
Contributor

@GaetanLepage GaetanLepage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@GaetanLepage
Copy link
Contributor

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 376129


x86_64-linux

✅ 4 packages built:
  • python312Packages.torcheval
  • python312Packages.torcheval.dist
  • python313Packages.torcheval
  • python313Packages.torcheval.dist

aarch64-linux

✅ 4 packages built:
  • python312Packages.torcheval
  • python312Packages.torcheval.dist
  • python313Packages.torcheval
  • python313Packages.torcheval.dist

@GaetanLepage GaetanLepage merged commit 051837d into NixOS:master Feb 10, 2025
25 of 27 checks passed
@bengsparks bengsparks deleted the torcheval branch February 10, 2025 22:56
@osbm osbm mentioned this pull request Feb 11, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: python 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants