-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add llvm-openmp to host deps on osx #159
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
@conda-forge-admin, please rerender |
Fixes + python ./test/run_test.py Traceback (most recent call last): File "/Users/runner/miniforge3/conda-bld/pytorch-recipe_1673797381436/test_tmp/./test/run_test.py", line 22, in <module> from torch.testing._internal.common_utils import ( File "/Users/runner/miniforge3/conda-bld/pytorch-recipe_1673797381436/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/lib/python3.11/site-packages/torch/testing/_internal/common_utils.py", line 57, in <module> import expecttest
Do we really not care at all if the tests pass? Why do we bother running them in the CI then anyhow? pytorch-cpu-feedstock/recipe/meta.yaml Line 136 in 2be0b38
|
I think it is just that pytorch is so heavy. It is hard to get them to pass on limited resources. |
Ok that makes sense |
i mean, i tried to debug them a while back. it is just very hard to disentangle:
We did it for scipy and numpy. You can try to run the tests. That might be good task to tackle to better understand how we are doing. I haven't heard of too many people complaining of segfaults, so i'm crossing my fingers that we are in a good place. |
@conda-forge-admin, please rerender |
…nda-forge-pinning 2023.01.24.14.49.06
recipe/conda_build_config.yaml
Outdated
@@ -1,5 +1,5 @@ | |||
MACOSX_SDK_VERSION: # [osx] | |||
- 12.3 # [osx] | |||
- 11.3 # [osx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I think this was useful to ensure that certain hardware optimizations were available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I think this was useful to ensure that certain hardware optimizations were available.
Because CI fails .. It can't download the SDK image.
12.3 SDK should be available in the newest OSX images on azure and github actions, that was our workaround. At any rate, having 12.3+ is critical for MPS (metal) support, specifically for enabling it automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct though that the 12.3 SDK is not available from the regular repo we use...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workaround --- use macOS-12 images to get 12.3 SDK:
pytorch-cpu-feedstock/conda-forge.yml
Lines 1 to 4 in 2be0b38
azure: | |
settings_osx: | |
pool: | |
vmImage: macos-12 |
Because CI fails .. It can't download the SDK image. |
@conda-forge-admin, please rerender |
…nda-forge-pinning 2023.01.28.03.27.41
I seem to be missing something .. any clues @ngam @hmaarrfk?
It tries to download the 12.3 SDK from https://github.com/phracker/MacOSX-SDKs/ but it is not available. Are there any other variables I need to set? |
It may be that we had a workaround for this in conda-smithy. Could you also add:
during the debugging process |
I will look if conda-smithy changed something recently or if the azure image changed substantially edit:
|
Ah, I think the problem is that 12.3 is housed elsewhere, not under Xcode 14.2. Let me try to double check the details. It used to be that all these SDKs got linked here: |
Basically, this is why:
A quick solution is to simply specify 13.1 instead of 12.3 for the time being. We need to resolve this issue at some point system-wide. The repo we rely on for SDKs is no longer maintained: https://github.com/phracker/MacOSX-SDKs. @Tobias-Fischer how about the following changes:
and then 12.3 to 13.1 to test. |
@conda-forge-admin, please rerender |
…nda-forge-pinning 2023.01.28.17.33.52
I feel like they pulled something |
That's 11.3, we are after either 12.3 or 13.1 (depending on which one you're following) |
Is there anything we can do here in the meantime to push this PR forward? |
Co-authored-by: ngam <[email protected]>
Do we know if this will actually fix things? Can we test? This PR seems fine to me without tests and it should likely be okay to add it from the perspective of PyTorch, though I am not sure about your goals... Let's wait for @hmaarrfk to finalize it, in particular make a final call on #159 (comment) |
Good point - I am reasonably confused now. From the 1.13.1 build logs on osx-64 python 3.10 (https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=642018&view=logs&jobId=c35410bf-a820-56fd-e231-d3cd4bead63c&j=a8dee1cf-d17f-5cb0-b53f-4a93dfdbfb0c&t=929f6101-935e-5679-3be5-15b073490bac):
But then when trying to build norse (https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=650727&view=logs&jobId=15f8dc28-9f6f-5e3a-5b3c-2b169071e5be&j=15f8dc28-9f6f-5e3a-5b3c-2b169071e5be&t=63fa899e-c160-5185-75c4-c5fd30c90374):
I have no idea what's going on there?!? |
Things like
are a sign of some other underlying issue. Did the problems start in conda-forge/norse-feedstock#11? |
Something weird is going on:
|
|
Okay, the problem is that we have double pins on llvm-openmp somehow... osx-64/pytorch-1.13.1-cpu_py38hc21d861_0.condaNo Description
|
I think what you did here will fix it!
|
@conda-forge-admin, please rerender |
Actually, let's also check the artifacts. Give me a second to download them. |
…nda-forge-pinning 2023.01.31.03.10.11
Ooops, redenering deleted everything! Anyway, I believe we will be good to go after this PR. Here's the would-be printout:
|
@conda-forge-admin, please rerender |
Many thanks for checking @ngam - sounds like this will do the job :). Is there anything else I can do here? |
…nda-forge-pinning 2023.02.02.17.24.02
@@ -1,5 +1,6 @@ | |||
MACOSX_SDK_VERSION: # [osx] | |||
- 12.3 # [osx] | |||
- 13.1 # [osx and arm64] | |||
- 11.0 # [osx and x86_64] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to modify this to make osx64 pass. Currently it is failing with something MPS related, and my guess is that this needs to go back 12.3+, perhaps 13.1, for both... but let me check.
Checklist
0
(if the version changed)conda-smithy
(Use the phrase@conda-forge-admin, please rerender
in a comment in this PR for automated rerendering)llvm-openmp should be pinned to 14. However, for some reason the llvm-openmp build version (not host version) is used for the version pin at the moment. This should resolve this issue.
Issue originally encountered in conda-forge/norse-feedstock#11 where:
Both osx-64 and osx-arm64 have implicitly already pulled in llvm-openmp in the host deps in previous builds.