Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable macos universal2 build with AVX2/SSE2 optimisation for x86_64 target #347

Merged
merged 2 commits into from
Oct 24, 2022

Conversation

t20100
Copy link
Contributor

@t20100 t20100 commented Oct 20, 2022

This PR proposes a fix to allow building c-blosc as part of a Python package that is built as a universal2 binary package on macos with enabled AVX2/SSE2 optimisation for the x86_64 target only.

To achieve this, c-blosc should support defining SHUFFLE_AVX2_ENABLED and SHUFFLE_SSE2_ENABLED and be able to build all files (including *-avx2.c and *-sse2.c) even when AVX2/SSE2 are not supported.

This is done by toggling AVX2/SSE2 with both SHUFFLE_AVX2_ENABLED, SHUFFLE_SSE2_ENABLED and __AVX2__, __SSE2__.

Related to #334
Related to silx-kit/hdf5plugin#186

@t20100
Copy link
Contributor Author

t20100 commented Oct 21, 2022

CI's broken for CMake / Windows GCC but it looks to be the same for recent PRs.

@kalvdans
Copy link
Contributor

kalvdans commented Oct 21, 2022

So if I understand universal2 correctly, a single invocation of the apple clang compiler preprocess each input file twice, once with __AVX2__ and __SSE2__ defined, and once with them undefined (to build arm binary)?

(If that is the case, the changes to shuffle.c are unnecessary)

@FrancescAlted
Copy link
Member

In addition to @kalvdans comment, I'd suggest to use (or plan to adopt) C-Blosc2 for creating universal binaries, as it has native support for NEON in ARM machines. C-Blosc2 brings other interesting features, and is backward compatible with C-Blosc1.

@FrancescAlted
Copy link
Member

CI's broken for CMake / Windows GCC but it looks to be the same for recent PRs.

Yeah, that's unfortunate and should be fixed somehow.

@t20100
Copy link
Contributor Author

t20100 commented Oct 24, 2022

So if I understand universal2 correctly, a single invocation of the apple clang compiler preprocess each input file twice, once with AVX2 and SSE2 defined, and once with them undefined (to build arm binary)?

Yes a single call to clang with -arch arm64 -arch x86_64 produces a single .o file with the build for both architectures and the macros __AVX2__, __SSE2__ have different values according to the architecture.

If that is the case, the changes to shuffle.c are unnecessary

Something needs to be done in shuffle.c because, unless I missed something with universal2, SHUFFLE_AVX2_ENABLED and SHUFFLE_SSE2_ENABLED can only be defined once for both architectures.
We want to define them for x86, but in the case of ARM, blosc_internal_*_avx2 and blosc_internal_*_sse2 functions do not exists but would then be referred to to fill impl_avx2 and impl_sse2 structures.
An alternative could be to define blosc_internal_*_avx2 and blosc_internal_*_sse2 functions even when __AVX2__/__SSE2__ are not defined with a body calling e.g., abort().

@t20100
Copy link
Contributor Author

t20100 commented Oct 24, 2022

I'd suggest to use (or plan to adopt) C-Blosc2

+1!
But I face this issue for packaging the blosc(1) HDF5 filter for ease of use with h5py in the hdf5plugin Python package, and according to Blosc/hdf5-blosc#29 using Blosc-2 should be possible but would produce compressed chunks that are incompatible with previous versions of the filter.

@kalvdans
Copy link
Contributor

kalvdans commented Oct 24, 2022

in the case of ARM, blosc_internal_*_avx2 and blosc_internal_*_sse2 functions do not exists but would then be referred to to fill impl_avx2 and impl_sse2 structures.

I would argue that those structures should not exist on ARM.

@t20100
Copy link
Contributor Author

t20100 commented Oct 24, 2022

I would argue that those structures should not exist on ARM.

Thus the use of defined(SHUFFLE_[AVX2|SSE2]_ENABLED) && defined(__[AVX2|SSE2]__) in shuffle.c since it is where they are defined.

@kalvdans
Copy link
Contributor

Thus the use of defined(SHUFFLE_[AVX2|SSE2]_ENABLED) && defined(__[AVX2|SSE2]__) in shuffle.c since it is where they are defined.

Got you! I now understand your PR and I'm in favour of the changes.

@@ -19,8 +19,8 @@

/* Make sure AVX2 is available for the compilation target and compiler. */
#if !defined(__AVX2__)
#error AVX2 is not supported by the target architecture/platform and/or this compiler.
#endif
#warning AVX2 is not supported by the target architecture/platform and/or this compiler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't print a warning, it is annoying to have to ignore it and it fails to compile with -Werror

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed them.

@FrancescAlted
Copy link
Member

I'd suggest to use (or plan to adopt) C-Blosc2

+1! But I face this issue for packaging the blosc(1) HDF5 filter for ease of use with h5py in the hdf5plugin Python package, and according to Blosc/hdf5-blosc#29 using Blosc-2 should be possible but would produce compressed chunks that are incompatible with previous versions of the filter.

It occurs to me that it should be relatively easy for C-Blosc to understand chunks from C-Blosc2 (besides the additional 16 bytes in the Blosc2 header, they are largely compatible). Would that help your migration plans?

@FrancescAlted FrancescAlted merged commit 2b90d3c into Blosc:main Oct 24, 2022
@t20100 t20100 deleted the macos-universal branch October 24, 2022 12:17
@t20100
Copy link
Contributor Author

t20100 commented Oct 24, 2022

hdf5plugin is only a way to package HDF5 filters for use with h5py and so to ease their installation/usage.
So we'll follow what is done in https://github.com/Blosc/hdf5-blosc ;)

Would that help your migration plans?

Not sure, or as a transition (but does it worth it? and is it the way to go with HDF5 filters?).
I added more comments here: Blosc/hdf5-blosc#29 (comment)

@t20100
Copy link
Contributor Author

t20100 commented Oct 26, 2022

Hi,

Do you plan on tagging a version of c-blosc in the near future?
Otherwise I'll reference the merge commit from hdf5plugin.

@FrancescAlted
Copy link
Member

Hi,

Do you plan on tagging a version of c-blosc in the near future? Otherwise I'll reference the merge commit from hdf5plugin.

Yes, I'd like to cut a release of C-Blosc soon. But no need to wait for me; go ahead and just reference the merge commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants