Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use `permute' or operator[] as the name of permutation functions? #55

Open
danieltowner opened this issue Mar 15, 2023 · 12 comments
Open
Assignees
Milestone

Comments

@danieltowner
Copy link
Collaborator

Re: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2664r1.pdf

The basic proposal is to add 3 permute overloads to cover the common permutation cases:

outputSimd = permute<4>(inputSimd, [](auto idx) { return idx % 1; }); // Statically generated permute
outputSimd = permute(inputSimd, indexSimd);                           // simd lookup by runtime index
outputSimd = permute(inputSimd, mask);                                // Extract active elements and store contiguously

The first use probably has to stay since there needs to be some way of passing in the size of the desired output simd.

Should the other two uses overload operator[]?

outputSimd = inputSimd[indexes];
outputSimd = inputSimd[simd_mask];

If they do change operator[] then the changes would need to be rolled into std::simd, but providing permute instead would make this an extension on top of std::simd.

@danieltowner danieltowner added this to the P2664R2 milestone Mar 15, 2023
@danieltowner danieltowner moved this to In Progress in std::simd in C++26 Mar 15, 2023
@danieltowner danieltowner self-assigned this Mar 15, 2023
@mattkretz
Copy link
Owner

mattkretz commented Mar 15, 2023

I hope we'll get a simd<T> operator[](contiguous_range auto const&, simd<U> idxs). In that case I believe we want to have the following consistency:

void f(std::vector<float> const& data, simd<float> v, simd<int> idxs) {
  simd x = data[idxs];
  simd y = v[idxs];
}

So simd<T>::operator[](simd<U>) makes a lot of sense to me. It seems to be a must-have.

OTOH, do we want the following?

void f(std::vector<float> const& data, simd<float> v, simd_mask<float> mask) {
  simd x = data[mask];
  simd y = v[mask];
}

Subscripting a contiguous range with a simd_mask? That doesn't seem right. Also subscripting via bool is not an analogue operation we could refer to. I think a "compress" operation should neither be called permute nor operator[].

I have a question that your paper doesn't appear to address: What about inserting zeros into the output? Like if I want to permute [1, 2, 3, 4] into [2, 0, 4, 0]? I can always blend with a mask. But it's not uncommon that the positions of the zeros is an intrinsic property of the algorithm and could trivially be specified in the index mapping function. E.g. permute(v, [](int idx) -> int { return bool(idx & 1) ? idx ^ 1 : std::simd_permute_zero; }), where simd_permute_zero could be INT_MIN?

Another question: Have you considered indexing from the end? E.g. permute(v, [](int idx) -> int { return -1 - idx; }) would return reversed v. I'm not saying that I want it. But I believe it's part of the design space that the paper should cover.

@danieltowner
Copy link
Collaborator Author

danieltowner commented Mar 15, 2023

I think subscripting a contiguous_iterator with a mask should be handled by an overload with a better name, like std::copy_if.

At the moment I handle the insertion of zeros likes this:

permute<simd<float>::size>(concat(values, simd<float>()),
                           [](auto idx) { return idx % 2 ? idx : simd<float>::size;});

I don't like making the indexes have special values to mean different things, like insert-a-zero or index-from-the-end.

@danieltowner
Copy link
Collaborator Author

For convenience, is it worth also having a C++-23 multi-index overload of operator[] to allow this to be written:

outputSimd = inputSimd[3, 5, 1, 7, 5];

This would be equivalent to:

constexpr simd indexes = [3, 5, 1, 7, 5];
outputSimd = inputSimd[simd_indexes];

(or with variables in there too)

@mattkretz
Copy link
Owner

is it worth also having a C++-23 multi-index overload of operator[]

This is a good question and belongs into the paper. As well as an exploration and your/our recommendation.

At this point [1, 2] has only one meaning in standard C++: that of mdspan. I.e. it returns a single element. Using the same syntax to mean shuffle and potentially at some other place to mean subrange is going to be inconsistent. inputSimd[simd{3, 5, 1, 7, 5}] would work (given #24) and still rather concise.

@mattkretz
Copy link
Owner

In order to write generic permutation mappings the mapping function needs the size of the input simd.
E.g. to write a generic broadcast_last mapping:

constexpr auto broadcast_last = [](unsigned idx) { return size - 1; };

But size isn't known at this point as it depends on the first permute parameter. We could add a second (optional) function argument for size. What do you think?

@danieltowner
Copy link
Collaborator Author

I had thought about doing that, but generally I've found that the size of the first parameter is often readily available in the context of its use anyway. For example, my reverse function:

template<typename _T, typename _Abi>
auto reverse(const simd<_T, _Abi>& v)
{
    return permute(v, [](auto idx) { return (simd_size_v<_T, _Abi> - 1) - idx; });
}

To start with I think we leave it without a size parameter and get more user feedback.

@mattkretz
Copy link
Owner

Right, but how do you write a generic / named index mapping function for reverse? Like the dupEven etc. examples in P2664?

@mattkretz
Copy link
Owner

I've just prototyped the permute functions and a few pre-defined index mappings: https://github.com/mattkretz/simd-prototyping/blob/main/permute.h
It detects whether the mapping function accepts a second size parameter and optionally passes the simd::size.

@mattkretz
Copy link
Owner

Oh, and for good measure, I also implemented setting negative indexing (from the end) and setting values to zero via the mapping. It comes at no runtime cost, so it's really just a question of usefulness / complexity.

@mattkretz
Copy link
Owner

Just realized: since I allow negative indexing, the size parameter is not necessary for any of the mappings I pre-defined. However, if you want to define a "interleave low and high halves" index mapping, then you need to know size/2. So there's still a use case.

@danieltowner
Copy link
Collaborator Author

Right, but how do you write a generic / named index mapping function for reverse? Like the dupEven etc. examples in P2664?

Do we want to provide pre-defined utility lambdas that we can hand over to permute:

permute(value, std::simd_getEven);

or do we define a top-level function which does invokes permute with a local lambda which can get the information about size from its own context:

auto getEven(simdable v) {
    return permute<decltype(v)::size/2>(v, [](auto idx) { return idx * 2; });
}

If we want the former, than having the size parameter is useful, and gives something very flexible. But the second creates a much tighter interface which makes it more obvious how to use it, and consequently harder for mistakes to creep in (e.g., getEven needs an output size which is no greater than half the input size, but a stand-alone lambda couldn't enforce that).

The function called getEven might not be a good example. Maybe better examples would be if we created functions which mirror their existing counterparts, like std::reverse, std::shift_left, std::rotate, and so on, overloaded for simd, and which are implemented using permute with a local lambda. I think that most users would want something which does the whole job (i.e., named functions), rather than a pre-defined utility lambda that they need to figure out how to call properly.

@danieltowner
Copy link
Collaborator Author

Oh, and for good measure, I also implemented setting negative indexing (from the end) and setting values to zero via the mapping. It comes at no runtime cost, so it's really just a question of usefulness / complexity.

This is what is in P2664R3, but I have had users feed back that they don't like negative indexes. Errors in computing the index (e.g., off-by-one) can result in apparently valid negative indexes which don't get flagged up at compile time and have to be caught with run-time testing. Also, having the size passed in (also in P2664R3) means redundancy in having two mechanisms for getting indexes from the end.

I propose that we keep the optional size parameter but drop negative indexes. Indexes will only ever be in the range [0..size), or the special zero and uninitialised constants, which is easy to check at compile time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants