Skip to content

Commit

Permalink
refactor <cuda/std/cstring>
Browse files Browse the repository at this point in the history
update docs

update docs

add `memcmp`, `memmove` and `memchr` implementations

implement tests

Use cuda::std::min/max in Thrust (NVIDIA#3364)

Implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16` (NVIDIA#3361)

* implement `cuda::std::numeric_limits` for `__half` and `__nv_bfloat16`

Cleanup util_arch (NVIDIA#2773)

Deprecate thrust::null_type (NVIDIA#3367)

Deprecate cub::DeviceSpmv (NVIDIA#3320)

Fixes: NVIDIA#896

Improves `DeviceSegmentedSort` test run time for large number of items and segments (NVIDIA#3246)

* fixes segment offset generation

* switches to analytical verification

* switches to analytical verification for pairs

* fixes spelling

* adds tests for large number of segments

* fixes narrowing conversion in tests

* addresses review comments

* fixes includes

Compile basic infra test with C++17 (NVIDIA#3377)

Adds support for large number of items and large number of segments to `DeviceSegmentedSort` (NVIDIA#3308)

* fixes segment offset generation

* switches to analytical verification

* switches to analytical verification for pairs

* addresses review comments

* introduces segment offset type

* adds tests for large number of segments

* adds support for large number of segments

* drops segment offset type

* fixes thrust namespace

* removes about-to-be-deprecated cub iterators

* no exec specifier on defaulted ctor

* fixes gcc7 linker error

* uses local_segment_index_t throughout

* determine offset type based on type returned by segment iterator begin/end iterators

* minor style improvements

Exit with error when RAPIDS CI fails. (NVIDIA#3385)

cuda.parallel: Support structured types as algorithm inputs (NVIDIA#3218)

* Introduce gpu_struct decorator and typing

* Enable `reduce` to accept arrays of structs as inputs

* Add test for reducing arrays-of-struct

* Update documentation

* Use a numpy array rather than ctypes object

* Change zeros -> empty for output array and temp storage

* Add a TODO for typing GpuStruct

* Documentation udpates

* Remove test_reduce_struct_type from test_reduce.py

* Revert to `to_cccl_value()` accepting ndarray + GpuStruct

* Bump copyrights

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Deprecate thrust::async (NVIDIA#3324)

Fixes: NVIDIA#100

Review/Deprecate CUB `util.ptx` for CCCL 2.x (NVIDIA#3342)

Fix broken `_CCCL_BUILTIN_ASSUME` macro (NVIDIA#3314)

* add compiler-specific path
* fix device code path
* add _CCC_ASSUME

Deprecate thrust::numeric_limits (NVIDIA#3366)

Replace `typedef` with `using` in libcu++ (NVIDIA#3368)

Deprecate thrust::optional (NVIDIA#3307)

Fixes: NVIDIA#3306

Upgrade to Catch2 3.8  (NVIDIA#3310)

Fixes: NVIDIA#1724

refactor `<cuda/std/cstdint>` (NVIDIA#3325)

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

Update CODEOWNERS (NVIDIA#3331)

* Update CODEOWNERS

* Update CODEOWNERS

* Update CODEOWNERS

* [pre-commit.ci] auto code formatting

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Fix sign-compare warning (NVIDIA#3408)

Implement more cmath functions to be usable on host and device (NVIDIA#3382)

* Implement more cmath functions to be usable on host and device

* Implement math roots functions

* Implement exponential functions

Redefine and deprecate thrust::remove_cvref (NVIDIA#3394)

* Redefine and deprecate thrust::remove_cvref

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Fix assert definition for NVHPC due to constexpr issues (NVIDIA#3418)

NVHPC cannot decide at compile time where the code would run so _CCCL_ASSERT within a constexpr function breaks it.

Fix this by always using the host definition which should also work on device.

Fixes NVIDIA#3411

Extend CUB reduce benchmarks (NVIDIA#3401)

* Rename max.cu to custom.cu, since it uses a custom operator
* Extend types covered my min.cu to all fundamental types
* Add some notes on how to collect tuning parameters

Fixes: NVIDIA#3283

Update upload-pages-artifact to v3 (NVIDIA#3423)

* Update upload-pages-artifact to v3

* Empty commit

---------

Co-authored-by: Ashwin Srinath <[email protected]>

Replace and deprecate thrust::cuda_cub::terminate (NVIDIA#3421)

`std::linalg` accessors and `transposed_layout` (NVIDIA#2962)

Add round up/down to multiple (NVIDIA#3234)

[FEA]: Introduce Python module with CCCL headers (NVIDIA#3201)

* Add cccl/python/cuda_cccl directory and use from cuda_parallel, cuda_cooperative

* Run `copy_cccl_headers_to_aude_include()` before `setup()`

* Create python/cuda_cccl/cuda/_include/__init__.py, then simply import cuda._include to find the include path.

* Add cuda.cccl._version exactly as for cuda.cooperative and cuda.parallel

* Bug fix: cuda/_include only exists after shutil.copytree() ran.

* Use `f"cuda-cccl @ file://{cccl_path}/python/cuda_cccl"` in setup.py

* Remove CustomBuildCommand, CustomWheelBuild in cuda_parallel/setup.py (they are equivalent to the default functions)

* Replace := operator (needs Python 3.8+)

* Fix oversights: remove `pip3 install ./cuda_cccl` lines from README.md

* Restore original README.md: `pip3 install -e` now works on first pass.

* cuda_cccl/README.md: FOR INTERNAL USE ONLY

* Remove `$pymajor.$pyminor.` prefix in cuda_cccl _version.py (as suggested under NVIDIA#3201 (comment))

Command used: ci/update_version.sh 2 8 0

* Modernize pyproject.toml, setup.py

Trigger for this change:

* NVIDIA#3201 (comment)

* NVIDIA#3201 (comment)

* Install CCCL headers under cuda.cccl.include

Trigger for this change:

* NVIDIA#3201 (comment)

Unexpected accidental discovery: cuda.cooperative unit tests pass without CCCL headers entirely.

* Factor out cuda_cccl/cuda/cccl/include_paths.py

* Reuse cuda_cccl/cuda/cccl/include_paths.py from cuda_cooperative

* Add missing Copyright notice.

* Add missing __init__.py (cuda.cccl)

* Add `"cuda.cccl"` to `autodoc.mock_imports`

* Move cuda.cccl.include_paths into function where it is used. (Attempt to resolve Build and Verify Docs failure.)

* Add # TODO: move this to a module-level import

* Modernize cuda_cooperative/pyproject.toml, setup.py

* Convert cuda_cooperative to use hatchling as build backend.

* Revert "Convert cuda_cooperative to use hatchling as build backend."

This reverts commit 61637d6.

* Move numpy from [build-system] requires -> [project] dependencies

* Move pyproject.toml [project] dependencies -> setup.py install_requires, to be able to use CCCL_PATH

* Remove copy_license() and use license_files=["../../LICENSE"] instead.

* Further modernize cuda_cccl/setup.py to use pathlib

* Trivial simplifications in cuda_cccl/pyproject.toml

* Further simplify cuda_cccl/pyproject.toml, setup.py: remove inconsequential code

* Make cuda_cooperative/pyproject.toml more similar to cuda_cccl/pyproject.toml

* Add taplo-pre-commit to .pre-commit-config.yaml

* taplo-pre-commit auto-fixes

* Use pathlib in cuda_cooperative/setup.py

* CCCL_PYTHON_PATH in cuda_cooperative/setup.py

* Modernize cuda_parallel/pyproject.toml, setup.py

* Use pathlib in cuda_parallel/setup.py

* Add `# TOML lint & format` comment.

* Replace MANIFEST.in with `[tool.setuptools.package-data]` section in pyproject.toml

* Use pathlib in cuda/cccl/include_paths.py

* pre-commit autoupdate (EXCEPT clang-format, which was manually restored)

* Fixes after git merge main

* Resolve warning: AttributeError: '_Reduce' object has no attribute 'build_result'

```
=========================================================================== warnings summary ===========================================================================
tests/test_reduce.py::test_reduce_non_contiguous
  /home/coder/cccl/python/devenv/lib/python3.12/site-packages/_pytest/unraisableexception.py:85: PytestUnraisableExceptionWarning: Exception ignored in: <function _Reduce.__del__ at 0x7bf123139080>

  Traceback (most recent call last):
    File "/home/coder/cccl/python/cuda_parallel/cuda/parallel/experimental/algorithms/reduce.py", line 132, in __del__
      bindings.cccl_device_reduce_cleanup(ctypes.byref(self.build_result))
                                                       ^^^^^^^^^^^^^^^^^
  AttributeError: '_Reduce' object has no attribute 'build_result'

    warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================= 1 passed, 93 deselected, 1 warning in 0.44s ==============================================================
```

* Move `copy_cccl_headers_to_cuda_cccl_include()` functionality to `class CustomBuildPy`

* Introduce cuda_cooperative/constraints.txt

* Also add cuda_parallel/constraints.txt

* Add `--constraint constraints.txt` in ci/test_python.sh

* Update Copyright dates

* Switch to https://github.com/ComPWA/taplo-pre-commit (the other repo has been archived by the owner on Jul 1, 2024)

For completeness: The other repo took a long time to install into the pre-commit cache; so long it lead to timeouts in the CCCL CI.

* Remove unused cuda_parallel jinja2 dependency (noticed by chance).

* Remove constraints.txt files, advertise running `pip install cuda-cccl` first instead.

* Make cuda_cooperative, cuda_parallel testing completely independent.

* Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Fix sign-compare warning (NVIDIA#3408) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Revert "Try using another runner (because V100 runners seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]"

This reverts commit ea33a21.

Error message: NVIDIA#3201 (comment)

* Try using A100 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Also show cuda-cooperative site-packages, cuda-parallel site-packages (after pip install) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Try using l4 runner (because V100 runners still seem to be stuck) [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Restore original ci/matrix.yaml [skip-rapids]

* Use for loop in test_python.sh to avoid code duplication.

* Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]

* Comment out taplo-lint in pre-commit config [skip-rapids][skip-matx][skip-docs][skip-vdc]

* Revert "Run only test_python.sh [skip-rapids][skip-matx][skip-docs][skip-vdc][skip pre-commit.ci]"

This reverts commit ec206fd.

* Implement suggestion by @shwina (NVIDIA#3201 (review))

* Address feedback by @leofang

---------

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

cuda.parallel: Add optional stream argument to reduce_into() (NVIDIA#3348)

* Add optional stream argument to reduce_into()

* Add tests to check for reduce_into() stream behavior

* Move protocol related utils to separate file and rework __cuda_stream__ error messages

* Fix synchronization issue in stream test and add one more invalid stream test case

* Rename cuda stream validation function after removing leading underscore

* Unpack values from __cuda_stream__ instead of indexing

* Fix linting errors

* Handle TypeError when unpacking invalid __cuda_stream__ return

* Use stream to allocate cupy memory in new stream test

Upgrade to actions/deploy-pages@v4 (from v2), as suggested by @leofang (NVIDIA#3434)

Deprecate `cub::{min, max}` and replace internal uses with those from libcu++ (NVIDIA#3419)

* Deprecate `cub::{min, max}` and replace internal uses with those from libcu++

Fixes NVIDIA#3404

Fix CI issues (NVIDIA#3443)

Remove deprecated `cub::min` (NVIDIA#3450)

* Remove deprecated `cuda::{min,max}`

* Drop unused `thrust::remove_cvref` file

Fix typo in builtin (NVIDIA#3451)

Moves agents to `detail::<algorithm_name>` namespace (NVIDIA#3435)

uses unsigned offset types in thrust's scan dispatch (NVIDIA#3436)

Default transform_iterator's copy ctor (NVIDIA#3395)

Fixes: NVIDIA#2393

Turn C++ dialect warning into error (NVIDIA#3453)

Uses unsigned offset types in thrust's sort algorithm calling into `DispatchMergeSort` (NVIDIA#3437)

* uses thrust's dynamic dispatch for merge_sort

* [pre-commit.ci] auto code formatting

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Refactor allocator handling of contiguous_storage (NVIDIA#3050)

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Drop thrust::detail::integer_traits (NVIDIA#3391)

Add cuda::is_floating_point supporting half and bfloat (NVIDIA#3379)

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Improve docs of std headers (NVIDIA#3416)

Drop C++11 and C++14 support for all of cccl (NVIDIA#3417)

* Drop C++11 and C++14 support for all of cccl

---------

Co-authored-by: Bernhard Manfred Gruber <[email protected]>

Deprecate a few CUB macros (NVIDIA#3456)

Deprecate thrust universal iterator categories (NVIDIA#3461)

Fix launch args order (NVIDIA#3465)

Add `--extended-lambda` to the list of removed clangd flags (NVIDIA#3432)

add `_CCCL_HAS_NVFP8` macro (NVIDIA#3429)

Add `_CCCL_BUILTIN_PREFETCH` (NVIDIA#3433)

Drop universal iterator categories (NVIDIA#3474)

Ensure that headers in `<cuda/*>` can be build with a C++ only compiler (NVIDIA#3472)

Specialize __is_extended_floating_point for FP8 types (NVIDIA#3470)

Also ensure that we actually can enable FP8 due to FP16 and BF16 requirements

Co-authored-by: Michael Schellenberger Costa <[email protected]>

Moves CUB kernel entry points to a detail namespace (NVIDIA#3468)

* moves emptykernel to detail ns

* second batch

* third batch

* fourth batch

* fixes cuda parallel

* concatenates nested namespaces

Deprecate block/warp algo specializations (NVIDIA#3455)

Fixes: NVIDIA#3409

Refactor CUB's util_debug (NVIDIA#3345)
  • Loading branch information
davebayer committed Jan 22, 2025
1 parent 6a0f48b commit 224a155
Show file tree
Hide file tree
Showing 481 changed files with 11,206 additions and 5,285 deletions.
1 change: 1 addition & 0 deletions .clangd
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ CompileFlags:
# strip CUDA flags unknown to clang
- "-ccbin*"
- "--compiler-options*"
- "--extended-lambda"
- "--expt-extended-lambda"
- "--expt-relaxed-constexpr"
- "-forward-unknown-to-host-compiler"
Expand Down
33 changes: 21 additions & 12 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,20 +1,29 @@
# general codeowners for all files
# (Order matters. This needs to be at the top)
* @nvidia/cccl-codeowners

# Libraries
thrust/ @nvidia/cccl-thrust-codeowners @nvidia/cccl-codeowners
cub/ @nvidia/cccl-cub-codeowners @nvidia/cccl-codeowners
libcudacxx/ @nvidia/cccl-libcudacxx-codeowners @nvidia/cccl-codeowners
thrust/ @nvidia/cccl-thrust-codeowners
cub/ @nvidia/cccl-cub-codeowners
libcudacxx/ @nvidia/cccl-libcudacxx-codeowners
cudax/ @nvidia/cccl-cudax-codeowners
c/ @nvidia/cccl-c-codeowners
python/ @nvidia/cccl-python-codeowners

# Infrastructure
.github/ @nvidia/cccl-infra-codeowners @nvidia/cccl-codeowners
ci/ @nvidia/cccl-infra-codeowners @nvidia/cccl-codeowners
.devcontainer/ @nvidia/cccl-infra-codeowners @nvidia/cccl-codeowners
.github/ @nvidia/cccl-infra-codeowners
ci/ @nvidia/cccl-infra-codeowners
.devcontainer/ @nvidia/cccl-infra-codeowners
.pre-commit-config.yaml @nvidia/cccl-infra-codeowners
.clang-format @nvidia/cccl-infra-codeowners
.clangd @nvidia/cccl-infra-codeowners
c2h/ @nvidia/cccl-infra-codeowners
.vscode @nvidia/cccl-infra-codeowners

# cmake
**/CMakeLists.txt @nvidia/cccl-cmake-codeowners @nvidia/cccl-codeowners
**/cmake/ @nvidia/cccl-cmake-codeowners @nvidia/cccl-codeowners
**/CMakeLists.txt @nvidia/cccl-cmake-codeowners
**/cmake/ @nvidia/cccl-cmake-codeowners

# benchmarks
benchmarks/ @nvidia/cccl-benchmark-codeowners
**/benchmarks @nvidia/cccl-benchmark-codeowners

# docs
docs/ @nvidia/cccl-docs-codeowners
examples/ @nvidia/cccl-docs-codeowners
2 changes: 1 addition & 1 deletion .github/actions/docs-build/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,4 @@ runs:
# Upload docs as pages artifacts
- name: Upload artifact
if: ${{ inputs.upload_pages_artifact == 'true' }}
uses: actions/upload-pages-artifact@v2
uses: actions/upload-pages-artifact@v3
2 changes: 1 addition & 1 deletion .github/workflows/build-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,4 @@ jobs:
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v2
uses: actions/deploy-pages@v4
6 changes: 6 additions & 0 deletions .github/workflows/build-rapids.yml
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,12 @@ jobs:
sccache --show-adv-stats
done
done
# Exit with error if any failures occurred
if test ${#failures[@]} -ne 0; then
exit 1
fi
EOF
chmod +x "$RUNNER_TEMP"/ci{,-entrypoint}.sh
Expand Down
11 changes: 11 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,17 @@ repos:
hooks:
- id: ruff # linter
- id: ruff-format # formatter

# TOML lint & format
- repo: https://github.com/ComPWA/taplo-pre-commit
rev: v0.9.3
hooks:
# See https://github.com/NVIDIA/cccl/issues/3426
# - id: taplo-lint
# exclude: "^docs/"
- id: taplo-format
exclude: "^docs/"

- repo: https://github.com/codespell-project/codespell
rev: v2.3.0
hooks:
Expand Down
106 changes: 0 additions & 106 deletions CMakePresets.json
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,6 @@
"CUB_ENABLE_DIALECT_CPP20": true,
"THRUST_ENABLE_MULTICONFIG": true,
"THRUST_MULTICONFIG_WORKLOAD": "LARGE",
"THRUST_MULTICONFIG_ENABLE_DIALECT_CPP11": true,
"THRUST_MULTICONFIG_ENABLE_DIALECT_CPP14": true,
"THRUST_MULTICONFIG_ENABLE_DIALECT_CPP17": true,
"THRUST_MULTICONFIG_ENABLE_DIALECT_CPP20": true,
"THRUST_MULTICONFIG_ENABLE_SYSTEM_CPP": true,
Expand Down Expand Up @@ -128,28 +126,6 @@
"LIBCUDACXX_ENABLE_LIBCUDACXX_TESTS": true
}
},
{
"name": "libcudacxx-cpp11",
"displayName": "libcu++: C++11",
"inherits": "libcudacxx-base",
"cacheVariables": {
"CMAKE_CXX_STANDARD": "11",
"CMAKE_CUDA_STANDARD": "11",
"LIBCUDACXX_TEST_STANDARD_VER": "c++11",
"CCCL_IGNORE_DEPRECATED_CPP_11": true
}
},
{
"name": "libcudacxx-cpp14",
"displayName": "libcu++: C++14",
"inherits": "libcudacxx-base",
"cacheVariables": {
"CMAKE_CXX_STANDARD": "14",
"CMAKE_CUDA_STANDARD": "14",
"LIBCUDACXX_TEST_STANDARD_VER": "c++14",
"CCCL_IGNORE_DEPRECATED_CPP_14": true
}
},
{
"name": "libcudacxx-cpp17",
"displayName": "libcu++: C++17",
Expand Down Expand Up @@ -179,28 +155,6 @@
"CMAKE_CUDA_ARCHITECTURES": "70"
}
},
{
"name": "libcudacxx-nvrtc-cpp11",
"displayName": "libcu++ NVRTC: C++11",
"inherits": "libcudacxx-nvrtc-base",
"cacheVariables": {
"CMAKE_CXX_STANDARD": "11",
"CMAKE_CUDA_STANDARD": "11",
"LIBCUDACXX_TEST_STANDARD_VER": "c++11",
"CCCL_IGNORE_DEPRECATED_CPP_11": true
}
},
{
"name": "libcudacxx-nvrtc-cpp14",
"displayName": "libcu++ NVRTC: C++14",
"inherits": "libcudacxx-nvrtc-base",
"cacheVariables": {
"CMAKE_CXX_STANDARD": "14",
"CMAKE_CUDA_STANDARD": "14",
"LIBCUDACXX_TEST_STANDARD_VER": "c++14",
"CCCL_IGNORE_DEPRECATED_CPP_14": true
}
},
{
"name": "libcudacxx-nvrtc-cpp17",
"displayName": "libcu++ NVRTC: C++17",
Expand Down Expand Up @@ -261,8 +215,6 @@
"THRUST_MULTICONFIG_ENABLE_SYSTEM_CUDA": true,
"THRUST_MULTICONFIG_ENABLE_SYSTEM_OMP": true,
"THRUST_MULTICONFIG_ENABLE_SYSTEM_TBB": true,
"THRUST_MULTICONFIG_ENABLE_DIALECT_CPP11": false,
"THRUST_MULTICONFIG_ENABLE_DIALECT_CPP14": false,
"THRUST_MULTICONFIG_ENABLE_DIALECT_CPP17": false,
"THRUST_MULTICONFIG_ENABLE_DIALECT_CPP20": false
}
Expand Down Expand Up @@ -420,22 +372,6 @@
"libcudacxx.test.atomics.ptx"
]
},
{
"name": "libcudacxx-nvrtc-cpp11",
"hidden": false,
"inherits": [
"libcudacxx-nvrtcc"
],
"configurePreset": "libcudacxx-nvrtc-cpp11"
},
{
"name": "libcudacxx-nvrtc-cpp14",
"hidden": false,
"inherits": [
"libcudacxx-nvrtcc"
],
"configurePreset": "libcudacxx-nvrtc-cpp14"
},
{
"name": "libcudacxx-nvrtc-cpp17",
"hidden": false,
Expand All @@ -452,20 +388,6 @@
],
"configurePreset": "libcudacxx-nvrtc-cpp20"
},
{
"name": "libcudacxx-cpp11",
"configurePreset": "libcudacxx-cpp11",
"inherits": [
"libcudacxx-base"
]
},
{
"name": "libcudacxx-cpp14",
"configurePreset": "libcudacxx-cpp14",
"inherits": [
"libcudacxx-base"
]
},
{
"name": "libcudacxx-cpp17",
"configurePreset": "libcudacxx-cpp17",
Expand Down Expand Up @@ -572,20 +494,6 @@
"outputOnFailure": false
}
},
{
"name": "libcudacxx-lit-cpp11",
"configurePreset": "libcudacxx-cpp11",
"inherits": [
"libcudacxx-lit-base"
]
},
{
"name": "libcudacxx-lit-cpp14",
"configurePreset": "libcudacxx-cpp14",
"inherits": [
"libcudacxx-lit-base"
]
},
{
"name": "libcudacxx-lit-cpp17",
"configurePreset": "libcudacxx-cpp17",
Expand All @@ -607,20 +515,6 @@
"libcudacxx-lit-base"
]
},
{
"name": "libcudacxx-nvrtc-cpp11",
"configurePreset": "libcudacxx-nvrtc-cpp11",
"inherits": [
"libcudacxx-nvrtc-base"
]
},
{
"name": "libcudacxx-nvrtc-cpp14",
"configurePreset": "libcudacxx-nvrtc-cpp14",
"inherits": [
"libcudacxx-nvrtc-base"
]
},
{
"name": "libcudacxx-nvrtc-cpp17",
"configurePreset": "libcudacxx-nvrtc-cpp17",
Expand Down
4 changes: 2 additions & 2 deletions c/parallel/src/reduce.cu
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ std::string get_single_tile_kernel_name(
check(nvrtcGetTypeName<op_wrapper>(&reduction_op_t));

return std::format(
"cub::DeviceReduceSingleTileKernel<{0}, {1}, {2}, {3}, {4}, {5}, {6}>",
"cub::detail::reduce::DeviceReduceSingleTileKernel<{0}, {1}, {2}, {3}, {4}, {5}, {6}>",
chained_policy_t,
input_iterator_t,
output_iterator_t,
Expand Down Expand Up @@ -192,7 +192,7 @@ std::string get_device_reduce_kernel_name(cccl_op_t op, cccl_iterator_t input_it
check(nvrtcGetTypeName<cuda::std::__identity>(&transform_op_t));

return std::format(
"cub::DeviceReduceKernel<{0}, {1}, {2}, {3}, {4}, {5}>",
"cub::detail::reduce::DeviceReduceKernel<{0}, {1}, {2}, {3}, {4}, {5}>",
chained_policy_t,
input_iterator_t,
offset_t,
Expand Down
5 changes: 2 additions & 3 deletions c/parallel/test/test_main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,7 @@

#include <iostream>

#define CATCH_CONFIG_RUNNER
#include <catch2/catch.hpp>
#include <catch2/catch_session.hpp>

int device_guard(int device_id)
{
Expand All @@ -40,7 +39,7 @@ int main(int argc, char* argv[])
int device_id{};

// Build a new parser on top of Catch's
using namespace Catch::clara;
using namespace Catch::Clara;
auto cli = session.cli() | Opt(device_id, "device")["-d"]["--device"]("device id to use");
session.cli(cli);

Expand Down
4 changes: 3 additions & 1 deletion c/parallel/test/test_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@
#include <type_traits>
#include <vector>

#include <catch2/catch.hpp>
#include <catch2/catch_template_test_macros.hpp>
#include <catch2/catch_test_macros.hpp>
#include <catch2/generators/catch_generators_all.hpp>
#include <cccl/c/reduce.h>
#include <nvrtc.h>

Expand Down
12 changes: 4 additions & 8 deletions c2h/include/c2h/catch2_main.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,9 @@
//! executable, this header is included into each test. On the other hand, when all the tests are compiled into a single
//! executable, this header is excluded from the tests and included into catch2_runner.cpp

#ifdef CUB_CONFIG_MAIN
# define CATCH_CONFIG_RUNNER
#endif

#include <catch2/catch.hpp>
#include <catch2/catch_session.hpp>

#if defined(CUB_CONFIG_MAIN)
#ifdef CUB_CONFIG_MAIN
# if THRUST_DEVICE_SYSTEM == THRUST_DEVICE_SYSTEM_CUDA
# include <c2h/catch2_runner_helper.h>

Expand All @@ -59,7 +55,7 @@ int main(int argc, char* argv[])
int device_id{};

// Build a new parser on top of Catch's
using namespace Catch::clara;
using namespace Catch::Clara;
auto cli = session.cli() | Opt(device_id, "device")["-d"]["--device"]("device id to use");
session.cli(cli);

Expand All @@ -73,4 +69,4 @@ int main(int argc, char* argv[])
# endif // THRUST_DEVICE_SYSTEM == THRUST_DEVICE_SYSTEM_CUDA
return session.run(argc, argv);
}
#endif
#endif // CUB_CONFIG_MAIN
Loading

0 comments on commit 224a155

Please sign in to comment.