|
| 1 | +# Thrust 1.10.0 (NVIDIA HPC SDK 20.9) |
| 2 | + |
| 3 | +## Summary |
| 4 | + |
| 5 | +Thrust 1.10.0 is the major release accompanying the NVIDIA HPC SDK 20.9 release. |
| 6 | +It drops support for C++03, GCC < 5, Clang < 6, and MSVC < 2017. |
| 7 | +It also overhauls CMake support. |
| 8 | +Finally, we now have a Code of Conduct for contributors: |
| 9 | +https://github.com/thrust/thrust/blob/main/CODE_OF_CONDUCT.md |
| 10 | + |
| 11 | +## Breaking Changes |
| 12 | + |
| 13 | +- C++03 is no longer supported. |
| 14 | +- GCC < 5, Clang < 6, and MSVC < 2017 are no longer supported. |
| 15 | +- C++11 is deprecated. |
| 16 | + Using this dialect will generate a compile-time warning. |
| 17 | + These warnings can be suppressed by defining |
| 18 | + `THRUST_IGNORE_DEPRECATED_CPP_DIALECT` or `THRUST_IGNORE_DEPRECATED_CPP_11`. |
| 19 | + Suppression is only a short term solution. |
| 20 | + We will be dropping support for C++11 in the near future. |
| 21 | +- Asynchronous algorithms now require C++14. |
| 22 | +- CMake < 3.15 is no longer supported. |
| 23 | +- The default branch on GitHub is now called `main`. |
| 24 | +- Allocator and vector classes have been replaced with alias templates. |
| 25 | + |
| 26 | +## New Features |
| 27 | + |
| 28 | +- Contributor documentation: https://github.com/thrust/thrust/blob/main/CONTRIBUTING.md |
| 29 | +- thrust/thrust#1159: CMake multi-config support, which allows multiple |
| 30 | + combinations of host and device systems to be built and tested at once. |
| 31 | + More details can be found here: https://github.com/thrust/thrust/blob/main/CONTRIBUTING.md#multi-config-cmake-options |
| 32 | +- CMake refactoring: |
| 33 | + - Added install targets to CMake builds. |
| 34 | + - Added support for CUB tests and examples. |
| 35 | + - Thrust can be added to another CMake project by calling `add_subdirectory` |
| 36 | + with the Thrust source root (see thrust/thrust#976). |
| 37 | + An example can be found here: |
| 38 | + https://github.com/thrust/thrust/blob/main/examples/cmake/add_subdir/CMakeLists.txt |
| 39 | + - CMake < 3.15 is no longer supported. |
| 40 | + - Dialects are now configured through target properties. |
| 41 | + A new `THRUST_CPP_DIALECT` option has been added for single config mode. |
| 42 | + Logic that modified `CMAKE_CXX_STANDARD` and `CMAKE_CUDA_STANDARD` has been |
| 43 | + eliminated. |
| 44 | + - Testing related CMake code has been moved to `testing/CMakeLists.txt` |
| 45 | + - Example related CMake code has been moved to `examples/CMakeLists.txt` |
| 46 | + - Header testing related CMake code has been moved to `cmake/ThrustHeaderTesting.cmake` |
| 47 | + - CUDA configuration CMake code has been moved to to `cmake/ThrustCUDAConfig.cmake`. |
| 48 | + - Now we explicitly `include(cmake/*.cmake)` files rather than searching |
| 49 | + `CMAKE_MODULE_PATH` - we only want to use the ones in the repo. |
| 50 | +- `thrust::transform_input_output_iterator`, a variant of transform iterator |
| 51 | + adapter that works as both an input iterator and an output iterator. |
| 52 | + The given input function is applied after reading from the wrapped iterator |
| 53 | + while the output function is applied before writing to the wrapped iterator. |
| 54 | + Thanks to Trevor Smith for this contribution. |
| 55 | + |
| 56 | +## Other Enhancements |
| 57 | + |
| 58 | +- Support for all combinations of host and device systems. |
| 59 | +- C++17 support. |
| 60 | +- thrust/thrust#1221: Allocator and vector classes have been replaced with |
| 61 | + alias templates. |
| 62 | + Thanks to Michael Francis for this contribution. |
| 63 | +- thrust/thrust#1186: Use placeholder expressions to simplify the definitions |
| 64 | + of a number of algorithms. |
| 65 | + Thanks to Michael Francis for this contribution. |
| 66 | +- thrust/thrust#1170: More conforming semantics for scan algorithms: |
| 67 | + - Follow P0571's guidance regarding intermediate types. |
| 68 | + - https://wg21.link/P0571 |
| 69 | + - The accumulator's type is now: |
| 70 | + - The type of the user-supplied initial value (if provided), or |
| 71 | + - The input iterator's value type if no initial value. |
| 72 | + - Follow C++ standard guidance for default binary operator type. |
| 73 | + - https://eel.is/c++draft/exclusive.scan#1 |
| 74 | + - Thrust binary/unary functors now specialize a default void template |
| 75 | + parameter. |
| 76 | + Types are deduced and forwarded transparently. |
| 77 | + - Updated the scan's default binary operator to the new `thrust::plus<>` |
| 78 | + specialization. |
| 79 | + - The `thrust::intermediate_type_from_function_and_iterators` helper is no |
| 80 | + longer needed and has been removed. |
| 81 | +- thrust/thrust#1255: Always use `cudaStreamSynchronize` instead of |
| 82 | + `cudaDeviceSynchronize` if the execution policy has a stream attached to it. |
| 83 | + Thanks to Rong Ou for this contribution. |
| 84 | +- thrust/thrust#1201: Tests for correct handling of legacy and per-thread |
| 85 | + default streams. |
| 86 | + Thanks to Rong Ou for this contribution. |
| 87 | + |
| 88 | +## Bug Fixes |
| 89 | + |
| 90 | +- thrust/thrust#1260: Fix `thrust::transform_inclusive_scan` with heterogeneous |
| 91 | + types. |
| 92 | + Thanks to Rong Ou for this contribution. |
| 93 | +- thrust/thrust#1258, NVC++ FS #28463: Ensure the CUDA radix sort backend |
| 94 | + synchronizes before returning; otherwise, copies from temporary storage will |
| 95 | + race with destruction of said temporary storage. |
| 96 | +- thrust/thrust#1264: Evaluate `CUDA_CUB_RET_IF_FAIL` macro argument only once. |
| 97 | + Thanks to Jason Lowe for this contribution. |
| 98 | +- thrust/thrust#1262: Add missing `<stdexcept>` header. |
| 99 | +- thrust/thrust#1250: Restore some `THRUST_DECLTYPE_RETURNS` macros in async |
| 100 | + test implementations. |
| 101 | +- thrust/thrust#1249: Use `std::iota` in `CUDATestDriver::target_devices`. |
| 102 | + Thanks to Michael Francis for this contribution. |
| 103 | +- thrust/thrust#1244: Check for macro collisions with system headers during |
| 104 | + header testing. |
| 105 | +- thrust/thrust#1224: Remove unnecessary SFINAE contexts from asynchronous |
| 106 | + algorithms. |
| 107 | +- thrust/thrust#1190: Make `out_of_memory_recovery` test trigger faster. |
| 108 | +- thrust/thrust#1187: Elminate superfluous iterators specific to the CUDA |
| 109 | + backend. |
| 110 | +- thrust/thrust#1181: Various fixes for GoUDA. |
| 111 | + Thanks to Andrei Tchouprakov for this contribution. |
| 112 | +- thrust/thrust#1178, thrust/thrust#1229: Use transparent functionals in |
| 113 | + placeholder expressions, fixing issues with `thrust::device_reference` and |
| 114 | + placeholder expressions and `thrust::find` with asymmetric equality |
| 115 | + operators. |
| 116 | +- thrust/thrust#1153: Switch to placement new instead of assignment to |
| 117 | + construct items in uninitialized memory. |
| 118 | + Thanks to Hugh Winkler for this contribution. |
| 119 | +- thrust/thrust#1050: Fix compilation of asynchronous algorithms when RDC is |
| 120 | + enabled. |
| 121 | +- thrust/thrust#1042: Correct return type of |
| 122 | + `thrust::detail::predicate_to_integral` from `bool` to `IntegralType`. |
| 123 | + Thanks to Andreas Hehn for this contribution. |
| 124 | +- thrust/thrust#1009: Avoid returning uninitialized allocators. |
| 125 | + Thanks to Zhihao Yuan for this contribution. |
| 126 | +- thrust/thrust#990: Add missing `<thrust/system/cuda/memory.h>` include to |
| 127 | + `<thrust/system/cuda/detail/malloc_and_free.h>`. |
| 128 | + Thanks to Robert Maynard for this contribution. |
| 129 | +- thrust/thrust#966: Fix spurious MSVC conversion with loss of data warning in |
| 130 | + sort algorithms. |
| 131 | + Thanks to Zhihao Yuan for this contribution. |
| 132 | +- Add more metadata to mock specializations for testing iterator in |
| 133 | + `testing/copy.cu`. |
| 134 | +- Add missing include to shuffle unit test. |
| 135 | +- Specialize `thrust::wrapped_function` for `void` return types because MSVC is |
| 136 | + not a fan of the pattern `return static_cast<void>(expr);`. |
| 137 | +- Replace deprecated `tbb/tbb_thread.h` with `<thread>`. |
| 138 | +- Fix overcounting of initial value in TBB scans. |
| 139 | +- Use `thrust::advance` instead of `+=` for generic iterators. |
| 140 | +- Wrap the OMP flags in `-Xcompiler` for NVCC |
| 141 | +- Extend `ASSERT_STATIC_ASSERT` skip for the OMP backend. |
| 142 | +- Add missing header caught by `tbb.cuda` configs. |
| 143 | +- Fix "unsafe API" warnings in examples on MSVC: `s/fopen/fstream/` |
| 144 | +- Various C++17 fixes. |
| 145 | + |
1 | 146 | # Thrust 1.9.10-1 (NVIDIA HPC SDK 20.7, CUDA Toolkit 11.1) |
2 | 147 |
|
3 | 148 | ## Summary |
@@ -1076,11 +1221,14 @@ Support for TBB allows Thrust programs to integrate more naturally into |
1076 | 1221 | - `set_operations` |
1077 | 1222 |
|
1078 | 1223 | ## Other Enhancements |
1079 | | -- thrust::for_each now returns the end of the input range similar to most other algorithms |
1080 | | -- thrust::pair and thrust::tuple have swap functionality |
1081 | | -- All CUDA algorithms now support large data types |
1082 | | -- Iterators may be dereferenced in user __device__ or __global__ functions |
1083 | | -- The safe use of different backend systems is now possible within a single binary |
| 1224 | + |
| 1225 | +- `thrust::for_each` now returns the end of the input range similar to most |
| 1226 | + other algorithms. |
| 1227 | +- `thrust::pair` and `thrust::tuple` have swap functionality. |
| 1228 | +- All CUDA algorithms now support large data types. |
| 1229 | +- Iterators may be dereferenced in user `__device__` or `__global__` functions. |
| 1230 | +- The safe use of different backend systems is now possible within a single |
| 1231 | + binary |
1084 | 1232 |
|
1085 | 1233 | ## Bug Fixes |
1086 | 1234 |
|
|
0 commit comments