|
| 1 | +- Feature Name: `repr_scalable` |
| 2 | +- Start Date: 2025-07-07 |
| 3 | +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) |
| 4 | +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +Extends Rust's existing SIMD infrastructure, `#[repr(simd)]`, with a |
| 10 | +complementary scalable representation, `#[repr(scalable)]`, to support scalable |
| 11 | +vector types, such as Arm's Scalable Vector Extension (SVE), or RISC-V's Vector |
| 12 | +Extension (RVV). |
| 13 | + |
| 14 | +Like the existing `repr(simd)` representation, `repr(scalable)` is internal |
| 15 | +compiler infrastructure that will be used only in the standard library to |
| 16 | +introduce scalable vector types which can then be stablised. Only the |
| 17 | +infrastructure to define these types are introduced in this RFC, not the types |
| 18 | +or intrinsics that use it. |
| 19 | + |
| 20 | +This RFC builds on Rust's existing SIMD infrastructure, introduced in |
| 21 | +[rfcs#1199: SIMD Infrastructure][rfcs#1199]. It depends on |
| 22 | +[rfcs#3729: Hierarchy of Sized traits][rfcs#3729]. |
| 23 | + |
| 24 | +SVE is used in examples throughout this RFC, but the proposed features should be |
| 25 | +sufficient to enable support for similar extensions in other architectures, such |
| 26 | +as RISC-V's V Extension. |
| 27 | + |
| 28 | +# Motivation |
| 29 | +[motivation]: #motivation |
| 30 | + |
| 31 | +SIMD types and instructions are a crucial element of high-performance Rust |
| 32 | +applications and allow for operating on multiple values in a single instruction. |
| 33 | +Many processors have SIMD registers of a known fixed length and provide |
| 34 | +intrinsics which operate on these registers. For example, Arm's Neon extension |
| 35 | +is well-supported by Rust and provides 128-bit registers and a wide range of |
| 36 | +intrinsics. |
| 37 | + |
| 38 | +Instead of releasing more extensions with ever increasing register bit widths, |
| 39 | +AArch64 has introduced a Scalable Vector Extension (SVE). Similarly, RISC-V has |
| 40 | +a Vector Extension (RVV). These extensions have vector registers whose width |
| 41 | +depends on the CPU implementation and bit-width-agnostic intrinsics for |
| 42 | +operating on these registers. By using scalale vectors, code won't need to be |
| 43 | +re-written using new architecture extensions with larger registers, new types |
| 44 | +and intrinsics, but instead will work on newer processors with different vector |
| 45 | +register lengths and performance characteristics. |
| 46 | + |
| 47 | +Scalable vectors have interesting and challenging implications for Rust, |
| 48 | +introducing value types with sizes that can only be known at runtime, requiring |
| 49 | +significant changes to the language's notion of sizedness - this support is |
| 50 | +being proposed in the [rfcs#3729]. |
| 51 | + |
| 52 | +Hardware is generally available with SVE, and key Rust stakeholders want to be |
| 53 | +able to use these architecture features from Rust. In a [recent discussion on |
| 54 | +SVE, Amanieu, co-lead of the library team, said][quote_amanieu]: |
| 55 | + |
| 56 | +> I've talked with several people in Google, Huawei and Microsoft, all of whom |
| 57 | +> have expressed a rather urgent desire for the ability to use SVE intrinsics in |
| 58 | +> Rust code, especially now that SVE hardware is generally available. |
| 59 | +
|
| 60 | +Without support in the compiler, leveraging the |
| 61 | +[*Hierarchy of Sized traits*][rfcs#3729] proposal, it is not possible to |
| 62 | +introduce intrinsics and types exposing the scalable vector support in hardware. |
| 63 | + |
| 64 | +# Guide-level explanation |
| 65 | +[guide-level-explanation]: #guide-level-explanation |
| 66 | + |
| 67 | +None of the infrastructure proposed in this RFC is intended to be used directly |
| 68 | +by Rust users. |
| 69 | + |
| 70 | +`repr(scalable)` as described later in |
| 71 | +[*Reference-level explanation*][reference-level-explanation] is perma-unstable |
| 72 | +and exists only enables scalable vector types to be defined in the standard |
| 73 | +library. The specific vector types are intended to eventually be stabilised, but |
| 74 | +none are proposed in this RFC. |
| 75 | + |
| 76 | +## Using scalable vectors |
| 77 | +[using-scalable-vectors]: #using-scalable-vectors |
| 78 | + |
| 79 | +Scalable vector types correspond to vector registers in hardware with unknown |
| 80 | +size at compile time. However, it will be a known and fixed size at runtime. |
| 81 | +Additional properties could be known during compilation, depending on the |
| 82 | +architecture, such as a minimum or maximum size or that the size must be a |
| 83 | +multiple of some factor. |
| 84 | + |
| 85 | +As previously described, users will not define their own scalable vector types |
| 86 | +and instead use intrinsics from `std::arch`, and this RFC is not proposing any |
| 87 | +such intrinsics, just the infrastructure. However, to illustrate how the types |
| 88 | +and intrinsics that this infrastructure will enable can be used, consider the |
| 89 | +following example that sums two input vectors: |
| 90 | + |
| 91 | +```rust |
| 92 | +fn sve_add(in_a: Vec<f32>, in_b: Vec<f32>, out_c: &mut Vec<f32>) { |
| 93 | + let len = in_a.len(); |
| 94 | + unsafe { |
| 95 | + // `svcntw` returns the actual number of elements that are in a 32-bit |
| 96 | + // element vector |
| 97 | + let step = svcntw() as usize; |
| 98 | + for i in (0..len).step_by(step) { |
| 99 | + let a = in_a.as_ptr().add(i); |
| 100 | + let b = in_b.as_ptr().add(i); |
| 101 | + let c = out_c as *mut f32; |
| 102 | + let c = c.add(i); |
| 103 | + |
| 104 | + // `svwhilelt_b32` generates a mask based on comparing the current |
| 105 | + // index against the `len` |
| 106 | + let pred = svwhilelt_b32(i as _, len as _); |
| 107 | + |
| 108 | + // `svld1_f32` loads a vector register with the data from address |
| 109 | + // `a`, zeroing any elements in the vector that are masked out |
| 110 | + let sva = svld1_f32(pred, a); |
| 111 | + let svb = svld1_f32(pred, b); |
| 112 | + |
| 113 | + // `svadd_f32_m` adds `a` and `b`, any lanes that are masked out will |
| 114 | + // take the keep value of `a` |
| 115 | + let svc = svadd_f32_m(pred, sva, svb); |
| 116 | + |
| 117 | + // `svst1_f32` will store the result without accessing any memory |
| 118 | + // locations that are masked out |
| 119 | + svst1_f32(svc, pred, c); |
| 120 | + } |
| 121 | + } |
| 122 | +} |
| 123 | +``` |
| 124 | + |
| 125 | +From a user's perspective, writing code for scalable vectors isn't too different |
| 126 | +from when writing code with a fixed sized vector. |
| 127 | + |
| 128 | +# Reference-level explanation |
| 129 | +[reference-level-explanation]: #reference-level-explanation |
| 130 | + |
| 131 | +Types annotated with the `#[repr(simd)]` attribute contains either an array |
| 132 | +field or multiple fields to indicate the intended size of the SIMD vector that |
| 133 | +the type represents. |
| 134 | + |
| 135 | +Similarly, a `scalable` repr is introduced to define a scalable vector type. |
| 136 | +`scalable` accepts an integer to determine the minimum number of elements the |
| 137 | +vector contains. For example: |
| 138 | + |
| 139 | +```rust |
| 140 | +#[repr(simd, scalable(4))] |
| 141 | +pub struct svfloat32_t { _ty: [f32], } |
| 142 | +``` |
| 143 | + |
| 144 | +As with the existing `repr(simd)`, `_ty` is purely a type marker, used to get |
| 145 | +the element type for the codegen backend. |
| 146 | + |
| 147 | +## Properties of scalable vectors |
| 148 | +[properties-of-scalable-vector-types]: #properties-of-scalable-vectors |
| 149 | + |
| 150 | +Scalable vectors are necessarily non-`const Sized` (from [rfcs#3729]) as they |
| 151 | +behave like value types but the exact size cannot be known at compilation time. |
| 152 | + |
| 153 | +[rfcs#3729] allows these types to implement `Clone` (and consequently `Copy`) as |
| 154 | +`Clone` only requires an implementation of `Sized`, irrespective of constness. |
| 155 | + |
| 156 | +Scalable vector types have some further restrictions due to limitations of the |
| 157 | +codegen backend: |
| 158 | + |
| 159 | +- Cannot be stored in compound types (structs, enums, etc) |
| 160 | + - Including coroutines, so these types cannot be held across |
| 161 | + an await boundary in async functions. |
| 162 | +- Cannot be used in arrays |
| 163 | +- Cannot be the type of a static variable. |
| 164 | + |
| 165 | +Some of these limitations may be able to be lifted in future depending on what |
| 166 | +is supported by rustc's codegen backends. |
| 167 | + |
| 168 | +## ABI |
| 169 | +[abi]: #abi |
| 170 | + |
| 171 | +Rust currently always passes SIMD vectors on the stack to avoid ABI mismatches |
| 172 | +between functions annotated with `target_feature` - where the relevant vector |
| 173 | +register is guaranteed to be present - and those without - where the relevant |
| 174 | +vector register might not be present. |
| 175 | + |
| 176 | +However, this approach will not work for scalable vector types as the relevant |
| 177 | +target feature must to be present to use the instruction that can allocate the |
| 178 | +correct size on the stack for the scalable vector. |
| 179 | + |
| 180 | +Therefore, there is an additional restriction that these types cannot be used in |
| 181 | +the argument or return types of functions unless those functions are annotated |
| 182 | +with the relevant target feature. |
| 183 | + |
| 184 | +## Target features |
| 185 | +[target-features]: #target-features |
| 186 | + |
| 187 | +Similarly to the issues with the ABI of scalable vectors, without the relevant |
| 188 | +target features, few operations can actually be performed on scalable vectors - |
| 189 | +causing issues for the use of scalable vectors in generic code and with traits. |
| 190 | +For example, implementations of traits like `Clone` would not be able to |
| 191 | +actually perform a clone, and generic functions that are instantiated with |
| 192 | +scalable vectors would during instruction selection in the codegen backend. |
| 193 | + |
| 194 | +When a scalable vector is instantiated into a generic function during |
| 195 | +monomorphisation, or a trait method is being implemented for a scalable vector, |
| 196 | +then the relevant target feature will be added to the function. |
| 197 | + |
| 198 | +For example, when instantiating `std::mem::size_of_val` with a scalable vector |
| 199 | +during monomorphisation, the relevant target feature will be added to `size_of_val` |
| 200 | +for codegen. |
| 201 | + |
| 202 | +## Implementing `repr(scalable)` |
| 203 | +[implementing-repr-scalable]: #implementing-reprscalable |
| 204 | + |
| 205 | +Implementing `repr(scalable)` largely involves lowering scalable vectors to the |
| 206 | +appropriate type in the codegen backend. LLVM has robust support for scalable |
| 207 | +vectors and is the default backend, so this section will focus on implementation |
| 208 | +in the LLVM codegen backend. Other codegen backends can implement support when |
| 209 | +scalable vectors are supported by the backend. |
| 210 | + |
| 211 | +Most of the complexity of SVE is handled by LLVM: lowering Rust's scalable |
| 212 | +vectors to the correct type in LLVM and the `vscale` modifier that is applied to |
| 213 | +LLVM's vector types. |
| 214 | + |
| 215 | +LLVM's scalable vector type is of the form `<vscale x elements x type>`: |
| 216 | + |
| 217 | +- `elements` multiplied by `size_of::<$ty>` gives the smallest allowed register |
| 218 | + size and the increment size |
| 219 | +- `vscale` is a runtime constant that is used to determine the actual vector |
| 220 | + register size |
| 221 | + |
| 222 | +For example, with SVE, the scalable vector register (`Z` register) size has to |
| 223 | +be a multiple of 128 bits and a power of 2. Only the value of `elements` can be |
| 224 | +chosen by compiler. For `f32`, `elements` must always be four, as with the |
| 225 | +minimum `vscale` of one, `1 * 4 * sizeof(f32)` is the 128-bit minimum register |
| 226 | +size. |
| 227 | + |
| 228 | +At runtime `vscale` could then be any power of two which would result in |
| 229 | +register sizes of 128, 256, 512, 1024 and 2048. `vscale` could be any value |
| 230 | +providing it gives a legal vector register size for the architecture. |
| 231 | + |
| 232 | +`repr(scalable)` expects the number of `elements` to be provided rather than |
| 233 | +calculating it. This avoids needing to teach the compiler how to calculate the |
| 234 | +required `element` count, particularly as some of these scalable types can have |
| 235 | +different element counts. For instance, the predicates used in SVE have |
| 236 | +different element counts depending on the types they are a predicate for. |
| 237 | + |
| 238 | +While it is possible to change the vector length at runtime using a |
| 239 | +[`prctl()`][prctl] call to the kernel, this would require that `vscale` change, |
| 240 | +which is unsupported. As Rust cannot prevent users from doing this, it will be |
| 241 | +documented as undefined behaviour, consistent with C and C++. |
| 242 | + |
| 243 | +# Drawbacks |
| 244 | +[drawbacks]: #drawbacks |
| 245 | + |
| 246 | +- `repr(scalable)` is inherently additional complexity to the language, despite |
| 247 | + being largely hidden from users. |
| 248 | + |
| 249 | +# Rationale and alternatives |
| 250 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 251 | + |
| 252 | +Without support for scalable vectors in the language and compiler, it is not |
| 253 | +possible to leverage hardware with scalable vectors from Rust. As extensions |
| 254 | +with scalable vectors are available in architectures as either the only or |
| 255 | +recommended way to do SIMD, lack of support in Rust would severely limit Rust's |
| 256 | +suitability on these architectures compared to other systems programming |
| 257 | +languages. |
| 258 | + |
| 259 | +By aligning with the approach taken by C (discussed in the |
| 260 | +[*Prior art*][prior-art] below), most of the documentation that already exists |
| 261 | +for scalable vector intrinsics in C should still be applicable to Rust. |
| 262 | + |
| 263 | +# Prior art |
| 264 | +[prior-art]: #prior-art |
| 265 | + |
| 266 | +There are not many languages with support for scalable vectors: |
| 267 | + |
| 268 | +- SVE in C takes a similar approach as this proposal by using sizeless |
| 269 | + incomplete types to represent scalable vectors. However, sizeless types are |
| 270 | + not part of the C specification and Arm's C Language Extensions (ACLE) provide |
| 271 | + [an edit to the C standard][acle_sizeless] which formally define "sizeless |
| 272 | + types". |
| 273 | +- [.NET 9 has experimental support for SVE][dotnet], but as a managed language, |
| 274 | + the design and implementation considerations in .NET are quite different to |
| 275 | + Rust. |
| 276 | + |
| 277 | +[rfcs#3268] was a previous iteration of this RFC. |
| 278 | + |
| 279 | +# Unresolved questions |
| 280 | +[unresolved-questions]: #unresolved-questions |
| 281 | + |
| 282 | +There are currently no unresolved questions. |
| 283 | + |
| 284 | +# Future possibilities |
| 285 | +[future-possibilities]: #future-possibilities |
| 286 | + |
| 287 | +There are a handful of future possibilities enabled by this RFC: |
| 288 | + |
| 289 | +## General mechanism for target-feature-affected types |
| 290 | +[general-mechanism-target-feature-types]: #general-mechanism-for-target-feature-affected-types |
| 291 | + |
| 292 | +A more general mechanism for enforcing that SIMD types are only used in |
| 293 | +`target_feature`-annotated functions would be useful, as this would enable SVE |
| 294 | +types to have fewer distinct restrictions than other SIMD types, and would |
| 295 | +enable SIMD vectors to be passed by-register, a performance improvement. |
| 296 | + |
| 297 | +Such a mechanism would need be introduced gradually to existing SIMD types with |
| 298 | +a forward compatibility lint. This will be addressed in a forthcoming RFC. |
| 299 | + |
| 300 | +## Relaxed restrictions |
| 301 | +[relaxed-restrictions]: #relaxed-restrictions |
| 302 | + |
| 303 | +Some of the restrictions on these types (e.g. use in compound types) could be |
| 304 | +relaxed at a later time either by extending rustc's codegen or leveraging newly |
| 305 | +added support in LLVM. |
| 306 | + |
| 307 | +However, as C also has restriction and scalable vectors are nevertheless used in |
| 308 | +production code, it is unlikely there will be much demand for those restrictions |
| 309 | +to be relaxed. |
| 310 | + |
| 311 | +## Portable SIMD |
| 312 | +[portable-simd]: #portable-simd |
| 313 | + |
| 314 | +Given that there are significant differences between scalable vectors and |
| 315 | +fixed-length vectors, and that `std::simd` is unstable, it is worth |
| 316 | +experimenting with architecture-specific support and implementation initially. |
| 317 | +Later, there are a variety of approaches that could be taken to incorporate |
| 318 | +support for scalable vectors into Portable SIMD. |
| 319 | + |
| 320 | +[acle_sizeless]: https://arm-software.github.io/acle/main/acle.html#formal-definition-of-sizeless-types |
| 321 | +[dotnet]: https://github.com/dotnet/runtime/issues/93095 |
| 322 | +[prctl]: https://www.kernel.org/doc/Documentation/arm64/sve.txt |
| 323 | +[rfcs#1199]: https://rust-lang.github.io/rfcs/1199-simd-infrastructure.html |
| 324 | +[rfcs#3268]: https://github.com/rust-lang/rfcs/pull/3268 |
| 325 | +[rfcs#3729]: https://github.com/rust-lang/rfcs/pull/3729 |
| 326 | +[quote_amanieu]: https://github.com/rust-lang/rust/pull/118917#issuecomment-2202256754 |
0 commit comments