Commit a7f4da6
[Pytorch] Improve conversion to bfloat16 on aarch64/NEON (pytorch#166958)
Summary:
Autovectorization of casting to bfloat16_t is broken in clang-[17, 20], fixed in clang-21.
We are adding a workaround vectorized code, which improves conversion speed from smaller int data types.
We've observed the following performance improvements, when compiling with clang-19 and targeting armv9a+sve2:
before:
uint8->bfloat16_t ===> 319.433us
int8->bfloat16_t ===> 320.216us
int16->bfloat16_t ===> 326.899us
int32->bfloat16_t ===> 327.925us
after:
uint8->bfloat16_t ===> 185.189us -----> 72% higher throughput
int8->bfloat16_t ===> 169.790us -----> 89% higher throughput
int16->bfloat16_t ===> 180.744us -----> 81% higher throughput
int32->bfloat16_t ===> 185.129us -----> 77% higher throughput
Test Plan:
Correctness:
buck2 test mode/opt //caffe2/test:test_ops
buck2 test mode/opt //caffe2/test:torch
Performance:
buck2 run mode/opt //caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test
Differential Revision: D86207189
Pull Request resolved: pytorch#166958
Approved by: https://github.com/mcfi1 parent 7e98c4c commit a7f4da6
1 file changed
+56
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
223 | 223 | | |
224 | 224 | | |
225 | 225 | | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
226 | 282 | | |
227 | 283 | | |
228 | 284 | | |
| |||
0 commit comments