Commit 101514f
committed
Implement custom scan group function for generic binary operator
Using
```
import dpctl.tensor as dpt
import dpctl
x = dpt.ones(2048000, dtype="f4")
q_prof = dpctl.SyclQueue(x.sycl_context, x.sycl_device, property="enable_profiling")
xx = x.to_device(q_prof)
mm = dpt.cumulative_logsumexp(xx)
timer = dpctl.SyclTimer(device_timer="order_manager", time_scale=1e9)
with timer(q_prof):
for _ in range(250):
dpt.cumulative_logsumexp(xx, out=mm)
print(f"dpctl.__version__ = {dpctl.__version__}")
print(f"Device: {x.sycl_device}")
print(f"host_dt={timer.dt.host_dt/250}, device_dt={timer.dt.device_dt/250}")
```
Testing on Iris Xe from WSL.
This branch:
```
$ python ~/cumlogsumexp.py
dpctl.__version__ = 0.19.0dev0+351.gffd26092a0.dirty
Device: <dpctl.SyclDevice [backend_type.level_zero, device_type.gpu, Intel(R) Graphics [0x9a49]] at 0x7f37a8f995f0>
host_dt=1059589.7079911083, device_dt=1154782.72
```
vs. main branch:
```
$ python cumlogsumexp.py
dpctl.__version__ = 0.19.0dev0+307.g04a8228748
Device: <dpctl.SyclDevice [backend_type.level_zero, device_type.gpu, Intel(R) Graphics [0x9a49]] at 0x7ff6147d3cf0>
host_dt=2721938.803792, device_dt=10048323.168
```
So this is about 8x speed-up.1 parent ffd2609 commit 101514f
File tree
2 files changed
+73
-15
lines changed- dpctl/tensor/libtensor/include
- kernels
- utils
2 files changed
+73
-15
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
280 | 280 | | |
281 | 281 | | |
282 | 282 | | |
283 | | - | |
| 283 | + | |
| 284 | + | |
284 | 285 | | |
285 | 286 | | |
286 | 287 | | |
| |||
454 | 455 | | |
455 | 456 | | |
456 | 457 | | |
457 | | - | |
| 458 | + | |
| 459 | + | |
458 | 460 | | |
459 | 461 | | |
460 | 462 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
212 | 212 | | |
213 | 213 | | |
214 | 214 | | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
220 | 226 | | |
221 | 227 | | |
222 | 228 | | |
223 | | - | |
224 | 229 | | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
225 | 245 | | |
226 | 246 | | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
232 | 279 | | |
| 280 | + | |
| 281 | + | |
233 | 282 | | |
| 283 | + | |
234 | 284 | | |
235 | | - | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
236 | 291 | | |
237 | | - | |
| 292 | + | |
| 293 | + | |
238 | 294 | | |
239 | 295 | | |
240 | 296 | | |
| |||
0 commit comments