|
| 1 | +================================== |
| 2 | +Long running workloads and compute |
| 3 | +================================== |
| 4 | + |
| 5 | +Long running workloads (compute) are workloads that will not complete in 10 |
| 6 | +seconds. (The time let the user wait before he reaches for the power button). |
| 7 | +This means that other techniques need to be used to manage those workloads, |
| 8 | +that cannot use fences. |
| 9 | + |
| 10 | +Some hardware may schedule compute jobs, and have no way to pre-empt them, or |
| 11 | +have their memory swapped out from them. Or they simply want their workload |
| 12 | +not to be preempted or swapped out at all. |
| 13 | + |
| 14 | +This means that it differs from what is described in driver-api/dma-buf.rst. |
| 15 | + |
| 16 | +As with normal compute jobs, dma-fence may not be used at all. In this case, |
| 17 | +not even to force preemption. The driver with is simply forced to unmap a BO |
| 18 | +from the long compute job's address space on unbind immediately, not even |
| 19 | +waiting for the workload to complete. Effectively this terminates the workload |
| 20 | +when there is no hardware support to recover. |
| 21 | + |
| 22 | +Since this is undesirable, there need to be mitigations to prevent a workload |
| 23 | +from being terminated. There are several possible approach, all with their |
| 24 | +advantages and drawbacks. |
| 25 | + |
| 26 | +The first approach you will likely try is to pin all buffers used by compute. |
| 27 | +This guarantees that the job will run uninterrupted, but also allows a very |
| 28 | +denial of service attack by pinning as much memory as possible, hogging the |
| 29 | +all GPU memory, and possibly a huge chunk of CPU memory. |
| 30 | + |
| 31 | +A second approach that will work slightly better on its own is adding an option |
| 32 | +not to evict when creating a new job (any kind). If all of userspace opts in |
| 33 | +to this flag, it would prevent cooperating userspace from forced terminating |
| 34 | +older compute jobs to start a new one. |
| 35 | + |
| 36 | +If job preemption and recoverable pagefaults are not available, those are the |
| 37 | +only approaches possible. So even with those, you want a separate way of |
| 38 | +controlling resources. The standard kernel way of doing so is cgroups. |
| 39 | + |
| 40 | +This creates a third option, using cgroups to prevent eviction. Both GPU and |
| 41 | +driver-allocated CPU memory would be accounted to the correct cgroup, and |
| 42 | +eviction would be made cgroup aware. This allows the GPU to be partitioned |
| 43 | +into cgroups, that will allow jobs to run next to each other without |
| 44 | +interference. |
| 45 | + |
| 46 | +The interface to the cgroup would be similar to the current CPU memory |
| 47 | +interface, with similar semantics for min/low/high/max, if eviction can |
| 48 | +be made cgroup aware. |
| 49 | + |
| 50 | +What should be noted is that each memory region (tiled memory for example) |
| 51 | +should have its own accounting. |
| 52 | + |
| 53 | +The key is set to the regionid set by the driver, for example "tile0". |
| 54 | +For the value of $card, we use drmGetUnique(). |
0 commit comments