|
| 1 | +mm: zswap: shrink until can accept |
| 2 | + |
| 3 | +jira LE-4623 |
| 4 | +Rebuild_History Non-Buildable kernel-4.18.0-553.81.1.el8_10 |
| 5 | +commit-author Domenico Cerasuolo <cerasuolodomenico@gmail.com> |
| 6 | +commit e0228d590beb0d0af345c58a282f01afac5c57f3 |
| 7 | +Empty-Commit: Cherry-Pick Conflicts during history rebuild. |
| 8 | +Will be included in final tarball splat. Ref for failed cherry-pick at: |
| 9 | +ciq/ciq_backports/kernel-4.18.0-553.81.1.el8_10/e0228d59.failed |
| 10 | + |
| 11 | +This update addresses an issue with the zswap reclaim mechanism, which |
| 12 | +hinders the efficient offloading of cold pages to disk, thereby |
| 13 | +compromising the preservation of the LRU order and consequently |
| 14 | +diminishing, if not inverting, its performance benefits. |
| 15 | + |
| 16 | +The functioning of the zswap shrink worker was found to be inadequate, as |
| 17 | +shown by basic benchmark test. For the test, a kernel build was utilized |
| 18 | +as a reference, with its memory confined to 1G via a cgroup and a 5G swap |
| 19 | +file provided. The results are presented below, these are averages of |
| 20 | +three runs without the use of zswap: |
| 21 | + |
| 22 | +real 46m26s |
| 23 | +user 35m4s |
| 24 | +sys 7m37s |
| 25 | + |
| 26 | +With zswap (zbud) enabled and max_pool_percent set to 1 (in a 32G |
| 27 | +system), the results changed to: |
| 28 | + |
| 29 | +real 56m4s |
| 30 | +user 35m13s |
| 31 | +sys 8m43s |
| 32 | + |
| 33 | +written_back_pages: 18 |
| 34 | +reject_reclaim_fail: 0 |
| 35 | +pool_limit_hit:1478 |
| 36 | + |
| 37 | +Besides the evident regression, one thing to notice from this data is the |
| 38 | +extremely low number of written_back_pages and pool_limit_hit. |
| 39 | + |
| 40 | +The pool_limit_hit counter, which is increased in zswap_frontswap_store |
| 41 | +when zswap is completely full, doesn't account for a particular scenario: |
| 42 | +once zswap hits his limit, zswap_pool_reached_full is set to true; with |
| 43 | +this flag on, zswap_frontswap_store rejects pages if zswap is still above |
| 44 | +the acceptance threshold. Once we include the rejections due to |
| 45 | +zswap_pool_reached_full && !zswap_can_accept(), the number goes from 1478 |
| 46 | +to a significant 21578266. |
| 47 | + |
| 48 | +Zswap is stuck in an undesirable state where it rejects pages because it's |
| 49 | +above the acceptance threshold, yet fails to attempt memory reclaimation. |
| 50 | +This happens because the shrink work is only queued when |
| 51 | +zswap_frontswap_store detects that it's full and the work itself only |
| 52 | +reclaims one page per run. |
| 53 | + |
| 54 | +This state results in hot pages getting written directly to disk, while |
| 55 | +cold ones remain memory, waiting only to be invalidated. The LRU order is |
| 56 | +completely broken and zswap ends up being just an overhead without |
| 57 | +providing any benefits. |
| 58 | + |
| 59 | +This commit applies 2 changes: a) the shrink worker is set to reclaim |
| 60 | +pages until the acceptance threshold is met and b) the task is also |
| 61 | +enqueued when zswap is not full but still above the threshold. |
| 62 | + |
| 63 | +Testing this suggested update showed much better numbers: |
| 64 | + |
| 65 | +real 36m37s |
| 66 | +user 35m8s |
| 67 | +sys 9m32s |
| 68 | + |
| 69 | +written_back_pages: 10459423 |
| 70 | +reject_reclaim_fail: 12896 |
| 71 | +pool_limit_hit: 75653 |
| 72 | + |
| 73 | +Link: https://lkml.kernel.org/r/20230526183227.793977-1-cerasuolodomenico@gmail.com |
| 74 | +Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit") |
| 75 | + Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> |
| 76 | + Acked-by: Johannes Weiner <hannes@cmpxchg.org> |
| 77 | + Reviewed-by: Yosry Ahmed <yosryahmed@google.com> |
| 78 | + Reviewed-by: Vitaly Wool <vitaly.wool@konsulko.com> |
| 79 | + Cc: Dan Streetman <ddstreet@ieee.org> |
| 80 | + Cc: Seth Jennings <sjenning@redhat.com> |
| 81 | + Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| 82 | +(cherry picked from commit e0228d590beb0d0af345c58a282f01afac5c57f3) |
| 83 | + Signed-off-by: Jonathan Maple <jmaple@ciq.com> |
| 84 | + |
| 85 | +# Conflicts: |
| 86 | +# mm/zswap.c |
| 87 | +diff --cc mm/zswap.c |
| 88 | +index a85668159422,bcb82e09eb64..000000000000 |
| 89 | +--- a/mm/zswap.c |
| 90 | ++++ b/mm/zswap.c |
| 91 | +@@@ -43,6 -36,9 +43,12 @@@ |
| 92 | + #include <linux/pagemap.h> |
| 93 | + #include <linux/workqueue.h> |
| 94 | + |
| 95 | +++<<<<<<< HEAD |
| 96 | +++======= |
| 97 | ++ #include "swap.h" |
| 98 | ++ #include "internal.h" |
| 99 | ++ |
| 100 | +++>>>>>>> e0228d590beb (mm: zswap: shrink until can accept) |
| 101 | + /********************************* |
| 102 | + * statistics |
| 103 | + **********************************/ |
| 104 | +* Unmerged path mm/zswap.c |
0 commit comments