Commit 9cee7e8
mm: memcg: optimize parent iteration in memcg_rstat_updated()
In memcg_rstat_updated(), we iterate the memcg being updated and its
parents to update memcg->vmstats_percpu->stats_updates in the fast path
(i.e. no atomic updates). According to my math, this is 3 memory loads
(and potentially 3 cache misses) per memcg:
- Load the address of memcg->vmstats_percpu.
- Load vmstats_percpu->stats_updates (based on some percpu calculation).
- Load the address of the parent memcg.
Avoid most of the cache misses by caching a pointer from each struct
memcg_vmstats_percpu to its parent on the corresponding CPU. In this
case, for the first memcg we have 2 memory loads (same as above):
- Load the address of memcg->vmstats_percpu.
- Load vmstats_percpu->stats_updates (based on some percpu calculation).
Then for each additional memcg, we need a single load to get the
parent's stats_updates directly. This reduces the number of loads from
O(3N) to O(2+N) -- where N is the number of memcgs we need to iterate.
Additionally, stash a pointer to memcg->vmstats in each struct
memcg_vmstats_percpu such that we can access the atomic counter that all
CPUs fold into, memcg->vmstats->stats_updates.
memcg_should_flush_stats() is changed to memcg_vmstats_needs_flush() to
accept a struct memcg_vmstats pointer accordingly.
In struct memcg_vmstats_percpu, make sure both pointers together with
stats_updates live on the same cacheline. Finally, update
mem_cgroup_alloc() to take in a parent pointer and initialize the new
cache pointers on each CPU. The percpu loop in mem_cgroup_alloc() may
look concerning, but there are multiple similar loops in the cgroup
creation path (e.g. cgroup_rstat_init()), most of which are hidden
within alloc_percpu().
According to Oliver's testing [1], this fixes multiple 30-38%
regressions in vm-scalability, will-it-scale-tlb_flush2, and
will-it-scale-fallocate1. This comes at a cost of 2 more pointers per
CPU (<2KB on a machine with 128 CPUs).
[1] https://lore.kernel.org/lkml/ZbDJsfsZt2ITyo61@xsang-OptiPlex-9020/
[yosryahmed@google.com: fix struct memcg_vmstats_percpu size and alignment]
Link: https://lkml.kernel.org/r/20240203044612.1234216-1-yosryahmed@google.com
Link: https://lkml.kernel.org/r/20240124100023.660032-1-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Fixes: 8d59d22 ("mm: memcg: make stats flushing threshold per-memcg")
Tested-by: kernel test robot <oliver.sang@intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202401221624.cb53a8ca-oliver.sang@intel.com
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Greg Thelen <gthelen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>1 parent 67b8bcb commit 9cee7e8
1 file changed
+35
-21
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
621 | 621 | | |
622 | 622 | | |
623 | 623 | | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
624 | 633 | | |
625 | 634 | | |
626 | 635 | | |
| |||
632 | 641 | | |
633 | 642 | | |
634 | 643 | | |
635 | | - | |
636 | | - | |
637 | | - | |
638 | | - | |
| 644 | + | |
639 | 645 | | |
640 | 646 | | |
641 | 647 | | |
| |||
698 | 704 | | |
699 | 705 | | |
700 | 706 | | |
701 | | - | |
| 707 | + | |
702 | 708 | | |
703 | | - | |
| 709 | + | |
704 | 710 | | |
705 | 711 | | |
706 | 712 | | |
707 | 713 | | |
708 | 714 | | |
| 715 | + | |
709 | 716 | | |
710 | | - | |
711 | 717 | | |
712 | 718 | | |
713 | 719 | | |
714 | 720 | | |
715 | 721 | | |
716 | | - | |
717 | | - | |
718 | | - | |
719 | | - | |
720 | | - | |
721 | | - | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
722 | 726 | | |
723 | 727 | | |
724 | 728 | | |
725 | 729 | | |
726 | 730 | | |
727 | 731 | | |
728 | | - | |
729 | | - | |
730 | | - | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
731 | 736 | | |
732 | 737 | | |
733 | 738 | | |
| |||
756 | 761 | | |
757 | 762 | | |
758 | 763 | | |
759 | | - | |
| 764 | + | |
760 | 765 | | |
761 | 766 | | |
762 | 767 | | |
| |||
770 | 775 | | |
771 | 776 | | |
772 | 777 | | |
773 | | - | |
| 778 | + | |
774 | 779 | | |
775 | 780 | | |
776 | 781 | | |
| |||
5477 | 5482 | | |
5478 | 5483 | | |
5479 | 5484 | | |
5480 | | - | |
| 5485 | + | |
5481 | 5486 | | |
| 5487 | + | |
5482 | 5488 | | |
5483 | | - | |
| 5489 | + | |
5484 | 5490 | | |
5485 | 5491 | | |
5486 | 5492 | | |
| |||
5504 | 5510 | | |
5505 | 5511 | | |
5506 | 5512 | | |
| 5513 | + | |
| 5514 | + | |
| 5515 | + | |
| 5516 | + | |
| 5517 | + | |
| 5518 | + | |
| 5519 | + | |
| 5520 | + | |
5507 | 5521 | | |
5508 | 5522 | | |
5509 | 5523 | | |
| |||
5549 | 5563 | | |
5550 | 5564 | | |
5551 | 5565 | | |
5552 | | - | |
| 5566 | + | |
5553 | 5567 | | |
5554 | 5568 | | |
5555 | 5569 | | |
| |||
0 commit comments