Skip to content

Conversation

@philnik777
Copy link
Contributor

@philnik777 philnik777 commented Oct 21, 2025

This patch optimizes how for_each iterates over trees by using recursion and storing pointers to the next nodes on the stack. This avoids pointer chasing through the __parent_ pointer, reducing cache misses. It also makes use of the compiler being able tail-call optimize the recursive function, removing back-tracking the iterators have to do.

Benchmark                                             89eef941c4ed    b96071c259fb    Difference    % Difference
--------------------------------------------------  --------------  --------------  ------------  --------------
rng::for_each(set<int>)/32                                   34.61           26.27         -8.34         -24.10%
rng::for_each(set<int>)/50                                   63.97           39.65        -24.32         -38.02%
rng::for_each(set<int>)/8                                     4.56            6.52          1.96          42.95%
rng::for_each(set<int>)/8192                              19102.12         8406.37     -10695.75         -55.99%
rng::for_each(set<int>::iterator)/32                         34.61           27.76         -6.85         -19.80%
rng::for_each(set<int>::iterator)/50                         63.98           41.98        -22.00         -34.38%
rng::for_each(set<int>::iterator)/8                           4.47            5.81          1.34          29.95%
rng::for_each(set<int>::iterator)/8192                    19055.30         8711.55     -10343.76         -54.28%

@github-actions
Copy link

github-actions bot commented Oct 21, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions ,h,cpp -- libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.associative.pass.cpp libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/ranges.for_each.associative.pass copy.cpp libcxx/include/__algorithm/for_each.h libcxx/include/__algorithm/ranges_for_each.h libcxx/include/__algorithm/specialized_algorithms.h libcxx/include/__tree libcxx/include/map libcxx/include/set libcxx/test/benchmarks/algorithms/nonmodifying/for_each.bench.cpp libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/libcxx/include/__tree b/libcxx/include/__tree
index 5559862e2..6da62c365 100644
--- a/libcxx/include/__tree
+++ b/libcxx/include/__tree
@@ -681,7 +681,7 @@ __tree_iterate_subrange(_NodeIter __first_it, _NodeIter __last_it, _Func& __func
   using _Reference = _NodeIter::reference;
 
   auto __first = __first_it.__ptr_;
-  auto __last = __last_it.__ptr_;
+  auto __last  = __last_it.__ptr_;
 
   while (true) {
     if (__first == __last)

@philnik777 philnik777 force-pushed the optimize_tree_iteration branch 3 times, most recently from a31b2f2 to 8477cf3 Compare October 22, 2025 09:56
@ldionne ldionne marked this pull request as ready for review October 22, 2025 14:41
@ldionne ldionne requested a review from a team as a code owner October 22, 2025 14:41
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Oct 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2025

@llvm/pr-subscribers-libcxx

Author: Nikolas Klauser (philnik777)

Changes
Benchmark                                             89eef941c4ed    b96071c259fb    Difference    % Difference
--------------------------------------------------  --------------  --------------  ------------  --------------
rng::for_each(set&lt;int&gt;)/32                                   34.61           26.27         -8.34         -24.10%
rng::for_each(set&lt;int&gt;)/50                                   63.97           39.65        -24.32         -38.02%
rng::for_each(set&lt;int&gt;)/8                                     4.56            6.52          1.96          42.95%
rng::for_each(set&lt;int&gt;)/8192                              19102.12         8406.37     -10695.75         -55.99%
rng::for_each(set&lt;int&gt;::iterator)/32                         34.61           27.76         -6.85         -19.80%
rng::for_each(set&lt;int&gt;::iterator)/50                         63.98           41.98        -22.00         -34.38%
rng::for_each(set&lt;int&gt;::iterator)/8                           4.47            5.81          1.34          29.95%
rng::for_each(set&lt;int&gt;::iterator)/8192                    19055.30         8711.55     -10343.76         -54.28%

Patch is 22.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164405.diff

10 Files Affected:

  • (modified) libcxx/include/CMakeLists.txt (+1)
  • (modified) libcxx/include/__algorithm/for_each.h (+14)
  • (modified) libcxx/include/__algorithm/ranges_for_each.h (+9-1)
  • (added) libcxx/include/__algorithm/specialized_algorithms.h (+35)
  • (modified) libcxx/include/__tree (+103)
  • (modified) libcxx/include/map (+39)
  • (modified) libcxx/include/module.modulemap.in (+1)
  • (modified) libcxx/include/set (+37)
  • (modified) libcxx/test/benchmarks/algorithms/nonmodifying/for_each.bench.cpp (+47-7)
  • (modified) libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp (+58-3)
diff --git a/libcxx/include/CMakeLists.txt b/libcxx/include/CMakeLists.txt
index dd1e71380e7fc..f27e6f2ce4a14 100644
--- a/libcxx/include/CMakeLists.txt
+++ b/libcxx/include/CMakeLists.txt
@@ -194,6 +194,7 @@ set(files
   __algorithm/simd_utils.h
   __algorithm/sort.h
   __algorithm/sort_heap.h
+  __algorithm/specialized_algorithms.h
   __algorithm/stable_partition.h
   __algorithm/stable_sort.h
   __algorithm/swap_ranges.h
diff --git a/libcxx/include/__algorithm/for_each.h b/libcxx/include/__algorithm/for_each.h
index 6fb66d25a2462..222b2e88fc14c 100644
--- a/libcxx/include/__algorithm/for_each.h
+++ b/libcxx/include/__algorithm/for_each.h
@@ -11,6 +11,7 @@
 #define _LIBCPP___ALGORITHM_FOR_EACH_H
 
 #include <__algorithm/for_each_segment.h>
+#include <__algorithm/specialized_algorithms.h>
 #include <__config>
 #include <__functional/identity.h>
 #include <__iterator/segmented_iterator.h>
@@ -44,6 +45,19 @@ __for_each(_SegmentedIterator __first, _SegmentedIterator __last, _Func& __func,
   });
   return __last;
 }
+
+template <class _InputIterator,
+          class _Func,
+          class _Proj,
+          __enable_if_t<__specialized_algorithm<_Algorithm::__for_each,
+                                                __iterator_pair<_InputIterator, _InputIterator>>::__has_algorithm,
+                        int> = 0>
+_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _InputIterator
+__for_each(_InputIterator __first, _InputIterator __last, _Func& __func, _Proj& __proj) {
+  __specialized_algorithm<_Algorithm::__for_each, __iterator_pair<_InputIterator, _InputIterator>>()(
+      __first, __last, __func, __proj);
+  return __last;
+}
 #endif // !_LIBCPP_CXX03_LANG
 
 template <class _InputIterator, class _Func>
diff --git a/libcxx/include/__algorithm/ranges_for_each.h b/libcxx/include/__algorithm/ranges_for_each.h
index e9c84e8583f87..bc618442b9791 100644
--- a/libcxx/include/__algorithm/ranges_for_each.h
+++ b/libcxx/include/__algorithm/ranges_for_each.h
@@ -12,6 +12,7 @@
 #include <__algorithm/for_each.h>
 #include <__algorithm/for_each_n.h>
 #include <__algorithm/in_fun_result.h>
+#include <__algorithm/specialized_algorithms.h>
 #include <__concepts/assignable.h>
 #include <__config>
 #include <__functional/identity.h>
@@ -20,6 +21,7 @@
 #include <__ranges/access.h>
 #include <__ranges/concepts.h>
 #include <__ranges/dangling.h>
+#include <__type_traits/remove_cvref.h>
 #include <__utility/move.h>
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
@@ -71,7 +73,13 @@ struct __for_each {
             indirectly_unary_invocable<projected<iterator_t<_Range>, _Proj>> _Func>
   _LIBCPP_HIDE_FROM_ABI constexpr for_each_result<borrowed_iterator_t<_Range>, _Func>
   operator()(_Range&& __range, _Func __func, _Proj __proj = {}) const {
-    return __for_each_impl(ranges::begin(__range), ranges::end(__range), __func, __proj);
+    using _SpecialAlg = __specialized_algorithm<_Algorithm::__for_each, remove_cvref_t<_Range>>;
+    if constexpr (_SpecialAlg::__has_algorithm) {
+      auto [__iter, __func2] = _SpecialAlg()(__range, std::move(__func), std::move(__proj));
+      return {std::move(__iter), std::move(__func)};
+    } else {
+      return __for_each_impl(ranges::begin(__range), ranges::end(__range), __func, __proj);
+    }
   }
 };
 
diff --git a/libcxx/include/__algorithm/specialized_algorithms.h b/libcxx/include/__algorithm/specialized_algorithms.h
new file mode 100644
index 0000000000000..45078e2dfc209
--- /dev/null
+++ b/libcxx/include/__algorithm/specialized_algorithms.h
@@ -0,0 +1,35 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_SPECIALIZED_ALGORITHMS_H
+#define _LIBCPP___ALGORITHM_SPECIALIZED_ALGORITHMS_H
+
+#include <__config>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+// FIXME: This should really be an enum
+namespace _Algorithm {
+  struct __for_each {};
+} // namespace _Algorithm
+
+template <class, class>
+struct __iterator_pair {};
+
+template <class _Alg, class _Range>
+struct __specialized_algorithm {
+  static const bool __has_algorithm = false;
+};
+
+_LIBCPP_END_NAMESPACE_STD
+
+#endif // _LIBCPP___ALGORITHM_SPECIALIZED_ALGORITHMS_H
diff --git a/libcxx/include/__tree b/libcxx/include/__tree
index 0738c8c6a5e2b..d8e4a6da4f40a 100644
--- a/libcxx/include/__tree
+++ b/libcxx/include/__tree
@@ -11,6 +11,7 @@
 #define _LIBCPP___TREE
 
 #include <__algorithm/min.h>
+#include <__algorithm/specialized_algorithms.h>
 #include <__assert>
 #include <__config>
 #include <__fwd/pair.h>
@@ -717,6 +718,59 @@ private:
   friend class __tree_const_iterator;
 };
 
+template <class _Reference, class _EndNodePtr, class _NodePtr, class _Func, class _Proj>
+_LIBCPP_HIDE_FROM_ABI bool __tree_iterate_from_root(_EndNodePtr __last, _NodePtr __root, _Func& __func, _Proj& __proj) {
+  if (__root->__left_) {
+    if (std::__tree_iterate_from_root<_Reference>(__last, static_cast<_NodePtr>(__root->__left_), __func, __proj))
+      return true;
+  }
+  if (__root == __last)
+    return true;
+  __func(static_cast<_Reference>(__root->__get_value()));
+  if (__root->__right_)
+    return std::__tree_iterate_from_root<_Reference>(__last, static_cast<_NodePtr>(__root->__right_), __func, __proj);
+  return false;
+}
+
+template <class _Reference, class _NodePtr, class _EndNodePtr, class _Func, class _Proj>
+_LIBCPP_HIDE_FROM_ABI void
+__tree_iterate_from_begin(_EndNodePtr __first, _EndNodePtr __last, _Func& __func, _Proj& __proj) {
+  while (true) {
+    if (__first == __last)
+      return;
+    auto __nfirst = static_cast<_NodePtr>(__first);
+    __func(static_cast<_Reference>(__nfirst->__get_value()));
+    if (__nfirst->__right_) {
+      if (std::__tree_iterate_from_root<_Reference>(__last, static_cast<_NodePtr>(__nfirst->__right_), __func, __proj))
+        return;
+    }
+    if (std::__tree_is_left_child(__nfirst)) {
+      __first = __nfirst->__parent_;
+    } else {
+      do {
+        __first = __nfirst->__parent_;
+      } while (!std::__tree_is_left_child(__nfirst));
+    }
+  }
+}
+
+#ifndef _LIBCPP_CXX03_LANG
+template <class _Tp, class _NodePtr, class _DiffType>
+struct __specialized_algorithm<
+    _Algorithm::__for_each,
+    __iterator_pair<__tree_iterator<_Tp, _NodePtr, _DiffType>, __tree_iterator<_Tp, _NodePtr, _DiffType>>> {
+  static const bool __has_algorithm = true;
+
+  using __iterator _LIBCPP_NODEBUG = __tree_iterator<_Tp, _NodePtr, _DiffType>;
+
+  template <class _Func, class _Proj>
+  _LIBCPP_HIDE_FROM_ABI static void operator()(__iterator __first, __iterator __last, _Func& __func, _Proj& __proj) {
+    std::__tree_iterate_from_begin<typename __iterator::reference, _NodePtr>(
+        __first.__ptr_, __last.__ptr_, __func, __proj);
+  }
+};
+#endif
+
 template <class _Tp, class _NodePtr, class _DiffType>
 class __tree_const_iterator {
   using _NodeTypes _LIBCPP_NODEBUG = __tree_node_types<_NodePtr>;
@@ -780,8 +834,28 @@ private:
 
   template <class, class, class>
   friend class __tree;
+
+  friend struct __specialized_algorithm<_Algorithm::__for_each,
+                                        __iterator_pair<__tree_const_iterator, __tree_const_iterator> >;
 };
 
+#ifndef _LIBCPP_CXX03_LANG
+template <class _Tp, class _NodePtr, class _DiffType>
+struct __specialized_algorithm<
+    _Algorithm::__for_each,
+    __iterator_pair<__tree_const_iterator<_Tp, _NodePtr, _DiffType>, __tree_const_iterator<_Tp, _NodePtr, _DiffType>>> {
+  static const bool __has_algorithm = true;
+
+  using __iterator = __tree_const_iterator<_Tp, _NodePtr, _DiffType>;
+
+  template <class _Func, class _Proj>
+  _LIBCPP_HIDE_FROM_ABI static void operator()(__iterator __first, __iterator __last, _Func& __func, _Proj& __proj) {
+    std::__tree_iterate_from_begin<typename __iterator::reference, _NodePtr>(
+        __first.__ptr_, __last.__ptr_, __func, __proj);
+  }
+};
+#endif
+
 template <class _Tp, class _Compare>
 #ifndef _LIBCPP_CXX03_LANG
 _LIBCPP_DIAGNOSE_WARNING(!__is_invocable_v<_Compare const&, _Tp const&, _Tp const&>,
@@ -1466,7 +1540,36 @@ private:
 
     return __dest;
   }
+
+  friend struct __specialized_algorithm<_Algorithm::__for_each, __tree>;
+};
+
+#if _LIBCPP_STD_VER >= 14
+template <class _Tp, class _Compare, class _Allocator>
+struct __specialized_algorithm<_Algorithm::__for_each, __tree<_Tp, _Compare, _Allocator> > {
+  static const bool __has_algorithm = true;
+
+  using __node_pointer _LIBCPP_NODEBUG = typename __tree<_Tp, _Compare, _Allocator>::__node_pointer;
+
+  template <class _Func, class _Proj>
+#ifndef _LIBCPP_COMPILER_GCC
+  _LIBCPP_HIDE_FROM_ABI
+#endif
+  static void __impl(__node_pointer __root, _Func& __func, _Proj& __proj) {
+    if (__root->__left_)
+      __impl(static_cast<__node_pointer>(__root->__left_), __func, __proj);
+    __func(__root->__get_value());
+    if (__root->__right_)
+      __impl(static_cast<__node_pointer>(__root->__right_), __func, __proj);
+  }
+
+  template <class _Tree, class _Func, class _Proj>
+  _LIBCPP_HIDE_FROM_ABI static auto operator()(_Tree&& __range, _Func __func, _Proj __proj) {
+    __impl(__range.__root(), __func, __proj);
+    return std::make_pair(__range.end(), std::move(__func));
+  }
 };
+#endif
 
 // Precondition:  __size_ != 0
 template <class _Tp, class _Compare, class _Allocator>
diff --git a/libcxx/include/map b/libcxx/include/map
index 3ff849afcde09..99bda570295ae 100644
--- a/libcxx/include/map
+++ b/libcxx/include/map
@@ -577,6 +577,7 @@ erase_if(multimap<Key, T, Compare, Allocator>& c, Predicate pred);  // C++20
 #  include <__algorithm/equal.h>
 #  include <__algorithm/lexicographical_compare.h>
 #  include <__algorithm/lexicographical_compare_three_way.h>
+#  include <__algorithm/specialized_algorithms.h>
 #  include <__assert>
 #  include <__config>
 #  include <__functional/binary_function.h>
@@ -1375,6 +1376,8 @@ private:
 #  ifdef _LIBCPP_CXX03_LANG
   _LIBCPP_HIDE_FROM_ABI __node_holder __construct_node_with_key(const key_type& __k);
 #  endif
+
+  friend struct __specialized_algorithm<_Algorithm::__for_each, map>;
 };
 
 #  if _LIBCPP_STD_VER >= 17
@@ -1427,6 +1430,23 @@ map(initializer_list<pair<_Key, _Tp>>, _Allocator)
     -> map<remove_const_t<_Key>, _Tp, less<remove_const_t<_Key>>, _Allocator>;
 #  endif
 
+#  if _LIBCPP_STD_VER >= 14
+template <class _Key, class _Tp, class _Compare, class _Allocator>
+struct __specialized_algorithm<_Algorithm::__for_each, map<_Key, _Tp, _Compare, _Allocator>> {
+  using __map _LIBCPP_NODEBUG = map<_Key, _Tp, _Compare, _Allocator>;
+
+  static const bool __has_algorithm = true;
+
+  // set's begin() and end() are identical with and without const qualifiaction
+  template <class _Map, class _Func>
+  _LIBCPP_HIDE_FROM_ABI static auto operator()(_Map&& __map, _Func __func) {
+    auto [_, __func2] = __specialized_algorithm<_Algorithm::__for_each, typename __map::__base>()(
+        __map.__tree_, std::move(__func));
+    return std::make_pair(__map.end(), std::move(__func2));
+  }
+};
+#  endif
+
 #  ifndef _LIBCPP_CXX03_LANG
 template <class _Key, class _Tp, class _Compare, class _Allocator>
 map<_Key, _Tp, _Compare, _Allocator>::map(map&& __m, const allocator_type& __a)
@@ -1940,6 +1960,8 @@ private:
 
   typedef __map_node_destructor<__node_allocator> _Dp;
   typedef unique_ptr<__node, _Dp> __node_holder;
+
+  friend struct __specialized_algorithm<_Algorithm::__for_each, multimap>;
 };
 
 #  if _LIBCPP_STD_VER >= 17
@@ -1992,6 +2014,23 @@ multimap(initializer_list<pair<_Key, _Tp>>, _Allocator)
     -> multimap<remove_const_t<_Key>, _Tp, less<remove_const_t<_Key>>, _Allocator>;
 #  endif
 
+#  if _LIBCPP_STD_VER >= 14
+template <class _Key, class _Tp, class _Compare, class _Allocator>
+struct __specialized_algorithm<_Algorithm::__for_each, multimap<_Key, _Tp, _Compare, _Allocator>> {
+  using __map _LIBCPP_NODEBUG = multimap<_Key, _Tp, _Compare, _Allocator>;
+
+  static const bool __has_algorithm = true;
+
+  // set's begin() and end() are identical with and without const qualifiaction
+  template <class _Map, class _Func>
+  _LIBCPP_HIDE_FROM_ABI static auto operator()(_Map&& __map, _Func __func) {
+    auto [_, __func2] = __specialized_algorithm<_Algorithm::__for_each, typename __map::__base>()(
+        __map.__tree_, std::move(__func));
+    return std::make_pair(__map.end(), std::move(__func2));
+  }
+};
+#  endif
+
 #  ifndef _LIBCPP_CXX03_LANG
 template <class _Key, class _Tp, class _Compare, class _Allocator>
 multimap<_Key, _Tp, _Compare, _Allocator>::multimap(multimap&& __m, const allocator_type& __a)
diff --git a/libcxx/include/module.modulemap.in b/libcxx/include/module.modulemap.in
index a86d6c6a43d0e..bff35283f5fc8 100644
--- a/libcxx/include/module.modulemap.in
+++ b/libcxx/include/module.modulemap.in
@@ -838,6 +838,7 @@ module std [system] {
     module simd_utils                             { header "__algorithm/simd_utils.h" }
     module sort_heap                              { header "__algorithm/sort_heap.h" }
     module sort                                   { header "__algorithm/sort.h" }
+    module specialized_algorithms                 { header "__algorithm/specialized_algorithms.h" }
     module stable_partition                       { header "__algorithm/stable_partition.h" }
     module stable_sort {
       header "__algorithm/stable_sort.h"
diff --git a/libcxx/include/set b/libcxx/include/set
index 59ed0155c1def..fd8e63a967ff5 100644
--- a/libcxx/include/set
+++ b/libcxx/include/set
@@ -518,6 +518,7 @@ erase_if(multiset<Key, Compare, Allocator>& c, Predicate pred);  // C++20
 #  include <__algorithm/equal.h>
 #  include <__algorithm/lexicographical_compare.h>
 #  include <__algorithm/lexicographical_compare_three_way.h>
+#  include <__algorithm/specialized_algorithms.h>
 #  include <__assert>
 #  include <__config>
 #  include <__functional/is_transparent.h>
@@ -902,6 +903,9 @@ public:
     return __tree_.__equal_range_multi(__k);
   }
 #  endif
+
+  template <class, class>
+  friend struct __specialized_algorithm;
 };
 
 #  if _LIBCPP_STD_VER >= 17
@@ -948,6 +952,21 @@ template <class _Key, class _Allocator, class = enable_if_t<__is_allocator_v<_Al
 set(initializer_list<_Key>, _Allocator) -> set<_Key, less<_Key>, _Allocator>;
 #  endif
 
+#  if _LIBCPP_STD_VER >= 14
+template <class _Alg, class _Key, class _Compare, class _Allocator>
+struct __specialized_algorithm<_Alg, set<_Key, _Compare, _Allocator>> {
+  using __set _LIBCPP_NODEBUG = set<_Key, _Compare, _Allocator>;
+
+  static const bool __has_algorithm = __specialized_algorithm<_Alg, typename __set::__base>::__has_algorithm;
+
+  // set's begin() and end() are identical with and without const qualifiaction
+  template <class... _Args>
+  _LIBCPP_HIDE_FROM_ABI static auto operator()(const __set& __set, _Args&&... __args) {
+    return __specialized_algorithm<_Alg, typename __set::__base>()(__set.__tree_, std::forward<_Args>(__args)...);
+  }
+};
+#  endif
+
 #  ifndef _LIBCPP_CXX03_LANG
 
 template <class _Key, class _Compare, class _Allocator>
@@ -1362,6 +1381,9 @@ public:
     return __tree_.__equal_range_multi(__k);
   }
 #  endif
+
+  template <class, class>
+  friend struct __specialized_algorithm;
 };
 
 #  if _LIBCPP_STD_VER >= 17
@@ -1409,6 +1431,21 @@ template <class _Key, class _Allocator, class = enable_if_t<__is_allocator_v<_Al
 multiset(initializer_list<_Key>, _Allocator) -> multiset<_Key, less<_Key>, _Allocator>;
 #  endif
 
+#  if _LIBCPP_STD_VER >= 14
+template <class _Alg, class _Key, class _Compare, class _Allocator>
+struct __specialized_algorithm<_Alg, multiset<_Key, _Compare, _Allocator>> {
+  using __set _LIBCPP_NODEBUG = multiset<_Key, _Compare, _Allocator>;
+
+  static const bool __has_algorithm = __specialized_algorithm<_Alg, typename __set::__base>::__has_algorithm;
+
+  // set's begin() and end() are identical with and without const qualifiaction
+  template <class... _Args>
+  _LIBCPP_HIDE_FROM_ABI static auto operator()(const __set& __set, _Args&&... __args) {
+    return __specialized_algorithm<_Alg, typename __set::__base>()(__set.__tree_, std::forward<_Args>(__args)...);
+  }
+};
+#  endif
+
 #  ifndef _LIBCPP_CXX03_LANG
 
 template <class _Key, class _Compare, class _Allocator>
diff --git a/libcxx/test/benchmarks/algorithms/nonmodifying/for_each.bench.cpp b/libcxx/test/benchmarks/algorithms/nonmodifying/for_each.bench.cpp
index f58f336f8b892..0b42dec064ff8 100644
--- a/libcxx/test/benchmarks/algorithms/nonmodifying/for_each.bench.cpp
+++ b/libcxx/test/benchmarks/algorithms/nonmodifying/for_each.bench.cpp
@@ -23,7 +23,7 @@ int main(int argc, char** argv) {
 
   // {std,ranges}::for_each
   {
-    auto bm = []<class Container>(std::string name, auto for_each) {
+    auto sequence_bm = []<class Container>(std::string name, auto for_each) {
       using ElemType = typename Container::value_type;
       benchmark::RegisterBenchmark(
           name,
@@ -44,12 +44,52 @@ int main(int argc, char** argv) {
           ->Arg(50) // non power-of-two
           ->Arg(8192);
     };
-    bm.operator()<std::vector<int>>("std::for_each(vector<int>)", std_for_each);
-    bm.operator()<std::deque<int>>("std::for_each(deque<int>)", std_for_each);
-    bm.operator()<std::list<int>>("std::for_each(list<int>)", std_for_each);
-    bm.operator()<std::vector<int>>("rng::for_each(vector<int>)", std::ranges::for_each);
-    bm.operator()<std::deque<int>>("rng::for_each(deque<int>)", std::ranges::for_each);
-    bm.operator()<std::list<int>>("rng::for_each(list<int>)", std::ranges::for_each);
+    sequence_bm.operator()<std::vector<int>>("std::for_each(vector<int>)", std_for_each);
+    sequence_bm.operator()<std::deque<int>>("std::for_each(deque<int>)", std_for_each);
+    sequence_bm.operator()<std::list<int>>("std::for_each(list<int>)", std_for_each);
+    sequence_bm.operator()<std::vector<int>>("rng::for_each(vector<int>)", std::ranges::for_each);
+    sequence_bm.operator()<std::deque<int>>("rng::for_each(deque<int>)", std::ranges::for_each);
+    sequence_bm.operator()<std::list<int>>("rng::for_each(list<int>)", std::ranges::for_each);
+
+    auto associative_bm = []<class Container>(std::type_identity<Container>, std::string name, auto for_each) {
+      benchmark::RegisterBenchmark(
+          name,
+          [for_each](auto& st) {
+            Container c;
+            for (int64_t i = 0; i != st.range(0); ++i)
+              c.insert(i);
+
+            for (auto _ : st) {
+              benchmark::DoNotOptimize(c);
+              for_each(c.begin(), c.end(), [](auto v) { benchmark::DoNotOptimize(v); });
+            }
+          })
+          ->Arg(8)
+          ->Arg(32)
+          ->Arg(50) // non power-of-two
+          ->Arg(8192);
+    };
+    associative_bm(std::type_identity<std::set<int>>{}, "rng::for_each(set<int>::iterator)", std::ranges::for_each);
+
+    auto associative_ranges_bm = []<class Container>(std::type_identity<Container>, std::string name, auto for_each) {
+      benchmark::RegisterBenchmark(
+          name,
+          [for_each](auto& st) {
+            Container c;
+            for (int64_t i = 0; i != st.range(0); ++i)
+              c.insert(i);
+
+            for (auto _ : st) {
+              benchmark::DoNotOptimize(c);
+              for_each(c, [](auto v) { benchmark::DoNotOptimize(v); });
+            }
+          })
+          ->Arg(8)
+          ->Arg(32)
+          ->Arg(50) // non power-of-two
+          ->Arg(8192);
+    };
+    associative_ranges_bm(std::type_identity<std::set<int>>{}, "rng::for_each(set<int>)", std::ranges::for_each);
   }
 
   // {std,ranges}::for_each for join_view
diff --git a/libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp b/libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp
index 3db0bde75abd7..6a68aa7702c21 100644
--- a/libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp
+++ b/libcxx/test/std/algorithms/alg.nonmodifying/alg.foreach/for_each.pass.cpp
@@ -15,9 +15,9 @@
 #include <algorithm>
 #include <cassert>
 #include <deque>
-#if __has_include(<ranges>)
-#  include <ranges>
-#endif
+#includ...
[truncated]

@philnik777 philnik777 force-pushed the optimize_tree_iteration branch 2 times, most recently from fae64c3 to f32b24d Compare November 27, 2025 14:56
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the commit message, let's explain the gist of what this optimization does.

@philnik777 philnik777 changed the title [libc++] Optimize ranges::for_each for iterating over __trees [libc++] Optimize {std,ranges}::for_each for iterating over __trees Dec 2, 2025
[libc++] Optimize std::for_each for __tree iterators
@philnik777 philnik777 force-pushed the optimize_tree_iteration branch from f32b24d to 9d8f36b Compare December 2, 2025 14:52

# if _LIBCPP_STD_VER >= 14
template <class _Key, class _Tp, class _Compare, class _Allocator>
struct __specialized_algorithm<_Algorithm::__for_each, __single_range<map<_Key, _Tp, _Compare, _Allocator>>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you also intend to have a specialization for the two-iterator version for map and multimap?


// template<InputIterator Iter, class Function>
// constexpr Function // constexpr since C++20
// for_each(Iter first, Iter last, Function f);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a plain english comment explaining what this is testing (i.e. how it's different from for_each.pass.cpp).


// <algorithm>

// template<InputIterator Iter, class Function>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a plain english comment explaining what this is testing (i.e. how it's different from ranges.for_each.pass.cpp).


// template<InputIterator Iter, class Function>
// constexpr Function // constexpr since C++20
// for_each(Iter first, Iter last, Function f);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong signature.

};

#ifndef _LIBCPP_CXX03_LANG
template <class _Tp, class _NodePtr, class _DiffType>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add a comment explaining that this handles std::set::iterator and std::multiset::iterator in addition to __tree::iterator, since that's not obvious.

Same below for const_iterator.

associative_ranges_bm(
std::type_identity<std::set<int>>{},
std::false_type{},
"rng::for_each(set<int>::iterator)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These benchmark names are wrong, they should be rng::for_each(set<int>).

sequence_bm.operator()<std::deque<int>>("rng::for_each(deque<int>)", std::ranges::for_each);
sequence_bm.operator()<std::list<int>>("rng::for_each(list<int>)", std::ranges::for_each);

auto associative_bm =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move these benchmarks to their own scope, like we did for // {std,ranges}::for_each for join_view.

That way you can also avoid renaming bm to sequence_bm above.


auto associative_bm =
[]<class Container, bool IsMapLike>(
std::type_identity<Container>, std::bool_constant<IsMapLike>, std::string name, auto for_each) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you actually need auto for_each to be a parameter to this function.

}
{
int invoke_count = 0;
std::ranges::for_each(c, [&c, &invoke_count](const value_type& i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to test on containers with different sizes, at least 0, 1, 2 and then N.

{ // Make sure that an empty range works
{
int invoke_count = 0;
std::ranges::for_each(c.begin(), c.begin(), [&c, &invoke_count](const value_type& i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also test for_each(c.end(), c.end()).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants