-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Speed up compilation of common uses of std::visit() #164196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Speed up compilation of common uses of std::visit() #164196
Conversation
|
@llvm/pr-subscribers-libcxx Author: None (higher-performance) Changes
This PR hard-codes common cases to speed up compilation by roughly ~8x for them. Full diff: https://github.com/llvm/llvm-project/pull/164196.diff 1 Files Affected:
diff --git a/libcxx/include/variant b/libcxx/include/variant
index 9beef146f203c..ef5bca4c2fda0 100644
--- a/libcxx/include/variant
+++ b/libcxx/include/variant
@@ -1578,11 +1578,48 @@ _LIBCPP_HIDE_FROM_ABI constexpr void __throw_if_valueless(_Vs&&... __vs) {
}
}
-template < class _Visitor, class... _Vs, typename>
-_LIBCPP_HIDE_FROM_ABI constexpr decltype(auto) visit(_Visitor&& __visitor, _Vs&&... __vs) {
- using __variant_detail::__visitation::__variant;
- std::__throw_if_valueless(std::forward<_Vs>(__vs)...);
- return __variant::__visit_value(std::forward<_Visitor>(__visitor), std::forward<_Vs>(__vs)...);
+template <class _Visitor, class... _Vs, typename>
+_LIBCPP_HIDE_FROM_ABI constexpr decltype(auto) visit(_Visitor&& __visitor,
+ _Vs&&... __vs) {
+#define _XDispatchIndex(_I) \
+ case _I: \
+ if constexpr (__variant_size::value > _I) { \
+ return __visitor( \
+ __variant::__get_alt<_I>(std::forward<_Vs>(__vs)...).__value); \
+ } \
+ [[__fallthrough__]]
+#define _XDispatchMax 7 // Speed up compilation for the common cases
+ if constexpr (sizeof...(_Vs) == 1) {
+ if constexpr (variant_size<__remove_cvref_t<_Vs>...>::value <=
+ _XDispatchMax) {
+ using __variant_detail::__access::__variant;
+ using __variant_size = variant_size<__remove_cvref_t<_Vs>...>;
+ const size_t __indexes[] = {__vs.index()...};
+ switch (__indexes[0]) {
+ _XDispatchIndex(_XDispatchMax - 7);
+ _XDispatchIndex(_XDispatchMax - 6);
+ _XDispatchIndex(_XDispatchMax - 5);
+ _XDispatchIndex(_XDispatchMax - 4);
+ _XDispatchIndex(_XDispatchMax - 3);
+ _XDispatchIndex(_XDispatchMax - 2);
+ _XDispatchIndex(_XDispatchMax - 1);
+ _XDispatchIndex(_XDispatchMax - 0);
+ default:
+ __throw_bad_variant_access();
+ }
+ } else {
+ static_assert(
+ variant_size<__remove_cvref_t<_Vs>...>::value > _XDispatchMax,
+ "forgot to add dispatch case");
+ }
+ } else {
+ using __variant_detail::__visitation::__variant;
+ std::__throw_if_valueless(std::forward<_Vs>(__vs)...);
+ return __variant::__visit_value(std::forward<_Visitor>(__visitor),
+ std::forward<_Vs>(__vs)...);
+ }
+#undef _XDispatchMax
+#undef _XDispatchIndex
}
# if _LIBCPP_STD_VER >= 20
|
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
6cfb3c0 to
cc3f6cd
Compare
350f45c to
7c3c8e6
Compare
|
Could someone please take a look at this? @philnik777 or anybody else involved? |
philnik777
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this basically fixes #62648. I'm not sure how to proceed here. Previously there were problems with this approach generating a lot of code, which is why it was reverted. This patch is a lot more conservative at least. I guess I'd really like if we had reflection here, since that would allow us to generate the perfect switch/case for any variant. I think I'd be fine with this as a temporary solution. I'd really like you to check compile time and code generation overhead of this though, since, as mentioned, this was a big problem previously.
Please run the variant benchmarks as well and share the results.
Sounds good, thanks. |
ed6da80 to
186c480
Compare
186c480 to
cb49b31
Compare
|
Done. First, re: the runtime benchmarks, I had to run them a bit ad-hoc via googlebenchmark since I don't have the official setup handy, but regardless -- they actually indicate a speedup for < 8 elements: Before: After: As for compile-time benchmarking, I also tested it like this: Under
My setup/system is a bit different from last time, so it's not quite 8x here, but still, it's a huge win. tl;dr: it's a strict win on every axis I measure. @philnik777 |
ab3fa02 to
d4b0d1e
Compare
d4b0d1e to
5d47da1
Compare
std::visiton my machine costs roughly 10 milliseconds per unique invocation to compile, measurable as follows:This PR hard-codes common cases to speed up compilation by roughly ~8x for them.