Adding different types of parallelism to the elementwise layer #222

AndreySorokin7 · 2025-11-04T09:24:41Z

No description provided.

allnes · 2025-11-08T11:46:08Z

include/layers/EWLayer.hpp

  EWLayerImpl() = delete;
  EWLayerImpl(const Shape& shape, std::string function, float alpha = 0.0F,
-              float beta = 0.0F);
+              float beta = 0.0F, int type_parall = 0);


Use a strongly-typed backend enum instead of int for readability and safety.

enum class ParBackend { Seq = 0, Threads = 1, TBB = 2, OMP = 3 };

Propagate ParBackend through API instead of raw int

allnes · 2025-11-08T13:08:39Z

include/layers/EWLayer.hpp

+  int available_threads = -1;
+  if (type_parall_ == 0) available_threads = 1;
+  if (type_parall_ == 1)
+    available_threads = std::thread::hardware_concurrency();
+  if (type_parall_ == 2)
+    available_threads = oneapi::tbb::info::default_concurrency();
+  if (type_parall_ == 3) available_threads = omp_get_max_threads();


Please wrap common function for getting thread number

allnes · 2025-11-09T10:52:50Z

include/layers/Layer.hpp

@@ -1,5 +1,11 @@
 #pragma once
+#include <omp.h>


Guard the OpenMP/TBB includes and add ; otherwise non-OpenMP builds fail.

#ifdef HAS_OPENMP #include <omp.h> #endif #include <thread> #ifdef HAS_TBB #include <oneapi/tbb/blocked_range.h> #include <oneapi/tbb/parallel_for.h> #include <oneapi/tbb/info.h> #endif

allnes · 2025-11-09T10:57:25Z

include/layers/EWLayer.hpp

  EWLayerImpl() = delete;
  EWLayerImpl(const Shape& shape, std::string function, float alpha = 0.0F,
-              float beta = 0.0F);
+              float beta = 0.0F, int type_parall = 0);


Propagate ParBackend through API instead of raw int

allnes · 2025-11-09T11:02:59Z

include/layers/Layer.hpp

 };

+template <typename Func>
+inline void parallel_for(int count, Func func, int mode = 0) {


Suggested change

inline void parallel_for(int count, Func func, int mode = 0) {

inline void parallel_for(int count, Func func, int mode = 0) {

if (count <= 0) return;

allnes · 2025-11-09T11:10:54Z

include/layers/Layer.hpp

@@ -1,5 +1,11 @@
 #pragma once
+#include <omp.h>


Move all backend headers (OpenMP/TBB/Threads) and implementation details into a small parallel module. Expose a single, inline header API so call sites incur no extra call/indirection.

include/parallel/parallel.hpp (inline API)

include/parallel/backends.hpp (backend helpers; guarded includes)

No <omp.h>/TBB headers leaking into layer headers.

Example:

// include/parallel/parallel.hpp #pragma once #include <cstddef> enum class ParBackend { Auto, Seq, Threads, TBB, OMP }; struct ParOptions { ParBackend backend = ParBackend::Auto; int max_threads = 0; // 0 = runtime default std::size_t min_parallel_n = 4096; // small tasks stay sequential std::size_t grain = 1024; // backend-specific chunk hint }; // Header-only: one branch + inlined backend template <class F> inline void parallel_for(std::size_t n, F&& f, const ParOptions& opt) { if (n == 0) return; const ParBackend b = select_backend(opt, n); // inline, cheap switch (b) { case ParBackend::Seq: return impl_seq(n, f); case ParBackend::Threads: return impl_threads(n, f, opt); case ParBackend::TBB: return impl_tbb(n, f, opt); case ParBackend::OMP: return impl_omp(n, f, opt); case ParBackend::Auto: return impl_seq(n, f); // unreachable } }

allnes · 2025-11-09T11:12:22Z

include/layers/Layer.hpp

@@ -1,5 +1,11 @@
 #pragma once
+#include <omp.h>
+


Avoid re-evaluating “Auto” logic every call. Resolve once (feature flags + environment + problem size) and cache in the layer/context.

// Called once per layer or first use inline ParBackend resolve_auto_once(const ParOptions& opt, std::size_t n) noexcept { #if defined(HAS_OMP) if (n >= opt.min_parallel_n) return ParBackend::OMP; #elif defined(HAS_TBB) if (n >= opt.min_parallel_n) return ParBackend::TBB; #elif defined(HAS_THREADS) if (n >= opt.min_parallel_n) return ParBackend::Threads; #endif return ParBackend::Seq; } inline ParBackend select_backend(const ParOptions& opt, std::size_t n) noexcept { if (opt.backend != ParBackend::Auto) return opt.backend; static ParBackend cached = resolve_auto_once(opt, n); // or store in the layer return cached; }

aobolensk

I actually think we can leave remaining solution basically as is. The problem with OpenMP slowdown is actually reproducible, but I suggest to focus on parallel_for itself. Anyway, this effect is not that visible on matrix multiplication workloads. For further investigation we will take a look at the compilation details (which code it has been lowered to). For now we can proceed as is

1

f14195d

AndreySorokin7 requested review from allnes and aobolensk as code owners November 4, 2025 09:24

AndreySorokin7 and others added 6 commits November 5, 2025 19:17

fix

b19191f

Merge branch 'main' into AndreySorokin7/Add_parall_ew_layer

9efc295

fix

30a33ff

fix

4a3d16e

fix

406387d

fix

6f82796

allnes reviewed Nov 8, 2025

View reviewed changes

allnes reviewed Nov 9, 2025

View reviewed changes

aobolensk approved these changes Nov 9, 2025

View reviewed changes

AndreySorokin7 added 12 commits November 12, 2025 16:24

fix

3cb8263

fix

2d369b6

fix

0412b6a

fix

5521152

fix

4293356

fix

0ba7e1b

fix

f5f0f14

fix

8222f23

fix

0bb0d02

fix

66b3b93

fix

04c3815

fix

7430245

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding different types of parallelism to the elementwise layer #222

Adding different types of parallelism to the elementwise layer #222

Uh oh!

AndreySorokin7 commented Nov 4, 2025

Uh oh!

allnes Nov 8, 2025 •

edited

Loading

Uh oh!

allnes Nov 9, 2025

Uh oh!

allnes Nov 8, 2025

Uh oh!

allnes Nov 9, 2025

Uh oh!

allnes Nov 9, 2025

Uh oh!

allnes Nov 9, 2025

Uh oh!

allnes Nov 9, 2025 •

edited

Loading

Uh oh!

allnes Nov 9, 2025

Uh oh!

aobolensk left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	inline void parallel_for(int count, Func func, int mode = 0) {
	inline void parallel_for(int count, Func func, int mode = 0) {
	if (count <= 0) return;

Adding different types of parallelism to the elementwise layer #222

Are you sure you want to change the base?

Adding different types of parallelism to the elementwise layer #222

Uh oh!

Conversation

AndreySorokin7 commented Nov 4, 2025

Uh oh!

allnes Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allnes Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

allnes Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

allnes Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

allnes Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

allnes Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

allnes Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

allnes Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

aobolensk left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

allnes Nov 8, 2025 •

edited

Loading

allnes Nov 9, 2025 •

edited

Loading