Skip to content

Conversation

@rocallahan
Copy link
Contributor

@rocallahan rocallahan commented Dec 3, 2025

If your work is part of a larger effort, please discuss your general plans on Discourse first to align your vision with maintainers.

https://yosyshq.discourse.group/t/parallel-optmergepass-implementation/

What are the reasons/motivation for this change?

Massive speedups on large modules. On a particular real-world flattened design with millions of cells, at 20 cores I get 14.5x speedup on an opt_merge that takes 15 iterations internally. On a 48-core machine speedup levels out at 25x.

Another way to get massive speedups would be to make opt_merge incremental, i.e. at each step only consider merging cells whose state or connected cells have changed since the last iteration of opt_merge. That would be more efficient, but also more invasive since information about what's changed would have to be kept up to date across passes. Anyway, see the discussion in the Discourse thread.

Explain how this is achieved.

This is the same algorithm presented in the Discourse thread, but over the last few months we've upgraded RTLIL to be thread-safe for read-only access, so there is no extra TRTLIL layer anymore and a lot less code needs to change here.

I'm not sure why but this is actually faster than existing opt_merge even with YOSYS_MAX_THREADS=1, for the jpeg synthesis test. 16.0s before, 15.5s after for end-to-end synthesis.

@widlarizer widlarizer self-assigned this Dec 3, 2025
@widlarizer widlarizer self-requested a review December 3, 2025 13:48
Copy link
Collaborator

@widlarizer widlarizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oddly enough, I see a tsan failure at opt_merge_init.ys with a setup that yields Yosys 0.57+244 (git sha1 329ff2687, ccache clang++ 16.0.6 -Og -fPIC -O1 -fno-omit-frame-pointer -fno-optimize-sibling-calls -fsanitize=thread)

WARNING: ThreadSanitizer: data race (pid=...)
  Write of size 4 at 0x... by thread T5:
    #0 Yosys::RTLIL::Const::is_fully_def() const ...yosys/kernel/rtlil.cc:725:2 (yosys+0xa57c79)
    #1 (anonymous namespace)::OptMergeThreadWorker::has_dont_care_initval(Yosys::TRTLIL::RCell) const ...yosys/passes/opt/opt_merge.cc:250:53 (yosys+0x...)
    #2 (anonymous namespace)::OptMergeThreadWorker::compute_cell_hashes((anonymous namespace)::ComputeCellHashes const&) const ...yosys/passes/opt/opt_merge.cc:275:23 (yosys+0x...)
    #3 (anonymous namespace)::OptMergeWorker::OptMergeWorker(Yosys::RTLIL::Module*, bool, bool, bool)::'lambda'(int)::operator()(int) const ...yosys/passes/opt/opt_merge.cc:386:56 (yosys+0x...)
...

  Previous write of size 4 at 0x... by thread T1:
    #0 Yosys::RTLIL::Const::is_fully_def() const ...yosys/kernel/rtlil.cc:725:2 (yosys+0xa57c79)
    #1 (anonymous namespace)::OptMergeThreadWorker::has_dont_care_initval(Yosys::TRTLIL::RCell) const ...yosys/passes/opt/opt_merge.cc:250:53 (yosys+0x...)
    #2 (anonymous namespace)::OptMergeThreadWorker::compute_cell_hashes((anonymous namespace)::ComputeCellHashes const&) const ...yosys/passes/opt/opt_merge.cc:275:23 (yosys+0x...)
    #3 (anonymous namespace)::OptMergeWorker::OptMergeWorker(Yosys::RTLIL::Module*, bool, bool, bool)::'lambda'(int)::operator()(int) const ...yosys/passes/opt/opt_merge.cc:386:56 (yosys+0x...)
...

  Location is global 'Yosys::RTLIL::Const::is_fully_def() const::__d' of size 32 at 0x... (yosys+0x...)

Haven't taken a close look at the code change yet

@widlarizer
Copy link
Collaborator

I guess this is missing some things from main? I can't easily rebase it since it doesn't build on single chunk sigspecs

I'm not sure why but this is actually faster than existing `opt_merge` even with
YOSYS_MAX_THREADS=1, for the jpeg synthesis test. 16.0s before, 15.5s after for
end-to-end synthesis.
@rocallahan
Copy link
Contributor Author

I guess this is missing some things from main? I can't easily rebase it since it doesn't build on single chunk sigspecs

Hmm, not sure what I did there. Should be fixed now.

Oddly enough, I see a tsan failure at opt_merge_init.ys

When I run make -j test locally with SANITIZER = thread the only failure I see is techmap/bug5495.sh, which seems to be a timeout due to TSAN being slow. Hopefully your race report was only due to the branch not including all the necessary RTLIL fixes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants