Skip to content

Conversation

@hanno-becker
Copy link
Contributor

@hanno-becker hanno-becker commented Oct 31, 2025

  • Change mlk_polyvec back to struct { mlk_poly vec[MLKEM_K]; }
  • Change mlk_polymat to struct { mlk_polyvec vec[MLKEM_K]; }
  • Update all function signatures to use pointer style
  • Fix all implementations to use struct member access
  • Update tests, benchmarks, and CBMC harnesses
  • Add consistent const annotations

@hanno-becker hanno-becker force-pushed the structured branch 3 times, most recently from f300f02 to f18ce9f Compare November 2, 2025 05:36
@hanno-becker hanno-becker changed the title [WIP] Reintroduce struct definitions for mlk_poly{mat,vec} Reintroduce struct definitions for mlk_poly{mat,vec} Nov 2, 2025
@hanno-becker hanno-becker marked this pull request as ready for review November 2, 2025 05:36
@hanno-becker hanno-becker requested a review from a team as a code owner November 2, 2025 05:36
@hanno-becker hanno-becker force-pushed the structured branch 5 times, most recently from 32c1493 to 3dc4b2c Compare November 5, 2025 19:53
@hanno-becker
Copy link
Contributor Author

The runtime of polyvec_add further degrades here to >6min. That should certainly be addressed before this PR can be merged.

@hanno-becker hanno-becker force-pushed the structured branch 2 times, most recently from e9256dd to 478f245 Compare November 7, 2025 12:12
@hanno-becker hanno-becker added benchmark this PR should be benchmarked in CI and removed needs-work labels Nov 7, 2025
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 12293 cycles 12294 cycles 1.00
ML-KEM-512 encaps 14952 cycles 14952 cycles 1
ML-KEM-512 decaps 19485 cycles 19486 cycles 1.00
ML-KEM-768 keypair 21348 cycles 21349 cycles 1.00
ML-KEM-768 encaps 23927 cycles 23934 cycles 1.00
ML-KEM-768 decaps 30495 cycles 30501 cycles 1.00
ML-KEM-1024 keypair 30334 cycles 30333 cycles 1.00
ML-KEM-1024 encaps 34542 cycles 34540 cycles 1.00
ML-KEM-1024 decaps 44142 cycles 44142 cycles 1

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 9636 cycles 9660 cycles 1.00
ML-KEM-512 encaps 11108 cycles 11123 cycles 1.00
ML-KEM-512 decaps 15078 cycles 15126 cycles 1.00
ML-KEM-768 keypair 16586 cycles 16536 cycles 1.00
ML-KEM-768 encaps 17723 cycles 17836 cycles 0.99
ML-KEM-768 decaps 23385 cycles 23516 cycles 0.99
ML-KEM-1024 keypair 22298 cycles 22440 cycles 0.99
ML-KEM-1024 encaps 24416 cycles 24254 cycles 1.01
ML-KEM-1024 decaps 32087 cycles 31850 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 28525 cycles 29001 cycles 0.98
ML-KEM-512 encaps 36417 cycles 35123 cycles 1.04
ML-KEM-512 decaps 45540 cycles 44660 cycles 1.02
ML-KEM-768 keypair 48958 cycles 47791 cycles 1.02
ML-KEM-768 encaps 58392 cycles 57583 cycles 1.01
ML-KEM-768 decaps 70202 cycles 69633 cycles 1.01
ML-KEM-1024 keypair 73376 cycles 71515 cycles 1.03
ML-KEM-1024 encaps 85851 cycles 83296 cycles 1.03
ML-KEM-1024 decaps 101994 cycles 99912 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 4th gen (c7i) (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 encaps 36417 cycles 35123 cycles 1.04
ML-KEM-1024 encaps 85851 cycles 83296 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 50616 cycles 51095 cycles 0.99
ML-KEM-512 encaps 58394 cycles 59240 cycles 0.99
ML-KEM-512 decaps 74713 cycles 76953 cycles 0.97
ML-KEM-768 keypair 87822 cycles 86535 cycles 1.01
ML-KEM-768 encaps 95617 cycles 94269 cycles 1.01
ML-KEM-768 decaps 118833 cycles 117525 cycles 1.01
ML-KEM-1024 keypair 129952 cycles 130035 cycles 1.00
ML-KEM-1024 encaps 142718 cycles 141886 cycles 1.01
ML-KEM-1024 decaps 174120 cycles 173869 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 11761 cycles 11882 cycles 0.99
ML-KEM-512 encaps 13378 cycles 13408 cycles 1.00
ML-KEM-512 decaps 18229 cycles 18325 cycles 0.99
ML-KEM-768 keypair 20928 cycles 20573 cycles 1.02
ML-KEM-768 encaps 21788 cycles 21548 cycles 1.01
ML-KEM-768 decaps 28760 cycles 28753 cycles 1.00
ML-KEM-1024 keypair 28131 cycles 27718 cycles 1.01
ML-KEM-1024 encaps 29963 cycles 29917 cycles 1.00
ML-KEM-1024 decaps 39412 cycles 39244 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 16894 cycles 16846 cycles 1.00
ML-KEM-512 encaps 18604 cycles 18587 cycles 1.00
ML-KEM-512 decaps 24017 cycles 23921 cycles 1.00
ML-KEM-768 keypair 28547 cycles 28690 cycles 1.00
ML-KEM-768 encaps 29924 cycles 29693 cycles 1.01
ML-KEM-768 decaps 37790 cycles 37509 cycles 1.01
ML-KEM-1024 keypair 41156 cycles 41797 cycles 0.98
ML-KEM-1024 encaps 43396 cycles 44188 cycles 0.98
ML-KEM-1024 decaps 53881 cycles 54630 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 28321 cycles 28301 cycles 1.00
ML-KEM-512 encaps 34062 cycles 34094 cycles 1.00
ML-KEM-512 decaps 44316 cycles 44394 cycles 1.00
ML-KEM-768 keypair 48241 cycles 48260 cycles 1.00
ML-KEM-768 encaps 54244 cycles 54137 cycles 1.00
ML-KEM-768 decaps 68692 cycles 68631 cycles 1.00
ML-KEM-1024 keypair 70385 cycles 70477 cycles 1.00
ML-KEM-1024 encaps 78841 cycles 78880 cycles 1.00
ML-KEM-1024 decaps 98444 cycles 98478 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 17649 cycles 17658 cycles 1.00
ML-KEM-512 encaps 20622 cycles 20648 cycles 1.00
ML-KEM-512 decaps 27061 cycles 27099 cycles 1.00
ML-KEM-768 keypair 30251 cycles 30257 cycles 1.00
ML-KEM-768 encaps 32982 cycles 32953 cycles 1.00
ML-KEM-768 decaps 42184 cycles 42194 cycles 1.00
ML-KEM-1024 keypair 43827 cycles 43858 cycles 1.00
ML-KEM-1024 encaps 48859 cycles 48900 cycles 1.00
ML-KEM-1024 decaps 61580 cycles 61574 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 36214 cycles 36418 cycles 0.99
ML-KEM-512 encaps 42638 cycles 42824 cycles 1.00
ML-KEM-512 decaps 55417 cycles 55736 cycles 0.99
ML-KEM-768 keypair 59411 cycles 59610 cycles 1.00
ML-KEM-768 encaps 67646 cycles 67703 cycles 1.00
ML-KEM-768 decaps 84800 cycles 84883 cycles 1.00
ML-KEM-1024 keypair 88072 cycles 87511 cycles 1.01
ML-KEM-1024 encaps 98221 cycles 98447 cycles 1.00
ML-KEM-1024 decaps 120114 cycles 120095 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 39857 cycles 38475 cycles 1.04
ML-KEM-512 encaps 48111 cycles 47545 cycles 1.01
ML-KEM-512 decaps 62502 cycles 60958 cycles 1.03
ML-KEM-768 keypair 65134 cycles 63930 cycles 1.02
ML-KEM-768 encaps 75372 cycles 74866 cycles 1.01
ML-KEM-768 decaps 93932 cycles 92866 cycles 1.01
ML-KEM-1024 keypair 95141 cycles 94448 cycles 1.01
ML-KEM-1024 encaps 109209 cycles 108832 cycles 1.00
ML-KEM-1024 decaps 132320 cycles 131550 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 3rd gen (c6a) (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 39857 cycles 38475 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 16343 cycles 16272 cycles 1.00
ML-KEM-512 encaps 18549 cycles 18582 cycles 1.00
ML-KEM-512 decaps 25130 cycles 25048 cycles 1.00
ML-KEM-768 keypair 27542 cycles 29620 cycles 0.93
ML-KEM-768 encaps 29598 cycles 29856 cycles 0.99
ML-KEM-768 decaps 41443 cycles 39363 cycles 1.05
ML-KEM-1024 keypair 37564 cycles 37801 cycles 0.99
ML-KEM-1024 encaps 40338 cycles 40405 cycles 1.00
ML-KEM-1024 decaps 54071 cycles 54083 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-768 decaps 41443 cycles 39363 cycles 1.05

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 35444 cycles 35391 cycles 1.00
ML-KEM-512 encaps 40022 cycles 40758 cycles 0.98
ML-KEM-512 decaps 51165 cycles 51511 cycles 0.99
ML-KEM-768 keypair 58420 cycles 58800 cycles 0.99
ML-KEM-768 encaps 65442 cycles 66075 cycles 0.99
ML-KEM-768 decaps 79702 cycles 80060 cycles 1.00
ML-KEM-1024 keypair 87218 cycles 87780 cycles 0.99
ML-KEM-1024 encaps 97289 cycles 97071 cycles 1.00
ML-KEM-1024 decaps 115786 cycles 116309 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 18684 cycles 18685 cycles 1.00
ML-KEM-512 encaps 21974 cycles 22018 cycles 1.00
ML-KEM-512 decaps 29047 cycles 29017 cycles 1.00
ML-KEM-768 keypair 31943 cycles 31932 cycles 1.00
ML-KEM-768 encaps 35051 cycles 34994 cycles 1.00
ML-KEM-768 decaps 45015 cycles 45066 cycles 1.00
ML-KEM-1024 keypair 46287 cycles 46318 cycles 1.00
ML-KEM-1024 encaps 51612 cycles 51639 cycles 1.00
ML-KEM-1024 decaps 65225 cycles 65206 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 59323 cycles 59278 cycles 1.00
ML-KEM-512 encaps 68660 cycles 68898 cycles 1.00
ML-KEM-512 decaps 87376 cycles 87495 cycles 1.00
ML-KEM-768 keypair 99611 cycles 99102 cycles 1.01
ML-KEM-768 encaps 111130 cycles 111265 cycles 1.00
ML-KEM-768 decaps 135990 cycles 136232 cycles 1.00
ML-KEM-1024 keypair 149209 cycles 148897 cycles 1.00
ML-KEM-1024 encaps 165083 cycles 164445 cycles 1.00
ML-KEM-1024 decaps 196302 cycles 195870 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 45856 cycles 46140 cycles 0.99
ML-KEM-512 encaps 54825 cycles 54792 cycles 1.00
ML-KEM-512 decaps 70082 cycles 69955 cycles 1.00
ML-KEM-768 keypair 76363 cycles 75977 cycles 1.01
ML-KEM-768 encaps 86740 cycles 86806 cycles 1.00
ML-KEM-768 decaps 107333 cycles 106572 cycles 1.01
ML-KEM-1024 keypair 112553 cycles 110543 cycles 1.02
ML-KEM-1024 encaps 125545 cycles 124839 cycles 1.01
ML-KEM-1024 decaps 151731 cycles 150341 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 38862 cycles 38795 cycles 1.00
ML-KEM-512 encaps 44525 cycles 44883 cycles 0.99
ML-KEM-512 decaps 56649 cycles 56662 cycles 1.00
ML-KEM-768 keypair 64122 cycles 64171 cycles 1.00
ML-KEM-768 encaps 71985 cycles 72703 cycles 0.99
ML-KEM-768 decaps 87772 cycles 87936 cycles 1.00
ML-KEM-1024 keypair 95755 cycles 95674 cycles 1.00
ML-KEM-1024 encaps 106434 cycles 106231 cycles 1.00
ML-KEM-1024 decaps 126707 cycles 126780 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 59718 cycles 59677 cycles 1.00
ML-KEM-512 encaps 67098 cycles 67051 cycles 1.00
ML-KEM-512 decaps 85851 cycles 85797 cycles 1.00
ML-KEM-768 keypair 101885 cycles 101908 cycles 1.00
ML-KEM-768 encaps 113104 cycles 113148 cycles 1.00
ML-KEM-768 decaps 140131 cycles 139847 cycles 1.00
ML-KEM-1024 keypair 155144 cycles 155074 cycles 1.00
ML-KEM-1024 encaps 172555 cycles 172207 cycles 1.00
ML-KEM-1024 decaps 208114 cycles 208317 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 28346 cycles 28331 cycles 1.00
ML-KEM-512 encaps 34143 cycles 34042 cycles 1.00
ML-KEM-512 decaps 44427 cycles 44301 cycles 1.00
ML-KEM-768 keypair 48273 cycles 48258 cycles 1.00
ML-KEM-768 encaps 54147 cycles 54204 cycles 1.00
ML-KEM-768 decaps 68625 cycles 68683 cycles 1.00
ML-KEM-1024 keypair 70524 cycles 70523 cycles 1.00
ML-KEM-1024 encaps 78835 cycles 78791 cycles 1.00
ML-KEM-1024 decaps 98569 cycles 98341 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks

Benchmark suite Current: 478f245 Previous: 90fed62 Ratio
ML-KEM-512 keypair 155327 cycles 155256 cycles 1.00
ML-KEM-512 encaps 163181 cycles 163119 cycles 1.00
ML-KEM-512 decaps 206422 cycles 206392 cycles 1.00
ML-KEM-768 keypair 261310 cycles 260945 cycles 1.00
ML-KEM-768 encaps 276085 cycles 275525 cycles 1.00
ML-KEM-768 decaps 338416 cycles 337755 cycles 1.00
ML-KEM-1024 keypair 395622 cycles 395102 cycles 1.00
ML-KEM-1024 encaps 422736 cycles 422177 cycles 1.00
ML-KEM-1024 decaps 506687 cycles 505892 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@hanno-becker hanno-becker added enhancement New feature or request CBMC labels Nov 19, 2025
- Change mlk_polyvec back to struct `{ mlk_poly vec[MLKEM_K]; }`
- Change mlk_polymat to struct `{ mlk_polyvec vec[MLKEM_K]; }`
- Update all function signatures to use pointer style
- Fix all implementations to use struct member access
- Update tests, benchmarks, and CBMC harnesses
- Add consistent const annotations

Somewhat surprisingly and dissatisfyingly, I could not salvage
the CBMC proof for the 'monolithic' polymat_permute_bitrev_to_custom_native
but had to break it in two functions. It would be good to resolve
this as the split causes a lot of code-overhead for an entirely
trivial function.

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark this PR should be benchmarked in CI CBMC enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants