Skip to content

Conversation

@sipa
Copy link
Contributor

@sipa sipa commented Dec 30, 2022

Using some insights learned from #1058, this replaces the fixed-wnaf ecmult_const algorithm with a signed-digit based one. Conceptually both algorithms are very similar, in that they boil down to summing precomputed odd multiples of the input points. Practically however, the new algorithm is simpler because it's just using scalar operations, rather than relying on wnaf machinery with skew terms to guarantee odd multipliers.

The idea is that we can compute $q \cdot A$ as follows:

  • Let $s = f(q)$, for some function $f()$.
  • Compute $(s_1, s_2)$ such that $s = s_1 + \lambda s_2$, using secp256k1_scalar_lambda_split.
  • Let $v_1 = s_1 + 2^{128}$ and $v_2 = s_2 + 2^{128}$ (such that the $v_i$ are positive and $n$ bits long).
  • Computing the result as $$\sum_{i=0}^{n-1} (2v_1[i]-1) 2^i A + \sum_{i=0}^{n-1} (2v_2[i]-1) 2^i \lambda A$$ where $x[i]$ stands for the i'th bit of $x$, so summing positive and negative powers of two times $A$, based on the bits of $v_1.$

The comments in ecmult_const_impl.h show that if $f(q) = (q + (1+\lambda)(2^n - 2^{129} - 1))/2 \mod n$, the result will equal $q \cdot A$.

This last step can be performed in groups of multiple bits at once, by looking up entries in a precomputed table of odd multiples of $A$ and $\lambda A$, and then multiplying by a power of two before proceeding to the next group.

The result is slightly faster (I measure ~2% speedup), but significantly simpler as it only uses scalar arithmetic to determine the table lookup values. The speedup is due to the fact that no skew corrections at the end are needed, and less overhead to determine table indices. The precomputed table sizes are also made independent from the ecmult ones, after observing that the optimal table size is bigger here (which also gives a small speedup).

Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa nice! The skew correction is really annoying to reason about...

I just don't know where to get all the review power from.

@sipa sipa force-pushed the 202212_sd_ecmult_const branch 2 times, most recently from a0c7934 to f16c500 Compare December 30, 2022 22:03
@sipa
Copy link
Contributor Author

sipa commented Dec 30, 2022

Added a commit to remove secp256k1_scalar_shr_int, which is now unused apart from tests. It's a net reduction diff now, even with all the comments it adds.

@sipa
Copy link
Contributor Author

sipa commented Jan 5, 2023

@peterdettman Randomly tagging you, you may find this interesting.

@apoelstra
Copy link
Contributor

Very cool! I wonder if the explanation might be clearer if you said something like "without any further optimizations, k0 would be 2^n - 1 with n = 256; however we are going to combine this scheme with the GLV endomorphism and do a windowed rather than bitwise approach. So instead we solve for k0" rather than "for some constant k0".

In the end I was able to understand what you were doing without much trouble, so what you've written is fine, but it was a bit intimidating to see k0, k1, k2 all introduced at once just as unknowns.

Copy link
Contributor

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK f16c500

@sipa sipa force-pushed the 202212_sd_ecmult_const branch from f16c500 to b4ff4eb Compare January 7, 2023 18:47
@sipa
Copy link
Contributor Author

sipa commented Jan 7, 2023

@apoelstra I've changed the explanation to introduce the offset terms a bit more gently.

I've also dropped the ecmult_const_get_scalar_bit_group function as secp256k1_scalar_get_bits_var can be used instead (the _var part is not an issue as it's only variable-time in the offset/length, not the scalar).

@apoelstra
Copy link
Contributor

@sipa this looks way better, thanks!

Copy link
Contributor

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK b4ff4eb

@sipa sipa force-pushed the 202212_sd_ecmult_const branch from b4ff4eb to 4de39ef Compare January 19, 2023 20:36
@sipa
Copy link
Contributor Author

sipa commented Jan 19, 2023

Rebased after merge of #1170 and #1190.

@sipa
Copy link
Contributor Author

sipa commented Jan 20, 2023

Mental note: instead of adding $2^{128}$ to the split scalars, I believe it's also possible to (a) conditionally negate the scalar (if it's negative) and then (b) swap the meaning of positive/negate in the table (alternatively, bitwise negate the integers read from the table). This would mean we only need enough table coverage for 128 bits rather than 129, which for window=4 means one fewer addition.

EDIT: no, doesn't work

@sipa sipa force-pushed the 202212_sd_ecmult_const branch from 4de39ef to 1c48861 Compare January 23, 2023 16:15
@sipa
Copy link
Contributor Author

sipa commented Jan 23, 2023

Oops, my previous rebase accidentally reverted the changes I made to address @apoelstra's comment. Re-rebased.

@sipa sipa force-pushed the 202212_sd_ecmult_const branch 2 times, most recently from 4436ce2 to a3b1a90 Compare February 10, 2023 21:57
@peterdettman
Copy link
Contributor

Mental note: instead of adding 2128 to the split scalars, I believe it's also possible to (a) conditionally negate the scalar (if it's negative) and then (b) swap the meaning of positive/negate in the table (alternatively, bitwise negate the integers read from the table). This would mean we only need enough table coverage for 128 bits rather than 129, which for window=4 means one fewer addition.

EDIT: no, doesn't work

Are you able to explain why that doesn't work?

@peterdettman
Copy link
Contributor

Nice idea, simple and clear.

  • I would prefer to use 2^128 as the offset, allowing the split to use the full two's complement range of [-2^128, 2^128). That generalizes better to other split schemes (I have in mind the basis reduction of https://ia.cr/2020/454).
  • The Straus ladder has the same need to deal with negative split outputs; should the mechanism be unified (one way or the other?)
  • I don't have a build to test the performance of the current _scalar_split_lambda, but it's doing quite a bit of unnecessary work compared to the "original" non-modular formula, which it seems could be done cmod 2^129 (i.e. 129-bit two's complement representation) - and e.g. avoid calculating the 384 bits that are shifted away. Not to belabor the point, but if the split outputs were in 129-bit two's complement, adding a 2^128 offset is just a bit flip (or maybe a single limb complement).

@sipa sipa force-pushed the 202212_sd_ecmult_const branch from a3b1a90 to dd21e2c Compare February 11, 2023 15:11
@sipa
Copy link
Contributor Author

sipa commented Feb 11, 2023

I've switched to offset $2^{128}$.

@peterdettman
Copy link
Contributor

peterdettman commented Feb 11, 2023

I think if _scalar_split_lambda just output 129-bit two's complement values (and then sign-extend to ECMULT_CONST_BITS), then they would be directly usable in the comb, no offset needed for s1, s2. Something for the future, perhaps.

@sipa sipa force-pushed the 202212_sd_ecmult_const branch from f773744 to c8eb787 Compare October 23, 2023 15:28
@sipa
Copy link
Contributor Author

sipa commented Oct 23, 2023

Rebased and addressed the comments above.

Copy link
Contributor

@jonasnick jonasnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK mod nits c8eb787

I added a couple of (micro-)nit fixups to my branch. Most notably, it tries to unify the first iteration in ecmult_const with the loop and adds an explanation for the index computation in ECMULT_CONST_TABLE_GET_GE.

Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK mod these nits

also utACK on jonas' fixups (except a typo and one further comment here: jonasnick@cdf545a )

I believe I've done a thorough review, but I really have only nits.

Copy link
Contributor

@stratospher stratospher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c8eb787.

was able to follow the useful comments and code! also observed a 2% speedup on my machine.

@sipa sipa force-pushed the 202212_sd_ecmult_const branch from c8eb787 to 355bbdf Compare November 4, 2023 19:56
Copy link
Contributor

@jonasnick jonasnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 355bbdf

Copy link
Contributor

@siv2r siv2r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 355bbdf

Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 355bbdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants