Skip to content

Commit 90bb8ae

Browse files
authored
Merge pull request #219 from tomato42/speed-single
Speed up single-shot verify
2 parents 86718ad + 351c40b commit 90bb8ae

File tree

5 files changed

+179
-73
lines changed

5 files changed

+179
-73
lines changed

README.md

Lines changed: 34 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -72,32 +72,35 @@ pip install ecdsa[gmpy]
7272

7373
The following table shows how long this library takes to generate keypairs
7474
(`keygen`), to sign data (`sign`), to verify those signatures (`verify`),
75-
and to derive a shared secret (`ecdh`).
75+
to derive a shared secret (`ecdh`), and
76+
to verify the signatures with no key specific precomputation (`no PC verify`).
7677
All those values are in seconds.
7778
For convenience, the inverses of those values are also provided:
7879
how many keys per second can be generated (`keygen/s`), how many signatures
7980
can be made per second (`sign/s`), how many signatures can be verified
80-
per second (`verify/s`), and how many shared secrets can be derived per second
81-
(`ecdh/s`). The size in bytes of a raw signature (generally the smallest
81+
per second (`verify/s`), how many shared secrets can be derived per second
82+
(`ecdh/s`), and how many signatures with no key specific
83+
precomputation can be verified per second (`no PC verify/s`). The size of raw
84+
signature (generally the smallest
8285
way a signature can be encoded) is also provided in the `siglen` column.
8386
Use `tox -e speed` to generate this table on your own computer.
8487
On an Intel Core i7 4790K @ 4.0GHz I'm getting the following performance:
8588

8689
```
87-
siglen keygen keygen/s sign sign/s verify verify/s
88-
NIST192p: 48 0.00035s 2893.02 0.00038s 2620.53 0.00069s 1458.92
89-
NIST224p: 56 0.00043s 2307.11 0.00048s 2092.00 0.00088s 1131.33
90-
NIST256p: 64 0.00056s 1793.70 0.00061s 1639.87 0.00113s 883.79
91-
NIST384p: 96 0.00116s 864.33 0.00124s 806.29 0.00233s 429.87
92-
NIST521p: 132 0.00221s 452.16 0.00234s 427.31 0.00460s 217.19
93-
SECP256k1: 64 0.00056s 1772.65 0.00061s 1628.73 0.00110s 912.13
94-
BRAINPOOLP160r1: 40 0.00026s 3801.86 0.00029s 3401.11 0.00052s 1930.47
95-
BRAINPOOLP192r1: 48 0.00034s 2925.73 0.00038s 2634.34 0.00070s 1438.06
96-
BRAINPOOLP224r1: 56 0.00044s 2287.98 0.00048s 2083.87 0.00088s 1137.52
97-
BRAINPOOLP256r1: 64 0.00056s 1774.11 0.00061s 1628.25 0.00112s 890.71
98-
BRAINPOOLP320r1: 80 0.00081s 1238.18 0.00087s 1146.71 0.00151s 661.95
99-
BRAINPOOLP384r1: 96 0.00117s 855.47 0.00124s 804.56 0.00241s 414.83
100-
BRAINPOOLP512r1: 128 0.00223s 447.99 0.00234s 427.49 0.00437s 229.09
90+
siglen keygen keygen/s sign sign/s verify verify/s no PC verify no PC verify/s
91+
NIST192p: 48 0.00033s 2991.13 0.00036s 2740.86 0.00067s 1502.11 0.00136s 737.54
92+
NIST224p: 56 0.00042s 2360.67 0.00046s 2190.16 0.00083s 1201.83 0.00170s 587.79
93+
NIST256p: 64 0.00053s 1872.02 0.00057s 1743.08 0.00103s 968.53 0.00219s 457.36
94+
NIST384p: 96 0.00110s 907.45 0.00116s 861.63 0.00218s 459.38 0.00445s 224.92
95+
NIST521p: 132 0.00214s 467.72 0.00223s 448.70 0.00430s 232.76 0.00888s 112.66
96+
SECP256k1: 64 0.00054s 1841.11 0.00058s 1722.33 0.00111s 903.07 0.00216s 464.01
97+
BRAINPOOLP160r1: 40 0.00026s 3780.81 0.00029s 3422.67 0.00054s 1863.09 0.00109s 914.93
98+
BRAINPOOLP192r1: 48 0.00034s 2942.79 0.00037s 2710.56 0.00070s 1435.59 0.00138s 724.79
99+
BRAINPOOLP224r1: 56 0.00044s 2278.35 0.00047s 2145.32 0.00090s 1115.34 0.00182s 549.72
100+
BRAINPOOLP256r1: 64 0.00055s 1832.95 0.00059s 1704.50 0.00110s 911.02 0.00234s 427.22
101+
BRAINPOOLP320r1: 80 0.00077s 1305.78 0.00082s 1222.47 0.00156s 640.27 0.00321s 311.56
102+
BRAINPOOLP384r1: 96 0.00112s 893.07 0.00118s 849.32 0.00228s 438.75 0.00478s 209.35
103+
BRAINPOOLP512r1: 128 0.00213s 470.08 0.00221s 451.98 0.00419s 238.70 0.00940s 106.44
101104
102105
ecdh ecdh/s
103106
NIST192p: 0.00110s 910.70
@@ -118,20 +121,20 @@ On an Intel Core i7 4790K @ 4.0GHz I'm getting the following performance:
118121
To test performance with `gmpy2` loaded, use `tox -e speedgmpy2`.
119122
On the same machine I'm getting the following performance with `gmpy2`:
120123
```
121-
siglen keygen keygen/s sign sign/s verify verify/s
122-
NIST192p: 48 0.00017s 5945.50 0.00018s 5544.66 0.00033s 3002.54
123-
NIST224p: 56 0.00021s 4742.14 0.00022s 4463.52 0.00044s 2248.59
124-
NIST256p: 64 0.00024s 4155.73 0.00025s 3994.28 0.00047s 2105.34
125-
NIST384p: 96 0.00041s 2415.06 0.00043s 2316.41 0.00085s 1177.18
126-
NIST521p: 132 0.00072s 1391.14 0.00074s 1359.63 0.00140s 716.31
127-
SECP256k1: 64 0.00024s 4216.50 0.00025s 3994.52 0.00047s 2120.57
128-
BRAINPOOLP160r1: 40 0.00014s 7038.99 0.00015s 6501.55 0.00029s 3397.79
129-
BRAINPOOLP192r1: 48 0.00017s 5983.18 0.00018s 5626.08 0.00035s 2843.62
130-
BRAINPOOLP224r1: 56 0.00021s 4727.54 0.00022s 4464.86 0.00043s 2326.84
131-
BRAINPOOLP256r1: 64 0.00024s 4221.00 0.00025s 4010.26 0.00049s 2046.40
132-
BRAINPOOLP320r1: 80 0.00032s 3142.14 0.00033s 3009.15 0.00061s 1652.88
133-
BRAINPOOLP384r1: 96 0.00041s 2415.98 0.00043s 2340.35 0.00083s 1198.77
134-
BRAINPOOLP512r1: 128 0.00064s 1567.27 0.00066s 1526.33 0.00127s 788.51
124+
siglen keygen keygen/s sign sign/s verify verify/s no PC verify no PC verify/s
125+
NIST192p: 48 0.00017s 5878.39 0.00018s 5670.66 0.00034s 2971.38 0.00067s 1484.97
126+
NIST224p: 56 0.00021s 4705.08 0.00022s 4587.19 0.00040s 2499.96 0.00088s 1140.97
127+
NIST256p: 64 0.00024s 4252.73 0.00024s 4108.48 0.00049s 2038.80 0.00096s 1043.03
128+
NIST384p: 96 0.00041s 2455.84 0.00042s 2406.31 0.00079s 1260.03 0.00172s 580.61
129+
NIST521p: 132 0.00070s 1419.16 0.00072s 1392.50 0.00139s 719.35 0.00307s 325.96
130+
SECP256k1: 64 0.00024s 4228.87 0.00024s 4086.32 0.00047s 2124.86 0.00096s 1037.53
131+
BRAINPOOLP160r1: 40 0.00014s 6932.12 0.00015s 6678.36 0.00030s 3387.90 0.00056s 1784.02
132+
BRAINPOOLP192r1: 48 0.00017s 5886.05 0.00017s 5720.63 0.00034s 2941.22 0.00067s 1490.87
133+
BRAINPOOLP224r1: 56 0.00021s 4748.89 0.00022s 4638.15 0.00041s 2460.86 0.00089s 1128.91
134+
BRAINPOOLP256r1: 64 0.00024s 4248.00 0.00024s 4135.19 0.00045s 2209.69 0.00099s 1006.45
135+
BRAINPOOLP320r1: 80 0.00032s 3096.85 0.00033s 3012.43 0.00065s 1547.07 0.00137s 728.60
136+
BRAINPOOLP384r1: 96 0.00041s 2436.12 0.00042s 2396.23 0.00083s 1211.13 0.00176s 568.39
137+
BRAINPOOLP512r1: 128 0.00063s 1580.09 0.00064s 1562.78 0.00129s 778.09 0.00279s 358.12
135138
136139
ecdh ecdh/s
137140
NIST192p: 0.00051s 1960.26

speed.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,8 @@ def do(setup_statements, statement):
1919
"{name:>16}{sep:1} {siglen:>6} {keygen:>9{form}}{unit:1} "
2020
"{keygen_inv:>9{form_inv}} {sign:>9{form}}{unit:1} "
2121
"{sign_inv:>9{form_inv}} {verify:>9{form}}{unit:1} "
22-
"{verify_inv:>9{form_inv}}"
22+
"{verify_inv:>9{form_inv}} {verify_single:>13{form}}{unit:1} "
23+
"{verify_single_inv:>14{form_inv}}"
2324
)
2425

2526
print(
@@ -31,6 +32,8 @@ def do(setup_statements, statement):
3132
sign_inv="sign/s",
3233
verify="verify",
3334
verify_inv="verify/s",
35+
verify_single="no PC verify",
36+
verify_single_inv="no PC verify/s",
3437
name="",
3538
sep="",
3639
unit="",
@@ -54,6 +57,7 @@ def do(setup_statements, statement):
5457
keygen = do([S1], S2)
5558
sign = do([S1, S2, S3], S4)
5659
verf = do([S1, S2, S3, S4, S5, S6], S7)
60+
verf_single = do([S1, S2, S3, S4, S5], S7)
5761
import ecdsa
5862

5963
c = getattr(ecdsa, curve)
@@ -70,6 +74,8 @@ def do(setup_statements, statement):
7074
sign_inv=1.0 / sign,
7175
verify=verf,
7276
verify_inv=1.0 / verf,
77+
verify_single=verf_single,
78+
verify_single_inv=1.0 / verf_single,
7379
form=".5f",
7480
form_inv=".2f",
7581
)

src/ecdsa/ellipticcurve.py

Lines changed: 76 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939
from gmpy2 import mpz
4040

4141
GMPY = True
42-
except ImportError:
42+
except ImportError: # pragma: no branch
4343
try:
4444
from gmpy import mpz
4545

@@ -57,7 +57,7 @@
5757
class CurveFp(object):
5858
"""Elliptic Curve over the field of integers modulo a prime."""
5959

60-
if GMPY:
60+
if GMPY: # pragma: no branch
6161

6262
def __init__(self, p, a, b, h=None):
6363
"""
@@ -75,7 +75,7 @@ def __init__(self, p, a, b, h=None):
7575
# gmpy with it
7676
self.__h = h
7777

78-
else:
78+
else: # pragma: no branch
7979

8080
def __init__(self, p, a, b, h=None):
8181
"""
@@ -164,12 +164,12 @@ def __init__(self, curve, x, y, z, order=None, generator=False):
164164
# since it's generally better (faster) to use scaled points vs unscaled
165165
# ones, use writer-biased RWLock for locking:
166166
self._update_lock = RWLock()
167-
if GMPY:
167+
if GMPY: # pragma: no branch
168168
self.__x = mpz(x)
169169
self.__y = mpz(y)
170170
self.__z = mpz(z)
171171
self.__order = order and mpz(order)
172-
else:
172+
else: # pragma: no branch
173173
self.__x = x
174174
self.__y = y
175175
self.__z = z
@@ -359,7 +359,8 @@ def from_affine(point, generator=False):
359359
point.curve(), point.x(), point.y(), 1, point.order(), generator
360360
)
361361

362-
# plese note that all the methods that use the equations from hyperelliptic
362+
# please note that all the methods that use the equations from
363+
# hyperelliptic
363364
# are formatted in a way to maximise performance.
364365
# Things that make code faster: multiplying instead of taking to the power
365366
# (`xx = x * x; xxxx = xx * xx % p` is faster than `xxxx = x**4 % p` and
@@ -389,7 +390,7 @@ def _double(self, X1, Y1, Z1, p, a):
389390
"""Add a point to itself, arbitrary z."""
390391
if Z1 == 1:
391392
return self._double_with_z_1(X1, Y1, p, a)
392-
if not Z1:
393+
if not Y1 or not Z1:
393394
return 0, 0, 1
394395
# after:
395396
# http://hyperelliptic.org/EFD/g1p/auto-shortw-jacobian.html#doubling-dbl-2007-bl
@@ -579,11 +580,11 @@ def _naf(mult):
579580
if mult % 2:
580581
nd = mult % 4
581582
if nd >= 2:
582-
nd = nd - 4
583-
ret += [nd]
583+
nd -= 4
584+
ret.append(nd)
584585
mult -= nd
585586
else:
586-
ret += [0]
587+
ret.append(0)
587588
mult //= 2
588589
return ret
589590

@@ -621,15 +622,6 @@ def __mul__(self, other):
621622

622623
return PointJacobi(self.__curve, X3, Y3, Z3, self.__order)
623624

624-
@staticmethod
625-
def _leftmost_bit(x):
626-
"""Return integer with the same magnitude as x but only one bit set"""
627-
assert x > 0
628-
result = 1
629-
while result <= x:
630-
result = 2 * result
631-
return result // 2
632-
633625
def mul_add(self, self_mul, other, other_mul):
634626
"""
635627
Do two multiplications at the same time, add results.
@@ -643,7 +635,7 @@ def mul_add(self, self_mul, other, other_mul):
643635
if not isinstance(other, PointJacobi):
644636
other = PointJacobi.from_affine(other)
645637
# when the points have precomputed answers, then multiplying them alone
646-
# is faster (as it uses NAF)
638+
# is faster (as it uses NAF and no point doublings)
647639
self._maybe_precompute()
648640
other._maybe_precompute()
649641
if self.__precompute and other.__precompute:
@@ -653,32 +645,76 @@ def mul_add(self, self_mul, other, other_mul):
653645
self_mul = self_mul % self.__order
654646
other_mul = other_mul % self.__order
655647

656-
i = self._leftmost_bit(max(self_mul, other_mul)) * 2
648+
# (X3, Y3, Z3) is the accumulator
657649
X3, Y3, Z3 = 0, 0, 1
658650
p, a = self.__curve.p(), self.__curve.a()
659-
self = self.scale()
660-
# after scaling, point is immutable, no need for locking
661-
X1, Y1 = self.__x, self.__y
662-
other = other.scale()
663-
X2, Y2 = other.__x, other.__y
664-
both = self + other
665-
if both is INFINITY:
666-
X4, Y4 = 0, 0
667-
else:
668-
both.scale()
669-
X4, Y4 = both.__x, both.__y
651+
652+
# as we have 6 unique points to work with, we can't scale all of them,
653+
# but do scale the ones that are used most often
654+
# (post scale() points are immutable so no need for locking)
655+
self.scale()
656+
X1, Y1, Z1 = self.__x, self.__y, self.__z
657+
other.scale()
658+
X2, Y2, Z2 = other.__x, other.__y, other.__z
659+
670660
_double = self._double
671661
_add = self._add
672-
while i > 1:
662+
663+
# with NAF we have 3 options: no add, subtract, add
664+
# so with 2 points, we have 9 combinations:
665+
# 0, -A, +A, -B, -A-B, +A-B, +B, -A+B, +A+B
666+
# so we need 4 combined points:
667+
mAmB_X, mAmB_Y, mAmB_Z = _add(X1, -Y1, Z1, X2, -Y2, Z2, p)
668+
pAmB_X, pAmB_Y, pAmB_Z = _add(X1, Y1, Z1, X2, -Y2, Z2, p)
669+
mApB_X, mApB_Y, mApB_Z = _add(X1, -Y1, Z1, X2, Y2, Z2, p)
670+
pApB_X, pApB_Y, pApB_Z = _add(X1, Y1, Z1, X2, Y2, Z2, p)
671+
# when the self and other sum to infinity, we need to add them
672+
# one by one to get correct result but as that's very unlikely to
673+
# happen in regular operation, we don't need to optimise this case
674+
if not pApB_Y or not pApB_Z:
675+
return self * self_mul + other * other_mul
676+
677+
# gmp object creation has cumulatively higher overhead than the
678+
# speedup we get from calculating the NAF using gmp so ensure use
679+
# of int()
680+
self_naf = list(reversed(self._naf(int(self_mul))))
681+
other_naf = list(reversed(self._naf(int(other_mul))))
682+
# ensure that the lists are the same length (zip() will truncate
683+
# longer one otherwise)
684+
if len(self_naf) < len(other_naf):
685+
self_naf = [0] * (len(other_naf) - len(self_naf)) + self_naf
686+
elif len(self_naf) > len(other_naf):
687+
other_naf = [0] * (len(self_naf) - len(other_naf)) + other_naf
688+
689+
for A, B in zip(self_naf, other_naf):
673690
X3, Y3, Z3 = _double(X3, Y3, Z3, p, a)
674-
i = i // 2
675691

676-
if self_mul & i and other_mul & i:
677-
X3, Y3, Z3 = _add(X3, Y3, Z3, X4, Y4, 1, p)
678-
elif self_mul & i:
679-
X3, Y3, Z3 = _add(X3, Y3, Z3, X1, Y1, 1, p)
680-
elif other_mul & i:
681-
X3, Y3, Z3 = _add(X3, Y3, Z3, X2, Y2, 1, p)
692+
# conditions ordered from most to least likely
693+
if A == 0:
694+
if B == 0:
695+
pass
696+
elif B < 0:
697+
X3, Y3, Z3 = _add(X3, Y3, Z3, X2, -Y2, Z2, p)
698+
else:
699+
assert B > 0
700+
X3, Y3, Z3 = _add(X3, Y3, Z3, X2, Y2, Z2, p)
701+
elif A < 0:
702+
if B == 0:
703+
X3, Y3, Z3 = _add(X3, Y3, Z3, X1, -Y1, Z1, p)
704+
elif B < 0:
705+
X3, Y3, Z3 = _add(X3, Y3, Z3, mAmB_X, mAmB_Y, mAmB_Z, p)
706+
else:
707+
assert B > 0
708+
X3, Y3, Z3 = _add(X3, Y3, Z3, mApB_X, mApB_Y, mApB_Z, p)
709+
else:
710+
assert A > 0
711+
if B == 0:
712+
X3, Y3, Z3 = _add(X3, Y3, Z3, X1, Y1, Z1, p)
713+
elif B < 0:
714+
X3, Y3, Z3 = _add(X3, Y3, Z3, pAmB_X, pAmB_Y, pAmB_Z, p)
715+
else:
716+
assert B > 0
717+
X3, Y3, Z3 = _add(X3, Y3, Z3, pApB_X, pApB_Y, pApB_Z, p)
682718

683719
if not Y3 or not Z3:
684720
return INFINITY

0 commit comments

Comments
 (0)