This commit optimizes the decompressPoint subroutine, used in extracting
compressed pubkeys and performing pubkey recovery. We do so by replacing
the use of big.Int.Exp with with square-and-multiply exponentiation of
btcec's more optimized fieldVals, reducing the overall latency and
memory requirements of decompressPoint.
Instead of operating on bits of Q = (P+1)/4, the exponentiation applies
the square-and-multiply operations on full bytes of Q. Compared to the
original speedup. Compared the bit-wise version, the improvement is
roughly 10%.
A new pair fieldVal methods called Sqrt and SqrtVal are added, which
applies the square-and-multiply exponentiation using precomputed
byte-slice of the value Q.
Comparison against big.Int sqrt and SAM sqrt over bytes of Q:
benchmark old ns/op new ns/op delta
BenchmarkParseCompressedPubKey-8 35545 23119 -34.96%
benchmark old allocs new allocs delta
BenchmarkParseCompressedPubKey-8 35 6 -82.86%
benchmark old bytes new bytes delta
BenchmarkParseCompressedPubKey-8 2777 256 -90.78%
This modifies the normalize function of the internal field value to
both optimize it and address an issue where the reduction could
lead to an incorrect result with a small range of values. It also adds
tests to ensure the behavior is correct.
The following benchmark shows the relative speedups as a result of the
optimization on my system. In particular, the changes result in
approximately a 14% speedup in Normalize, which ultimately translates to
a 2% speedup in signature verifies.
benchmark old ns/op new ns/op delta
--------------------------------------------------------------------
BenchmarkAddJacobian 1364 1289 -5.50%
BenchmarkAddJacobianNotZOne 3150 3091 -1.87%
BenchmarkScalarBaseMult 134117 132816 -0.97%
BenchmarkScalarBaseMultLarge 135067 132966 -1.56%
BenchmarkScalarMult 411218 402217 -2.19%
BenchmarkSigVerify 671585 657833 -2.05%
BenchmarkFieldNormalize 36.0 31.0 -13.89%
As noted in issue #706, the existing code had an issue where the
normalized result was > P when both the first and second words of the
field representation being normalized were BOTH greater than or equal to
the first and second words of P. Although this condition is rare in
practice, it needs to be handled properly.
This resolves the issue by comparing the low words in the final
reduction step against the normalized low order prime bits to ensure the
final subtraction occurs correctly any time they're > P. This approach
retains the constant time property as well.
This commit contains various modifications for code and comment
consistency in the btcec package:
- Call out references at the top and reference them by their identifier in
the other comments
- Remove a TODO that no longer applies
- Add comments to the fields in the KoblitzCurve struct and reorder them
slightly
- Make comments wrap to 80
- Cleanup code that was far exceeding col 80 (only function declarations
typically do this)
- Extend block comments to use as much of the 80 cols as available
- Add a bit more explanation in a couple of places
- Update copyright year on secp256k1.go
- Fix a couple of typos in the comments