This commit optimizes the decompressPoint subroutine, used in extracting
compressed pubkeys and performing pubkey recovery. We do so by replacing
the use of big.Int.Exp with with square-and-multiply exponentiation of
btcec's more optimized fieldVals, reducing the overall latency and
memory requirements of decompressPoint.
Instead of operating on bits of Q = (P+1)/4, the exponentiation applies
the square-and-multiply operations on full bytes of Q. Compared to the
original speedup. Compared the bit-wise version, the improvement is
roughly 10%.
A new pair fieldVal methods called Sqrt and SqrtVal are added, which
applies the square-and-multiply exponentiation using precomputed
byte-slice of the value Q.
Comparison against big.Int sqrt and SAM sqrt over bytes of Q:
benchmark old ns/op new ns/op delta
BenchmarkParseCompressedPubKey-8 35545 23119 -34.96%
benchmark old allocs new allocs delta
BenchmarkParseCompressedPubKey-8 35 6 -82.86%
benchmark old bytes new bytes delta
BenchmarkParseCompressedPubKey-8 2777 256 -90.78%
This modifies the normalize function of the internal field value to
both optimize it and address an issue where the reduction could
lead to an incorrect result with a small range of values. It also adds
tests to ensure the behavior is correct.
The following benchmark shows the relative speedups as a result of the
optimization on my system. In particular, the changes result in
approximately a 14% speedup in Normalize, which ultimately translates to
a 2% speedup in signature verifies.
benchmark old ns/op new ns/op delta
--------------------------------------------------------------------
BenchmarkAddJacobian 1364 1289 -5.50%
BenchmarkAddJacobianNotZOne 3150 3091 -1.87%
BenchmarkScalarBaseMult 134117 132816 -0.97%
BenchmarkScalarBaseMultLarge 135067 132966 -1.56%
BenchmarkScalarMult 411218 402217 -2.19%
BenchmarkSigVerify 671585 657833 -2.05%
BenchmarkFieldNormalize 36.0 31.0 -13.89%
Putting the test code in the same package makes it easier for forks
since they don't have to change the import paths as much and it also
gets rid of the need for internal_test.go to bridge.
Also, remove the exception from the lint checks about returning the
unexported type since it is no longer required.
This adds new tests to the TestNormalize, TestMul, TestAdd2 functions
which trigger an issue with modular reduction that was fixed in the
prevous commit to prevent regressions.