This refactors the script engine to store and step through raw scripts
by making using of the new zero-allocation script tokenizer as opposed
to the less efficient method of storing and stepping through parsed
opcodes. It also improves several aspects while refactoring such as
optimizing the disassembly trace, showing all scripts in the trace in
the case of execution failure, and providing additional comments
describing the purpose of each field in the engine.
It should be noted that this is a step towards removing the parsed
opcode struct and associated supporting code altogether, however, in
order to ease the review process, this retains the struct and all
function signatures for opcode execution which make use of an individual
parsed opcode. Those will be updated in future commits.
The following is an overview of the changes:
- Modify internal engine scripts slice to use raw scripts instead of
parsed opcodes
- Introduce a tokenizer to the engine to track the current script
- Remove no longer needed script offset parameter from the engine since
that is tracked by the tokenizer
- Add an opcode index counter for disassembly purposes to the engine
- Update check for valid program counter to only consider the script
index
- Update tests for bad program counter accordingly
- Rework the NewEngine function
- Store the raw scripts
- Setup the initial tokenizer
- Explicitly check against version 0 instead of DefaultScriptVersion
which would break consensus if changed
- Check the scripts parse according to version 0 semantics to retain
current consensus rules
- Improve comments throughout
- Rework the Step function
- Use the tokenizer and raw scripts
- Create a parsed opcode on the fly for now to retain existing
opcode execution function signatures
- Improve comments throughout
- Update the Execute function
- Explicitly check against version 0 instead of DefaultScriptVersion
which would break consensus if changed
- Improve the disassembly tracing in the case of error
- Update the CheckErrorCondition function
- Modify clean stack error message to make sense in all cases
- Improve the comments
- Update the DisasmPC and DisasmScript functions on the engine
- Use the tokenizer
- Optimize construction via the use of strings.Builder
- Modify the subScript function to return the raw script bytes since the
parsed opcodes are no longer stored
- Update the various signature checking opcodes to use the raw opcode
data removal and signature hash calculation functions since the
subscript is now a raw script
- opcodeCheckSig
- opcodeCheckMultiSig
- opcodeCheckSigAlt
This converts the engine's current program counter disasembly to make
use of the standalone disassembly function to remove the dependency on
the parsed opcode struct.
It also updates the tests accordingly.
This converts the checkMinimalDataPush function defined on a parsed
opcode to a standalone function which accepts an opcode and data slice
instead in order to make it more flexible for raw script analysis.
It also updates all callers accordingly.
This converts the isConditional function defined on a parsed opcode to a
standalone function named isOpcodeConditional which accepts an opcode as
a byte instead in order to make it more flexible for raw script
analysis.
It also updates all callers accordingly.
This converts the alwaysIllegal function defined on a parsed opcode to a
standalone function named isOpcodeAlwaysIllegal which accepts an opcode
as a byte instead in order to make it more flexible for raw script
analysis.
It also updates all callers accordingly.
This converts the isDisabled function defined on a parsed opcode to a
standalone function which accepts an opcode as a byte instead in order
to make it more flexible for raw script analysis.
It also updates all callers accordingly.
This introduces a new function named removeOpcodeByDataRaw which accepts
the raw scripts and data to remove versus requiring the parsed opcodes
to both significantly optimize it as well as make it more flexible for
working with raw scripts.
There are several places in the rest of the code that currently only
have access to the parsed opcodes, so this only introduces the function
for use in the future and deprecates the existing one.
Note that, in practice, the script will never actually contain the data
that is intended to be removed since the function is only used during
signature verification to remove the signature itself which would
require some incredibly non-standard code to create.
Thus, as an optimization, it avoids allocating a new script unless there
is actually a match that needs to be removed.
Finally, it updates the tests to use the new function.
This converts SignTxOutput and supporting funcs, namely sign,
mergeScripts and mergeMultiSig, to make use of the new tokenizer as well
as some recently added funcs that deal with raw scripts in order to
remove the reliance on parsed opcodes as a step towards utlimately
removing them altogether and updates the comments to explicitly call out
the script version semantics.
It is worth noting that this has the side effect of optimizing the
function as well, however, since this change is not focused on the
optimization aspects, no benchmarks are provided.
This moves the function definition for mergeMultiSig so it is more
consistent with the preferred order used through the codebase. In
particular, the functions are defined before they're first used and
generally as close as possible to the first use when they're defined in
the same file.
This completes the process of converting the ExtractPkScriptAddr
function to use the optimized extraction functions recently introduced
as part of the typeOfScript conversion.
In particular, this cleans up the final remaining case for non-standard
transactions. The method now returns NonStandardTy direclty if no other
branch was taken.
The following is a before and after comparison of attempting to extract
pkscript addrs from a very large, non-standard script.
benchmark old ns/op new ns/op delta
BenchmarkExtractPkScriptAddrsLarge-8 60713 17.0 -99.97%
BenchmarkExtractPkScriptAddrs-8 289 17.0 -94.12%
benchmark old allocs new allocs delta
BenchmarkExtractPkScriptAddrsLarge-8 1 0 -100.00%
BenchmarkExtractPkScriptAddrs-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkExtractPkScriptAddrsLarge-8 311299 0 -100.00%
BenchmarkExtractPkScriptAddrs-8 768 0 -100.00%
This continues the process of converting the ExtractPkScriptAddrs
function to use the optimized extraction functions recently introduced
as part of the typeOfScript conversion.
In particular, this converts the extract of witness-pay-to-script-hash
scripts.
This continues the process of converting the ExtractPkScriptAddrs
function to use the optimized extraction functions recently introduced
as part of the typeOfScript conversion.
In particular, this converts the extraction for witness-pubkey-hash
scripts.
This continues the process of converting the ExtractPkScriptAddrs
function to use the optimized extraction functions recently introduced
as part of the typeOfScript conversion.
In particular, this converts the detection for nulldata scripts, removes
the slow path fallback code since it is the final case, and modifies the
comment to call out the script version semantics.
The following is a before and after comparison of analyzing both a
typical standard script and a very large non-standard script:
benchmark old ns/op new ns/op delta
-----------------------------------------------------------------------
BenchmarkExtractPkScriptAddrsLarge 132400 44.4 -99.97%
BenchmarkExtractPkScriptAddrs 1265 231 -81.74%
benchmark old allocs new allocs delta
-----------------------------------------------------------------------
BenchmarkExtractPkScriptAddrsLarge 1 0 -100.00%
BenchmarkExtractPkScriptAddrs 5 2 -60.00%
benchmark old bytes new bytes delta
-----------------------------------------------------------------------
BenchmarkExtractPkScriptAddrsLarge 466944 0 -100.00%
BenchmarkExtractPkScriptAddrs 1600 48 -97.00%
This continues the process of converting the ExtractPkScriptAddrs
function to use the optimized extraction functions recently introduced
as part of the typeOfScript conversion.
In particular, this converts the detection for multisig scripts.
Also, since the remaining slow path cases are all recursive calls,
the parsed opcodes are no longer used, so parsing is removed.
This continues the process of converting the ExtractPkScriptAddrs
function to use the optimized extraction functions recently introduced
as part of the typeOfScript conversion.
In particular, this converts the detection for pay-to-pubkey scripts.
This continues the process of converting the ExtractPkScriptAddrs
function to use the optimized extraction functions recently introduced
as part of the typeOfScript conversion.
In particular, this converts the detection for pay-to-pubkey-hash
scripts.
This begins the process of converting the ExtractPkScriptAddrs function
to use the optimized extraction functions recently introduced as part of
the typeOfScript conversion.
In order to ease the review process, the detection of each script type
will be converted in a separate commit such that the script is only
parsed as a fallback for the cases that are not already converted to
more efficient variants.
In particular, this converts the detection for pay-to-script-hash
scripts.
This converts the ExtractAtomicSwapDataPushes function to make use of
the new tokenizer instead of the far less efficient parseScript thereby
significantly optimizing the function.
The new implementation is designed such that it should be fairly easy to
move the function into the atomic swap tools where it more naturally
belongs now that the tokenizer makes it possible to analyze scripts
outside of the txscript module. Consequently, this also deprecates the
function.
The following is a before and after comparison of attempting to extract
from both a typical atomic swap script and a very large non-atomic swap
script:
benchmark old ns/op new ns/op delta
BenchmarkExtractAtomicSwapDataPushesLarge-8 61332 44.4 -99.93%
BenchmarkExtractAtomicSwapDataPushes-8 990 260 -73.74%
benchmark old allocs new allocs delta
BenchmarkExtractAtomicSwapDataPushesLarge-8 1 0 -100.00%
BenchmarkExtractAtomicSwapDataPushes-8 2 1 -50.00%
benchmark old bytes new bytes delta
BenchmarkExtractAtomicSwapDataPushesLarge-8 311299 0 -100.00%
BenchmarkExtractAtomicSwapDataPushes-8 3168 96 -96.97%
This renames the canonicalPush function to isCanonicalPush and converts
it to accept an opcode as a byte and the associate data as a byte slice
instead of the internal parse opcode data struct in order to make it
more flexible for raw script analysis.
It also updates all callers and tests accordingly.
This converts the PushedData function to make use of the new tokenizer
instead of the far less efficient parseScript thereby significantly
optimizing the function.
Also, the comment is modified to explicitly call out the script version
semantics.
The following is a before and after comparison of extracting the data
from a very large script:
benchmark old ns/op new ns/op delta
BenchmarkPushedData-8 64837 1790 -97.24%
benchmark old allocs new allocs delta
BenchmarkPushedData-8 7 6 -14.29%
benchmark old bytes new bytes delta
BenchmarkPushedData-8 312816 1520 -99.51%
This converts the CalcMultiSigStats function to make use of the new
extractMultisigScriptDetails function instead of the far less efficient
parseScript thereby significantly optimizing the function.
The tests are also updated accordingly.
The following is a before and after comparison of analyzing a standard
multisig script:
benchmark old ns/op new ns/op delta
---------------------------------------------------------------
BenchmarkCalcMultiSigStats 972 79.5 -91.82%
benchmark old allocs new allocs delta
---------------------------------------------------------------
BenchmarkCalcMultiSigStats 1 0 -100.00%
benchmark old bytes new bytes delta
---------------------------------------------------------------
BenchmarkCalcMultiSigStats 2304 0 -100.00%
This converts CalcScriptInfo and dependent expectedInputs to make use of
the new script tokenizer as well as several of the other recently added
raw script analysis functions in order to remove the reliance on parsed
opcodes as a step towards utlimately removing them altogether.
It is worth noting that this has the side effect of significantly
optimizing the function as well, however, since it is deprecated, no
benchmarks are provided.
This concludes the process of converting the typeOfScript function to
use a combination of raw script analysis and the new tokenizer instead
of the far less efficient parsed opcodes.
In particular, it converts the detection of witness script hash scripts
to use raw script analysis and the new tokenizer.
With all of the limbs now useing optimized variants, the following is a
before an after comparison of calling GetScriptClass on a large script:
benchmark old ns/op new ns/op delta
BenchmarkGetScriptClass-8 61515 15.3 -99.98%
benchmark old allocs new allocs delta
BenchmarkGetScriptClass-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkGetScriptClass-8 311299 0 -100.00%
This continues the process of converting the typeOfScript function to
use a combination of raw script analysis and the new tokenizer instead
of the far less efficient parsed opcodes.
In particular, it converts the detection of witness pubkey hash scripts
to use raw script analysis and the new tokenizer.
The following is a before and after comparison of analyzing a large
script:
benchmark old ns/op new ns/op delta
BenchmarkIsWitnessPubKeyHash-8 61688 62839 +1.87%
benchmark old allocs new allocs delta
BenchmarkIsWitnessPubKeyHash-8 1 1 +0.00%
benchmark old bytes new bytes delta
BenchmarkIsWitnessPubKeyHash-8 311299 311299 +0.00%