This implements an efficient and zero-allocation script tokenizer that
is exported to both provide a new capability to tokenize scripts to
external consumers of the API as well as to serve as a base for
refactoring the existing highly inefficient internal code.
It is important to note that this tokenizer is intended to be used in
consensus critical code in the future, so it must exactly follow the
existing semantics.
The current script parsing mechanism used throughout the txscript module
is to fully tokenize the scripts into an array of internal parsed
opcodes which are then examined and passed around in order to implement
virtually everything related to scripts.
While that approach does simplify the analysis of certain scripts and
thus provide some nice properties in that regard, it is both extremely
inefficient in many cases, and makes it impossible for external
consumers of the API to implement any form of custom script analysis
without manually implementing a bunch of error prone tokenizing code or,
alternatively, the script engine exposing internal structures.
For example, as shown by profiling the total memory allocations of an
initial sync, the existing script parsing code allocates a total of
around 295.12GB, which equates to around 50% of all allocations
performed. The zero-alloc tokenizer this introduces will allow that to
be reduced to virtually zero.
The following is a before and after comparison of tokenizing a large
script with a high opcode count using the existing code versus the
tokenizer this introduces for both speed and memory allocations:
benchmark old ns/op new ns/op delta
BenchmarkScriptParsing-8 63464 677 -98.93%
benchmark old allocs new allocs delta
BenchmarkScriptParsing-8 1 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkScriptParsing-8 311299 0 -100.00%
The following is an overview of the changes:
- Introduce new error code ErrUnsupportedScriptVersion
- Implement zero-allocation script tokenizer
- Add a full suite of tests to ensure the tokenizer works as intended
and follows the required consensus semantics
- Add an example of using the new tokenizer to count the number of
opcodes in a script
- Update README.md to include the new example
- Update script parsing benchmark to use the new tokenizer
The github markdown interpreter has been changed such that it no longer
allows spaces in between the brackets and parenthesis of links and now
requires a newline in between anchors and other formatting. This
updates all of the markdown files accordingly.
While here, it also corrects a couple of inconsistencies in some of the
README.md files.
First, it removes the documentation section from all the README.md files
and instead puts a web-based godoc badge and link at the top with the
other badges. This is being done since the local godoc tool no longer
ships with Go by default, so the instructions no longer work without
first installing godoc. Due to this, pretty much everyone uses the
web-based godoc these days anyways. Anyone who has manually installed
godoc won't need instructions.
Second, it makes sure the ISC license badge is at the top with the other
badges and removes the textual reference in the overview section.
Finally, it's modifies the Installation section to Installation and
Updating and adds a '-u' to the 'go get' command since it works for both
and thus is simpler.
This commit adds a new example to the txscript package that demonstrates
creating a new transaction which redeems funds and signing the referenced
transaction output the SignTxOutput function.
This commit contains the entire btcscript repository along with several
changes needed to move all of the files into the txscript directory in
order to prepare it for merging. This does NOT update btcd or any of the
other packages to use the new location as that will be done separately.
- All import paths in the old btcscript test files have been changed to the
new location
- All references to btcscript as the package name have been chagned to
txscript
This is ongoing work toward #214.