This commit drastically reduces the number of allocations needed to
deserialize a transaction and its scripts by using the combination of a
free list for initially deserializing the individual scripts along with
copying them into a single contiguous byte slice after the final size is
known and modifying each script in the transaction to point to its
location within the contiguous blob.
The end result is only a single allocation that holds all of the scripts
for a transaction regardless of the total number of scripts it has.
The script free list allows a maximum of 12,500 items with each buffer
being 512 bytes. This implies it will have a peak usage of 6.1MB. The
values were chosen based on profiling data and a desire to allow at
least 100 scripts per transaction to be simultaneously deserialized by
125 peers.
Also, while optimizing, decode directly into the existing previous
outpoint structure of each transaction input in order to avoid the extra
allocation per input that is otherwise caused when the local escapes to
the heap.
The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:
benchmark old allocs new allocs delta
-----------------------------------------------------------
ReadTxOut 1 0 -100.00%
ReadTxIn 2 0 -100.00%
DeserializeTxSmall 7 5 -28.57%
DeserializeTxLarge 11146 6 -99.95%
The current code involves a ton of small allocations which is harsh on
the garbage collector and in turn causes a lot of addition runtime
overhead both in terms of additional memory and processing time.
In order to improve the situation, this drasticially reduces the number
of allocations by creating contiguous slices of objects and
deserializing into them. Since the final data structures consist of
slices of pointers to the objects, they are constructed by pointing them
into the appropriate offset of the contiguous slice.
This could be improved upon even further by converting all of the data
structures provided the wire package to be slices of contiguous objects
directly, however that would be a major breaking API change and would
end up requiring updating a lot more code in every caller. I do think
that ultimately the API should be changed, but the changes in this
commit already makes a massive difference and it doesn't require
touching any of the callers, so it is a good place to begin.
The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:
benchmark old allocs new allocs delta
-----------------------------------------------------------
DeserializeTxLarge 16715 11146 -33.32%
DecodeGetHeaders 501 2 -99.60%
DecodeHeaders 2001 2 -99.90%
DecodeGetBlocks 501 2 -99.60%
DecodeAddr 3001 2002 -33.29%
DecodeInv 50003 3 -99.99%
DecodeNotFound 50002 3 -99.99%
DecodeMerkleBlock 107 3 -97.20%
Since the protocol encodes timestamps differently depending on the
message, the code currently decodes into a local variable and then
converts it to a time.Time. However, this causes an allocation due to
the local having to escape to the heap in order for the readElement
function to write to it.
So, in order to avoid that, this introduces two new types for a
timestamp named uint32Time and int64Time that are encoded as the
respective type on the read. When calling the readElements function,
the time.Time field in the message is cast to a pointer of the
appropriate type which effectively allows the allocations to be avoided.
The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:
benchmark old allocs new allocs delta
----------------------------------------------------------------------
ReadBlockHeader 1 0 -100.00%
DecodeHeaders 4001 2001 -49.99%
DecodeAddr 4001 3001 -24.99%
DecodeMerkleBlock 108 107 -0.93%
This introduces a new binary free list which provides a concurrent safe
list of unused buffers for the purpose of serializing and deserializing
primitive integers to their raw binary bytes.
For convenience, the type also provides functions for each of the
primitive unsigned integers that automatically obtain a buffer from the
free list, perform the necessary binary conversion, read from or write
to the given io.Reader or io.Writer, and return the buffer to the free
list.
A global instance of the type has been introduced with a maximum number
of 1024 items. Since each buffer is 8 bytes, it will consume a maximum
of 8KB. Theoretically, this value would only allow up to 1024 peers
simultaneously reading and writing without having to resort to burdening
the garbage collector with additional allocations. However, due to the
fact the code is designed in such a way that the buffers are quickly
used and returned to the free list, in practice it can support much more
than 1024 peers without involving the garbage collector since it is
highly unlikely every peer would need a buffer at the exact same time.
The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:
benchmark old allocs new allocs delta
-------------------------------------------------------------
WriteVarInt1 1 0 -100.00%
WriteVarInt3 1 0 -100.00%
WriteVarInt5 1 0 -100.00%
WriteVarInt9 1 0 -100.00%
ReadVarInt1 1 0 -100.00%
ReadVarInt3 1 0 -100.00%
ReadVarInt5 1 0 -100.00%
ReadVarInt9 1 0 -100.00%
ReadVarStr4 3 2 -33.33%
ReadVarStr10 3 2 -33.33%
WriteVarStr4 2 1 -50.00%
WriteVarStr10 2 1 -50.00%
ReadOutPoint 1 0 -100.00%
WriteOutPoint 1 0 -100.00%
ReadTxOut 3 1 -66.67%
WriteTxOut 2 0 -100.00%
ReadTxIn 5 2 -60.00%
WriteTxIn 3 0 -100.00%
DeserializeTxSmall 15 7 -53.33%
DeserializeTxLarge 33428 16715 -50.00%
SerializeTx 8 0 -100.00%
ReadBlockHeader 7 1 -85.71%
WriteBlockHeader 10 4 -60.00%
DecodeGetHeaders 1004 501 -50.10%
DecodeHeaders 18002 4001 -77.77%
DecodeGetBlocks 1004 501 -50.10%
DecodeAddr 9002 4001 -55.55%
DecodeInv 150005 50003 -66.67%
DecodeNotFound 150004 50002 -66.67%
DecodeMerkleBlock 222 108 -51.35%
TxSha 10 2 -80.00%
This adds two new flags, --txindex and --addrindex, to the addblock
utility which mirror the flags on btcd. They serve to to specify that
the transaction index and/or address index, respectively, should be
built while importing from the bootstrap file.
This is technically not 100% required since btcd will build the indexes
on the first load (when enabled) if they aren't already built, however
it is much faster to build the indexes as the blocks are being validated
(particularly for the address index), so this makes the capability
available.
This converts the project to allow btcd to be used with the glide
package manager in order to provide stable and reproducible builds
without the user having to jump through all of the hoops as they do
today.
It consists of adding a glide.yaml file which identifies the project
dependencies and locations along with a glide.lock file which contains
the complete dependency tree pinned to specific versions. Glide uses
these files to download the packages (or updates) to a local vendor
directory and checkout the correct pinned versions. The go tool, in
turn, is used to build/install btcd and will use the pinned versions in
the vendor directory.
This also updates TravisCI to build using glide, removes some of the
exceptions in the lint checks which are no longer required, and updates
the README.md with the new instructions needed to build the project with
glide.
This reduces the target ratio of freshly allocated data to live data to
10% in order to limit excessive overallocations by the garbage collector
during data bursts such as processing complex blocks or rapidly
receiving a lot of large transactions.
When an OS reboots or shuts down, it sends all processes SIGTERM before
sending SIGKILL. This allows btcd to do a proper shutdown which most
importantly closes the database.
This adds support for serving headers instead of inventory messages in
accordance with BIP0130. btcd itself does not yet make use of the
feature when receiving data.
This adds decode benchmarks for several of the messages that profiling
has identified to cause a lot of allocations in addition to those that
already exist. By adding these benchmarks, it makes it easier to get
allocation and speed statistics which can in turn be used to compare
future improvements.
The following bencharmarks have been added:
DecodeGetHeaders, DecodeHeaders, DecodeGetBlocks, DecodeAddr, DecodeInv,
DecodeNotFound, and DecodeMerkleBlock
For reference, here is the benchmark data as of this commit.
DecodeGetHeaders 93261 ns/op 24120 B/op 1004 allocs/op
DecodeHeaders 2071263 ns/op 368399 B/op 18002 allocs/op
DecodeGetBlocks 92486 ns/op 24120 B/op 1004 allocs/op
DecodeAddr 850608 ns/op 136202 B/op 9002 allocs/op
DecodeInv 17107172 ns/op 3601447 B/op 150004 allocs/op
DecodeNotFound 17522225 ns/op 3601444 B/op 150004 allocs/op
DecodeMerkleBlock 21062 ns/op 5192 B/op 222 allocs/op
This modifies the benchmarks in the wire package to avoid creating a new
reader for each iteration. This is useful since it means that showing
the memory allocations will only show the function under test instead of
the allocation for the benchmark setup as well.
The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:
benchmark old allocs new allocs delta
------------------------------------------------------------
ReadVarInt1 2 1 -50.00%
ReadVarInt3 2 1 -50.00%
ReadVarInt5 2 1 -50.00%
ReadVarInt9 2 1 -50.00%
ReadVarStr4 4 3 -25.00%
ReadVarStr10 4 3 -25.00%
ReadOutPoint 2 1 -50.00%
ReadTxOut 4 3 -25.00%
ReadTxIn 6 5 -16.67%
DeserializeTxSmall 16 15 -6.25%
DeserializeTxLarge 33430 33428 -0.01%
ReadBlockHeader 8 7 -12.50%
This adds a benchmark for deserializing a large transaction that is
often referred to as the megatransaction since it is the largest Bitcoin
transaction mined to date. It consists of 5569 inputs and 1 output and
its hash is:
bb41a757f405890fb0f5856228e23b715702d714d59bf2b1feb70d8b2b4e3e08.
This is being done so there is a benchmark that tests more of a
worst-case scenario which is a better candidate for identifying and
testing improvements.
The following benchmark results shows the how much more intensive this
transaction is over the existing mock transaction:
DeserializeTxSmall 1000000 1751 ns/op 376 B/op 16 allocs/op
DeserializeTxLarge 300 5093980 ns/op 1672829 B/op 33430 allocs/op
This removes the root field and all references to it from the BlockChain
since it is no longer required.
It was previously required because the chain state was not initialized
when the instance was created. However, that is no longer the case, so
there is no reason to keep it around any longer.
This changes the script template parsing function to use a pointer into
the constant global opcode array for parsed opcodes as opposed to making
a copy of the opcode entries which causes unnecessary allocations.
Profiling showed that after roughly 48 hours of operation, this
copy was the culprit of 207 million unnecessary allocations.
This removes the logging functions that are now implemented in the peer
package as they are no longer used by btcd itself and should have been
removed when they were copied into the peer package.
It is not the responsibility of mempool to relay transactions, so
return a slice of transactions accepted to the mempool due to the
passed transaction to the caller.
This improves the tests of the priority queue to include the secondary
sort ordering as well as adds some manual entries to ensure the edge
conditions are properly tested.
This also brings the priority queue test coverage up to 100%.
having 3 int32s above the uint64s in the struct
will cause misalignment for some 32-bit architectures.
see https://golang.org/pkg/sync/atomic/#pkg-note-BUG
This aligns bytesReceived and bytesSent.
Profiles discovered that lookups into the signature cache included an
expensive comparison to the stored `sigInfo` struct. This lookup had the
potential to be more expensive than directly verifying the signature
itself!
In addition, evictions were rather expensive because they involved
reading from /dev/urandom, or equivalent, for each eviction once the
signature cache was full as well as potentially iterating over every
item in the cache in the worst-case.
To remedy this poor performance several changes have been made:
* Change the lookup key to the fixed sized 32-byte signature hash
* Perform a full equality check only if there is a cache hit which
results in a significant speed up for both insertions and existence
checks
* Override entries in the case of a colliding hash on insert Add an
* .IsEqual() method to the Signature and PublicKey types in the
btcec package to facilitate easy equivalence testing
* Allocate the signature cache map with the max number of entries in
order to avoid unnecessary map re-sizes/allocations
* Optimize evictions from the signature cache Delete the first entry
* seen which is safe from manipulation due to
the pre image resistance of the hash function
* Double the default maximum number of entries within the signature
cache due to the reduction in the size of a cache entry
* With this eviction scheme, removals are effectively O(1)
Fixes#575.
The current code is needlessly checking the number of bytes needed to
serialize the unspentness bitmap in the utxo against a maximum value
that could never be returned because the function takes a uint32 output
index which is treated as a bit offset, and converts it bytes, which
will necessarily be less than a max uint32.
This check also causes a compile error on arm where native integers are
32 bits.
This simply removes the unneeded check.
This adds a test to ensure the priority queue works properly both for
sorting by fee per KB and priorities.
Thanks to @ceejep for the original test code and idea which was
subsequently modified and cleaned up a bit to the code here.
This introduces a new indexing infrastructure for supporting optional
indexes using the new database and blockchain infrastructure along with
two concrete indexer implementations which provide both a
transaction-by-hash and a transaction-by-address index.
The new infrastructure is mostly separated into a package named indexers
which is housed under the blockchain package. In order to support this,
a new interface named IndexManager has been introduced in the blockchain
package which provides methods to be notified when the chain has been
initialized and when blocks are connected and disconnected from the main
chain. A concrete implementation of an index manager is provided by the
new indexers package.
The new indexers package also provides a new interface named Indexer
which allows the index manager to manage concrete index implementations
which conform to the interface.
The following is high level overview of the main index infrastructure
changes:
- Define a new IndexManager interface in the blockchain package and
modify the package to make use of the interface when specified
- Create a new indexers package
- Provides an Index interface which allows concrete indexes to plugin
to an index manager
- Provides a concrete IndexManager implementation
- Handles the lifecycle of all indexes it manages
- Tracks the index tips
- Handles catching up disabled indexes that have been reenabled
- Handles reorgs while the index was disabled
- Invokes the appropriate methods for all managed indexes to allow
them to index and deindex the blocks and transactions
- Implement a transaction-by-hash index
- Makes use of internal block IDs to save a significant amount of
space and indexing costs over the old transaction index format
- Implement a transaction-by-address index
- Makes use of a leveling scheme in order to provide a good tradeoff
between space required and indexing costs
- Supports enabling and disabling indexes at will
- Support the ability to drop indexes if they are no longer desired
The following is an overview of the btcd changes:
- Add a new index logging subsystem
- Add new options --txindex and --addrindex in order to enable the
optional indexes
- NOTE: The transaction index will automatically be enabled when the
address index is enabled because it depends on it
- Add new options --droptxindex and --dropaddrindex to allow the indexes
to be removed
- NOTE: The address index will also be removed when the transaction
index is dropped because it depends on it
- Update getrawtransactions RPC to make use of the transaction index
- Reimplement the searchrawtransaction RPC that makes use of the address
index
- Update sample-btcd.conf to include sample usage for the new optional
index flags
This commit is the first stage of several that are planned to convert
the blockchain package into a concurrent safe package that will
ultimately allow support for multi-peer download and concurrent chain
processing. The goal is to update btcd proper after each step so it can
take advantage of the enhancements as they are developed.
In addition to the aforementioned benefit, this staged approach has been
chosen since it is absolutely critical to maintain consensus.
Separating the changes into several stages makes it easier for reviewers
to logically follow what is happening and therefore helps prevent
consensus bugs. Naturally there are significant automated tests to help
prevent consensus issues as well.
The main focus of this stage is to convert the blockchain package to use
the new database interface and implement the chain-related functionality
which it no longer handles. It also aims to improve efficiency in
various areas by making use of the new database and chain capabilities.
The following is an overview of the chain changes:
- Update to use the new database interface
- Add chain-related functionality that the old database used to handle
- Main chain structure and state
- Transaction spend tracking
- Implement a new pruned unspent transaction output (utxo) set
- Provides efficient direct access to the unspent transaction outputs
- Uses a domain specific compression algorithm that understands the
standard transaction scripts in order to significantly compress them
- Removes reliance on the transaction index and paves the way toward
eventually enabling block pruning
- Modify the New function to accept a Config struct instead of
inidividual parameters
- Replace the old TxStore type with a new UtxoViewpoint type that makes
use of the new pruned utxo set
- Convert code to treat the new UtxoViewpoint as a rolling view that is
used between connects and disconnects to improve efficiency
- Make best chain state always set when the chain instance is created
- Remove now unnecessary logic for dealing with unset best state
- Make all exported functions concurrent safe
- Currently using a single chain state lock as it provides a straight
forward and easy to review path forward however this can be improved
with more fine grained locking
- Optimize various cases where full blocks were being loaded when only
the header is needed to help reduce the I/O load
- Add the ability for callers to get a snapshot of the current best
chain stats in a concurrent safe fashion
- Does not block callers while new blocks are being processed
- Make error messages that reference transaction outputs consistently
use <transaction hash>:<output index>
- Introduce a new AssertError type an convert internal consistency
checks to use it
- Update tests and examples to reflect the changes
- Add a full suite of tests to ensure correct functionality of the new
code
The following is an overview of the btcd changes:
- Update to use the new database and chain interfaces
- Temporarily remove all code related to the transaction index
- Temporarily remove all code related to the address index
- Convert all code that uses transaction stores to use the new utxo
view
- Rework several calls that required the block manager for safe
concurrency to use the chain package directly now that it is
concurrent safe
- Change all calls to obtain the best hash to use the new best state
snapshot capability from the chain package
- Remove workaround for limits on fetching height ranges since the new
database interface no longer imposes them
- Correct the gettxout RPC handler to return the best chain hash as
opposed the hash the txout was found in
- Optimize various RPC handlers:
- Change several of the RPC handlers to use the new chain snapshot
capability to avoid needlessly loading data
- Update several handlers to use new functionality to avoid accessing
the block manager so they are able to return the data without
blocking when the server is busy processing blocks
- Update non-verbose getblock to avoid deserialization and
serialization overhead
- Update getblockheader to request the block height directly from
chain and only load the header
- Update getdifficulty to use the new cached data from chain
- Update getmininginfo to use the new cached data from chain
- Update non-verbose getrawtransaction to avoid deserialization and
serialization overhead
- Update gettxout to use the new utxo store versus loading
full transactions using the transaction index
The following is an overview of the utility changes:
- Update addblock to use the new database and chain interfaces
- Update findcheckpoint to use the new database and chain interfaces
- Remove the dropafter utility which is no longer supported
NOTE: The transaction index and address index will be reimplemented in
another commit.
mempoolPolicy contains the values that configure the mempool policy.
This decouples the values from the internals of btcd to move closer
to a mempool package.
Putting the test code in the same package makes it easier for forks
since they don't have to change the import paths as much and it also
gets rid of the need for internal_test.go to bridge.
This same thing should probably be done for the majority of the code
base.
This modifies the chaincfg package to register the default network
params via the init function instead of manually hard coding their data
into the maps. This is less error prone when adding new default
networks.
A new function named mustRegister has been introduced that panics if
there are any errors when registering the network that the new code
makes use of and appropriate tests have been added.
This optimizes the way in which the maps are limited by the block
manager.
Previously the code would read a cryptographically random value large
enough to construct a hash, find the first entry larger than that value,
and evict it.
That approach is quite inefficient and could easily become a bottleneck
when processing transactions due to the need to read from a source such
as /dev/urandom and all of the subsequent hash comparisons.
Luckily, strong cryptographic randomness is not needed here. The primary
intent of limiting the maps is to control memory usage with a secondary
concern of making it difficult for adversaries to force eviction of
specific entries.
Consequently, this changes the code to make use of the pseudorandom
iteration order of Go's maps along with the preimage resistance of the
hashing function to provide the desired functionality. It has
previously been discussed that the specific pseudorandom iteration order
is not guaranteed by the Go spec even though in practice that is how it
is implemented. This is not a concern however because even if the
specific compiler doesn't implement that, the preimage resistance of the
hashing function alone is enough.
Thanks to @Roasbeef for pointing out the efficiency concerns and the
fact that strong cryptographic randomness is not necessary.
This simply exports and adds some comments to the fields of the
BlockTemplate struct.
This is primarily being done as a step toward being able to separate the
mining code into its own package, but also it makes sense on its own
because code that requests new block template necessarily examines the
returned fields which implies they should be exported.
This prevents the node from repeatedly requesting and rejecting the
same transaction as different peers inv the same transaction.
Idea from Bitcoin Core commit 0847d9cb5fcd2fdd5a21bde699944d966cf5add9
Also, limit the number of both requested blocks and transactions.
The --blocksonly configuration option disables accepting transactions
from remote peers. It will still accept, relay, and rebroadcast
valid transactions sent via RPC or websockets.
This updates the Go versions using when running the TravisCI integration
tests to reflect the latest supported Go versions.
Also, the vet tool moved into the Go source tree as of Go 1.5. Its previous
location in the x/tools repo was deprecated at that time and has now
been removed.
Finally, old tool path is no longer needed, so it has been removed.
The vet tool moved into the Go source tree as of Go 1.5. Its previous
location in the x/tools repo was deprecated at that time and has now
been removed.
This commit updates the .travis.yml configuration to avoid fetching vet
from the old location and to simply use the version now available as
part of the standard Go install.
Also, while here, remove the check for changing the tool path since it
is no longer needed.
This modifies the peer package to add support for the sendheaders
protocol message introduced by BIP0030.
NOTE: This does not add support to btcd itself. That requires the server
and sync code to make use of the new functionality exposed by these
changes. As a result, btcd will still be using protocol version 70011.
This ensures the channel passed to QueueMessage is writable and that
QueueMessage will not read from the channel (write-only).
This change is merely a safety change. If a user of the API passes
a read-only channel to QueueMessage, it will now be caught at compile
time instead of panicking during runtime.
Also update internal functions.
When the RPC server is not running a buffered transaction notification
channel fills and eventually blocks. This commit ensures that the
channel continues to be drained irrespective of the RPC server status.