This does the minimum work necessary to refactor the mempool code into
its own package. The idea is that separating this code into its own
package will greatly improve its testability, allow independent
benchmarking and profiling, and open up some interesting opportunities
for future development related to the memory pool.
There are likely some areas related to policy that could be further
refactored, however it is better to do that in future commits in order
to keep the changeset as small as possible during this refactor.
Overview of the major changes:
- Create the new package
- Move several files into the new package:
- mempool.go -> mempool/mempool.go
- mempoolerror.go -> mempool/error.go
- policy.go -> mempool/policy.go
- policy_test.go -> mempool/policy_test.go
- Update mempool logging to use the new mempool package logger
- Rename mempoolPolicy to Policy (so it's now mempool.Policy)
- Rename mempoolConfig to Config (so it's now mempool.Config)
- Rename mempoolTxDesc to TxDesc (so it's now mempool.TxDesc)
- Rename txMemPool to TxPool (so it's now mempool.TxPool)
- Move defaultBlockPrioritySize to the new package and export it
- Export DefaultMinRelayTxFee from the mempool package
- Export the CalcPriority function from the mempool package
- Introduce a new RawMempoolVerbose function on the TxPool and update
the RPC server to use it
- Update all references to the mempool to use the package.
- Add a skeleton README.md
This reduces the mempool lock contention by removing an unnecessary
check when responding to a "mempool" request.
In particular, the code first gets a list of all transactions from the
mempool and then iterates them in order to construct the inventory
vectors and apply bloom filtering if it is enabled. Since it is
possible that the transaction was removed from the mempool by another
thread while that list is being iterated, the code was checking if each
transaction was still in the mempool. This is a pointless check because
the transaction might still be removed at any point after the check
anyways. For example, it might be removed after the mempool response
has been sent to the remote peer or even while the loop is still
iterating.
This refactors the notification state mutex out of the state itself to
the client. This is being done since the code makes a copy of the
notification state and accesses that copy immutably, and therefore there
is no need for it to have its own mutex.
This makes vet happy by manually copying the notification state fields
during the deep copy instead of copying the entire struct which contains
an embedded mutex.
This exposes a new function on the ScriptBuilder type named AddOps that
allows multiple opcodes to be added via a single call and adds tests to
exercise the new function.
Finally, it updates a couple of places in the signing code that were
abusing the interface by setting its private script directly to use the
new public function instead.
This rewrites the shutdown logic to simplify the shutdown signalling.
All cleanup is now run from deferred functions in the main function and
channels are used to signal shutdown either from OS signals or from
other subsystems such as the RPC server and windows service controller.
The RPC server has been modified to use a new channel for signalling
shutdown that is exposed via the RequestedProcessShutdown function
instead of directly calling Stop on the server as it previously did.
Finally, it adds a few checks for early termination during the main
start sequence so the process can be stopped without starting all the
subsystems if desired.
This is a backport of the equivalent logic from Decred with a few slight
modifications. Credits go to @jrick.
This moves several of the chain constants to the Params struct in the
chaincfg package which is intended for that purpose. This is mostly a
backport of the same modifications made in Decred along with a few
additional things cleaned up.
The following is an overview of the changes:
- Comment all fields in the Params struct definition
- Add locals to BlockChain instance for the calculated values based on
the provided chain params
- Rename the following param fields:
- SubsidyHalvingInterval -> SubsidyReductionInterval
- ResetMinDifficulty -> ReduceMinDifficulty
- Add new Param fields:
- CoinbaseMaturity
- TargetTimePerBlock
- TargetTimespan
- BlocksPerRetarget
- RetargetAdjustmentFactor
- MinDiffReductionTime
This updates all code to make use of the new chainhash package since the
old wire.ShaHash type and related functions have been removed in favor
of the abstracted package.
Also, while here, rename all variables that included sha in their name
to include hash instead.
This is mostly a backport of some of the same modifications made in
Decred along with a few additional things cleaned up. In particular,
this updates the code to make use of the new chainhash package.
Also, since this required API changes anyways and the hash algorithm is
no longer tied specifically to SHA, all other functions throughout the
code base which had "Sha" in their name have been changed to Hash so
they are not incorrectly implying the hash algorithm.
The following is an overview of the changes:
- Remove the wire.ShaHash type
- Update all references to wire.ShaHash to the new chainhash.Hash type
- Rename the following functions and update all references:
- wire.BlockHeader.BlockSha -> BlockHash
- wire.MsgBlock.BlockSha -> BlockHash
- wire.MsgBlock.TxShas -> TxHashes
- wire.MsgTx.TxSha -> TxHash
- blockchain.ShaHashToBig -> HashToBig
- peer.ShaFunc -> peer.HashFunc
- Rename all variables that included sha in their name to include hash
instead
- Update for function name changes in other dependent packages such as
btcutil
- Update copyright dates on all modified files
- Update glide.lock file to use the required version of btcutil
This is a backport of the chainhash package made in Decred along with a
few additional things cleaned up, finished test coverage, and rewording
of some documentation to make it more generic.
In particular, the new package provides the definition of the hash type
and associated hashing functions which will allow the rest of the code to be
agnostic to the specific hash algorithm.
This only implements the package and does not change any of the code
base over to use it.
Putting the test code in the same package makes it easier for forks
since they don't have to change the import paths as much and it also
gets rid of the need for internal_test.go to bridge.
This same thing should probably be done for the majority of the code
base.
Rather than making the caller to pass in the median time source on
ProcessBlock and IsCurrent, modify the Config struct to include the
median time source and associate it with the chain instance when it is
created.
This is being done because both the ProcessBlock and IsCurrent functions
require access to the blockchain state already, it is a little bit safer
to ensure the time source matches the chain instance state, it
simplifies the caller logic, and it also allows its use within the logic
of the blockchain package itself which will be required by upcoming
rule change warning logic that is part of BIP9.
This commit drastically reduces the number of allocations needed to
deserialize a transaction and its scripts by using the combination of a
free list for initially deserializing the individual scripts along with
copying them into a single contiguous byte slice after the final size is
known and modifying each script in the transaction to point to its
location within the contiguous blob.
The end result is only a single allocation that holds all of the scripts
for a transaction regardless of the total number of scripts it has.
The script free list allows a maximum of 12,500 items with each buffer
being 512 bytes. This implies it will have a peak usage of 6.1MB. The
values were chosen based on profiling data and a desire to allow at
least 100 scripts per transaction to be simultaneously deserialized by
125 peers.
Also, while optimizing, decode directly into the existing previous
outpoint structure of each transaction input in order to avoid the extra
allocation per input that is otherwise caused when the local escapes to
the heap.
The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:
benchmark old allocs new allocs delta
-----------------------------------------------------------
ReadTxOut 1 0 -100.00%
ReadTxIn 2 0 -100.00%
DeserializeTxSmall 7 5 -28.57%
DeserializeTxLarge 11146 6 -99.95%
The current code involves a ton of small allocations which is harsh on
the garbage collector and in turn causes a lot of addition runtime
overhead both in terms of additional memory and processing time.
In order to improve the situation, this drasticially reduces the number
of allocations by creating contiguous slices of objects and
deserializing into them. Since the final data structures consist of
slices of pointers to the objects, they are constructed by pointing them
into the appropriate offset of the contiguous slice.
This could be improved upon even further by converting all of the data
structures provided the wire package to be slices of contiguous objects
directly, however that would be a major breaking API change and would
end up requiring updating a lot more code in every caller. I do think
that ultimately the API should be changed, but the changes in this
commit already makes a massive difference and it doesn't require
touching any of the callers, so it is a good place to begin.
The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:
benchmark old allocs new allocs delta
-----------------------------------------------------------
DeserializeTxLarge 16715 11146 -33.32%
DecodeGetHeaders 501 2 -99.60%
DecodeHeaders 2001 2 -99.90%
DecodeGetBlocks 501 2 -99.60%
DecodeAddr 3001 2002 -33.29%
DecodeInv 50003 3 -99.99%
DecodeNotFound 50002 3 -99.99%
DecodeMerkleBlock 107 3 -97.20%
Since the protocol encodes timestamps differently depending on the
message, the code currently decodes into a local variable and then
converts it to a time.Time. However, this causes an allocation due to
the local having to escape to the heap in order for the readElement
function to write to it.
So, in order to avoid that, this introduces two new types for a
timestamp named uint32Time and int64Time that are encoded as the
respective type on the read. When calling the readElements function,
the time.Time field in the message is cast to a pointer of the
appropriate type which effectively allows the allocations to be avoided.
The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:
benchmark old allocs new allocs delta
----------------------------------------------------------------------
ReadBlockHeader 1 0 -100.00%
DecodeHeaders 4001 2001 -49.99%
DecodeAddr 4001 3001 -24.99%
DecodeMerkleBlock 108 107 -0.93%
This introduces a new binary free list which provides a concurrent safe
list of unused buffers for the purpose of serializing and deserializing
primitive integers to their raw binary bytes.
For convenience, the type also provides functions for each of the
primitive unsigned integers that automatically obtain a buffer from the
free list, perform the necessary binary conversion, read from or write
to the given io.Reader or io.Writer, and return the buffer to the free
list.
A global instance of the type has been introduced with a maximum number
of 1024 items. Since each buffer is 8 bytes, it will consume a maximum
of 8KB. Theoretically, this value would only allow up to 1024 peers
simultaneously reading and writing without having to resort to burdening
the garbage collector with additional allocations. However, due to the
fact the code is designed in such a way that the buffers are quickly
used and returned to the free list, in practice it can support much more
than 1024 peers without involving the garbage collector since it is
highly unlikely every peer would need a buffer at the exact same time.
The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:
benchmark old allocs new allocs delta
-------------------------------------------------------------
WriteVarInt1 1 0 -100.00%
WriteVarInt3 1 0 -100.00%
WriteVarInt5 1 0 -100.00%
WriteVarInt9 1 0 -100.00%
ReadVarInt1 1 0 -100.00%
ReadVarInt3 1 0 -100.00%
ReadVarInt5 1 0 -100.00%
ReadVarInt9 1 0 -100.00%
ReadVarStr4 3 2 -33.33%
ReadVarStr10 3 2 -33.33%
WriteVarStr4 2 1 -50.00%
WriteVarStr10 2 1 -50.00%
ReadOutPoint 1 0 -100.00%
WriteOutPoint 1 0 -100.00%
ReadTxOut 3 1 -66.67%
WriteTxOut 2 0 -100.00%
ReadTxIn 5 2 -60.00%
WriteTxIn 3 0 -100.00%
DeserializeTxSmall 15 7 -53.33%
DeserializeTxLarge 33428 16715 -50.00%
SerializeTx 8 0 -100.00%
ReadBlockHeader 7 1 -85.71%
WriteBlockHeader 10 4 -60.00%
DecodeGetHeaders 1004 501 -50.10%
DecodeHeaders 18002 4001 -77.77%
DecodeGetBlocks 1004 501 -50.10%
DecodeAddr 9002 4001 -55.55%
DecodeInv 150005 50003 -66.67%
DecodeNotFound 150004 50002 -66.67%
DecodeMerkleBlock 222 108 -51.35%
TxSha 10 2 -80.00%
This adds two new flags, --txindex and --addrindex, to the addblock
utility which mirror the flags on btcd. They serve to to specify that
the transaction index and/or address index, respectively, should be
built while importing from the bootstrap file.
This is technically not 100% required since btcd will build the indexes
on the first load (when enabled) if they aren't already built, however
it is much faster to build the indexes as the blocks are being validated
(particularly for the address index), so this makes the capability
available.
This converts the project to allow btcd to be used with the glide
package manager in order to provide stable and reproducible builds
without the user having to jump through all of the hoops as they do
today.
It consists of adding a glide.yaml file which identifies the project
dependencies and locations along with a glide.lock file which contains
the complete dependency tree pinned to specific versions. Glide uses
these files to download the packages (or updates) to a local vendor
directory and checkout the correct pinned versions. The go tool, in
turn, is used to build/install btcd and will use the pinned versions in
the vendor directory.
This also updates TravisCI to build using glide, removes some of the
exceptions in the lint checks which are no longer required, and updates
the README.md with the new instructions needed to build the project with
glide.
This reduces the target ratio of freshly allocated data to live data to
10% in order to limit excessive overallocations by the garbage collector
during data bursts such as processing complex blocks or rapidly
receiving a lot of large transactions.
When an OS reboots or shuts down, it sends all processes SIGTERM before
sending SIGKILL. This allows btcd to do a proper shutdown which most
importantly closes the database.
This adds support for serving headers instead of inventory messages in
accordance with BIP0130. btcd itself does not yet make use of the
feature when receiving data.
This adds decode benchmarks for several of the messages that profiling
has identified to cause a lot of allocations in addition to those that
already exist. By adding these benchmarks, it makes it easier to get
allocation and speed statistics which can in turn be used to compare
future improvements.
The following bencharmarks have been added:
DecodeGetHeaders, DecodeHeaders, DecodeGetBlocks, DecodeAddr, DecodeInv,
DecodeNotFound, and DecodeMerkleBlock
For reference, here is the benchmark data as of this commit.
DecodeGetHeaders 93261 ns/op 24120 B/op 1004 allocs/op
DecodeHeaders 2071263 ns/op 368399 B/op 18002 allocs/op
DecodeGetBlocks 92486 ns/op 24120 B/op 1004 allocs/op
DecodeAddr 850608 ns/op 136202 B/op 9002 allocs/op
DecodeInv 17107172 ns/op 3601447 B/op 150004 allocs/op
DecodeNotFound 17522225 ns/op 3601444 B/op 150004 allocs/op
DecodeMerkleBlock 21062 ns/op 5192 B/op 222 allocs/op
This modifies the benchmarks in the wire package to avoid creating a new
reader for each iteration. This is useful since it means that showing
the memory allocations will only show the function under test instead of
the allocation for the benchmark setup as well.
The following is a before and after comparison of the allocations
with the benchmarks that did not change removed:
benchmark old allocs new allocs delta
------------------------------------------------------------
ReadVarInt1 2 1 -50.00%
ReadVarInt3 2 1 -50.00%
ReadVarInt5 2 1 -50.00%
ReadVarInt9 2 1 -50.00%
ReadVarStr4 4 3 -25.00%
ReadVarStr10 4 3 -25.00%
ReadOutPoint 2 1 -50.00%
ReadTxOut 4 3 -25.00%
ReadTxIn 6 5 -16.67%
DeserializeTxSmall 16 15 -6.25%
DeserializeTxLarge 33430 33428 -0.01%
ReadBlockHeader 8 7 -12.50%
This adds a benchmark for deserializing a large transaction that is
often referred to as the megatransaction since it is the largest Bitcoin
transaction mined to date. It consists of 5569 inputs and 1 output and
its hash is:
bb41a757f405890fb0f5856228e23b715702d714d59bf2b1feb70d8b2b4e3e08.
This is being done so there is a benchmark that tests more of a
worst-case scenario which is a better candidate for identifying and
testing improvements.
The following benchmark results shows the how much more intensive this
transaction is over the existing mock transaction:
DeserializeTxSmall 1000000 1751 ns/op 376 B/op 16 allocs/op
DeserializeTxLarge 300 5093980 ns/op 1672829 B/op 33430 allocs/op
This removes the root field and all references to it from the BlockChain
since it is no longer required.
It was previously required because the chain state was not initialized
when the instance was created. However, that is no longer the case, so
there is no reason to keep it around any longer.
This changes the script template parsing function to use a pointer into
the constant global opcode array for parsed opcodes as opposed to making
a copy of the opcode entries which causes unnecessary allocations.
Profiling showed that after roughly 48 hours of operation, this
copy was the culprit of 207 million unnecessary allocations.
This removes the logging functions that are now implemented in the peer
package as they are no longer used by btcd itself and should have been
removed when they were copied into the peer package.
It is not the responsibility of mempool to relay transactions, so
return a slice of transactions accepted to the mempool due to the
passed transaction to the caller.
This improves the tests of the priority queue to include the secondary
sort ordering as well as adds some manual entries to ensure the edge
conditions are properly tested.
This also brings the priority queue test coverage up to 100%.