lbcd/database/internal/treap/immutable.go

387 lines
11 KiB
Go
Raw Normal View History

database: Major redesign of database package. This commit contains a complete redesign and rewrite of the database package that approaches things in a vastly different manner than the previous version. This is the first part of several stages that will be needed to ultimately make use of this new package. Some of the reason for this were discussed in #255, however a quick summary is as follows: - The previous database could only contain blocks on the main chain and reorgs required deleting the blocks from the database. This made it impossible to store orphans and could make external RPC calls for information about blocks during the middle of a reorg fail. - The previous database interface forced a high level of bitcoin-specific intelligence such as spend tracking into each backend driver. - The aforementioned point led to making it difficult to implement new backend drivers due to the need to repeat a lot of non-trivial logic which is better handled at a higher layer, such as the blockchain package. - The old database stored all blocks in leveldb. This made it extremely inefficient to do things such as lookup headers and individual transactions since the entire block had to be loaded from leveldb (which entails it doing data copies) to get access. In order to address all of these concerns, and others not mentioned, the database interface has been redesigned as follows: - Two main categories of functionality are provided: block storage and metadata storage - All block storage and metadata storage are done via read-only and read-write MVCC transactions with both manual and managed modes - Support for multiple concurrent readers and a single writer - Readers use a snapshot and therefore are not blocked by the writer - Some key properties of the block storage and retrieval API: - It is generic and does NOT contain additional bitcoin logic such spend tracking and block linking - Provides access to the raw serialized bytes so deserialization is not forced for callers that don't need it - Support for fetching headers via independent functions which allows implementations to provide significant optimizations - Ability to efficiently retrieve arbitrary regions of blocks (transactions, scripts, etc) - A rich metadata storage API is provided: - Key/value with arbitrary data - Support for buckets and nested buckets - Bucket iteration through a couple of different mechanisms - Cursors for efficient and direct key seeking - Supports registration of backend database implementations - Comprehensive test coverage - Provides strong documentation with example usage This commit also contains an implementation of the previously discussed interface named ffldb (flat file plus leveldb metadata backend). Here is a quick overview: - Highly optimized for read performance with consistent write performance regardless of database size - All blocks are stored in flat files on the file system - Bulk block region fetching is optimized to perform linear reads which improves performance on spindle disks - Anti-corruption mechanisms: - Flat files contain full block checksums to quickly an easily detect database corruption without needing to do expensive merkle root calculations - Metadata checksums - Open reconciliation - Extensive test coverage: - Comprehensive blackbox interface testing - Whitebox testing which uses intimate knowledge to exercise uncommon failure paths such as deleting files out from under the database - Corruption tests (replacing random data in the files) In addition, this commit also contains a new tool under the new database directory named dbtool which provides a few basic commands for testing the database. It is designed around commands, so it could be useful to expand on in the future. Finally, this commit addresses the following issues: - Adds support for and therefore closes #255 - Fixes #199 - Fixes #201 - Implements and closes #256 - Obsoletes and closes #257 - Closes #247 once the required chain and btcd modifications are in place to make use of this new code
2016-02-03 18:42:04 +01:00
// Copyright (c) 2015-2016 The btcsuite developers
// Use of this source code is governed by an ISC
// license that can be found in the LICENSE file.
package treap
import (
"bytes"
"math/rand"
2021-07-22 17:24:05 +02:00
"sync"
database: Major redesign of database package. This commit contains a complete redesign and rewrite of the database package that approaches things in a vastly different manner than the previous version. This is the first part of several stages that will be needed to ultimately make use of this new package. Some of the reason for this were discussed in #255, however a quick summary is as follows: - The previous database could only contain blocks on the main chain and reorgs required deleting the blocks from the database. This made it impossible to store orphans and could make external RPC calls for information about blocks during the middle of a reorg fail. - The previous database interface forced a high level of bitcoin-specific intelligence such as spend tracking into each backend driver. - The aforementioned point led to making it difficult to implement new backend drivers due to the need to repeat a lot of non-trivial logic which is better handled at a higher layer, such as the blockchain package. - The old database stored all blocks in leveldb. This made it extremely inefficient to do things such as lookup headers and individual transactions since the entire block had to be loaded from leveldb (which entails it doing data copies) to get access. In order to address all of these concerns, and others not mentioned, the database interface has been redesigned as follows: - Two main categories of functionality are provided: block storage and metadata storage - All block storage and metadata storage are done via read-only and read-write MVCC transactions with both manual and managed modes - Support for multiple concurrent readers and a single writer - Readers use a snapshot and therefore are not blocked by the writer - Some key properties of the block storage and retrieval API: - It is generic and does NOT contain additional bitcoin logic such spend tracking and block linking - Provides access to the raw serialized bytes so deserialization is not forced for callers that don't need it - Support for fetching headers via independent functions which allows implementations to provide significant optimizations - Ability to efficiently retrieve arbitrary regions of blocks (transactions, scripts, etc) - A rich metadata storage API is provided: - Key/value with arbitrary data - Support for buckets and nested buckets - Bucket iteration through a couple of different mechanisms - Cursors for efficient and direct key seeking - Supports registration of backend database implementations - Comprehensive test coverage - Provides strong documentation with example usage This commit also contains an implementation of the previously discussed interface named ffldb (flat file plus leveldb metadata backend). Here is a quick overview: - Highly optimized for read performance with consistent write performance regardless of database size - All blocks are stored in flat files on the file system - Bulk block region fetching is optimized to perform linear reads which improves performance on spindle disks - Anti-corruption mechanisms: - Flat files contain full block checksums to quickly an easily detect database corruption without needing to do expensive merkle root calculations - Metadata checksums - Open reconciliation - Extensive test coverage: - Comprehensive blackbox interface testing - Whitebox testing which uses intimate knowledge to exercise uncommon failure paths such as deleting files out from under the database - Corruption tests (replacing random data in the files) In addition, this commit also contains a new tool under the new database directory named dbtool which provides a few basic commands for testing the database. It is designed around commands, so it could be useful to expand on in the future. Finally, this commit addresses the following issues: - Adds support for and therefore closes #255 - Fixes #199 - Fixes #201 - Implements and closes #256 - Obsoletes and closes #257 - Closes #247 once the required chain and btcd modifications are in place to make use of this new code
2016-02-03 18:42:04 +01:00
)
2021-07-22 17:24:05 +02:00
var nodePool = &sync.Pool{New: func() interface{} { return newTreapNode(nil, nil, 0) }}
database: Major redesign of database package. This commit contains a complete redesign and rewrite of the database package that approaches things in a vastly different manner than the previous version. This is the first part of several stages that will be needed to ultimately make use of this new package. Some of the reason for this were discussed in #255, however a quick summary is as follows: - The previous database could only contain blocks on the main chain and reorgs required deleting the blocks from the database. This made it impossible to store orphans and could make external RPC calls for information about blocks during the middle of a reorg fail. - The previous database interface forced a high level of bitcoin-specific intelligence such as spend tracking into each backend driver. - The aforementioned point led to making it difficult to implement new backend drivers due to the need to repeat a lot of non-trivial logic which is better handled at a higher layer, such as the blockchain package. - The old database stored all blocks in leveldb. This made it extremely inefficient to do things such as lookup headers and individual transactions since the entire block had to be loaded from leveldb (which entails it doing data copies) to get access. In order to address all of these concerns, and others not mentioned, the database interface has been redesigned as follows: - Two main categories of functionality are provided: block storage and metadata storage - All block storage and metadata storage are done via read-only and read-write MVCC transactions with both manual and managed modes - Support for multiple concurrent readers and a single writer - Readers use a snapshot and therefore are not blocked by the writer - Some key properties of the block storage and retrieval API: - It is generic and does NOT contain additional bitcoin logic such spend tracking and block linking - Provides access to the raw serialized bytes so deserialization is not forced for callers that don't need it - Support for fetching headers via independent functions which allows implementations to provide significant optimizations - Ability to efficiently retrieve arbitrary regions of blocks (transactions, scripts, etc) - A rich metadata storage API is provided: - Key/value with arbitrary data - Support for buckets and nested buckets - Bucket iteration through a couple of different mechanisms - Cursors for efficient and direct key seeking - Supports registration of backend database implementations - Comprehensive test coverage - Provides strong documentation with example usage This commit also contains an implementation of the previously discussed interface named ffldb (flat file plus leveldb metadata backend). Here is a quick overview: - Highly optimized for read performance with consistent write performance regardless of database size - All blocks are stored in flat files on the file system - Bulk block region fetching is optimized to perform linear reads which improves performance on spindle disks - Anti-corruption mechanisms: - Flat files contain full block checksums to quickly an easily detect database corruption without needing to do expensive merkle root calculations - Metadata checksums - Open reconciliation - Extensive test coverage: - Comprehensive blackbox interface testing - Whitebox testing which uses intimate knowledge to exercise uncommon failure paths such as deleting files out from under the database - Corruption tests (replacing random data in the files) In addition, this commit also contains a new tool under the new database directory named dbtool which provides a few basic commands for testing the database. It is designed around commands, so it could be useful to expand on in the future. Finally, this commit addresses the following issues: - Adds support for and therefore closes #255 - Fixes #199 - Fixes #201 - Implements and closes #256 - Obsoletes and closes #257 - Closes #247 once the required chain and btcd modifications are in place to make use of this new code
2016-02-03 18:42:04 +01:00
// cloneTreapNode returns a shallow copy of the passed node.
func cloneTreapNode(node *treapNode) *treapNode {
2021-07-22 17:24:05 +02:00
clone := nodePool.Get().(*treapNode)
clone.key = node.key
clone.value = node.value
clone.priority = node.priority
clone.left = node.left
clone.right = node.right
return clone
database: Major redesign of database package. This commit contains a complete redesign and rewrite of the database package that approaches things in a vastly different manner than the previous version. This is the first part of several stages that will be needed to ultimately make use of this new package. Some of the reason for this were discussed in #255, however a quick summary is as follows: - The previous database could only contain blocks on the main chain and reorgs required deleting the blocks from the database. This made it impossible to store orphans and could make external RPC calls for information about blocks during the middle of a reorg fail. - The previous database interface forced a high level of bitcoin-specific intelligence such as spend tracking into each backend driver. - The aforementioned point led to making it difficult to implement new backend drivers due to the need to repeat a lot of non-trivial logic which is better handled at a higher layer, such as the blockchain package. - The old database stored all blocks in leveldb. This made it extremely inefficient to do things such as lookup headers and individual transactions since the entire block had to be loaded from leveldb (which entails it doing data copies) to get access. In order to address all of these concerns, and others not mentioned, the database interface has been redesigned as follows: - Two main categories of functionality are provided: block storage and metadata storage - All block storage and metadata storage are done via read-only and read-write MVCC transactions with both manual and managed modes - Support for multiple concurrent readers and a single writer - Readers use a snapshot and therefore are not blocked by the writer - Some key properties of the block storage and retrieval API: - It is generic and does NOT contain additional bitcoin logic such spend tracking and block linking - Provides access to the raw serialized bytes so deserialization is not forced for callers that don't need it - Support for fetching headers via independent functions which allows implementations to provide significant optimizations - Ability to efficiently retrieve arbitrary regions of blocks (transactions, scripts, etc) - A rich metadata storage API is provided: - Key/value with arbitrary data - Support for buckets and nested buckets - Bucket iteration through a couple of different mechanisms - Cursors for efficient and direct key seeking - Supports registration of backend database implementations - Comprehensive test coverage - Provides strong documentation with example usage This commit also contains an implementation of the previously discussed interface named ffldb (flat file plus leveldb metadata backend). Here is a quick overview: - Highly optimized for read performance with consistent write performance regardless of database size - All blocks are stored in flat files on the file system - Bulk block region fetching is optimized to perform linear reads which improves performance on spindle disks - Anti-corruption mechanisms: - Flat files contain full block checksums to quickly an easily detect database corruption without needing to do expensive merkle root calculations - Metadata checksums - Open reconciliation - Extensive test coverage: - Comprehensive blackbox interface testing - Whitebox testing which uses intimate knowledge to exercise uncommon failure paths such as deleting files out from under the database - Corruption tests (replacing random data in the files) In addition, this commit also contains a new tool under the new database directory named dbtool which provides a few basic commands for testing the database. It is designed around commands, so it could be useful to expand on in the future. Finally, this commit addresses the following issues: - Adds support for and therefore closes #255 - Fixes #199 - Fixes #201 - Implements and closes #256 - Obsoletes and closes #257 - Closes #247 once the required chain and btcd modifications are in place to make use of this new code
2016-02-03 18:42:04 +01:00
}
// Immutable represents a treap data structure which is used to hold ordered
// key/value pairs using a combination of binary search tree and heap semantics.
// It is a self-organizing and randomized data structure that doesn't require
// complex operations to maintain balance. Search, insert, and delete
// operations are all O(log n). In addition, it provides O(1) snapshots for
// multi-version concurrency control (MVCC).
//
// All operations which result in modifying the treap return a new version of
// the treap with only the modified nodes updated. All unmodified nodes are
// shared with the previous version. This is extremely useful in concurrent
// applications since the caller only has to atomically replace the treap
// pointer with the newly returned version after performing any mutations. All
// readers can simply use their existing pointer as a snapshot since the treap
// it points to is immutable. This effectively provides O(1) snapshot
// capability with efficient memory usage characteristics since the old nodes
// only remain allocated until there are no longer any references to them.
type Immutable struct {
root *treapNode
count int
// totalSize is the best estimate of the total size of of all data in
// the treap including the keys, values, and node sizes.
totalSize uint64
}
// newImmutable returns a new immutable treap given the passed parameters.
func newImmutable(root *treapNode, count int, totalSize uint64) *Immutable {
return &Immutable{root: root, count: count, totalSize: totalSize}
}
// Len returns the number of items stored in the treap.
func (t *Immutable) Len() int {
return t.count
}
// Size returns a best estimate of the total number of bytes the treap is
// consuming including all of the fields used to represent the nodes as well as
// the size of the keys and values. Shared values are not detected, so the
// returned size assumes each value is pointing to different memory.
func (t *Immutable) Size() uint64 {
return t.totalSize
}
// get returns the treap node that contains the passed key. It will return nil
// when the key does not exist.
func (t *Immutable) get(key []byte) *treapNode {
for node := t.root; node != nil; {
// Traverse left or right depending on the result of the
// comparison.
compareResult := bytes.Compare(key, node.key)
if compareResult < 0 {
node = node.left
continue
}
if compareResult > 0 {
node = node.right
continue
}
// The key exists.
return node
}
// A nil node was reached which means the key does not exist.
return nil
}
// Has returns whether or not the passed key exists.
func (t *Immutable) Has(key []byte) bool {
if node := t.get(key); node != nil {
return true
}
return false
}
// Get returns the value for the passed key. The function will return nil when
// the key does not exist.
func (t *Immutable) Get(key []byte) []byte {
if node := t.get(key); node != nil {
return node.value
}
return nil
}
// Put inserts the passed key/value pair.
func (t *Immutable) Put(key, value []byte) *Immutable {
// Use an empty byte slice for the value when none was provided. This
// ultimately allows key existence to be determined from the value since
// an empty byte slice is distinguishable from nil.
if value == nil {
value = emptySlice
}
// The node is the root of the tree if there isn't already one.
if t.root == nil {
root := newTreapNode(key, value, rand.Int())
return newImmutable(root, 1, nodeSize(root))
}
// Find the binary tree insertion point and construct a replaced list of
// parents while doing so. This is done because this is an immutable
// data structure so regardless of where in the treap the new key/value
// pair ends up, all ancestors up to and including the root need to be
// replaced.
//
// When the key matches an entry already in the treap, replace the node
// with a new one that has the new value set and return.
var parents parentStack
var compareResult int
for node := t.root; node != nil; {
// Clone the node and link its parent to it if needed.
nodeCopy := cloneTreapNode(node)
if oldParent := parents.At(0); oldParent != nil {
if oldParent.left == node {
oldParent.left = nodeCopy
} else {
oldParent.right = nodeCopy
}
}
parents.Push(nodeCopy)
// Traverse left or right depending on the result of comparing
// the keys.
compareResult = bytes.Compare(key, node.key)
if compareResult < 0 {
node = node.left
continue
}
if compareResult > 0 {
node = node.right
continue
}
// The key already exists, so update its value.
nodeCopy.value = value
// Return new immutable treap with the replaced node and
// ancestors up to and including the root of the tree.
newRoot := parents.At(parents.Len() - 1)
newTotalSize := t.totalSize - uint64(len(node.value)) +
uint64(len(value))
return newImmutable(newRoot, t.count, newTotalSize)
}
// Link the new node into the binary tree in the correct position.
2021-07-22 17:24:05 +02:00
node := nodePool.Get().(*treapNode)
node.key = key
node.value = value
node.priority = rand.Int()
database: Major redesign of database package. This commit contains a complete redesign and rewrite of the database package that approaches things in a vastly different manner than the previous version. This is the first part of several stages that will be needed to ultimately make use of this new package. Some of the reason for this were discussed in #255, however a quick summary is as follows: - The previous database could only contain blocks on the main chain and reorgs required deleting the blocks from the database. This made it impossible to store orphans and could make external RPC calls for information about blocks during the middle of a reorg fail. - The previous database interface forced a high level of bitcoin-specific intelligence such as spend tracking into each backend driver. - The aforementioned point led to making it difficult to implement new backend drivers due to the need to repeat a lot of non-trivial logic which is better handled at a higher layer, such as the blockchain package. - The old database stored all blocks in leveldb. This made it extremely inefficient to do things such as lookup headers and individual transactions since the entire block had to be loaded from leveldb (which entails it doing data copies) to get access. In order to address all of these concerns, and others not mentioned, the database interface has been redesigned as follows: - Two main categories of functionality are provided: block storage and metadata storage - All block storage and metadata storage are done via read-only and read-write MVCC transactions with both manual and managed modes - Support for multiple concurrent readers and a single writer - Readers use a snapshot and therefore are not blocked by the writer - Some key properties of the block storage and retrieval API: - It is generic and does NOT contain additional bitcoin logic such spend tracking and block linking - Provides access to the raw serialized bytes so deserialization is not forced for callers that don't need it - Support for fetching headers via independent functions which allows implementations to provide significant optimizations - Ability to efficiently retrieve arbitrary regions of blocks (transactions, scripts, etc) - A rich metadata storage API is provided: - Key/value with arbitrary data - Support for buckets and nested buckets - Bucket iteration through a couple of different mechanisms - Cursors for efficient and direct key seeking - Supports registration of backend database implementations - Comprehensive test coverage - Provides strong documentation with example usage This commit also contains an implementation of the previously discussed interface named ffldb (flat file plus leveldb metadata backend). Here is a quick overview: - Highly optimized for read performance with consistent write performance regardless of database size - All blocks are stored in flat files on the file system - Bulk block region fetching is optimized to perform linear reads which improves performance on spindle disks - Anti-corruption mechanisms: - Flat files contain full block checksums to quickly an easily detect database corruption without needing to do expensive merkle root calculations - Metadata checksums - Open reconciliation - Extensive test coverage: - Comprehensive blackbox interface testing - Whitebox testing which uses intimate knowledge to exercise uncommon failure paths such as deleting files out from under the database - Corruption tests (replacing random data in the files) In addition, this commit also contains a new tool under the new database directory named dbtool which provides a few basic commands for testing the database. It is designed around commands, so it could be useful to expand on in the future. Finally, this commit addresses the following issues: - Adds support for and therefore closes #255 - Fixes #199 - Fixes #201 - Implements and closes #256 - Obsoletes and closes #257 - Closes #247 once the required chain and btcd modifications are in place to make use of this new code
2016-02-03 18:42:04 +01:00
parent := parents.At(0)
if compareResult < 0 {
parent.left = node
} else {
parent.right = node
}
// Perform any rotations needed to maintain the min-heap and replace
// the ancestors up to and including the tree root.
newRoot := parents.At(parents.Len() - 1)
for parents.Len() > 0 {
// There is nothing left to do when the node's priority is
// greater than or equal to its parent's priority.
parent = parents.Pop()
if node.priority >= parent.priority {
break
}
// Perform a right rotation if the node is on the left side or
// a left rotation if the node is on the right side.
if parent.left == node {
node.right, parent.left = parent, node.right
} else {
node.left, parent.right = parent, node.left
}
// Either set the new root of the tree when there is no
// grandparent or relink the grandparent to the node based on
// which side the old parent the node is replacing was on.
grandparent := parents.At(0)
if grandparent == nil {
newRoot = node
} else if grandparent.left == parent {
grandparent.left = node
} else {
grandparent.right = node
}
}
return newImmutable(newRoot, t.count+1, t.totalSize+nodeSize(node))
}
// Delete removes the passed key from the treap and returns the resulting treap
// if it exists. The original immutable treap is returned if the key does not
// exist.
func (t *Immutable) Delete(key []byte) *Immutable {
// Find the node for the key while constructing a list of parents while
// doing so.
var parents parentStack
var delNode *treapNode
for node := t.root; node != nil; {
parents.Push(node)
// Traverse left or right depending on the result of the
// comparison.
compareResult := bytes.Compare(key, node.key)
if compareResult < 0 {
node = node.left
continue
}
if compareResult > 0 {
node = node.right
continue
}
// The key exists.
delNode = node
break
}
// There is nothing to do if the key does not exist.
if delNode == nil {
return t
}
// When the only node in the tree is the root node and it is the one
// being deleted, there is nothing else to do besides removing it.
parent := parents.At(1)
if parent == nil && delNode.left == nil && delNode.right == nil {
return newImmutable(nil, 0, 0)
}
// Construct a replaced list of parents and the node to delete itself.
// This is done because this is an immutable data structure and
// therefore all ancestors of the node that will be deleted, up to and
// including the root, need to be replaced.
var newParents parentStack
for i := parents.Len(); i > 0; i-- {
node := parents.At(i - 1)
nodeCopy := cloneTreapNode(node)
if oldParent := newParents.At(0); oldParent != nil {
if oldParent.left == node {
oldParent.left = nodeCopy
} else {
oldParent.right = nodeCopy
}
}
newParents.Push(nodeCopy)
}
delNode = newParents.Pop()
parent = newParents.At(0)
// Perform rotations to move the node to delete to a leaf position while
// maintaining the min-heap while replacing the modified children.
var child *treapNode
newRoot := newParents.At(newParents.Len() - 1)
for delNode.left != nil || delNode.right != nil {
// Choose the child with the higher priority.
var isLeft bool
if delNode.left == nil {
child = delNode.right
} else if delNode.right == nil {
child = delNode.left
isLeft = true
} else if delNode.left.priority >= delNode.right.priority {
child = delNode.left
isLeft = true
} else {
child = delNode.right
}
// Rotate left or right depending on which side the child node
// is on. This has the effect of moving the node to delete
// towards the bottom of the tree while maintaining the
// min-heap.
child = cloneTreapNode(child)
if isLeft {
child.right, delNode.left = delNode, child.right
} else {
child.left, delNode.right = delNode, child.left
}
// Either set the new root of the tree when there is no
// grandparent or relink the grandparent to the node based on
// which side the old parent the node is replacing was on.
//
// Since the node to be deleted was just moved down a level, the
// new grandparent is now the current parent and the new parent
// is the current child.
if parent == nil {
newRoot = child
} else if parent.left == delNode {
parent.left = child
} else {
parent.right = child
}
// The parent for the node to delete is now what was previously
// its child.
parent = child
}
// Delete the node, which is now a leaf node, by disconnecting it from
// its parent.
if parent.right == delNode {
parent.right = nil
} else {
parent.left = nil
}
return newImmutable(newRoot, t.count-1, t.totalSize-nodeSize(delNode))
}
// ForEach invokes the passed function with every key/value pair in the treap
// in ascending order.
func (t *Immutable) ForEach(fn func(k, v []byte) bool) {
// Add the root node and all children to the left of it to the list of
// nodes to traverse and loop until they, and all of their child nodes,
// have been traversed.
var parents parentStack
for node := t.root; node != nil; node = node.left {
parents.Push(node)
}
for parents.Len() > 0 {
node := parents.Pop()
if !fn(node.key, node.value) {
return
}
// Extend the nodes to traverse by all children to the left of
// the current node's right child.
for node := node.right; node != nil; node = node.left {
parents.Push(node)
}
}
}
// NewImmutable returns a new empty immutable treap ready for use. See the
// documentation for the Immutable structure for more details.
func NewImmutable() *Immutable {
return &Immutable{}
}
2021-07-22 17:24:05 +02:00
func (t *Immutable) Recycle() {
var parents parentStack
for node := t.root; node != nil; node = node.left {
parents.Push(node)
}
for parents.Len() > 0 {
node := parents.Pop()
// Extend the nodes to traverse by all children to the left of
// the current node's right child.
for n := node.right; n != nil; n = n.left {
parents.Push(n)
}
node.Reset()
nodePool.Put(node)
}
}