lbcd/database2/doc.go

95 lines
4 KiB
Go
Raw Normal View History

database: Major redesign of database package. This commit contains a complete redesign and rewrite of the database package that approaches things in a vastly different manner than the previous version. This is the first part of several stages that will be needed to ultimately make use of this new package. Some of the reason for this were discussed in #255, however a quick summary is as follows: - The previous database could only contain blocks on the main chain and reorgs required deleting the blocks from the database. This made it impossible to store orphans and could make external RPC calls for information about blocks during the middle of a reorg fail. - The previous database interface forced a high level of bitcoin-specific intelligence such as spend tracking into each backend driver. - The aforementioned point led to making it difficult to implement new backend drivers due to the need to repeat a lot of non-trivial logic which is better handled at a higher layer, such as the blockchain package. - The old database stored all blocks in leveldb. This made it extremely inefficient to do things such as lookup headers and individual transactions since the entire block had to be loaded from leveldb (which entails it doing data copies) to get access. In order to address all of these concerns, and others not mentioned, the database interface has been redesigned as follows: - Two main categories of functionality are provided: block storage and metadata storage - All block storage and metadata storage are done via read-only and read-write MVCC transactions with both manual and managed modes - Support for multiple concurrent readers and a single writer - Readers use a snapshot and therefore are not blocked by the writer - Some key properties of the block storage and retrieval API: - It is generic and does NOT contain additional bitcoin logic such spend tracking and block linking - Provides access to the raw serialized bytes so deserialization is not forced for callers that don't need it - Support for fetching headers via independent functions which allows implementations to provide significant optimizations - Ability to efficiently retrieve arbitrary regions of blocks (transactions, scripts, etc) - A rich metadata storage API is provided: - Key/value with arbitrary data - Support for buckets and nested buckets - Bucket iteration through a couple of different mechanisms - Cursors for efficient and direct key seeking - Supports registration of backend database implementations - Comprehensive test coverage - Provides strong documentation with example usage This commit also contains an implementation of the previously discussed interface named ffldb (flat file plus leveldb metadata backend). Here is a quick overview: - Highly optimized for read performance with consistent write performance regardless of database size - All blocks are stored in flat files on the file system - Bulk block region fetching is optimized to perform linear reads which improves performance on spindle disks - Anti-corruption mechanisms: - Flat files contain full block checksums to quickly an easily detect database corruption without needing to do expensive merkle root calculations - Metadata checksums - Open reconciliation - Extensive test coverage: - Comprehensive blackbox interface testing - Whitebox testing which uses intimate knowledge to exercise uncommon failure paths such as deleting files out from under the database - Corruption tests (replacing random data in the files) In addition, this commit also contains a new tool under the new database directory named dbtool which provides a few basic commands for testing the database. It is designed around commands, so it could be useful to expand on in the future. Finally, this commit addresses the following issues: - Adds support for and therefore closes #255 - Fixes #199 - Fixes #201 - Implements and closes #256 - Obsoletes and closes #257 - Closes #247 once the required chain and btcd modifications are in place to make use of this new code
2016-02-03 18:42:04 +01:00
// Copyright (c) 2015-2016 The btcsuite developers
// Use of this source code is governed by an ISC
// license that can be found in the LICENSE file.
/*
Package database2 provides a block and metadata storage database.
Overview
As of July 2015, there are over 365,000 blocks in the Bitcoin block chain and
and over 76 million transactions (which turns out to be over 35GB of data).
This package provides a database layer to store and retrieve this data in a
simple and efficient manner.
The default backend, ffldb, has a strong focus on speed, efficiency, and
robustness. It makes use leveldb for the metadata, flat files for block
storage, and strict checksums in key areas to ensure data integrity.
A quick overview of the features database provides are as follows:
- Key/value metadata store
- Bitcoin block storage
- Efficient retrieval of block headers and regions (transactions, scripts, etc)
- Read-only and read-write transactions with both manual and managed modes
- Nested buckets
- Supports registration of backend databases
- Comprehensive test coverage
Database
The main entry point is the DB interface. It exposes functionality for
transactional-based access and storage of metadata and block data. It is
obtained via the Create and Open functions which take a database type string
that identifies the specific database driver (backend) to use as well as
arguments specific to the specified driver.
Namespaces
The Namespace interface is an abstraction that provides facilities for obtaining
transactions (the Tx interface) that are the basis of all database reads and
writes. Unlike some database interfaces that support reading and writing
without transactions, this interface requires transactions even when only
reading or writing a single key.
The Begin function provides an unmanaged transaction while the View and Update
functions provide a managed transaction. These are described in more detail
below.
Transactions
The Tx interface provides facilities for rolling back or committing changes that
database: Major redesign of database package. This commit contains a complete redesign and rewrite of the database package that approaches things in a vastly different manner than the previous version. This is the first part of several stages that will be needed to ultimately make use of this new package. Some of the reason for this were discussed in #255, however a quick summary is as follows: - The previous database could only contain blocks on the main chain and reorgs required deleting the blocks from the database. This made it impossible to store orphans and could make external RPC calls for information about blocks during the middle of a reorg fail. - The previous database interface forced a high level of bitcoin-specific intelligence such as spend tracking into each backend driver. - The aforementioned point led to making it difficult to implement new backend drivers due to the need to repeat a lot of non-trivial logic which is better handled at a higher layer, such as the blockchain package. - The old database stored all blocks in leveldb. This made it extremely inefficient to do things such as lookup headers and individual transactions since the entire block had to be loaded from leveldb (which entails it doing data copies) to get access. In order to address all of these concerns, and others not mentioned, the database interface has been redesigned as follows: - Two main categories of functionality are provided: block storage and metadata storage - All block storage and metadata storage are done via read-only and read-write MVCC transactions with both manual and managed modes - Support for multiple concurrent readers and a single writer - Readers use a snapshot and therefore are not blocked by the writer - Some key properties of the block storage and retrieval API: - It is generic and does NOT contain additional bitcoin logic such spend tracking and block linking - Provides access to the raw serialized bytes so deserialization is not forced for callers that don't need it - Support for fetching headers via independent functions which allows implementations to provide significant optimizations - Ability to efficiently retrieve arbitrary regions of blocks (transactions, scripts, etc) - A rich metadata storage API is provided: - Key/value with arbitrary data - Support for buckets and nested buckets - Bucket iteration through a couple of different mechanisms - Cursors for efficient and direct key seeking - Supports registration of backend database implementations - Comprehensive test coverage - Provides strong documentation with example usage This commit also contains an implementation of the previously discussed interface named ffldb (flat file plus leveldb metadata backend). Here is a quick overview: - Highly optimized for read performance with consistent write performance regardless of database size - All blocks are stored in flat files on the file system - Bulk block region fetching is optimized to perform linear reads which improves performance on spindle disks - Anti-corruption mechanisms: - Flat files contain full block checksums to quickly an easily detect database corruption without needing to do expensive merkle root calculations - Metadata checksums - Open reconciliation - Extensive test coverage: - Comprehensive blackbox interface testing - Whitebox testing which uses intimate knowledge to exercise uncommon failure paths such as deleting files out from under the database - Corruption tests (replacing random data in the files) In addition, this commit also contains a new tool under the new database directory named dbtool which provides a few basic commands for testing the database. It is designed around commands, so it could be useful to expand on in the future. Finally, this commit addresses the following issues: - Adds support for and therefore closes #255 - Fixes #199 - Fixes #201 - Implements and closes #256 - Obsoletes and closes #257 - Closes #247 once the required chain and btcd modifications are in place to make use of this new code
2016-02-03 18:42:04 +01:00
took place while the transaction was active. It also provides the root metadata
bucket under which all keys, values, and nested buckets are stored. A
transaction can either be read-only or read-write and managed or unmanaged.
Managed versus Unmanaged Transactions
A managed transaction is one where the caller provides a function to execute
within the context of the transaction and the commit or rollback is handled
automatically depending on whether or not the provided function returns an
error. Attempting to manually call Rollback or Commit on the managed
transaction will result in a panic.
An unmanaged transaction, on the other hand, requires the caller to manually
call Commit or Rollback when they are finished with it. Leaving transactions
open for long periods of time can have several adverse effects, so it is
recommended that managed transactions are used instead.
Buckets
The Bucket interface provides the ability to manipulate key/value pairs and
nested buckets as well as iterate through them.
The Get, Put, and Delete functions work with key/value pairs, while the Bucket,
CreateBucket, CreateBucketIfNotExists, and DeleteBucket functions work with
buckets. The ForEach function allows the caller to provide a function to be
called with each key/value pair and nested bucket in the current bucket.
Metadata Bucket
As discussed above, all of the functions which are used to manipulate key/value
pairs and nested buckets exist on the Bucket interface. The root metadata
bucket is the upper-most bucket in which data is stored and is created at the
same time as the database. Use the Metadata function on the Tx interface
to retrieve it.
Nested Buckets
The CreateBucket and CreateBucketIfNotExists functions on the Bucket interface
provide the ability to create an arbitrary number of nested buckets. It is
a good idea to avoid a lot of buckets with little data in them as it could lead
to poor page utilization depending on the specific driver in use.
*/
package database2