196962ff0 Add AcceleratedCRC32C to port_win.h 1bdf1c34c Merge upstream LevelDB v1.20 d31721eb0 Merge #17: Fixed file sharing errors fecd44902 Fixed file sharing error in Win32Env::GetFileSize(), Win32SequentialFile::_Init(), Win32RandomAccessFile::_Init() Fixed error checking in Win32SequentialFile::_Init() 5b7510f1b Merge #14: Merge upstream LevelDB 1.19 0d969fd57 Merge #16: [LevelDB] Do no crash if filesystem can't fsync c8c029b5b [LevelDB] Do no crash if filesystem can't fsync a53934a3a Increase leveldb version to 1.20. f3f139737 Separate Env tests from PosixEnv tests. eb4f0972f leveldb: Fix compilation warnings in port_posix_sse.cc on x86 (32-bit). d0883b600 Fixed path to doc file: index.md. 7fa20948d Convert documentation to markdown. ea175e28f Implement support for Intel crc32 instruction (SSE 4.2) 95cd743e5 Including <limits> for std::numeric_limits. 646c3588d Limit the number of read-only files the POSIX Env will have open. d40bc3fa5 Merge #13: Typo ebbd772d3 Typo a2fb086d0 Add option for max file size. The currend hard-coded value of 2M is inefficient in colossus. git-subtree-dir: src/leveldb git-subtree-split: 196962ff01c39b4705d8117df5c3f8c205349950
3.7 KiB
leveldb File format
<beginning_of_file>
[data block 1]
[data block 2]
...
[data block N]
[meta block 1]
...
[meta block K]
[metaindex block]
[index block]
[Footer] (fixed size; starts at file_size - sizeof(Footer))
<end_of_file>
The file contains internal pointers. Each such pointer is called a BlockHandle and contains the following information:
offset: varint64
size: varint64
See varints for an explanation of varint64 format.
-
The sequence of key/value pairs in the file are stored in sorted order and partitioned into a sequence of data blocks. These blocks come one after another at the beginning of the file. Each data block is formatted according to the code in
block_builder.cc
, and then optionally compressed. -
After the data blocks we store a bunch of meta blocks. The supported meta block types are described below. More meta block types may be added in the future. Each meta block is again formatted using
block_builder.cc
and then optionally compressed. -
A "metaindex" block. It contains one entry for every other meta block where the key is the name of the meta block and the value is a BlockHandle pointing to that meta block.
-
An "index" block. This block contains one entry per data block, where the key is a string >= last key in that data block and before the first key in the successive data block. The value is the BlockHandle for the data block.
-
At the very end of the file is a fixed length footer that contains the BlockHandle of the metaindex and index blocks as well as a magic number.
metaindex_handle: char[p]; // Block handle for metaindex index_handle: char[q]; // Block handle for index padding: char[40-p-q];// zeroed bytes to make fixed length // (40==2*BlockHandle::kMaxEncodedLength) magic: fixed64; // == 0xdb4775248b80fb57 (little-endian)
"filter" Meta Block
If a FilterPolicy
was specified when the database was opened, a
filter block is stored in each table. The "metaindex" block contains
an entry that maps from filter.<N>
to the BlockHandle for the filter
block where <N>
is the string returned by the filter policy's
Name()
method.
The filter block stores a sequence of filters, where filter i contains
the output of FilterPolicy::CreateFilter()
on all keys that are stored
in a block whose file offset falls within the range
[ i*base ... (i+1)*base-1 ]
Currently, "base" is 2KB. So for example, if blocks X and Y start in
the range [ 0KB .. 2KB-1 ]
, all of the keys in X and Y will be
converted to a filter by calling FilterPolicy::CreateFilter()
, and the
resulting filter will be stored as the first filter in the filter
block.
The filter block is formatted as follows:
[filter 0]
[filter 1]
[filter 2]
...
[filter N-1]
[offset of filter 0] : 4 bytes
[offset of filter 1] : 4 bytes
[offset of filter 2] : 4 bytes
...
[offset of filter N-1] : 4 bytes
[offset of beginning of offset array] : 4 bytes
lg(base) : 1 byte
The offset array at the end of the filter block allows efficient mapping from a data block offset to the corresponding filter.
"stats" Meta Block
This meta block contains a bunch of stats. The key is the name of the statistic. The value contains the statistic.
TODO(postrelease): record following stats.
data size
index size
key size (uncompressed)
value size (uncompressed)
number of entries
number of data blocks