spec/index.md
Jeremy Kauffman b1fb5f357a url edits
2018-10-30 11:52:24 -04:00

46 KiB
Raw Blame History

%%% Title = "LBRY: A Decentralized Digital Content Marketplace" area = "Internet"

[seriesInfo] name = "Internet-Draft" value = "draft-grintsvayg-00" stream = "IETF" status = "informational"

date = 2018-08-21T00:00:00Z

author initials="A." surname="Grintsvayg" fullname="Alex Grintsvayg" %%%

LBRY: A Decentralized Digital Content Marketplace

A> Please excuse the unfinished state of this paper. It is being actively worked on. The content here is made available early because it contains useful information for developers.

A> For more technical information about LBRY, visit lbry.tech.

Introduction

LBRY is a protocol for accessing and publishing digital content in a global, decentralized marketplace. Clients can use LBRY to publish, host, find, download, and pay for content — books, movies, music, or anything else. Anyone can participate and no permission is required, nor can anyone be blocked from participating. The system is distributed, so no single entity has unilateral control, nor will the removal of any single entity prevent the system from functioning.

TODO:

  • why is it significant
  • whom does it help
  • why is it different/better than what existed before

Table of Contents

Overview

This document defines the LBRY protocol, its components, and how they fit together. At its core, LBRY consists of several discrete components that are used together in order to provide the end-to-end capabilities of the protocol. There are two distributed data stores (blockchain and DHT), a peer-to-peer protocol for exchanging data, and several specifications for data structure, transformation, and retrieval.

This document assumes that the reader is familiar with Bitcoin and blockchain technology. It does not attempt to document the Bitcoin protocol or explain how it works. The Bitcoin developer reference is recommended for anyone wishing to understand the technical details.

Conventions and Terminology

(Rather than this section, maybe we can use a syntax like brackets around keywords to inline key definitions?)

file
A single piece of content published using LBRY.
blob
The unit of data transmission on the data network. A published file is split into many blobs.
stream
A set of blobs that can be reassembled into a file. Every stream has a descriptor blob and one or more content blobs.
blob hash
The output of a cryptographic hash function is applied to a blob. Hashes are used to uniquely identify blobs and to verify that the contents of the blob are correct. Unless otherwise specified, LBRY uses SHA384 as the hash function.
metadata
Information about the contents of a stream (e.g. creator, description, stream descriptor hash, etc). Metadata is stored in the blockchain.
claim
A single metadata entry in the blockchain.
name
A human-readable UTF8 string that is associated with a published claim.
channel
The unit of pseudonymous publisher identity. Claims may be part of a channel.
URL
A reference to a claim that specifies how to retrieve it.

Blockchain

The LBRY blockchain is a public, proof-of-work blockchain. It serves three key purposes:

  1. An index of the content available on the network
  2. A payment system and record of purchases for priced content
  3. Trustful publisher identities

The LBRY blockchain is a fork of the Bitcoin blockchain, with substantial modifications. This document will not cover or specify any aspects of LBRY that are identical to Bitcoin, and will instead focus on the differences.

Claims

A claim is a single metadata entry in the blockchain. There are two types of claims:

stream
Declares the availability, access method, and publisher of a stream of bytes (typically a file).
channel
Creates a trustful pseudonym that can be used to identify the origin of stream claims.

Claim Properties

Claims have 4 properties:

claimId
A 20-byte hash unique among all claims. See [Claim Identifier Generation](#claim-identifier-generation).
name
A normalized UTF-8 string of up to 255 bytes used to address the claim. See [URLs](#urls) and [Normalization](#normalization).
amount
A quantity of tokens used to stake the claim. See [Controlling](#controlling).
value
Metadata about a stream or a channel. See [Metadata](#metadata).

Claim Example

Here is an example stream claim:

{
  "claimId": "fa3d002b67c4ff439463fcc0d4c80758e38a0aed",
  "name": "lbry",
  "amount": 100000000,
  "value": "{\"ver\": \"0.0.3\", \"description\": \"What is LBRY? An introduction with Alex Tabarrok\",
            \"license\": \"LBRY inc\", \"title\": \"What is LBRY?\", \"author\": \"Samuel Bryan\",
            \"language\": \"en\", \"sources\": {\"lbry_sd_hash\":
            \"e1e324bce7437540fac6707fa142cca44d76fc4e8e65060139a88ff7cdb218b4540cb9cff8bb3d5e06157ae6b08e5cb5\"},
            \"content_type\": \"video/mp4\", \"nsfw\": false, \"thumbnail\":
            \"https://s3.amazonaws.com/files.lbry.io/logo.png\"}",
  "txid": "53ed05d9dfd728a94bedf952d67783bbe9da5d2ab436a84338bb53f0b85301b5",
  "n": 0,
  "height": 146117
}

Claim Operations

There are three claim operations: create, update, and abandon.

create
Makes a new claim.
update
Changes the value or amount of an existing claim, without changing the claim ID.
abandon
Withdraws a claim, freeing the associated credits to be used for other purposes.

Supports

A support is an additional transaction type that lends its amount to an existing claim.

A support contains a claim ID, and amount, and nothing else. Supports function analogously to claims in terms of Claim Operations and Claim Statuses, with the exception that they cannot be updated.

Claimtrie

The claimtrie is the data structure used to store the set of all claims and prove the correctness of claim resolution.

The claimtrie is implemented as a Merkle tree that maps names to claims. Claims are stored as leaf nodes in the tree. Names are stored as the path from the root node to the leaf node.

The root hash is the hash of the root node. It is stored in the header of each block in the blockchain. Nodes in the LBRY network use the root hash to efficiently and securely validate the state of the claimtrie.

Multiple claims can exist for the same name. They are all stored in the leaf node for that name, sorted in decreasing order by the total amount of credits backing each claim.

For more details on the specific claimtrie implementation, see the source code.

Claim and Support Statuses

All claims and supports can have one or more the following statuses at a given block.

Throughout this section, whenever we write claim, we refer to both claims and supports.

Accepted

An accepted claim is one that has been been entered into the blockchain. This happens when the transaction containing it is included in a block.

Accepted claims do not appear in or affect the claimtrie state until they are Active.

The sum of the amount of a claim and all accepted supports is called the total amount.

Abandoned

An abandoned claim is one that was withdrawn by its creator or current owner. Spending a transaction that contains a claim will cause that claim to become abandoned.

Abandoned claims are no longer stored in the claimtrie.

While data related to abandoned claims technically still resides in the blockchain, it is improper to use this data to fetch the associated content, and active claims signed by abandoned identities will no longer be reported as valid.

Active

An active claim is an accepted and non-abandoned claim that has been in the blockchain for an algorithmically determined number of blocks. This length of time required is called the activation delay.

If the claim is an update to an already active claim, is the first claim for a name, or does not affect the sort order at the leaf for a name, the activation delay is 0 (i.e. the claim becomes active in the same block it is accepted).

Otherwise, the activation delay is determined by a formula covered in Claimtrie Transitions. The formula's variable inputs are the height of the current block, the height at which the claim was accepted, and the height at which the relevant claimtrie state for the name being considered last changed.

The sum of the amount of an active claim and all active supports is called it's effective amount. Only the effective amount affects the sort order of a claimtrie leaf.

Controlling

A controlling claim is the active claim that is first in the sort order at a leaf. That is, it has the highest total effective amount of all claims with the same name.

Only one claim can be controlling for a given name at a given block.

Claimtrie Transitions

To determine the sort order of a claimtrie leaf, the following algorithm is used:

  1. For each active claim for the name, add up the amount of the claim and the amount of all the active supports for that claim.

  2. If all of the claims for a name are in the same order (appending new claims allowed), then nothing is changing.

  3. Otherwise, a takeover is occurring. Set the takeover height for this name to the current height, recalculate which claims and supports are now active, and return to step 1.

  4. At this point, the claim with the greatest total is the controlling claim at this block.

The purpose of 3 is to handle the case when multiple competing claims are made on the same name in different blocks, and one of those claims becomes active but another still-inactive claim has the greatest amount. Step 3 will cause the greater claim to also activate and become the controlling claim.

Determining Active Claims

If a claim does not become active immediately, it becomes active at the block heigh determined by the following formula:

C + min(4032, floor((H-T) / 32))

Where:

  • C = claim height (height when the claim was accepted)
  • H = current height
  • T = takeover height (the most recent height at which the relevant claimtrie state for the name changed)

In written form, the delay before a claim becomes active is equal to the claims height minus height of the last takeover, divided by 32. The delay is capped at 4032 blocks, which is 7 days of blocks at 2.5 minutes per block (our target block time). The max delay is reached 224 (7x32) days after the last takeover.

The purpose of this delay function is to give long-standing claimants time to respond to changes, while still keeping takeover times reasonable and allowing recent or contentious claims to change state quickly.

Claim Transition Example

Here is a step-by-step example to illustrate the different scenarios. All claims are for the same name.

Block 13: Claim A for 10LBC is accepted. It is the first claim, so it immediately becomes active and controlling.
State: A(10) is controlling

Block 1001: Claim B for 20LBC is accepted. Its activation height is 1001 + min(4032, floor((1001-13) / 32)) = 1001 + 30 = 1031.
State: A(10) is controlling, B(20) is accepted.

Block 1010: Support X for 14LBC for claim A is accepted. Since it is a support for the controlling claim, it activates immediately.
State: A(10+14) is controlling, B(20) is accepted.

Block 1020: Claim C for 50LBC is accepted. The activation height is 1020 + min(4032, floor((1020-13) / 32)) = 1020 + 31 = 1051.
State: A(10+14) is controlling, B(20) is accepted, C(50) is accepted.

Block 1031: Claim B activates. It has 20LBC, while claim A has 24LBC (10 original + 14 from support X). There is no takeover, and claim A remains controlling.
State: A(10+14) is controlling, B(20) is active, C(50) is accepted.

Block 1040: Claim D for 300LBC is accepted. The activation height is 1040 + min(4032, floor((1040-13) / 32)) = 1040 + 32 = 1072.
State: A(10+14) is controlling, B(20) is active, C(50) is accepted, D(300) is accepted.

Block 1051: Claim C activates. It has 50LBC, while claim A has 24LBC, so a takeover is initiated. The takeover height for this name is set to 1051, and therefore the activation delay for all the claims becomes min(4032, floor((1051-1051) / 32)) = 0. All the claims become active. The totals for each claim are recalculated, and claim D becomes controlling because it has the highest total.
State: A(10+14) is active, B(20) is active, C(50) is active, D(300) is controlling.

Normalization

Names in the claimtrie are normalized to avoid confusion due to Unicode equivalence or casing. All names are converted using Unicode Normalization Form D (NFD), then lowercased using the en_US locale when possible.

URLs

A URL is a name with one or more modifiers. A bare name on its own will resolve to the controlling claim at the latest block height. Common URL structures are:

Stream Claim Name: a basic claim for a name

lbry://meet-lbry

Channel Claim Name: a claim for a channel

lbry://@lbry

Channel Claim Name and Stream Claim Name: URLS with a channel and a stream claim name are resolved in two steps. First the channel is resolved to get the appropriate claim for that channel. Then the stream claim name is resolved to get the appropriate claim from among the claims in the channel.

lbry://@lbry/meet-lbry

Claim ID: a claim for this name with this claim ID (does not have to be the controlling claim). Partial prefix matches are allowed (see Resolution).

lbry://meet-lbry#7a0aa95c5023c21c098
lbry://meet-lbry#7a
lbry://@lbry#3f/meet-lbry

Claim Sequence: the Nth claim for this name, in the order the claims entered the blockchain. N must be a positive number. This can be used to determine which claim came first, rather than which claim has the most support.

lbry://meet-lbry:1
lbry://@lbry:1/meet-lbry

Bid Position: the Nth claim for this name, in order of most support to least support. N must be a positive number. This is useful for resolving non-winning bids in bid order, e.g. if you want to list the top three winning claims in a voting contest or want to ignore the activation delay.

lbry://meet-lbry$2
lbry://meet-lbry$3
lbry://@lbry$2/meet-lbry

Query Params: extra parameters, reserved for future use

lbry://meet-lbry?arg=value+arg2=value2

Grammar

The full URL grammar is defined using Xquery EBNF notation:

URL ::= Scheme Path Query?

Scheme ::= 'lbry://'

Path ::=  StreamClaimNameAndModifier | ChannelClaimNameAndModifier ( '/' StreamClaimNameAndModifier )?

StreamClaimNameAndModifier ::= StreamClaimName Modifier?
ChannelClaimNameAndModifier ::= ChannelClaimName Modifier?

StreamClaimName ::= NameChar+
ChannelClaimName ::= '@' NameChar+

Modifier ::= ClaimID | ClaimSequence | BidPosition
ClaimID ::= '#' Hex+
ClaimSequence ::= ':' PositiveNumber
BidPosition ::= '$' PositiveNumber

Query ::= '?' QueryParameterList
QueryParameterList ::= QueryParameter ( '&' QueryParameterList )*
QueryParameter ::= QueryParameterName ( '=' QueryParameterValue )?
QueryParameterName ::= NameChar+
QueryParameterValue ::= NameChar+

PositiveDigit ::= [123456789]
Digit ::= '0' | PositiveDigit
PositiveNumber ::= PositiveDigit Digit*

HexAlpha ::= [abcdef]
Hex ::= (Digit | HexAlpha)+

NameChar ::= Char - [=&#:$@?/]  /* any character that is not reserved */
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

Resolution

URL resolution is the process of translating a URL into it's associated claim id and metadata.

No Modifier

Return the controlling claim for the name. Stream claims and channel claims are resolved the same way.

ClaimID

Get all claims for the claim name whose IDs start with the given ClaimID. Sort the claims in ascending order by block height and position within the block. Return the first claim.

ClaimSequence

Get all claims for the claim name. Sort the claims in ascending order by block height and position within the block. Return the Nth claim, where N is the given ClaimSequence value.

BidPosition

Get all claims for the claim name. Sort the claims in descending order by total effective amount. Return the Nth claim, where N is the given BidSequence value.

ChannelName and ClaimName

Content on the LBRY network is encoded to facilitate distribution.

Blobs

The unit of data in the LBRY network is called a blob. A blob is an encrypted chunk of data up to 2MiB in size. Each blob is indexed by its blob hash, which is a SHA384 hash of the blob contents. Addressing blobs by their hash protects against naming collisions and ensures that the content you get is what you expect.

Blobs are encrypted using AES-256 in CBC mode and PKCS7 padding. In order to keep each encrypted blob at 2MiB max, a blob can hold at most 2097151 bytes (2MiB minus 1 byte) of plaintext data. The source code for the exact algorithm is available here. The encryption key and IV for each blob is stored as described below.

Streams

Multiple blobs are combined into a stream. A stream may be a book, a movie, a CAD file, etc. All content on the network is shared as streams. Every stream begins with the manifest blob, followed by one or more content blobs. The content blobs hold the actual content of the stream. The manifest blob contains information necessary to find the content blobs and decode them into a file. This includes the hashes of the content blobs, their order in the stream, and cryptographic material for decrypting them.

The blob hash of the manifest blob is called the stream hash. It uniquely identifies each stream.

Manifest Contents

A manifest blob's contents are encoded using a canonical JSON encoding. The JSON encoding must be canonical to support consistent hashing and validation. The encoding is the same as standard JSON, but adds the following rules:

  • Object keys must be quoted and lexicographically sorted.
  • All strings are hex-encoded. Hex letters must be lowercase.
  • Whitespace before, after, or between tokens is not permitted.
  • Floating point numbers, leading zeros, and "minus 0" for integers are not permitted.
  • Trailing commas after the last item in an array or object are not permitted.

Here's an example manifest:

{"blobs":[{"blob_hash":"a6daea71be2bb89fab29a2a10face08143411a5245edcaa5efff48c2e459e7ec01ad20edfde6da43a932aca45b2cec61","iv":"ef6caef207a207ca5b14c0282d25ce21","length":2097152},{"blob_hash":"bf2717e2c445052366d35bcd58edb108cbe947af122d8f76b4856db577aeeaa2def5b57dbb80f7b1531296bd3e0256fc","iv":"a37b291a37337fc1ff90ae655c244c1d","length":2097152},...,{"blob_hash":"322973617221ddfec6e53bff4b74b9c21c968cd32ba5a5094d84210e660c4b2ed0882b114a2392a08b06183f19330aaf","iv": "a00f5f458695bdc9d50d3dbbc7905abc","length":600160}],"filename":"6b706a7977755477704d632e6d7034","key":"94d89c0493c576057ac5f32eb0871180","version":1}

Here's the same manifest, with whitespace added for readability:

{
  "blobs":[
    {
      "blob_hash":"a6daea71be2bb89fab29a2a10face08143411a5245edcaa5efff48c2e459e7ec01ad20edfde6da43a932aca45b2cec61",
      "iv":"ef6caef207a207ca5b14c0282d25ce21",
      "length":2097152
    },
    {
      "blob_hash":"bf2717e2c445052366d35bcd58edb108cbe947af122d8f76b4856db577aeeaa2def5b57dbb80f7b1531296bd3e0256fc",
      "iv":"a37b291a37337fc1ff90ae655c244c1d",
      "length":2097152
    },
    ...,
    {
      "blob_hash":"322973617221ddfec6e53bff4b74b9c21c968cd32ba5a5094d84210e660c4b2ed0882b114a2392a08b06183f19330aaf",
      "iv": "a00f5f458695bdc9d50d3dbbc7905abc",
      "length": 600160
    }  
  ],
  "filename":"6b706a7977755477704d632e6d7034",
  "key":"94d89c0493c576057ac5f32eb0871180",
  "version":1
}

The key field contains the key to decrypt the stream, and is optional. The key may be stored by a third party and made available to a client when presented with proof that the content was purchased. The version field is always 1. It is intended to signal structure changes in future versions of this protocol. The length field for each blob is the length of the encrypted blob, not the original file chunk.

Every stream must have at least two blobs - the manifest blob and a content blob. Consequently, zero-length streams are not allowed.

Stream Encoding

A file must be encoded into a stream before it can be published. Encoding involves breaking the file into chunks, encrypting the chunks into content blobs, and creating the manifest blob. Here are the steps:

Setup
  1. Generate a random 32-byte key for the stream.
Content Blobs
  1. Break the file into chunks of at most 2097151 bytes.
  2. Generate a random IV for each chuck.
  3. Pad each chunk using PKCS7 padding
  4. Encrypt each chunk with AES-CBC using the stream key and the IV for that chunk.
  5. An encrypted chunk is a blob.
Manifest Blob
  1. Fill in the manifest data.
  2. Encode the data using the canonical JSON encoding described above.
  3. Compute the stream hash.

An implementation of this process is available here. fixme: this link is for v0, not v1. need to implement v1 or drop the link.

Stream Decoding

Decoding a stream is like encoding in reverse, and with the added step of verifying that the expected blob hashes match the actual data.

  1. Verify that the manifest blob hash matches the stream hash you expect.
  2. Parse the manifest blob contents.
  3. Verify the hashes of the content blobs.
  4. Decrypt and remove the padding from each content blob using the key and IVs in the manifest.
  5. Concatenate the decrypted chunks in order.

Announce

After a stream is encoded, it must be announced to the network. Announcing is the process of letting other nodes on the network know that you have content available for download. The LBRY network tracks announced content using a distributed hash table.

Distributed Hash Table

Distributed hash tables (or DHTs) have proven to be an effective way to build a decentralized content network. Our DHT implementation follows the Kademlia specification fairly closely, with some modifications.

A distributed hash table is a key-value store that is spread over multiple nodes in a network. Nodes may join or leave the network anytime, with no central coordination necessary. Nodes communicate with each other using a peer-to-peer protocol to advertise what data they have and what they are best positioned to store.

When a host connects to the DHT, it announces the hash for every blob it wishes to share. Downloading a blob from the network requires querying the DHT for a list of hosts that announced that blobs hash (called peers), then requesting the blob from the peers directly.

Announcing to the DHT

A host announces a hash to the DHT in two steps. First, the host looks for nodes that are closest to the target hash. Then the host asks those nodes to store the fact that the host has the target hash available for download.

Finding the closest nodes is done via iterative FindNode DHT requests. The host starts with the closest nodes it knows about and sends a FindNode(target_hash) request to each of them. If any of the requests return nodes that are closer to the target hash, the host sends FindNode requests to those nodes to try to get even closer. When the FindNode requests no longer return nodes that are closer, the search ends.

Once the search is over, the host takes the 8 closest nodes it found and sends a Store(target_hash) request to them. The nodes receiving this request store the fact that the host is a peer for the target hash.

Download

A client wishing to download a stream must first query the DHT to find peers hosting the blobs in that stream, then contact those peers to download the blobs directly.

Querying the DHT

Querying works almost the same way as announcing. A client looking for a target hash will start by sending iterative FindValue(target_hash) requests to the nodes it knows that are closest to the target hash. If a node receives a FindValue request and knows of any peers for the target hash, it will respond with a list of those peers. Otherwise, it will respond with the closest nodes to the target hash that it knows about. The client then queries those closer nodes using the same FindValue call. This way, each call either finds the client some peers, or brings it closer to finding those peers. If no peers are found and no closer nodes are being returned, the client will determine that the target hash is not available and will give up.

Blob Exchange Protocol

Downloading a blob from a peer is governed by the Blob Exchange Protocol. It is used by hosts and clients to check blob availability, exchange blobs, and negotiate data prices. The protocol is an RPC protocol using Protocol Buffers and the gRPC framework. It has five types of requests.

fixme: protocol does not negotiate anything right now. It just checks the price. Should we include negotiation in v1?

PriceCheck

PriceCheck gets the price that the server is charging for data transfer. It returns the prices in deweys per KB.

DownloadCheck

DownloadCheck checks whether the server has certain blobs available for download. For each hash in the request, the server returns a true or false to indicate whether the blob is available.

Download

Download requests the blob for a given hash. The response contains the blob, its hash, and the address where to send payment for the data transfer. If the blob is not available on the server, the response will instead contain an error.

UploadCheck

UploadCheck asks the server whether blobs can be uploaded to it. For each hash in the request, the server returns a true or false to indicate whether it would accept a given blob for upload. In addition, if any of the hashes in the request is a stream hash and the server has the manifest blob for that stream but is missing some content blobs, it may include the hashes of those content blobs in the response.

Upload

Upload sends a blob to the server. If uploading many blobs, the client should use the UploadCheck request to check which blobs the server actually needs. This avoids needlessly uploading blobs that the server already has. If a client tries to upload too many blobs that the server does not want, this may be consider a denial of service attack.

The protocol calls and message types are defined in detail here.

Reflectors and Data Markets

In order for a client to download content, there must be hosts online that have the content the client wants, when the client wants it. To incentivize the continued hosting of data, the blob exchange protocol supports data upload and payment for data. Reflectors are hosts that accept data uploads. They rehost (reflect) the uploaded data and charge for downloads. Using a reflector is optional, but most publishers will probably choose to use them. Doing so obviates the need for the publisher's server to be online and connectable, which can be especially useful for mobile clients or those behind a firewall.

The current version of the protocol does not support sophisticated price negotiation between clients and hosts. The host simply chooses the price it will charge. Clients check this price before downloading, and pay the price after the download is complete. Future protocol versions will include more options for price negotiation, as well as stronger proofs of payment.


Edit this on Github: https://github.com/lbryio/spec