Table of Contents
Some links I found useful while doing DHT work
http://xlattice.sourceforge.net/components/protocol/kademlia/specs.html
http://www.maymounkov.org/papers/maymounkov-kademlia-lncs.pdf (https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf alternative as it was offline)
http://blog.notdot.net/2009/11/Implementing-a-DHT-in-Go-part-2
http://www.scs.stanford.edu/17au-cs244b/labs/projects/kaplan-nelson_ma_rachleff.pdf
http://www.bittorrent.org/beps/bep_0005.html
blog.libtorrent.org/2016/09/dht-bootstrap-node
https://github.com/lbryio/reflector-cluster/issues/41
torrent specific: https://wiki.theory.org/BitTorrentDraftDHTProtocol and torrent general: https://wiki.theory.org/BitTorrentSpecification
Iterative find algorithm
This is the algorithm used for traversing the DHT network looking for a key. There are two kinds of lookup:
- value: when looking for values associated with a key. RPC method is
findValue
, returns K closest nodes and the values associated. Supports pagination - node: when looking for the K closest nodes to a key. RPC method is
findNode
The same algorithm is used for iterating over the network, changing just which RPC method to call and how to handle results.
The shortlist
We start with a short list of K closest nodes in our own routing table. If the routing table is not yet populated, then we use a shortlist acquired manually (via bootstrap nodes, which are nodes that help us join the network, or local database nodes from a previous run). The main difference between shortlist nodes and normal nodes is that they can be in a questionable state by having:
- unknown node id
- happens with peers where the endpoint is known but were never contacted, like the ones used for bootstrap
- they are not added to the list of known peers, so they do not count as an answer to "find nodes" (this list is sorted on node id)
- skip straight to querying them directly
- unknown or bad metrics
- those come from the local routing table and can be stale, so the status being good or bad is meaningless. When we query them, their status is updated and bad ones are removed (this is important for maintenance of the routing table)
General overview
Concepts
K: constant in Kademlia. LBRY uses 8
Distance to the key: bitwise XOR between key and a node id
Active set: peers we discovered, ordered by distance to the key
Contacted set: contacted peers that replied to queries
Probe: the act of requesting a peer for values under a key or K closest nodes
Running probes: mapping of peers to ongoing queries
Concurrency limit: the limit of concurrent probes (alpha
in Kademlia)
Algorithm
The shortlist will add some peers to the active set or directly to running probes. On startup and after each probe finishes, a search round loop is executed.
The search round code iterates over the set of active nodes, which is ordered by distance. Then schedules new probes to until the concurrency limit is reached or it is too far (today we estimate if it is too far by checking if the index of the node being checked is higher than K + running probes size
)
If the search round does not add new probes and there aren't probes running, the search is done. It means we did not discover new nodes that are closer than the nodes we already know.
Trying to simplify, it looks like:
- for each active peer that is not contacted in the active set, ordered by distance
- probe, add results to active set and the peer being probed to contacted set
- stop when the K closest on the active set are in the contacted set
- (skip bad ones, control concurrency, etc)
Announcing
The act of storing new values in a distributed hash table. It starts by finding the K closest nodes to the key, then calling store
on each one of them. This is repeated on a time interval to ensure the value is there as nodes go offline.
The "value" here is just the contact information (node id, IP address and ports) so you one can use another protocol to properly request the full value (file or any kind of data). Note that other implementations supports arbitrary payloads directly in the DHT (torrent with BEP-44), but this is not implemented here.