Updated DHT Resources (markdown)

2022-07-05 11:38:02 -03:00 · 2022-07-05 11:38:02 -03:00 · 82b4187610
commit 82b4187610
parent b237fc1092
1 changed files with 36 additions and 31 deletions
--- a/DHT-Resources.md
+++ b/DHT-Resources.md
@ -16,40 +16,45 @@ Some links I found useful while doing DHT work

 torrent specific: [https://wiki.theory.org/BitTorrentDraftDHTProtocol](https://wiki.theory.org/BitTorrentDraftDHTProtocol) and torrent general: [https://wiki.theory.org/BitTorrentSpecification](https://wiki.theory.org/BitTorrentSpecification)

-## Naive IterativeFind description
+### Iterative find algorithm
+This is the algorithm used for traversing the DHT network looking for a key.
+There are two kinds of lookup:
+- value: when looking for values associated with a key. RPC method is `findValue`, returns K closest nodes and the values associated. Supports pagination
+- node: when looking for the K closest nodes to a key. RPC method is `findNode`

-_grin made this up_
+The same algorithm is used for iterating over the network, changing just which RPC method to call and how to handle results.

-naive iterativeFind description (bounded parallelism, not loose parallelism):
+#### The shortlist
+We start with a short list of K closest nodes in our own routing table. If the routing table is not yet populated, then we use a shortlist acquired manually (via bootstrap nodes, which are nodes that help us join the network, or local database nodes from a previous run).
+The main difference between shortlist nodes and normal nodes is that they can be in a questionable state by having:
+- unknown node id
+    - happens with peers where the endpoint is known but were never contacted, like the ones used for bootstrap
+    - they are not added to the list of known peers, so they do not count as an answer to "find nodes" (this list is sorted on node id)
+    - skip straight to querying them directly
+- unknown or bad metrics
+    - those come from the local routing table and can be stale, so the status being good or bad is meaningless. When we query them, their status is updated and bad ones are removed (this is important for maintenance of the routing table)

-`target` = hash that you're looking for
+#### General overview
+##### Concepts
+K: constant in Kademlia. LBRY uses 8
+Distance to the key: bitwise XOR between key and a node id
+Active set: peers we discovered, ordered by distance to the key
+Contacted set: contacted peers that replied to queries
+Probe: the act of requesting a peer for values under a key or K closest nodes
+Running probes: mapping of peers to ongoing queries
+Concurrency limit: the limit of concurrent probes (`alpha` in Kademlia)

-`activeNodes = []`, always sorted by distance to target
+##### Algorithm
+The shortlist will add some peers to the active set or directly to running probes. On startup and after each probe finishes, a search round loop is executed.
+The search round code iterates over the set of active nodes, which is ordered by distance. Then schedules new probes to until the concurrency limit is reached or it is too far (today we estimate if it is too far by checking if the index of the node being checked is higher than `K + running probes size`)
+If the search round does not add new probes and there aren't probes running, the search is done. It means we did not discover new nodes that are closer than the nodes we already know.

-`toContact = []`, always sorted by distance to target
+Trying to simplify, it looks like:
+- for each active peer that is not contacted in the active set, ordered by distance
+    - probe, add results to active set and the peer being probed to contacted set
+    - stop when the K closest on the active set are in the contacted set
+    - (skip bad ones, control concurrency, etc)

-`contacted = {}`
-
-start!
-
-put all nodes in routing table into the toContact list
-
-run `ALPHA` workers
-
-each worker will asynchronously:
-
- pop first node in toContact list, mark it as contacted, contact it
- if node does not reply in some time, drop it and move on to next node
- if node replies, insert it into activeNodes
- if node replies with the value for FindValue call, return that value and immediately end the whole process
- if node replies with contacts, insert the contacts that have not been contacted yet into toContact
- if `len(toContact) == 0` (and there are no outstanding request) or `len(activeNodes) >= K && dist(target, activeNodes[K-1]) < dist(target, toContact[0])`, quit
-
-if FindValue found a value, return it. otherwise return first K activeNodes
-
-caveats:
-
- if you're doing the optimization of caching FindValue results at the closest node that didn't have it, you can't simply stop the search when you get a value. you have to keep going so you can find the closest node that didn't return a value
- this algorithm doesn't stop early when its unlikely to find closer nodes. to do this, you have to change two things
-    - only add the K closest nodes to the shortlist at the start of the search
-    - for the termination condition, track the closest node seen so far. if responses from the shortlist are not getting you any closer to the target, stop searching
+##### Announcing
+The act of storing new values in a distributed hash table. It starts by finding the K closest nodes to the key, then calling `store` on each one of them. This is repeated on a time interval to ensure the value is there as nodes go offline.
+The "value" here is just the contact information (node id, IP address and ports) so you one can use another protocol to properly request the full value (file or any kind of data). Note that other implementations supports arbitrary payloads directly in the DHT (torrent with BEP-44), but this is not implemented here.