Table of Contents
- When do we not need to verify data?
- What data do we need to verify?
- How does one get and validate the blockchain headers?
- Does ElectrumX Server validate the blockchain headers?
- What transactions do we need to validate?
- How do we validate the transactions in a block?
- What claims need to be verified?
- How do I verify a claim that I own?
- How do I verify that my claim query returned legitimate results?
- How do I verify my streamed data?
When do we not need to verify data?
- We don't need to verify data received over SSL/HTTPS, assuming that we trust the server we are talking to.
- We don't need to verify data transferred on the loopback: the OS ensures its integrity (and if the user can modify one end, then it was their intention to modify the receiving end as well).
What data do we need to verify?
We need to verify data pulled from peer-to-peer networks. Any intermediate peer may modify the data in their favor; we can't trust them. We will make an assumption that we can trust the majority of peers, assuming that they are randomly distributed, unaffiliated, not all found through one source, etc.
There are at least four pieces of data that we need to verify:
- The blockchain headers that contain hashes for the other two pieces.
- A transaction (and its position) in a block.
- A claim (and its position) in the claimtrie.
- That the streaming data matches what was registered.
#2 and #3 both rely on a Merkle-tree type of validation. The necessary root hash code for those is contained in the blockchain headers. Hence, the need to validate the headers.
How does one get and validate the blockchain headers?
These can be acquired through the P2P network or through the RPC call getheaders. Some or all of them may be required. You can compute the block hashes up the chain to ensure that each block has the correct hash. You will also need to speak to a large sampling of nodes to determine the democratically winning block hash. As this is well-documented elsewhere on the web, we won't go into it more here.
Does ElectrumX Server validate the blockchain headers?
No, it does not. It expects you to configure it to point to a local, trusted blockchain instance. The client-side wallet needs to validate headers — at least the ones that it cares about. Further reading: https://electrumx.readthedocs.io/en/latest/protocol-basics.html . And for clarity, the connection between the client wallet and ElectrumX server has traditionally been unsecure.
What transactions do we need to validate?
Validate all of the transactions in the blockchain that are associated with coins you have spent or earned, including any claims you have made. The client wallet does (or should do) this out of the box for coins associated with the local wallet addresses. (Note: I assume that this includes claims/supports as they are "spends", but I haven't verified that.)
The blockchain nodes allow you to query for transactions by address using a bloom filter (RPC method getdata
). In other words, you can combine all the addresses that you care about into a single filter input, send the request for matching transactions, and get the result without ever having to reveal exact addresses.
Again this is generally handled well by the wallet client already.
How do we validate the transactions in a block?
If you have all the transactions for that block, you can run the hash on the whole set to determine that the hash there matches what was recorded in the block header. If you have a partial set of transactions for a certain block, we validate them using the Merkle tree that is returned as part of the request for transactions. This latter option is called Simplified Payment Verification (SPV).
What claims need to be verified?
We will split this into two main categories:
- You want to verify that the claims that you own are in the claimtrie, that they are being served up as potential results to a query.
- You want to verify that the results of the query you just made match the data that is in the official claimtrie.
How do I verify a claim that I own?
For each CLAIM_NAME_OP transaction that you own,
- Derive the claimId from the transaction hash and output index (as found in the spec).
- Request the name proof for this with the RPC method
getnameproof
. Include the requested claimId. †The entire claimId is not required; a unique part of it will suffice. - Under the theory that the it's impossible/expensive to construct a Merkle tree that hashes to the known block hash, you can then hash your full claimId in with the results of the call and verify it against the known block claim trie root hash.
- Verify that the value returned (from where?) matches the value in the latest claim/update transaction that you have submitted.
The results of the proof method are structured list<hash, bool>
, where the bool is true for values that should go on the right of the current result.
† To be implemented in lbrycrd shortly, as part of PR https://github.com/lbryio/lbrycrd/pull/209 . The traditional proof process was Merkle in theory but not in practice — it didn't combine peers into a single hash. This has been rectified with the yet unmerged PR #209.
How do I verify that my claim query returned legitimate results?
For the following, assume that that a URI query returns a simple { claimId: "...", metadata: "..." }
Query formed as lbry://claim-name
- Find the latest transaction that updates the returned claimId (using the bloom filter).
- Verify the transaction returned in step 1 (same as wallet/coin transaction verification).
- Assert that the metadata returned by the query matches what is in the corresponding transaction. Metadata is now verified.
- Request the name proof from lbrycrd for the name in the query without passing in the claimId; this will return the proof for the winning claim. RPC method is
getnameproof
. - Use the returned proof data and the claim root hash in the block header to validate the claimId.
How do I verify that that the URI query returned the right data when formed as lbry://claim-name$1 - the claim bid order lookup ?
Query formed as lbry://claim-name:1 - the claim sequence lookup
- Run steps 1-3 of the general query verification above.
- †Call
getnameproof
with these parameters: claim-name sequence 1 - Use the returned proof data and the claim root hash in the block header to validate the claimId.
Query formed as lbry://claim-name$1 - the bid sequence lookup
- Run steps 1-3 of the general query verification above.
- †Call
getnameproof
with these parameters: claim-name bid 1 - Use the returned proof data and the claim root hash in the block header to validate the claimId.
Query formed as lbry://claim-name#deadbeef - the by-id claim lookup
- Run steps 1-3 of the general query verification above.
- Call getnameproof with these parameters: claim-name claim deadbeef
- Use the returned proof data and the claim root hash in the block header to validate the claimId.
How do I verify my streamed data?
A hash of the uploaded data is included in the metadata. Once the metadata has been verified as outlined above, you can use that to verify the data that has been downloaded. For streaming, we may want to include an additional "first 100KB" hash, or something to that affect so that we can verify the start of the data early on. I'd hate to have someone watch a two-hour movie and get a warning at the end about the content being other than what they paid for.