Reimplement ClaimTrie as Patricia Trie #106

Closed
opened 2018-03-14 14:42:23 +01:00 by lyoshenka · 4 comments
lyoshenka commented 2018-03-14 14:42:23 +01:00 (Migrated from github.com)

Our Claimtrie is inefficient because it must store lots of empty nodes when storing a long name. The solution is to use this: https://github.com/ethereum/wiki/wiki/Patricia-Tree

Our Claimtrie is inefficient because it must store lots of empty nodes when storing a long name. The solution is to use this: https://github.com/ethereum/wiki/wiki/Patricia-Tree
shyba commented 2018-05-02 20:28:54 +02:00 (Migrated from github.com)

For the sake of having this documented, we should be mindful that there is a space efficiency optimization and a time efficiency one mixed in this change. If we are doing one or both is a pending discussion.
If we just change the trie without changing how proofs are calculated we could solve the memory issue without a hardfork. While changing the proof calculation to match the new data structure (skipping empty nodes) would require a hardfork.

For the sake of having this documented, we should be mindful that there is a space efficiency optimization and a time efficiency one mixed in this change. If we are doing one or both is a pending discussion. If we just change the trie without changing how proofs are calculated we could solve the memory issue without a hardfork. While changing the proof calculation to match the new data structure (skipping empty nodes) would require a hardfork.
BrannonKing commented 2018-09-08 01:28:36 +02:00 (Migrated from github.com)

I was just doing some testing on node counts for this. With a million random strings (standard 60 characters), I end up with 25 million nodes in traditional trie. With the collapsed prefixes, I end up with 1.5 million nodes after a million insertions. That's a substantial difference! Next test: the qp trie. We can talk about where to check in my tests and what to name them next week.

I was just doing some testing on node counts for this. With a million random strings (standard 60 characters), I end up with 25 million nodes in traditional trie. With the collapsed prefixes, I end up with 1.5 million nodes after a million insertions. That's a substantial difference! Next test: the qp trie. We can talk about where to check in my tests and what to name them next week.
BrannonKing commented 2018-09-18 23:41:43 +02:00 (Migrated from github.com)

I'm attaching my experiments on this. I originally concluded that the PrefixTrie is the right structure, and that it would drastically reduce our (present) memory usage. I also concluded that we don't need a separate library as it is not a lot of code. Since then a discussion with @shyba pointed out that std::map may be the best container. I did some testing and came to that conclusion as well. I'll leave it here on the shelf and hope that we can get c++11 support in lbrycrd in the very near future. I intend to bring it in without a hard fork (keep the hash calculation functioning the way it is).

lbry_tries.zip

I'm attaching my experiments on this. I originally concluded that the PrefixTrie is the right structure, and that it would drastically reduce our (present) memory usage. I also concluded that we don't need a separate library as it is not a lot of code. Since then a discussion with @shyba pointed out that std::map may be the best container. I did some testing and came to that conclusion as well. I'll leave it here on the shelf and hope that we can get c++11 support in lbrycrd in the very near future. I intend to bring it in without a hard fork (keep the hash calculation functioning the way it is). [lbry_tries.zip](https://github.com/lbryio/lbrycrd/files/2406126/lbry_tries.zip)
BrannonKing commented 2018-09-19 00:53:54 +02:00 (Migrated from github.com)

Last week I tested the code here with my same sample data: https://github.com/ytakano/radix_tree. It worked well enough, but it was not as fast as std::map. I didn't see a lot of other libraries that were easily accessible. I did not attempt to use Ethereum's trie code. It appeared complex, and I don't want to mix their code in with ours.

Last week I tested the code here with my same sample data: https://github.com/ytakano/radix_tree. It worked well enough, but it was not as fast as std::map. I didn't see a lot of other libraries that were easily accessible. I did not attempt to use Ethereum's trie code. It appeared complex, and I don't want to mix their code in with ours.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lbrycrd#106
No description provided.