From fb679ca5db39c0d50f447a462b8bdbb18fde3755 Mon Sep 17 00:00:00 2001 From: Alex Grin Date: Thu, 14 Oct 2021 16:22:02 -0400 Subject: [PATCH] Updated The claim trie memory reduction (markdown) --- The-claim-trie-memory-reduction.md | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/The-claim-trie-memory-reduction.md b/The-claim-trie-memory-reduction.md index a4ff0b6..f9e921b 100644 --- a/The-claim-trie-memory-reduction.md +++ b/The-claim-trie-memory-reduction.md @@ -43,11 +43,13 @@ class Node { With this kind of structure, I can walk from the root node to the leaf node for any search in O(len(name)) time. I can keep a set of nodes that need their hashes updated as I go. It works okay, but now consider the general inefficiencies of this approach. Example keys: superfluous, stupendous, and stupified. How does that look? -## πŸ„΄βž™πŸ„½βž™πŸ„³βž™πŸ„Ύβž™πŸ…„βž™πŸ…‚ - ➚ - βžšπŸ…ƒβž™πŸ…„βž™πŸ„Ώβž™πŸ„Έβž™πŸ„΅βž™πŸ„Έβž™πŸ„΄βž™πŸ„³ +``` + πŸ„΄βž™πŸ„½βž™πŸ„³βž™πŸ„Ύβž™πŸ…„βž™πŸ…‚ + ➚ + ➚ πŸ…ƒβž™πŸ…„βž™πŸ„Ώβž™πŸ„Έβž™πŸ„΅βž™πŸ„Έβž™πŸ„΄βž™πŸ„³ πŸ…‚ - βž˜πŸ…„βž™πŸ„Ώβž™πŸ„΄βž™πŸ…βž™πŸ„΅βž™πŸ„»βž™πŸ…„βž™πŸ„Ύβž™πŸ…„βž™πŸ…‚ + ➘ πŸ…„βž™πŸ„Ώβž™πŸ„΄βž™πŸ…βž™πŸ„΅βž™πŸ„»βž™πŸ…„βž™πŸ„Ύβž™πŸ…„βž™πŸ…‚ +``` In other words, we're now using 25 nodes to hold three data points. All but two of those nodes have one or no child. 22 of the 25 nodes have an empty data member. This is RAM intensive and very wasteful. @@ -61,15 +63,19 @@ Over the years there have been many proposals to improve this structure. I'm goi It ends up that idea #1 makes all the difference. You have to combine the nodes as much as possible. That turns the above trie into 5 nodes down from 25 becoming: -## πŸ„΄πŸ„½πŸ„³πŸ„ΎπŸ…„πŸ…‚ - ➚ - βžšπŸ…ƒπŸ…„πŸ„Ώβž™πŸ„ΈπŸ„΅πŸ„ΈπŸ„΄πŸ„³ +``` + πŸ„΄πŸ„½πŸ„³πŸ„ΎπŸ…„πŸ…‚ + ➚ + ➚ πŸ…ƒπŸ…„πŸ„Ώβž™πŸ„ΈπŸ„΅πŸ„ΈπŸ„΄πŸ„³ πŸ…‚ - βž˜πŸ…„πŸ„ΏπŸ„΄πŸ…πŸ„΅πŸ„»πŸ…„πŸ„ΎπŸ…„πŸ…‚ + ➘ πŸ…„πŸ„ΏπŸ„΄πŸ…πŸ„΅πŸ„»πŸ…„πŸ„ΎπŸ…„πŸ…‚ +``` ## Section 2: the experiments -[ Timed experiments for 1 million insertions of random data [a-zA-Z0-9]{1, 60}](https://www.notion.so/adecf55e97fb4c8080e5288bb44cd65d) +Timed experiments for 1 million insertions of random data [a-zA-Z0-9]{1, 60} + +TABLE GOES HERE A few notes about the table: @@ -110,7 +116,7 @@ set.end() == set.lower_bound("C" + char(1)) The general find algorithm: -[https://s3-us-west-2.amazonaws.com/secure.notion-static.com/8c57dec4-5909-4568-bc0e-0902e6c13952/untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/8c57dec4-5909-4568-bc0e-0902e6c13952/untitled) +![](https://s3.us-west-2.amazonaws.com/secure.notion-static.com/8c57dec4-5909-4568-bc0e-0902e6c13952/untitled?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAT73L2G45O3KS52Y5%2F20211014%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20211014T202050Z&X-Amz-Expires=86400&X-Amz-Signature=fa216fce572dea2d2d63bd4216cbea6f35954476f0cd5c68599e8af78d39da84&X-Amz-SignedHeaders=host&response-content-disposition=filename%20%3D%22untitled%22) The general find algorithm in pseudo-code: