Memory improvements, claimtrie revamp #276

Merged
BrannonKing merged 1 commit from memory_improvements into master 2019-07-01 20:45:33 +02:00
BrannonKing commented 2019-05-11 04:51:30 +02:00 (Migrated from github.com)

This PR addresses the following issues:

Fixes #106 (Reimplement claimtrie without so many empty nodes, save RAM)
Fixes #108 (Separate data structure and data operations)
Fixes #133 (Trie serialization failure)
Fixes #138 (Rename nCurrentHeight)
Fixes #145 (Rename CClaimTrieCache)
Fixes #158 (Properly delete CClaimTrieNode)

It also includes these enhancements:

  1. BLOCK_HASH is no longer needlessly stored in the claimtrie DB. I had a theory that this was masking places that we should have been comparing the claimTrieHash, which theory proved correct.
  2. An issue with takeover height computation was detected and (temporarily) worked around.
  3. cachedTakeoverHeights was removed; we now use the takeoverHeight field on the nodes in the cache.
  4. The getQueue* functions now re-use code.
  5. effectiveAmount is updated when reading from disk (we should now ensure that it's not being needlessly computed elsewhere).

Although the commits have been squashed, much of this work was done by @bvbfan .

We experimented with keeping empty nodes truly empty -- allocating claim space for them as needed. This saved less than 10MB of RAM and complicated the code a bit; we ditched it. We also experimented with a similar approach where empty node hashes were kept in a separate map. This also was too complicated to get right.

This PR addresses the following issues: Fixes #106 (Reimplement claimtrie without so many empty nodes, save RAM) Fixes #108 (Separate data structure and data operations) Fixes #133 (Trie serialization failure) Fixes #138 (Rename nCurrentHeight) Fixes #145 (Rename CClaimTrieCache) Fixes #158 (Properly delete CClaimTrieNode) It also includes these enhancements: 1. BLOCK_HASH is no longer needlessly stored in the claimtrie DB. I had a theory that this was masking places that we should have been comparing the claimTrieHash, which theory proved correct. 2. An issue with takeover height computation was detected and (temporarily) worked around. 3. cachedTakeoverHeights was removed; we now use the takeoverHeight field on the nodes in the cache. 4. The getQueue* functions now re-use code. 5. effectiveAmount is updated when reading from disk (we should now ensure that it's not being needlessly computed elsewhere). Although the commits have been squashed, much of this work was done by @bvbfan . We experimented with keeping empty nodes truly empty -- allocating claim space for them as needed. This saved less than 10MB of RAM and complicated the code a bit; we ditched it. We also experimented with a similar approach where empty node hashes were kept in a separate map. This also was too complicated to get right.
lbrynaut (Migrated from github.com) reviewed 2019-05-11 04:51:30 +02:00
tiger5226 (Migrated from github.com) reviewed 2019-05-11 05:27:07 +02:00
tiger5226 (Migrated from github.com) commented 2019-05-11 05:27:07 +02:00

@lbrynaut Is this meant to be here?

@lbrynaut Is this meant to be here?
BrannonKing (Migrated from github.com) reviewed 2019-05-11 06:18:54 +02:00
BrannonKing (Migrated from github.com) commented 2019-05-11 06:18:53 +02:00

I'm glad you brought this up. I'm the one that changed the default and checked it in. I found it frustrating during our last hard fork that none of the log files had IP addresses in them. I suppose this is there because some people may be uncomfortable exposing their internal IP addresses. I think that concern is generally a thing of the past, though, and they aren't forced to send me their logs. I'm certainly open to arguments the other way. I think we should enable IP logging on all instances that we manage.

I'm glad you brought this up. I'm the one that changed the default and checked it in. I found it frustrating during our last hard fork that none of the log files had IP addresses in them. I suppose this is there because some people may be uncomfortable exposing their internal IP addresses. I think that concern is generally a thing of the past, though, and they aren't forced to send me their logs. I'm certainly open to arguments the other way. I think we should enable IP logging on all instances that we manage.
BrannonKing (Migrated from github.com) reviewed 2019-05-11 06:31:24 +02:00
@ -0,0 +305,4 @@
}
template <typename TKey, typename TData>
template <typename TDataUni>
BrannonKing (Migrated from github.com) commented 2019-05-11 06:24:13 +02:00

TDataUni allows us to have both lvalue and rvalue data inputs on this method. The alternative is to have two overloads of this method.

TDataUni allows us to have both lvalue and rvalue data inputs on this method. The alternative is to have two overloads of this method.
@ -0,0 +333,4 @@
auto name = it.key();
name.insert(name.end(), key.begin(), key.end());
auto& node = insert(key, shared);
copy = iterator{name, node};
BrannonKing (Migrated from github.com) commented 2019-05-11 06:26:11 +02:00

Not a true forward iterator -- that would require the full stack. This is a children-only iterator. It's true of all the places we create the iterator here. We are specifically choosing to not initialize them fully, and we're using the weak_ptr workaround for the place where we need the parent function but don't have it.

Not a true forward iterator -- that would require the full stack. This is a children-only iterator. It's true of all the places we create the iterator here. We are specifically choosing to not initialize them fully, and we're using the weak_ptr workaround for the place where we need the parent function but don't have it.
@ -0,0 +477,4 @@
using iterator = Trie::iterator;
using const_iterator = Trie::const_iterator;
template class CPrefixTrie<Key, Data>;
BrannonKing (Migrated from github.com) commented 2019-05-11 06:22:09 +02:00

This and the lines below it are a c++ nuance. You have to instantiate a specific type of a template or you need the full definition in the header file. As a native C# programmer, I prefer the latter. @bvbfan prefers the former (as it is here).

This and the lines below it are a c++ nuance. You have to instantiate a specific type of a template or you need the full definition in the header file. As a native C# programmer, I prefer the latter. @bvbfan prefers the former (as it is here).
tiger5226 (Migrated from github.com) reviewed 2019-05-11 17:33:49 +02:00
tiger5226 (Migrated from github.com) commented 2019-05-11 05:32:13 +02:00

without data

`without` data
tiger5226 (Migrated from github.com) commented 2019-05-11 05:30:15 +02:00

adding true everywhere doesn't help me understand what is being passed in here. Shouldn't we just assign this to a variable like the others?

adding true everywhere doesn't help me understand what is being passed in here. Shouldn't we just assign this to a variable like the others?
@ -36,3 +36,3 @@
//! Max memory allocated to block tree DB specific cache, if no -txindex (MiB)
static const int64_t nMaxBlockDBCache = 2;
static const int64_t nMaxBlockDBCache = 8;
//! Max memory allocated to block tree DB specific cache, if -txindex (MiB)
tiger5226 (Migrated from github.com) commented 2019-05-11 05:58:36 +02:00

Going to 8GB max is an important note for the release that contains this. Should be in the change log. Also I am interested in the mindbcache purpose.

Going to 8GB max is an important note for the release that contains this. Should be in the change log. Also I am interested in the mindbcache purpose.
tiger5226 (Migrated from github.com) commented 2019-05-11 05:54:27 +02:00

should this be higher for lbrycrd? blocks are just files on a disk at risk of malware manipulation. Bitcoin is 6 blocks but also a lot harder based on computational power requirements at 6. Maybe we should have something much more than 24 as a default? Just a thought since our hash power is minimal in comparison.

should this be higher for lbrycrd? blocks are just files on a disk at risk of malware manipulation. Bitcoin is 6 blocks but also a lot harder based on computational power requirements at 6. Maybe we should have something much more than 24 as a default? Just a thought since our hash power is minimal in comparison.
tiger5226 (Migrated from github.com) reviewed 2019-05-11 17:36:50 +02:00
tiger5226 (Migrated from github.com) commented 2019-05-11 17:36:50 +02:00

Agreed. If we find it useful we should enable it, but maybe disable it by default for privacy.

Agreed. If we find it useful we should enable it, but maybe disable it by default for privacy.
tiger5226 (Migrated from github.com) reviewed 2019-05-11 17:41:02 +02:00
tiger5226 (Migrated from github.com) commented 2019-05-11 17:41:02 +02:00

this is probably a very low priority attack vector, but 6 blocks with lbrycrd is not a lot to overwrite with malware. Most exchanges have min confirmations at 200 for LBC. Validating the top 200 blocks is probably a good default for right now. Again the likelihood that enough people catch the LBC malware that does this is probably near zero...just a thought though after researching the default value.

this is probably a very low priority attack vector, but 6 blocks with lbrycrd is not a lot to overwrite with malware. Most exchanges have min confirmations at 200 for LBC. Validating the top 200 blocks is probably a good default for right now. Again the likelihood that enough people catch the LBC malware that does this is probably near zero...just a thought though after researching the default value.
BrannonKing (Migrated from github.com) reviewed 2019-05-11 23:18:56 +02:00
BrannonKing (Migrated from github.com) commented 2019-05-11 23:18:56 +02:00

claimtrie.h::insertClaimIntoTrie utilizes a default parameter value even though it's a virtual method. This is non-standard c++, and whether or not inheritors use that default is undefined. I had intended to remove it, but I apparently only made it part way down that road.

claimtrie.h::insertClaimIntoTrie utilizes a default parameter value even though it's a virtual method. This is non-standard c++, and whether or not inheritors use that default is undefined. I had intended to remove it, but I apparently only made it part way down that road.
bvbfan (Migrated from github.com) reviewed 2019-05-13 08:43:53 +02:00
@ -0,0 +477,4 @@
using iterator = Trie::iterator;
using const_iterator = Trie::const_iterator;
template class CPrefixTrie<Key, Data>;
bvbfan (Migrated from github.com) commented 2019-05-13 08:43:53 +02:00

In some project is used *.impl file ext (not *.cpp) and gets include where it's needed, i'm ok with that too. But all in header makes things to look redundant and hard to read.

In some project is used *.impl file ext (not *.cpp) and gets include where it's needed, i'm ok with that too. But all in header makes things to look redundant and hard to read.
bvbfan (Migrated from github.com) reviewed 2019-05-13 08:46:41 +02:00
bvbfan (Migrated from github.com) commented 2019-05-13 08:46:40 +02:00

The comment is old, it should be removed at all, we have data on all nodes so it's safe to be used range-based for loop.

The comment is old, it should be removed at all, we have data on all nodes so it's safe to be used range-based for loop.
bvbfan (Migrated from github.com) reviewed 2019-05-13 08:55:43 +02:00
@ -0,0 +333,4 @@
auto name = it.key();
name.insert(name.end(), key.begin(), key.end());
auto& node = insert(key, shared);
copy = iterator{name, node};
bvbfan (Migrated from github.com) commented 2019-05-13 08:55:43 +02:00

That's by design, iterator is forward through children, i mean this is not an exception. Thus we don't have ability to obtain parent, that's nodes stand for. We can go for full iterator but i'm afraid we will fill iterator stack for nothing i.e. we will not use it.

That's by design, iterator is forward through children, i mean this is not an exception. Thus we don't have ability to obtain parent, that's nodes stand for. We can go for *full* iterator but i'm afraid we will fill iterator stack for nothing i.e. we will not use it.
lbrynaut (Migrated from github.com) reviewed 2019-05-15 14:55:21 +02:00
lbrynaut (Migrated from github.com) commented 2019-05-15 14:55:21 +02:00

Possibly unrelated to this PR, but that's a good point. Bitcoin core did backtrack this from 288 to 6, but it's likely not appropriate for us. Reverting to 288 is probably more sane.

Possibly unrelated to this PR, but that's a good point. Bitcoin core did backtrack this from 288 to 6, but it's likely not appropriate for us. Reverting to 288 is probably more sane.
lbrynaut (Migrated from github.com) reviewed 2019-05-15 15:41:19 +02:00
lbrynaut (Migrated from github.com) commented 2019-05-15 15:41:19 +02:00

I'd leave it disabled as well.

I'd leave it disabled as well.
BrannonKing (Migrated from github.com) reviewed 2019-05-15 18:29:12 +02:00
BrannonKing (Migrated from github.com) commented 2019-05-15 18:21:33 +02:00

I still feel like this method fits better on the base class.

I still feel like this method fits better on the base class.
BrannonKing (Migrated from github.com) commented 2019-05-15 18:21:51 +02:00

This method also fits better on the base class.

This method also fits better on the base class.
BrannonKing (Migrated from github.com) commented 2019-05-15 18:25:57 +02:00

24 is an hour -- same as bitcoin. Why would we need more than that?

24 is an hour -- same as bitcoin. Why would we need more than that?
tiger5226 (Migrated from github.com) reviewed 2019-05-16 04:20:14 +02:00
tiger5226 (Migrated from github.com) commented 2019-05-16 04:20:14 +02:00

I don't think time is the intention here. The intention is to confirm enough blocks such that it is statistically painful to overwrite like with a big reorg for a double spend attack. 6 blocks of bitcoin is a standard for confirmation not meant to be a time equivalence. Admittedly, it is nice that it evenly breaks down to an hour, however arbitrary, but should not be the correlation between cryptocurrencies. However, I could be misinterpreting it.

I don't think time is the intention here. The intention is to confirm enough blocks such that it is statistically painful to overwrite like with a big reorg for a double spend attack. 6 blocks of bitcoin is a standard for confirmation not meant to be a time equivalence. Admittedly, it is nice that it evenly breaks down to an hour, however arbitrary, but should not be the correlation between cryptocurrencies. However, I could be misinterpreting it.
tiger5226 (Migrated from github.com) reviewed 2019-06-01 03:09:22 +02:00
@ -78,0 +125,4 @@
{
sValue = HexStr(vvchParams[2].begin(), vvchParams[2].end());
return true;
}
tiger5226 (Migrated from github.com) commented 2019-06-01 03:09:21 +02:00

not that I am well informed on whats going on here, but @BrannonKing wouldn't it be better to handle this sort of thing in the processing of the request rather than modifying so many lines of code? Seems like a DRY violation.

not that I am well informed on whats going on here, but @BrannonKing wouldn't it be better to handle this sort of thing in the processing of the request rather than modifying so many lines of code? Seems like a DRY violation.
bvbfan (Migrated from github.com) approved these changes 2019-06-03 18:26:33 +02:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lbrycrd#276
No description provided.