Separate disk related functions in CClaimTrieDb #175

Closed
kaykurokawa wants to merge 301 commits from separate_disk_claim_rebased into master
kaykurokawa commented 2018-07-19 19:09:42 +02:00 (Migrated from github.com)

This is a rebased and squashed version of @bvbfan 's CClaimTrieDb commits which separates disk related function from the claimtrie into its own class (from https://github.com/lbryio/lbrycrd/pull/160)

I removed here commits which renamed CClaimTrieCache and nCurrentHeight, as well as the new management of CClaimTrieNode, they will be put into seperate PR's.

This PR is not downgrade compatible (once you upgrade to this commit, you will not be able to downgrade to previous versions without wiping your claim database) and should be accompanied by a version bump.

This is a rebased and squashed version of @bvbfan 's CClaimTrieDb commits which separates disk related function from the claimtrie into its own class (from https://github.com/lbryio/lbrycrd/pull/160) I removed here commits which renamed CClaimTrieCache and nCurrentHeight, as well as the new management of CClaimTrieNode, they will be put into seperate PR's. This PR is not downgrade compatible (once you upgrade to this commit, you will not be able to downgrade to previous versions without wiping your claim database) and should be accompanied by a version bump.
BrannonKing (Migrated from github.com) reviewed 2018-07-19 19:32:07 +02:00
BrannonKing (Migrated from github.com) commented 2018-07-19 19:26:13 +02:00

Why not an unordered_map? (or map if ordering matters). It appears from the method below that we commonly need to find by key. I suppose changing to a dictionary structure is not the intent of this story.

Why not an unordered_map? (or map if ordering matters). It appears from the method below that we commonly need to find by key. I suppose changing to a dictionary structure is not the intent of this story.
BrannonKing (Migrated from github.com) approved these changes 2018-07-19 19:32:41 +02:00
bvbfan (Migrated from github.com) reviewed 2018-07-19 19:43:28 +02:00
bvbfan (Migrated from github.com) commented 2018-07-19 19:43:28 +02:00

unordered_map is a hash_map, so we should provide a hash algorithm for some classes, like base_blob derived.

unordered_map is a hash_map, so we should provide a hash algorithm for some classes, like base_blob derived.
lbrynaut (Migrated from github.com) requested changes 2018-07-26 20:54:05 +02:00
lbrynaut (Migrated from github.com) commented 2018-07-26 20:04:55 +02:00

No, this is broken.

No, this is broken.
@ -248,2 +195,3 @@
return current->haveClaim(outPoint);
const CClaimTrieNode* current = getNodeForName(name);
return current && current->haveClaim(outPoint);
}
lbrynaut (Migrated from github.com) commented 2018-07-26 20:05:06 +02:00

If this style change (and others like it) isn't pulled in from clang-format, remove it.

If this style change (and others like it) isn't pulled in from clang-format, remove it.
lbrynaut (Migrated from github.com) commented 2018-07-26 20:05:09 +02:00

If this style change (and others like it) isn't pulled in from clang-format, remove it. (Please apply to other cases as well -- alternatively, just run clang-format and rebase and comment that that's why it's here.)

If this style change (and others like it) isn't pulled in from clang-format, remove it. (Please apply to other cases as well -- alternatively, just run clang-format and rebase and comment that that's why it's here.)
@ -437,4 +364,3 @@
return false;
}
claimsForNameType CClaimTrie::getClaimsForName(const std::string& name) const
lbrynaut (Migrated from github.com) commented 2018-07-26 20:06:29 +02:00

Adding this else isn't needed.

Adding this else isn't needed.
@ -1072,3 +621,4 @@
return db.Sync();
}
bool CClaimTrie::InsertFromDisk(const std::string& name, CClaimTrieNode* node)
lbrynaut (Migrated from github.com) commented 2018-07-26 20:14:46 +02:00

name.empty() ?

name.empty() ?
@ -1320,3 +855,1 @@
}
else
{
} else {
lbrynaut (Migrated from github.com) commented 2018-07-26 20:06:57 +02:00

Again, unless it's clang-format specified, don't add lines that aren't needed.

Again, unless it's clang-format specified, don't add lines that aren't needed.
@ -1359,3 +893,3 @@
{
if (cachedNode != cache.end()) {
currentNode = cachedNode->second;
continue;
lbrynaut (Migrated from github.com) commented 2018-07-26 20:07:46 +02:00

Suggested style: if (current && !current->claims.empty())

Suggested style: if (current && !current->claims.empty())
@ -2520,2 +1796,3 @@
{
hashMapType::const_iterator cachedHash = cacheHashes.find(leafPosition.str());
if (cachedHash != cacheHashes.end()) {
return cachedHash->second;
lbrynaut (Migrated from github.com) commented 2018-07-26 20:15:51 +02:00

Alternative style:

CClaimTrieNode emptyNode;
db.updateQueueRow(itcache->first, pNode ? *pNode : emptyNode);

I'll stop commenting on style, perhaps it's pointless, or maybe clang-format will help (although not on things like this, which are adding unnecessary lines).

Alternative style: ``` CClaimTrieNode emptyNode; db.updateQueueRow(itcache->first, pNode ? *pNode : emptyNode); ``` I'll stop commenting on style, perhaps it's pointless, or maybe clang-format will help (although not on things like this, which are adding unnecessary lines).
lbrynaut (Migrated from github.com) commented 2018-07-26 20:14:25 +02:00

Why are we removing const from all of the methods? Are we also removing mutable members?

Why are we removing const from all of the methods? Are we also removing mutable members?
lbrynaut (Migrated from github.com) commented 2018-07-26 20:27:53 +02:00

This answers my previous question about removing mutable members. Disregard.

This answers my previous question about removing mutable members. Disregard.
lbrynaut (Migrated from github.com) commented 2018-07-26 20:34:01 +02:00

I don't think the queues get to be that large, but perhaps someday it's worth measuring the performance. As it is, you gain cache effects due to locality (I think this wins), but with a well defined hash and unordered_map you gain constant time retrieval. This is not urgent for this PR and it's fine as-is. I'd also advise (style) to just use std::find.std::find_if with a comparator if/when it's supported by the compiler (certainly after upstream merges).

I don't think the queues get to be that large, but perhaps someday it's worth measuring the performance. As it is, you gain cache effects due to locality (I think this wins), but with a well defined hash and unordered_map you gain constant time retrieval. This is not urgent for this PR and it's fine as-is. I'd also advise (style) to just use std::find.std::find_if with a comparator if/when it's supported by the compiler (certainly after upstream merges).
lbrynaut (Migrated from github.com) commented 2018-07-26 20:47:26 +02:00

As a compromise (and simplicity), perhaps it's worth considering 2 step hashes such that the above hash == some_map[K][V]? I see why you need these now, but we're jumping through some hoops to get there, no? What I proposed is probably less space efficient, but I want to be sure we're doing this in a sustainable way going forward.

As a compromise (and simplicity), perhaps it's worth considering 2 step hashes such that the above hash == some_map[K][V]? I see why you need these now, but we're jumping through some hoops to get there, no? What I proposed is probably less space efficient, but I want to be sure we're doing this in a sustainable way going forward.
lbrynaut (Migrated from github.com) commented 2018-07-26 20:36:48 +02:00

Where did these definitions come from? It should be documented because 1) Not all hashes are equally appropriate, and 2) I'd hope we can replace most/all of these with boost:: or std:: hash implementations. I don't think supporting these in our code forever is ideal.

Where did these definitions come from? It should be documented because 1) Not all hashes are equally appropriate, and 2) I'd hope we can replace most/all of these with boost:: or std:: hash implementations. I don't think supporting these in our code forever is ideal.
lbrynaut (Migrated from github.com) commented 2018-07-26 20:40:53 +02:00

If we really want to do this, there are plenty of template specialization varieties that can be made that resort to std::hash which eliminates the need for many of the custom hash variants.

If we really want to do this, there are plenty of template specialization varieties that can be made that resort to std::hash which eliminates the need for many of the custom hash variants.
lbrynaut (Migrated from github.com) commented 2018-07-26 20:50:34 +02:00

I really wish we could avoid this kind of test, but if we go this route, at least it helps locate regressions.

I really wish we could avoid this kind of test, but if we go this route, at least it helps locate regressions.
lbrynaut (Migrated from github.com) reviewed 2018-07-26 20:56:13 +02:00
@ -248,2 +195,3 @@
return current->haveClaim(outPoint);
const CClaimTrieNode* current = getNodeForName(name);
return current && current->haveClaim(outPoint);
}
lbrynaut (Migrated from github.com) commented 2018-07-26 20:56:13 +02:00

Update: Hopefully this has already been clang-formatting because a large portion of it is just mechanical style changes, and it's distracting for such a large commit. I realize that as we're requiring clang-format now this should be less an issue in the future. If these were added manually, I suggest an immediate format/rebase.

Update: Hopefully this has already been clang-formatting because a large portion of it is just mechanical style changes, and it's distracting for such a large commit. I realize that as we're requiring clang-format now this should be less an issue in the future. If these were added manually, I suggest an immediate format/rebase.
bvbfan (Migrated from github.com) reviewed 2018-07-30 06:27:11 +02:00
bvbfan (Migrated from github.com) commented 2018-07-30 06:27:11 +02:00

What is broken?

What is broken?
bvbfan (Migrated from github.com) reviewed 2018-07-30 06:28:06 +02:00
@ -248,2 +195,3 @@
return current->haveClaim(outPoint);
const CClaimTrieNode* current = getNodeForName(name);
return current && current->haveClaim(outPoint);
}
bvbfan (Migrated from github.com) commented 2018-07-30 06:28:06 +02:00

The style is recommended bitcoin style which @kaykurokawa describe days ago.

The style is recommended bitcoin style which @kaykurokawa describe days ago.
bvbfan (Migrated from github.com) reviewed 2018-07-30 06:28:29 +02:00
@ -248,2 +195,3 @@
return current->haveClaim(outPoint);
const CClaimTrieNode* current = getNodeForName(name);
return current && current->haveClaim(outPoint);
}
bvbfan (Migrated from github.com) commented 2018-07-30 06:28:29 +02:00

See above.

See above.
bvbfan (Migrated from github.com) reviewed 2018-07-30 06:28:49 +02:00
@ -437,4 +364,3 @@
return false;
}
claimsForNameType CClaimTrie::getClaimsForName(const std::string& name) const
bvbfan (Migrated from github.com) commented 2018-07-30 06:28:48 +02:00

Same as above.

Same as above.
bvbfan (Migrated from github.com) reviewed 2018-07-30 06:29:43 +02:00
bvbfan (Migrated from github.com) commented 2018-07-30 06:29:42 +02:00

Yes.

Yes.
bvbfan (Migrated from github.com) reviewed 2018-07-30 06:34:44 +02:00
bvbfan (Migrated from github.com) commented 2018-07-30 06:34:44 +02:00

Get it from here -> http://lolengine.net/blog/2011/12/20/cpp-constant-string-hash it should be unique.

Get it from here -> http://lolengine.net/blog/2011/12/20/cpp-constant-string-hash it *should* be unique.
bvbfan (Migrated from github.com) reviewed 2018-07-30 06:35:46 +02:00
bvbfan (Migrated from github.com) commented 2018-07-30 06:35:46 +02:00

Yes, it's helper macro. There is no need to write class names by hand.

Yes, it's helper macro. There is no need to write class names by hand.
bvbfan (Migrated from github.com) reviewed 2018-07-30 06:39:36 +02:00
bvbfan (Migrated from github.com) commented 2018-07-30 06:39:36 +02:00

The test aims to guarantee that all distinct type has an unique id, when new type is added unit test should be updated as well. The algorithm should produce same results on all different compilers and all its versions, which is not guaranteed nor by std::type_id nor by boost::type_id.

The test aims to *guarantee* that all distinct type has an unique id, when new type is added unit test should be updated as well. The algorithm should produce same results on all different compilers and all its versions, which is *not* guaranteed nor by std::type_id nor by boost::type_id.
bvbfan commented 2018-07-30 06:41:23 +02:00 (Migrated from github.com)

Is there some stopper to merge my changes on master? I plan to rebase my last changes for wed meet.

Is there some stopper to merge my changes on master? I plan to rebase my last changes for wed meet.
BrannonKing (Migrated from github.com) reviewed 2018-07-30 17:43:32 +02:00
BrannonKing (Migrated from github.com) commented 2018-07-30 17:43:32 +02:00

I agree that this is a bad pattern. This should technically cause linker problems due to double definition.

I agree that this is a bad pattern. This should technically cause linker problems due to double definition.
BrannonKing (Migrated from github.com) reviewed 2018-07-30 17:51:04 +02:00
BrannonKing (Migrated from github.com) commented 2018-07-30 17:51:04 +02:00

I like the change. Saying that a removal method is const is misleading. The only place I would expect to see the mutable keyword is in an immutable object for the caching of expensive-to-compute fields.

I like the change. Saying that a removal method is `const` is misleading. The only place I would expect to see the `mutable` keyword is in an immutable object for the caching of expensive-to-compute fields.
bvbfan (Migrated from github.com) reviewed 2018-07-30 17:53:37 +02:00
bvbfan (Migrated from github.com) commented 2018-07-30 17:53:37 +02:00

It looks like but is not. Template has a external linking and it has few hacks to prevent problems:

  1. Write template in class body - this is ugly and is not a proper C++ fix
  2. Write a 3th file only with template implementation then include only. This adds an inconsistency between .h and .cpp with added .impl
  3. Mark normal functions as inline do the trick and it's the natural way to keep consistency of .h and .cpp files.
It looks like but is not. Template has a external linking and it has few *hacks* to prevent problems: 1. Write template in class body - this is ugly and is not a proper C++ fix 2. Write a 3th file only with template implementation then include only. This adds an inconsistency between .h and .cpp with added .impl 3. Mark normal functions as inline *do* the trick and it's the natural way to keep consistency of .h and .cpp files.
bvbfan (Migrated from github.com) reviewed 2018-07-30 18:28:15 +02:00
bvbfan (Migrated from github.com) commented 2018-07-30 18:28:14 +02:00

I made the file safety included in all conditions, that's needed when it has a templates in.

I made the file *safety* included in all conditions, that's needed when it has a templates in.
lbrynaut (Migrated from github.com) reviewed 2018-07-30 19:09:08 +02:00
lbrynaut (Migrated from github.com) commented 2018-07-30 19:09:07 +02:00

Including a cpp file from another means incorrect/bad organization. That is what I meant by broken.

Including a cpp file from another means incorrect/bad organization. That is what I meant by broken.
lbrynaut (Migrated from github.com) reviewed 2018-07-30 19:09:45 +02:00
@ -248,2 +195,3 @@
return current->haveClaim(outPoint);
const CClaimTrieNode* current = getNodeForName(name);
return current && current->haveClaim(outPoint);
}
lbrynaut (Migrated from github.com) commented 2018-07-30 19:09:44 +02:00

This is produced from clang-format?

This is produced from clang-format?
lbrynaut commented 2018-07-30 19:12:11 +02:00 (Migrated from github.com)

Is there some stopper to merge my changes on master? I plan to rebase my last changes for wed meet.

This is under review. Feel free to rebase, but it won't be merged (i.e. don't merge it) until review is complete.

> Is there some stopper to merge my changes on master? I plan to rebase my last changes for wed meet. This is under review. Feel free to rebase, but it won't be merged (i.e. don't merge it) until review is complete.
bvbfan (Migrated from github.com) reviewed 2018-07-30 19:14:31 +02:00
bvbfan (Migrated from github.com) commented 2018-07-30 19:14:28 +02:00

So this is no correct, i don't want to opposite you but describe some pattern as incorrect by default is not serious.

So this is no correct, i don't want to opposite you but describe some pattern as *incorrect* by default is not serious.
lbrynaut (Migrated from github.com) reviewed 2018-07-30 19:15:52 +02:00
lbrynaut (Migrated from github.com) commented 2018-07-30 19:15:50 +02:00

This link didn't work for me. It's also not really documenting why we want to use these hash values in our code base indefinitely, as opposed to re-organizing or sticking with existing boost/std ones (or some composition of).

This link didn't work for me. It's also not really documenting why we want to use these hash values in our code base indefinitely, as opposed to re-organizing or sticking with existing boost/std ones (or some composition of).
bvbfan (Migrated from github.com) reviewed 2018-07-30 19:16:48 +02:00
bvbfan (Migrated from github.com) left a comment

So it was accepted month ago, this is only separate review original was #140

So it was accepted month ago, this is only separate review original was #140
lbrynaut (Migrated from github.com) reviewed 2018-07-30 19:24:55 +02:00
lbrynaut (Migrated from github.com) commented 2018-07-30 19:24:55 +02:00

This is not a strong argument for why we want/need to support our own hashing here. This seems like a case of NIH, which should be avoided (https://en.wikipedia.org/wiki/Not_invented_here#In_computing). I don't have an understanding of why we must go this route without considering other options. What should be an organizational PR (as the name states separating methods out), looks to be far larger, including a breaking database re-organization which is something else entirely.

@bvbfan If you've had discussions about this work that I was not a part of (surely you have), I'd like to chat with whoever had them with you :-) @kaykurokawa ? If it's just a matter of me not being caught-up, I'll adjust the review accordingly.

This is not a strong argument for why we want/need to support our own hashing here. This seems like a case of NIH, which should be avoided (https://en.wikipedia.org/wiki/Not_invented_here#In_computing). I don't have an understanding of why we *must* go this route without considering other options. What should be an organizational PR (as the name states separating methods out), looks to be far larger, including a breaking database re-organization which is something else entirely. @bvbfan If you've had discussions about this work that I was not a part of (surely you have), I'd like to chat with whoever had them with you :-) @kaykurokawa ? If it's just a matter of me not being caught-up, I'll adjust the review accordingly.
lbrynaut commented 2018-07-30 19:28:14 +02:00 (Migrated from github.com)

So it was accepted month ago, this is only separate review original was #140

I hadn't reviewed it then. Being a team that is responsible for maintaining this code might mean multiple rounds of reviews. If @kaykurokawa is going to merge it as-is, that's up to him, but my concerns are worth noting here anyway.

> So it was accepted month ago, this is only separate review original was #140 I hadn't reviewed it then. Being a team that is responsible for maintaining this code might mean multiple rounds of reviews. If @kaykurokawa is going to merge it as-is, that's up to him, but my concerns are worth noting here anyway.
bvbfan commented 2018-07-30 19:36:32 +02:00 (Migrated from github.com)

There is no problem to be reviewed but let be conceptional rather than a silly comments. I mean when you know better way to do it, describe it how to do.

There is no problem to be reviewed but let be conceptional rather than a silly comments. I mean when you know better way to do it, describe it how to do.
lbrynaut commented 2018-07-30 19:43:54 +02:00 (Migrated from github.com)

There is no problem to be reviewed but let be conceptional rather than a silly comments. I mean when you know better way to do it, describe it how to do.

What silly comment(s)? I'm mostly asking questions on or pointing out things I don't know, or don't look right to me. As you said, if this has already been reviewed despite my concerns, it'll be merged anyway.

I'm only reviewing it because I care about this code and wonder about future maintenance. This work was not assigned to me, so I don't know a better way to do all of it (and haven't thought about it as much as you and/or @kaykurokawa). If you've exhausted all alternative implementations and arrived here, it'll be merged and this is all disregarded anyway. Since I don't know, I've added my 2 cents. I don't know if that provides any value though. 2 "lbrynaut tokens" worth of input may not be worth much :-)

> There is no problem to be reviewed but let be conceptional rather than a silly comments. I mean when you know better way to do it, describe it how to do. What silly comment(s)? I'm mostly asking questions on or pointing out things I don't know, or don't look right to me. As you said, if this has already been reviewed despite my concerns, it'll be merged anyway. I'm only reviewing it because I care about this code and wonder about future maintenance. This work was not assigned to me, so I don't know a better way to do all of it (and haven't thought about it as much as you and/or @kaykurokawa). If you've exhausted all alternative implementations and arrived here, it'll be merged and this is all disregarded anyway. Since I don't know, I've added my 2 cents. I don't know if that provides any value though. 2 "lbrynaut tokens" worth of input may not be worth much :-)
bvbfan commented 2018-07-30 19:46:40 +02:00 (Migrated from github.com)

Only for clarification -> "Including a cpp file from another means incorrect/bad organization. That is what I meant by broken." is a silly comment. This not broke anything and it's not forbidden or anything else. Templates want it, adding an *impl files is not better about me, if you know better describe it.

Only for clarification -> "Including a cpp file from another means incorrect/bad organization. That is what I meant by broken." is a silly comment. This not broke anything and it's not forbidden or anything else. Templates want it, adding an *impl files is not better about me, if you know better describe it.
lbrynaut commented 2018-07-30 19:52:17 +02:00 (Migrated from github.com)

Only for clarification -> "Including a cpp file from another means incorrect/bad organization. That is what I meant by broken." is a silly comment. This not broke anything and it's not forbidden or anything else.

Of course it's not forbidden, but that doesn't mean that it's common or required either. In fact, it would be a first for this entire code base, which is why it stands out.

> Only for clarification -> "Including a cpp file from another means incorrect/bad organization. That is what I meant by broken." is a silly comment. This not broke anything and it's not forbidden or anything else. Of course it's not forbidden, but that doesn't mean that it's common or required either. In fact, it would be a first for this entire code base, which is why it stands out.
bvbfan commented 2018-07-30 19:58:13 +02:00 (Migrated from github.com)

There is not problem to be implemented in other way, but i'm against pattern to described to be bad by default. About me, only better pattern can make other to be marked as bad.

There is not problem to be implemented in other way, but i'm against pattern to described to be bad by default. About me, only better pattern can make other to be marked as bad.
lbrynaut commented 2018-07-30 20:09:08 +02:00 (Migrated from github.com)

There is not problem to be implemented in other way, but i'm against pattern to described to be bad by default. About me, only better pattern can make other to be marked as bad.

Ok. I don't agree though. Most times I've seen cpp files included from others was due to bad organization and/or laziness. Similar to "goto" statements. There's nothing wrong with them, and often you can achieve a lot using them. But I think they are also bad organizationally. What I mean by organization in this sense affects readability and comprehension. There are many ways to implement things that hide other issues (such as linker problems, or basic error checking), but that doesn't mean they're all 'good' to use, even if allowed. In most cases, a slight re-organization can improve readability/flow, which also helps comprehension, which leads to better maintenance, etc.

We may not agree on style, and I'm ok with that. But since we're both working on this code and this is not a common use for the project, I hesitate to introduce it just because we can. My opinion on introducing this is "it's broken", meaning it absolutely should not be introduced. But like I said, my token input may be worthless ;-)

> There is not problem to be implemented in other way, but i'm against pattern to described to be bad by default. About me, only better pattern can make other to be marked as bad. Ok. I don't agree though. Most times I've seen cpp files included from others was due to bad organization and/or laziness. Similar to "goto" statements. There's nothing wrong with them, and often you can achieve a lot using them. But I think they are also bad organizationally. What I mean by organization in this sense affects readability and comprehension. There are many ways to implement things that hide other issues (such as linker problems, or basic error checking), but that doesn't mean they're all 'good' to use, even if allowed. In most cases, a slight re-organization can improve readability/flow, which also helps comprehension, which leads to better maintenance, etc. We may not agree on style, and I'm ok with that. But since we're both working on this code and this is not a common use for the project, I hesitate to introduce it just because we can. My opinion on introducing this is "it's broken", meaning it absolutely should not be introduced. But like I said, my token input may be worthless ;-)
bvbfan commented 2018-07-30 20:30:38 +02:00 (Migrated from github.com)

As i say before this is not a regular cpp file, it's like a template implementation file, since i'm not prefer to separate cpp and impl files i've made a trick to combine them.

As i say before this is not a *regular* cpp file, it's like a template implementation file, since i'm not prefer to separate cpp and impl files i've made a *trick* to combine them.
BrannonKing (Migrated from github.com) reviewed 2018-08-03 16:41:40 +02:00
BrannonKing (Migrated from github.com) commented 2018-08-03 16:41:40 +02:00

@bvbfan , can you explain why these hash methods weren't necessary before but are now?

@bvbfan , can you explain why these hash methods weren't necessary before but are now?
bvbfan (Migrated from github.com) reviewed 2018-08-03 17:06:03 +02:00
bvbfan (Migrated from github.com) commented 2018-08-03 17:06:03 +02:00

Sure. Whole idea behind these hashes is to use as safe as possible db. Till now they a predefined chars 'n', 'r' etc. which was used in claimtrie.cpp now we separated it and calls can be type safe e.g. you can only declare type and don't care what is predefined char, hash or anything else. The first idea was to use boost::type_id but for sake of not generate same hashes in all its versions (e.g. in 1.59 and 1.64 are not same) i aim to provide a proven algorithm to generate same hashes in all compilers and OS'es. That's why unit test matters to prove any type that we need has a unique value.

Sure. Whole idea behind these hashes is to use as safe as possible db. Till now they a predefined chars 'n', 'r' etc. which was used in claimtrie.cpp now we separated it and calls can be type safe e.g. you can only declare type and don't care what is predefined char, hash or anything else. The first idea was to use boost::type_id but for sake of not generate same hashes in all its versions (e.g. in 1.59 and 1.64 are not same) i aim to provide a proven algorithm to generate same hashes in all compilers and OS'es. That's why unit test matters to prove any type that we need has a unique value.
bvbfan (Migrated from github.com) reviewed 2018-08-03 17:16:24 +02:00
bvbfan (Migrated from github.com) commented 2018-08-03 17:16:23 +02:00

Not only this, generic functions prevent code duplication and it does not need to update db implementation every time when new type is introduced. I've notice many code duplication in claimtrie i aim to remove it at all, this kind of implementation do it.

Not only this, generic functions prevent code duplication and it does not need to update db implementation every time when new type is introduced. I've notice many code duplication in claimtrie i aim to remove it at all, this kind of implementation do it.
kaykurokawa commented 2018-08-04 01:40:13 +02:00 (Migrated from github.com)

I think the linker problem where we have to include "claimtriedb.cpp" in claimtrie.cpp is just an unfortunate problem when working with template classes in C++. In one of my past project we put implementations in the header file to get around this. I believe you can also use forward declaration (not sure if that is the right term , see last solution here: http://www.cs.technion.ac.il/users/yechiel/c++-faq/separate-template-class-defn-from-decl.html ) but that looks like the worst solution in this case due to the number of functions and possible types.

So I think in this case, the solution chosen here seems fine given that the other two solutions are equally bad.

The point regarding the new hashing features @lbrynaut brought up is good. Maybe it was a bad idea to change the original simple hashing function (where each type was assigned a unique letter) to this one , considering that we could have avoided a database migration and the code would have been simpler. This is not @bvbvfan 's fault , I said that it would be OK to do this but in hindsight this was not the best idea (I can take responsibility and fix this ).

I think the linker problem where we have to include "claimtriedb.cpp" in claimtrie.cpp is just an unfortunate problem when working with template classes in C++. In one of my past project we put implementations in the header file to get around this. I believe you can also use forward declaration (not sure if that is the right term , see last solution here: http://www.cs.technion.ac.il/users/yechiel/c++-faq/separate-template-class-defn-from-decl.html ) but that looks like the worst solution in this case due to the number of functions and possible types. So I think in this case, the solution chosen here seems fine given that the other two solutions are equally bad. The point regarding the new hashing features @lbrynaut brought up is good. Maybe it was a bad idea to change the original simple hashing function (where each type was assigned a unique letter) to this one , considering that we could have avoided a database migration and the code would have been simpler. This is not @bvbvfan 's fault , I said that it would be OK to do this but in hindsight this was not the best idea (I can take responsibility and fix this ).
bvbfan commented 2018-08-04 10:43:34 +02:00 (Migrated from github.com)

I can propose a fix that will not introduce migration nor cpp include. It can be done by template implicit instantiation of every function like this:
template<> inline char hashType<std::string, CClaimTrieNode>() { return 'n'; } template CClaimTrieDb::getQueueRow<>(const std::string &key, CClaimTrieNode &row); // explicit instantiation
This can be done in claimtrie.db so it will not needed to be include anywhere, when we add new type new explicit instantiation should be provided. I can do that on master and make a new RR if it accept by all of us.

I can propose a fix that will not introduce migration nor cpp include. It can be done by template implicit instantiation of every function like this: `template<> inline char hashType<std::string, CClaimTrieNode>() { return 'n'; } template CClaimTrieDb::getQueueRow<>(const std::string &key, CClaimTrieNode &row); // explicit instantiation ` This can be done in claimtrie.db so it will not needed to be include anywhere, when we add new type new explicit instantiation should be provided. I can do that on master and make a new RR if it accept by all of us.
kaykurokawa commented 2018-08-06 11:44:15 +02:00 (Migrated from github.com)

I think the above new solution looks good to me

I think the above new solution looks good to me
lbrynaut commented 2018-08-06 15:45:23 +02:00 (Migrated from github.com)

So I think in this case, the solution chosen here seems fine given that the other two solutions are equally bad.

I'm aware of the issue, but if we really have to go this route, I think it should be made clear in a more standard way. For example, under the include dir having an impl directory with the implementation defined in .ipp files makes it clear what's going on at least. It's true it's more or less the same thing, but projects like boost chose to do it that way, and a lot of projects have followed with good reason (a de facto standard).

That said, I still don't like the idea of needing these methods in the first place and would prefer to see the solution that does not require the new types at all nor the DB migration. @kaykurokawa The one way migration makes me nervous because usually migrations are done for underlying reasons other than a migration for the sake of it. Second, if the migration simplified the code or allowed us to eliminate unnecessary code, I'd be all on board -- but instead this is the opposite, taking on responsibility for more untested code that adds complexity. If you see the need to go this route, let me know what's on your mind (I still think I'm missing some background on this). Otherwise, I want to avoid all unnecessary code to avoid the future maintenance costs.

> So I think in this case, the solution chosen here seems fine given that the other two solutions are equally bad. I'm aware of the issue, but if we really have to go this route, I think it should be made clear in a more standard way. For example, under the include dir having an impl directory with the implementation defined in .ipp files makes it clear what's going on at least. It's true it's more or less the same thing, but projects like boost chose to do it that way, and a lot of projects have followed with good reason (a de facto standard). That said, I still don't like the idea of needing these methods in the first place and would prefer to see the solution that does not require the new types at all nor the DB migration. @kaykurokawa The one way migration makes me nervous because usually migrations are done for underlying reasons other than a migration for the sake of it. Second, if the migration simplified the code or allowed us to eliminate unnecessary code, I'd be all on board -- but instead this is the opposite, taking on responsibility for more untested code that adds complexity. If you see the need to go this route, let me know what's on your mind (I still think I'm missing some background on this). Otherwise, I want to avoid all unnecessary code to avoid the future maintenance costs.
lbrynaut commented 2018-08-06 16:00:49 +02:00 (Migrated from github.com)

I can propose a fix that will not introduce migration nor cpp include. It can be done by template implicit instantiation of every function like this:

I'd like to review this solution as a PR.

> I can propose a fix that will not introduce migration nor cpp include. It can be done by template implicit instantiation of every function like this: I'd like to review this solution as a PR.
bvbfan commented 2018-08-06 16:00:49 +02:00 (Migrated from github.com)

There is no migrations or cpp/ipp include, see better. The worst case to me is separate directory with ipp files in.

There is no migrations or cpp/ipp include, see better. The worst case to me is separate directory with ipp files in.
bvbfan commented 2018-08-06 17:31:45 +02:00 (Migrated from github.com)

I push it in this branch, now it's able for review.

I push it in this branch, now it's able for review.
bvbfan commented 2018-08-07 06:20:38 +02:00 (Migrated from github.com)

I've perform some style fixes, more to come when CClaimCache is renamed and separated in its own file, in feature RR.

I've perform some style fixes, more to come when CClaimCache is renamed and separated in its own file, in feature RR.
lbrynaut commented 2018-08-07 19:43:40 +02:00 (Migrated from github.com)

@bvbfan I discussed this briefly with @kaykurokawa and here is where I'm at with it:

  1. This is hard to review because of all of the unnecessary style changes (as mentioned earlier, if this comes from clang-format, save formatting for a separate commit/PR, otherwise, this would benefit from reverting those).
  2. I think this an over-engineered solution for the issue at hand, and I'm of the opinion that this is the kind of project that requires minimalistic changes in preference to large sweeping ones to accomplish a particular goal. I'm assuming that you and @kaykurokawa had agreed on some scope of this work, so while it may exactly address what he had in mind, it's not for me to decide.
  3. I see no urgency in merging this for any reason (functionally, etc). While a re-organization would be nice, I don't see this as positively better than existing. There's some benefit to breaking up the code into separate files, and some benefit to templatizing some common functionality, but as-is, this PR doesn't strike me as code we need or want to merge right now.
  4. Instead of going around @kaykurokawa and re-defining the task or asking you to start over and simplify the scope, I'm noting that right now I'm focused on the normalization fork and the upstream code changes, so I'm punting on this and will revisit later if @kaykurokawa requests it -- otherwise, it's in his hands at the moment.

If anyone else thinks this work is urgently needed and @kaykurokawa is unable to add the required input, let me know and I'll re-prioritize.

@bvbfan I discussed this briefly with @kaykurokawa and here is where I'm at with it: 1) This is hard to review because of all of the unnecessary style changes (as mentioned earlier, if this comes from clang-format, save formatting for a separate commit/PR, otherwise, this would benefit from reverting those). 2) I think this an over-engineered solution for the issue at hand, and I'm of the opinion that this is the kind of project that requires minimalistic changes in preference to large sweeping ones to accomplish a particular goal. I'm assuming that you and @kaykurokawa had agreed on some scope of this work, so while it may exactly address what he had in mind, it's not for me to decide. 3) I see no urgency in merging this for any reason (functionally, etc). While a re-organization would be nice, I don't see this as positively better than existing. There's some benefit to breaking up the code into separate files, and some benefit to templatizing some common functionality, but as-is, this PR doesn't strike me as code we need or want to merge right now. 4) Instead of going around @kaykurokawa and re-defining the task or asking you to start over and simplify the scope, I'm noting that right now I'm focused on the normalization fork and the upstream code changes, so I'm punting on this and will revisit later if @kaykurokawa requests it -- otherwise, it's in his hands at the moment. If anyone else thinks this work is urgently needed and @kaykurokawa is unable to add the required input, let me know and I'll re-prioritize.
bvbfan commented 2018-08-07 20:12:01 +02:00 (Migrated from github.com)

For me, existing code base is redundant and it contains duplicate code fragments, RR-s, not only this, solve issues, later ones memory leaks. So after all it's not an over-engineering at all, furthermore it simplifies all logic.

For me, existing code base is redundant and it contains duplicate code fragments, RR-s, not only this, solve issues, later ones memory leaks. So after all it's not an over-engineering at all, furthermore it simplifies all logic.
bvbfan commented 2018-08-08 08:43:06 +02:00 (Migrated from github.com)

Rebase to master, @kaykurokawa if you have a comments or advice please share them otherwise it ready to go in.

Rebase to master, @kaykurokawa if you have a comments or advice please share them otherwise it ready to go in.
kaykurokawa commented 2018-08-20 20:00:40 +02:00 (Migrated from github.com)

Hey, this seems to have some problem syncing, seems to be a new problem that was not present in previous iterations of this PR.

After CClaimTrie::WriteToDisk() is called to write buffers to disk ..

this problem occurs:

2018-08-18 22:25:59 incrementBlock: nCurrentHeight (before increment): 33913
2018-08-18 22:25:59 ERROR: ConnectBlock() : the merkle root of the claim trie does not match (actual=0000000000000000000000000000000000000000000000000000000000000001 vs block=c4827743b4a73ca1b35275805f0564b1fe6d294ea316199173d9039f41fa3f40)
2018-08-18 22:25:59 Misbehaving: 103.100.44.62:9246 (0 -> 100) BAN THRESHOLD EXCEEDE

It seems like the merkle root of the claim trie gets calculated the same as an empty trie after CClaimTrie::WriteToDisk() is called. I'm investigating the problem.

Hey, this seems to have some problem syncing, seems to be a new problem that was not present in previous iterations of this PR. After CClaimTrie::WriteToDisk() is called to write buffers to disk .. this problem occurs: 2018-08-18 22:25:59 incrementBlock: nCurrentHeight (before increment): 33913 2018-08-18 22:25:59 ERROR: ConnectBlock() : the merkle root of the claim trie does not match (actual=0000000000000000000000000000000000000000000000000000000000000001 vs block=c4827743b4a73ca1b35275805f0564b1fe6d294ea316199173d9039f41fa3f40) 2018-08-18 22:25:59 Misbehaving: 103.100.44.62:9246 (0 -> 100) BAN THRESHOLD EXCEEDE It seems like the merkle root of the claim trie gets calculated the same as an empty trie after CClaimTrie::WriteToDisk() is called. I'm investigating the problem.
bvbfan commented 2018-08-20 21:36:38 +02:00 (Migrated from github.com)

You tests with some mock data, unit tests pass when i test them, maybe we should update them.

You tests with some mock data, unit tests pass when i test them, maybe we should update them.
bvbfan commented 2018-08-21 12:45:29 +02:00 (Migrated from github.com)

Can you provide steps to reproduce?

Can you provide steps to reproduce?
kaykurokawa commented 2018-08-22 17:04:49 +02:00 (Migrated from github.com)

I wiped the data directory and started lbrycrdd from scratch.

I wiped the data directory and started lbrycrdd from scratch.
bvbfan commented 2018-08-22 17:18:37 +02:00 (Migrated from github.com)

So i found what you mean i think, so
call WriteToDisk()
then not call ReadFromDisk
lead to not finding claims
I'll investigate if mean that.

So i found what you mean i think, so call WriteToDisk() then not call ReadFromDisk lead to not finding claims I'll investigate if mean that.
bvbfan commented 2018-08-22 20:36:41 +02:00 (Migrated from github.com)

I found the problem

diff --git a/src/claimtrie.cpp b/src/claimtrie.cpp
index e6d7250f..305455c2 100644
--- a/src/claimtrie.cpp
+++ b/src/claimtrie.cpp
@@ -630,7 +630,8 @@ bool CClaimTrie::WriteToDisk()
uint32_t num_claims = pNode ? pNode->claims.size() : 0;
LogPrintf("%s: Writing %s to disk with %d claims\n", __func__, itcache->first, num_claims);
if (pNode) {
-            db.updateQueueRow(itcache->first, *pNode);
+            CClaimTrieNode copy(*pNode);
+            db.updateQueueRow(itcache->first, copy);
} else {
CClaimTrieNode emptyNode;
db.updateQueueRow(itcache->first, emptyNode);

void CClaimTrieDb::updateQueueRow makes a swap on IN parameter (i don't want to change this behavior but it's not used on node) witch cause root to became empty, so with patch above issue is fixed and ready to land. Branch is updated with related patch.

I found the problem ``` diff --git a/src/claimtrie.cpp b/src/claimtrie.cpp index e6d7250f..305455c2 100644 --- a/src/claimtrie.cpp +++ b/src/claimtrie.cpp @@ -630,7 +630,8 @@ bool CClaimTrie::WriteToDisk() uint32_t num_claims = pNode ? pNode->claims.size() : 0; LogPrintf("%s: Writing %s to disk with %d claims\n", __func__, itcache->first, num_claims); if (pNode) { - db.updateQueueRow(itcache->first, *pNode); + CClaimTrieNode copy(*pNode); + db.updateQueueRow(itcache->first, copy); } else { CClaimTrieNode emptyNode; db.updateQueueRow(itcache->first, emptyNode); ``` void CClaimTrieDb::updateQueueRow makes a swap on IN parameter (i don't want to change this behavior but it's not used on node) witch cause root to became empty, so with patch above issue is fixed and ready to land. Branch is updated with related patch.
BrannonKing (Migrated from github.com) reviewed 2018-08-28 01:31:39 +02:00
@ -2705,15 +1848,12 @@ bool CClaimTrieCache::getProofForName(const std::string& name, CClaimTrieProof&
nodes.push_back(node);
current = nextCurrent;
}
BrannonKing (Migrated from github.com) commented 2018-08-28 01:18:06 +02:00

Why do we need to clone this? That looks like a good way to make a memory leak.

Why do we need to clone this? That looks like a good way to make a memory leak.
BrannonKing (Migrated from github.com) commented 2018-08-28 01:22:18 +02:00

I think this method needs to return a read-only collection. Are people going to add to modify that data thinking that they're updating the actual queue? In looking at the code, it appears that we do return the real buffer, but that's only one of three cases. The other two: we read something off disk or we use that createIfNotExists flag.

I think this method needs to return a read-only collection. Are people going to add to modify that data thinking that they're updating the actual queue? In looking at the code, it appears that we do return the real buffer, but that's only one of three cases. The other two: we read something off disk or we use that `createIfNotExists` flag.
BrannonKing (Migrated from github.com) commented 2018-08-28 01:27:27 +02:00

What does this line do if getQueueCacheRow actually called Read?

What does this line do if getQueueCacheRow actually called Read?
BrannonKing (Migrated from github.com) commented 2018-08-28 01:31:12 +02:00

Any way we can do these updates in a for loop instead? We have all the buffers in a collection, no?

Any way we can do these updates in a `for` loop instead? We have all the buffers in a collection, no?
BrannonKing (Migrated from github.com) commented 2018-08-28 01:03:18 +02:00

What if this inherited from std::map instead? You wouldn't need the find and insert methods below. You wouldn't need the dataType stuff. It seems like that would be less code.

What if this inherited from std::map instead? You wouldn't need the find and insert methods below. You wouldn't need the dataType stuff. It seems like that would be less code.
@ -0,0 +126,4 @@
itData = map->push_back(std::make_pair(key, V()));
}
std::swap(itData->second, row);
}
BrannonKing (Migrated from github.com) commented 2018-08-28 01:05:21 +02:00

If you were using a real map here, the above four lines would turn into a single map->at(key) = row;. That seems more straightforward to me. Are you intentionally trying to return the old row here? If so, it should use the method return. The way that's coded is going to lead to unintended consequences.

If you were using a real map here, the above four lines would turn into a single `map->at(key) = row;`. That seems more straightforward to me. Are you intentionally trying to return the old row here? If so, it should use the method return. The way that's coded is going to lead to unintended consequences.
bvbfan (Migrated from github.com) reviewed 2018-08-28 07:09:25 +02:00
bvbfan (Migrated from github.com) commented 2018-08-28 07:09:25 +02:00

Map is ordered i.e. it'll reorder every insert, list benefits speed over it O(log N) vs O(N)

Map is ordered i.e. it'll reorder every insert, list benefits speed over it O(log N) vs O(N)
bvbfan (Migrated from github.com) reviewed 2018-08-28 07:10:31 +02:00
@ -0,0 +126,4 @@
itData = map->push_back(std::make_pair(key, V()));
}
std::swap(itData->second, row);
}
bvbfan (Migrated from github.com) commented 2018-08-28 07:10:31 +02:00

Yeah, but i don't think it's worst, no?

Yeah, but i don't think it's worst, no?
bvbfan (Migrated from github.com) reviewed 2018-08-28 07:11:38 +02:00
@ -2705,15 +1848,12 @@ bool CClaimTrieCache::getProofForName(const std::string& name, CClaimTrieProof&
nodes.push_back(node);
current = nextCurrent;
}
bvbfan (Migrated from github.com) commented 2018-08-28 07:11:38 +02:00

It should, i'll provide memleak free RR next to this.

It should, i'll provide memleak free RR next to this.
bvbfan (Migrated from github.com) reviewed 2018-08-28 07:13:52 +02:00
bvbfan (Migrated from github.com) commented 2018-08-28 07:13:51 +02:00

It private one, so it safe to return iterator over second argument, which is map.

It private one, so it safe to return iterator over second argument, which is map.
bvbfan (Migrated from github.com) reviewed 2018-08-28 07:17:21 +02:00
bvbfan (Migrated from github.com) commented 2018-08-28 07:17:21 +02:00

CClaimTrieCache::update has a for loop.

CClaimTrieCache::update has a for loop.
bvbfan (Migrated from github.com) reviewed 2018-08-28 07:22:02 +02:00
bvbfan (Migrated from github.com) commented 2018-08-28 07:22:01 +02:00

Yeah, but when it's found it added to map then it's erased from cache map.

Yeah, but when it's found it added to map then it's erased from cache map.
BrannonKing (Migrated from github.com) reviewed 2018-08-28 15:15:33 +02:00
BrannonKing (Migrated from github.com) commented 2018-08-28 15:15:33 +02:00

Your current use of a list is O(1) insertion time; it's the lookup time that suffers. Once we get the upstream c++11, we could use an unordered_list, which is definitely faster. However, the number of buffers we have is small; none of these solutions are going to be a performance problem. Hence, I prefer the "less code" option.

Your current use of a list is O(1) insertion time; it's the lookup time that suffers. Once we get the upstream c++11, we could use an unordered_list, which is definitely faster. However, the number of buffers we have is small; none of these solutions are going to be a performance problem. Hence, I prefer the "less code" option.
BrannonKing (Migrated from github.com) reviewed 2018-08-28 15:16:14 +02:00
@ -0,0 +126,4 @@
itData = map->push_back(std::make_pair(key, V()));
}
std::swap(itData->second, row);
}
BrannonKing (Migrated from github.com) commented 2018-08-28 15:16:14 +02:00

I do think it's pretty bad; it's what caused the problem that Kay found, right?

I do think it's pretty bad; it's what caused the problem that Kay found, right?
BrannonKing (Migrated from github.com) reviewed 2018-08-28 15:17:57 +02:00
BrannonKing (Migrated from github.com) commented 2018-08-28 15:17:57 +02:00

CDBWrapper::Read does not automatically add to the cache as far as I can tell.

CDBWrapper::Read does not automatically add to the cache as far as I can tell.
bvbfan (Migrated from github.com) reviewed 2018-08-28 15:29:06 +02:00
@ -0,0 +126,4 @@
itData = map->push_back(std::make_pair(key, V()));
}
std::swap(itData->second, row);
}
bvbfan (Migrated from github.com) commented 2018-08-28 15:29:06 +02:00

I don't have good reason for swap, but follows current behavior.

I don't have good reason for swap, but follows current behavior.
bvbfan (Migrated from github.com) reviewed 2018-08-28 15:32:34 +02:00
bvbfan (Migrated from github.com) commented 2018-08-28 15:32:34 +02:00

getQueueCacheRow put it to cache, line:1079, this is safe, it ease from cache.

getQueueCacheRow put it to cache, line:1079, this is safe, it ease from cache.
bvbfan commented 2018-08-29 07:21:24 +02:00 (Migrated from github.com)

Update unit test from @kaykurokawa and adjust @lbrynaut fix in #197 + perform clang-format on affected files.

Update unit test from @kaykurokawa and adjust @lbrynaut fix in #197 + perform clang-format on affected files.
bvbfan commented 2018-08-31 12:36:53 +02:00 (Migrated from github.com)

Fix clang issue.

Fix clang issue.
bvbfan commented 2018-09-04 16:18:38 +02:00 (Migrated from github.com)

Rebase to resolve conflicts.

Rebase to resolve conflicts.
BrannonKing commented 2018-09-14 16:18:13 +02:00 (Migrated from github.com)

@lbrynaut has requested that we hold on this until his upstream merge changes come in.

@lbrynaut has requested that we hold on this until his upstream merge changes come in.

Pull request closed

Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lbrycrd#175
No description provided.