Fix incorrect usage of the claim index #197
No reviewers
Labels
No labels
area: devops
area: discovery
area: docs
area: livestream
area: proposal
consider soon
Epic
good first issue
hacktoberfest
hard fork
help wanted
icebox
Invalid
level: 0
level: 1
level: 2
level: 3
level: 4
needs: exploration
needs: grooming
needs: priority
needs: repro
needs: tech design
on hold
priority: blocker
priority: high
priority: low
priority: medium
resilience
soft fork
Tom's Wishlist
type: bug
type: discussion
type: improvement
type: new feature
type: refactor
type: task
type: testing
unplanned
work in progress
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: LBRYCommunity/lbrycrd#197
Loading…
Reference in a new issue
No description provided.
Delete branch "claimindex-fix"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fix incorrect usage of the claim index so that we can properly maintain state through re-orgs.
Props to @shyba, who created #195 with some code that made it easy to track this down.
So the claimId passed in here isn't always utilized/enforced?
@ -1357,7 +1362,6 @@ bool CClaimTrieCache::insertClaimIntoTrie(const std::string& name, CClaimValue c
{
fChanged = true;
currentNode->insertClaim(claim);
Is this working around a problem in the base class? Why not fix it there?
Do "updated" claims have the same problem?
@ -1357,7 +1362,6 @@ bool CClaimTrieCache::insertClaimIntoTrie(const std::string& name, CClaimValue c
{
fChanged = true;
currentNode->insertClaim(claim);
No.
Not sure what you're asking.
No.
I was wondering why we have to verify the returned claimId (just below this) when we requested a specific one here.
We don't. I added logging in case of corruption (as in current code). That shouldn't ever happen.
You change behavior lake that,
will result in
Why it is needed, can explain with example, it looks like you fix it in not right direction, about me.
It's working fine so far. No claims missing and fallback yet never triggered. I will keep updating in the following days, but the way it looks now seems fixed. Thank you!
What I noted there in my comment of how this is applied is just that if a block is disconnected and then re-connected, deleting before adding back is correct. If you add before removing in that case, it's a bug.
@bvbfan What you pointed out is incorrect. In the case that a block is disconnected and then connected, the order is
delete
,add
. If a block is connected and then disconnected, the order is stilladd
,delete
. Anadd
in general never prompts adelete
, so thinking of them simply being reversed is not right. Also, the actual bug has nothing to do with ordering, that was purely an implementation detail on the fix.@lyoshenka, the previous claim index code is definitely causing corruption. No worry of a fork, just missing claims from the index (they are still retrievable in other, slower ways). Most users won't notice this, but we should consider a release advising a re-index on the sooner side.
I'll give @kaykurokawa through Monday to review and then I'm going to merge this. @shyba After that, please don't forget to PR the unit test update on top.
hmm I'm a bit confused here .. what is the issue that is being corrected here? Can you describe in pseudo-code or write a unit test for it?
I was looking at this too and I think there's some problem where we are deciding to add stuff to the claim index and removing from the claim index. Don't we need to add a claim below this line :
abd9e02c7c/src/claimtrie.cpp (L1371)
?This unit test below (can be added to claimtriebranching_tests.cpp) shows the problem caused by the above line where if we make two claims to the same name, we spend it on the next block, and than decrement that block, we only get one claim back on that name.
Also, its a bit weired because claims get added to the claim index when they are queued up (in addClaimToQueues), and also when they are inserted into the claim trie. So the claim gets added twice to the claim index, could that cause a problem?
@kaykurokawa i've update my branch with your unit test, it's not affected by this issue, because i'm not writing index direct to levelDB but enqueued it
2d94bd6a2f (diff-f8129adb47c1b30eac7d2795b17883edR517)
and update with empty index
2d94bd6a2f (diff-f8129adb47c1b30eac7d2795b17883edR523)
then check against it
2d94bd6a2f (diff-f8129adb47c1b30eac7d2795b17883edR529)
Tests pass
@shyba you can try it on https://github.com/lbryio/lbrycrd/pull/175 or to checkout branch separate_disk_claim_rebased
@lbrynaut @kaykurokawa let me know what you decide once you've synced on this. we can do a release as soon as you're ready
@kaykurokawa Thanks, I'll take a look at this tomorrow, although it's unrelated to the bug/issue being fixed.
Without a mechanism to only apply the index state changes on flush, we permanently lose any removed elements from the index on every disconnect/re-org (non-flushed).
I don't understand this. How is it being added twice? It shouldn't be, so maybe I've missed something. Also, there is no issue if that was the case, other than performance/wasted cycles.
@kaykurokawa Ah, I see what you mean and will also look at. I think it could be simplified, but don't think it's incorrect.
@kaykurokawa This test appears to pass for me using the changes here. I'll look into further.
@kaykurokawa I added the test as you provided to sanity check against travis, which also appears to pass. I'll probably back it out in case you want to modify or clean-it up a bit though. Am I missing something on this?
ah sorry, yea that's not the right test ignore that... I think I've figured out what's the problem here
So the issue here I think is that we need to make a proper logic for claim addition and removal.
That I believe is the root of the problem, reorg/disconnects should not cause issues as long as we handle this.
So right now there is logic for removing/adding things from the claim index in the function insertIntoClaimTrie() and removeClaimFromTrie().
However, there are cases for this when the logic fails. For removeClaimFromTrie(), you don't want to remove claims from the claim index when a claim is moved from the trie into the claim queue (where claims are kept which have not activated due to activation delay). This happens when undoing a claim activation in a block decrement. We are not handling this properly and I think this is the reason why we are losing claims (see unit test below to reproduce this).
For insertClaimIntoTrie(), you don't want to add claims to the claim index when claims are moved from the queue into the claim trie (because we already add claims to the claim index when they are put into the claim queue). This happens for all claims when they are activated (due to activation delay).
So basically, the function removeClaimFromTrie() and insertClaimIntoTrie() should not be making decisions about whether to add things to the claim index or not. Below, I tried to describe where we should add and remove from the claim index.
When calling ConnectBlock() in main.cpp
When calling DisconnectBlock() in main.cpp:
Below is the unit test which shows that this PR does not handle properly the case where a claim is merely deactivated in a block decrement but is removed from the claim index.
@kaykurokawa Awesome, thanks, will review a bit more today and comment on this further.
@kaykurokawa I agree with the usage above -- nice work! The original claim index usage was broken for more cases than I thought, which that test clearly illustrates via your points on where it fails to do the right thing. However, this still doesn't fully address the issue being fixed here because just following this logic loses claims as well, during say VerifyDB calls where it does 'memory only' disconnects which actually loses real claims (as opposed to a true re-org which does DisconnectTip which includes a flush after disconnect). Maybe my original wording was bad or unclear?
A combination of our two approaches solves it in testing here.
To summarize:
I think you have pointed out that the claim index was being used incorrectly, which we agreed on, though you caught more broken cases than me. 👍 This is more correct now ... but ...
This PR was trying to address incorrect usage, but mainly in order to fix losing claims (at least that was my initial motivation from speaking with @shyba). To do that, it only applied claim index changes on flush (regardless of the logic that decided those changes).
It's not obvious to find in our current test setup, because we always decrement via Invalidate, which calls DisconnectTip.
This PR is updated with the combination approach I'm proposing (as well as both test cases you've written). If you don't agree or think I've overlooked something else, please let me know.
This is empty claim, assert points to nothing in release build, i.e. removeClaimFromTrie will not get called.
@bvbfan Unless you've modified the code and build process, this isn't true.
abd9e02c7c/src/main.cpp (L51)
I don't see it. Makes me confusing now.
@lbrynaut but can you make insertations inside insertClaimIntoTrie and removeClaimFromTrie otherwise it looks confusing, don't you think?
Hrm, will look at the travis issue later. It might be running slowly, as it hasn't tried to build the latest rebased commit but reports it as failed.
Please read the above discussion.
That's a c++11 construct.
It looks like the claim parameter is both an in and an out parameter, but you're not passing a valid one in. It appears that way to me also because the internal removeClaim does a swap, i.e, it expects valid data to be there. (This refers to the remofveClaimFromTrie call.)
I don't use the c++11 compiler option. C++11 could be like this:
Good catch, I'll look into further. Odd that it works here, probably compiler version dependent.
Ah, so the reason this works is because pre-C++11 std::swap is copy-constructable and assignable. With C++11, it would use move semantics (move-constructible and move-assignable) and you would be correct to see what you're seeing.
So good catch, but no immediate fix necessary. Agreed?
Yes, I think it is safe. In looking at it again, the swap in
CClaimTrieNode::removeClaim
doesn't actually need to be there as the claim is erased right after the swap. We could just as easily doclaim = *itClaims;
and avoid the confusion.Right, that's exactly what it does in place of the swap. There are several cases that need updating, not just that one. I'll make sure to update them in the upstream merge if I haven't already.
The source of problem are activated claims, as @kaykurokawa pointed out, they bother API when it's performed Increment / Decrement block and it can lie to you, i mean:
IncrementBlock
DecrementBlock is on opposite, this looks easy to misread. Should we make API more suggestive?
Over time, probably. Re-writing the claimtrie interface is not related to this PR.
Untested but LGTM,
yea, API can def be improved let's keep this in mind or start new issue.
I'd like to just clean up the tests (improve comments), but I can do that in a separate PR later.
Will merge pending testing from @shyba
Tested locally by adding the commit that verifies integrity as a whole and did some tests like restarting, worked. Removed the helper commit, deployed. So far so good! LGTM
It looks like the no-NDEBUG requirement hasn't yet changed upstream either. I'm surprised, as using asserts to accomplish real work is generally considered an anti-pattern.