Fix a crash bug on node restart when blocks are disconnected #109
No reviewers
Labels
No labels
area: devops
area: discovery
area: docs
area: livestream
area: proposal
consider soon
Epic
good first issue
hacktoberfest
hard fork
help wanted
icebox
Invalid
level: 0
level: 1
level: 2
level: 3
level: 4
needs: exploration
needs: grooming
needs: priority
needs: repro
needs: tech design
on hold
priority: blocker
priority: high
priority: low
priority: medium
resilience
soft fork
Tom's Wishlist
type: bug
type: discussion
type: improvement
type: new feature
type: refactor
type: task
type: testing
unplanned
work in progress
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: LBRYCommunity/lbrycrd#109
Loading…
Reference in a new issue
No description provided.
Delete branch "iter-bug"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
While looking at related parts, I ran into this uninitialized iterator bug. The expiration queue is empty on restart, so disconnecting a block forces a crash since we're using an uninitialized iterator in that case.
Hmm.. this shouldn't happen I think and means that there's some deeper problem. If itQueueRow != expirationQueueCache.end() as in line 1675 that means that we can't find the expiration entry for a claim which should always exist. If not in cache, it should be able to look into disk and find it CClaimTrieCache::getExpirationQueueCacheRow() will call CClaimTrie::getExpirationQueueRow() which will look on disk for entry if needed)
How would I reproduce this error, would the below step work?
1)on regtest make a claim
2) increment block
2)Exit lbrycrdd
3)Decrement the block
Thanks for finding this.
hmm couldn't reproduce this using the steps above, (used this script: https://gist.github.com/kaykurokawa/bdecc79ac36b3a8af148fb43bdaefb01 )
I wonder if this is something related to expirations not working properly...
This was found/fixed using mainnet master (no modifications). It's possible it's related to the chain movement of the time, but I'm fairly certain that if I re-build master, it won't be too long before I can hit this again. If I do, I'll post the stack trace.
There was a time, every startup crashed with this, across multiple rebuilds and syncs. Today, it's not showing this behaviour. I suspect it's a real bug (and the code is definitely unchecked/unsafe) without the fix, but I'm ok kicking the can on this for now.
I think adding additional tests for claim expiration in https://github.com/lbryio/lbrycrd/pull/112 may reveal something, I will work on that.
I will develop a test for this to reproduce on the block height around the time lbrynaut ran into this problem.
ship it
Come to find out: this change was already checked in back in March 2018. I vote we close this PR. I think what we were seeing was just the affect of the way it's coded: it uses a remove-the-old-if-it-exists-before-inserting-the-new approach. I don't think that's a problem.
Yes, the change was probably applied in one of the other branches that I built on top of. It fixes an issue, but we're not sure it alone fixes the related issue documented here (see #134 also).
Pull request closed