Fix a crash bug on node restart when blocks are disconnected #109

Closed
lbrynaut wants to merge 247 commits from iter-bug into master
lbrynaut commented 2018-03-21 18:26:31 +01:00 (Migrated from github.com)

While looking at related parts, I ran into this uninitialized iterator bug. The expiration queue is empty on restart, so disconnecting a block forces a crash since we're using an uninitialized iterator in that case.

While looking at related parts, I ran into this uninitialized iterator bug. The expiration queue is empty on restart, so disconnecting a block forces a crash since we're using an uninitialized iterator in that case.
kaykurokawa (Migrated from github.com) reviewed 2018-03-21 18:26:31 +01:00
kaykurokawa commented 2018-03-26 21:09:10 +02:00 (Migrated from github.com)

Hmm.. this shouldn't happen I think and means that there's some deeper problem. If itQueueRow != expirationQueueCache.end() as in line 1675 that means that we can't find the expiration entry for a claim which should always exist. If not in cache, it should be able to look into disk and find it CClaimTrieCache::getExpirationQueueCacheRow() will call CClaimTrie::getExpirationQueueRow() which will look on disk for entry if needed)

How would I reproduce this error, would the below step work?

1)on regtest make a claim
2) increment block
2)Exit lbrycrdd
3)Decrement the block

Thanks for finding this.

Hmm.. this shouldn't happen I think and means that there's some deeper problem. If itQueueRow != expirationQueueCache.end() as in line 1675 that means that we can't find the expiration entry for a claim which should always exist. If not in cache, it should be able to look into disk and find it CClaimTrieCache::getExpirationQueueCacheRow() will call CClaimTrie::getExpirationQueueRow() which will look on disk for entry if needed) How would I reproduce this error, would the below step work? 1)on regtest make a claim 2) increment block 2)Exit lbrycrdd 3)Decrement the block Thanks for finding this.
kaykurokawa commented 2018-04-01 22:48:01 +02:00 (Migrated from github.com)

hmm couldn't reproduce this using the steps above, (used this script: https://gist.github.com/kaykurokawa/bdecc79ac36b3a8af148fb43bdaefb01 )

I wonder if this is something related to expirations not working properly...

hmm couldn't reproduce this using the steps above, (used this script: https://gist.github.com/kaykurokawa/bdecc79ac36b3a8af148fb43bdaefb01 ) I wonder if this is something related to expirations not working properly...
lbrynaut commented 2018-04-02 14:23:00 +02:00 (Migrated from github.com)

This was found/fixed using mainnet master (no modifications). It's possible it's related to the chain movement of the time, but I'm fairly certain that if I re-build master, it won't be too long before I can hit this again. If I do, I'll post the stack trace.

This was found/fixed using mainnet master (no modifications). It's possible it's related to the chain movement of the time, but I'm fairly certain that if I re-build master, it won't be too long before I can hit this again. If I do, I'll post the stack trace.
lbrynaut commented 2018-04-02 15:12:00 +02:00 (Migrated from github.com)

There was a time, every startup crashed with this, across multiple rebuilds and syncs. Today, it's not showing this behaviour. I suspect it's a real bug (and the code is definitely unchecked/unsafe) without the fix, but I'm ok kicking the can on this for now.

There was a time, every startup crashed with this, across multiple rebuilds and syncs. Today, it's not showing this behaviour. I suspect it's a real bug (and the code is definitely unchecked/unsafe) without the fix, but I'm ok kicking the can on this for now.
kaykurokawa commented 2018-04-02 15:56:28 +02:00 (Migrated from github.com)

I think adding additional tests for claim expiration in https://github.com/lbryio/lbrycrd/pull/112 may reveal something, I will work on that.

I think adding additional tests for claim expiration in https://github.com/lbryio/lbrycrd/pull/112 may reveal something, I will work on that.
kaykurokawa commented 2018-04-13 16:25:26 +02:00 (Migrated from github.com)

I will develop a test for this to reproduce on the block height around the time lbrynaut ran into this problem.

I will develop a test for this to reproduce on the block height around the time lbrynaut ran into this problem.
BrannonKing (Migrated from github.com) approved these changes 2018-07-18 15:23:53 +02:00
BrannonKing (Migrated from github.com) left a comment

ship it

ship it
BrannonKing commented 2018-07-26 19:39:06 +02:00 (Migrated from github.com)

Come to find out: this change was already checked in back in March 2018. I vote we close this PR. I think what we were seeing was just the affect of the way it's coded: it uses a remove-the-old-if-it-exists-before-inserting-the-new approach. I don't think that's a problem.

Come to find out: this change was already checked in back in March 2018. I vote we close this PR. I think what we were seeing was just the affect of the way it's coded: it uses a remove-the-old-if-it-exists-before-inserting-the-new approach. I don't think that's a problem.
lbrynaut commented 2018-07-26 19:45:19 +02:00 (Migrated from github.com)

Yes, the change was probably applied in one of the other branches that I built on top of. It fixes an issue, but we're not sure it alone fixes the related issue documented here (see #134 also).

Yes, the change was probably applied in one of the other branches that I built on top of. It fixes an issue, but we're not sure it alone fixes the related issue documented here (see #134 also).

Pull request closed

Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lbrycrd#109
No description provided.