lbrycrdd block check performance degrades non-linearly with txset #389
Labels
No labels
area: devops
area: discovery
area: docs
area: livestream
area: proposal
consider soon
Epic
good first issue
hacktoberfest
hard fork
help wanted
icebox
Invalid
level: 0
level: 1
level: 2
level: 3
level: 4
needs: exploration
needs: grooming
needs: priority
needs: repro
needs: tech design
on hold
priority: blocker
priority: high
priority: low
priority: medium
resilience
soft fork
Tom's Wishlist
type: bug
type: discussion
type: improvement
type: new feature
type: refactor
type: task
type: testing
unplanned
work in progress
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: LBRYCommunity/lbrycrd#389
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Background
I run Luxor mining pool. Since ~5/21 we've seen larger numbers of transactions on the network, which is a good thing, but we've seen this impact the performance of lbrycrdd significantly.
Issue
Blocks should not take >1s to Connect. If they do, this causes a huge problem for miners, especially for short block times like Lbry.
Expectation
Performance in the same order of magnitude as BTC -
Bench output from lbrycrdd for block 771809
Bench output for a similar block from btc (height 632057)
The most concerning problem is the tx validation speed. You'll notice it's 50x slower on LBRY than BTC.
Further, it seems to get worse as more transactions are added. Notice lbry block 771819 only has 55, and processes at 0.493 ms / txin.
Verify 55 txins: 27.14ms (0.493ms/txin)
This leads me to believe there is some non-linear regression
Reproducer
I can reliably reproduce this issue by running the lbrycrdd daemon with -debug=bench and see how long the block connect takes.
Version:
LBRYcrd Core Daemon version v0.17.3.2-be118de
Machine
2015 Macbook Pro, 2.8 GHz Quad-Core Intel Core i7
16 GB 1600 MHz DDR3
Config
Default config.
@nitronick600 , I can look into this. In the meantime will you please run the same timing test with v0.17.4.5 and v0.19.1.2 ? You will have to reindex once when going to 17.4.5 but not when going to v19 from there. Run with -dbcache=1200 on those.
Alright, I'll run the reindex now for 17.4.5
If I recall correctly, we need to run v0.17.3.2 because of a consensus issue in the later versions?
This is also causing the
getblocktemplate
RPC to take multiple seconds to return a value.The consensus issue in 17.4.x is repaired in 17.4.5. It is marked as pre-release so that we could get some more internal miles on it, increase our confidence in it.
"CreateNewBlock() packages: 1.56ms (457 packages, 19 updated descendants), validity: 6329.01ms (total 6330.57ms)"
This block generation time is pretty common now.
I want to clarify a few things:
computeNodeHash
is most expensive function as well asgetMerkleHash
according to Valgrind cachegrind. Also if we move connected transactions bench log before merkle hash check, we can notice merkle is a 2 to 10x slowerVerify is exactly how long is merkle hash computation, that's on v17_master, sqlite backend.
If i recall my attempt to use one query with step/final function, i remember that step function is called in unordered way, despite query is ordered, looks like it's called before actual result ordering, that results in using a map/flat_map to have children ordered then map to vector conversion or anther computation function taking a map. So i left the approach.
cachegrind.zip
On block 771091 we have 5.5 sec. on merkle hash.
I backport my query from new hash PR, where it has a children present check, it saves some time but it still slow. We should improve merkle hash query somehow.
debug.log
I made a branch of the backport query
merkle_improve
, @BrannonKing you can check it.17.4.5 doesn't seem any better:
Will try 19.1.2 next
@nitronick600 If you can compile (clone branch
merkle_improve
follow the instructions on readme) and try it.19.1.2 isn't any better either:
As an aside, it took almost 15 minutes to startup.
Have we been able to make any progress on this?
We have progress on, #390 was merged to v17_master which makes things faster. You can compile by yourself or wait for release, if you use HDD you may interested to try to increase db cache,
-dbcache=8192
@nitronick600 you can test new release https://github.com/lbryio/lbrycrd/releases/tag/v0.17.4.6
We're running the branch now
merkle_improve
branch now, but I don't have enough info to say if this fixes the problem. It does seem that blocks are much faster to verify; will get back to you onsubmitblock
In the release 0.17.4.6 has another one improvement especially indexing by claim id instead of claim name which gives another boost. So the release is faster than merkle_improve.
Will this require a reindex?
No, just use new executable. Memory usage is lowered by default if you prefer to use higher memory (will be faster)
-dbcache=4096
as command line argument.