Slow responses and out of memory issues #277

Closed
opened 2019-05-19 17:07:57 +02:00 by cod3gen · 11 comments
cod3gen commented 2019-05-19 17:07:57 +02:00 (Migrated from github.com)

Just updated to the latest master of lbrycrd a couple days ago, but have not been able to make it stable. Daemon will shutdown after a couple rpc calls to getclaimtrie and/or getblocktemplate, and all calls are generally very slow to execute, espescially getclaimtrie which takes from 60000-80000ms.

Daemon shuts down for apparently no good reason, but every now and then i see Error: Out of memory. Terminating. as last entry in log, sometimes its just nothing indicating that the daemon have shut down in the logs.

2019-05-19T13:16:38Z addSupportToQueues: nValidAtHeight: 570138
2019-05-19T13:16:38Z UpdateTip: new best=d8cf51c89206d1ff70066c4b05074ac2174486bd0c65a0362a796aa8f91ad5ce height=570138 version=0x20000000 log2_work=72.799238 tx=4053388 date='2019-05-19T13:15:24Z' progress=0.989165 cache=120.5MiB(550771txo)
2019-05-19T13:16:38Z [default wallet] AddToWallet 922459c27b7a3cd0ab02a7fd6c943a873ab03b020154fb4368409f0d1f4fd485  update
2019-05-19T13:16:42Z Error: Out of memory. Terminating.

Sometimes getblocktemplate response is error -1: createnewblock: testblockvalidity failed: bad-claim-merkle-hash (code 16) which usually is gone within a couple seconds.

build: LBRYcrd Core version v0.17.1.0-55f5f2049 (release build)
os: Ubuntu 18.04.1 LTS
Available RAM is above 40GB, Server load is less than 1.

2019-05-19T13:20:29Z LBRYcrd Core version v0.17.1.0-55f5f2049 (release build)
2019-05-19T13:20:29Z InitParameterInteraction: parameter interaction: -whitelistforcerelay=1 -> setting -whitelistrelay=1
2019-05-19T13:20:29Z Assuming ancestors of block a6bbb48f5343eb9b0287c22f3ea8b29f36cf10794a37f8a925a894d6f4519913 have valid signatures.
2019-05-19T13:20:29Z Setting nMinimumChainWork=000000000000000000000000000000000000000000000000607ca7e806c4c1e9
2019-05-19T13:20:29Z Using the 'sse4(1way),sse41(4way),avx2(8way)' SHA256 implementation
2019-05-19T13:20:29Z Using RdRand as an additional entropy source
2019-05-19T13:20:29Z Default data directory /home/lbry/.lbrycrd
2019-05-19T13:20:29Z Using data directory /home/lbry/.lbrycrd
2019-05-19T13:20:29Z Using config file /home/lbry/.lbrycrd/lbrycrd.conf
2019-05-19T13:20:29Z Using at most 125 automatic connections (1024 file descriptors available)
2019-05-19T13:20:29Z Using 16 MiB out of 32/2 requested for signature cache, able to store 524288 elements
2019-05-19T13:20:29Z Using 16 MiB out of 32/2 requested for script execution cache, able to store 524288 elements
2019-05-19T13:20:29Z Using 12 threads for script verification
2019-05-19T13:20:29Z scheduler thread start
2019-05-19T13:20:29Z Binding RPC on address 0.0.0.0 port 23301 failed.
2019-05-19T13:20:29Z HTTP: creating work queue of depth 16
2019-05-19T13:20:29Z Config options rpcuser and rpcpassword will soon be deprecated. Locally-run instances may remove rpcuser to use cookie-based auth, or may be replaced with rpcauth. Please see share/rpcauth for rpcauth auth generation.
2019-05-19T13:20:29Z HTTP: starting 12 worker threads
2019-05-19T13:20:29Z Using wallet directory /home/lbry/.lbrycrd
2019-05-19T13:20:29Z init message: Verifying wallet(s)...
2019-05-19T13:20:29Z Using BerkeleyDB version Berkeley DB 4.8.30: (April  9, 2010)
2019-05-19T13:20:29Z Using wallet wallet.dat
2019-05-19T13:20:29Z BerkeleyEnvironment::Open: LogDir=/home/lbry/.lbrycrd/database ErrorFile=/home/lbry/.lbrycrd/db.log

For now i have switched back to release v0.12.4.0 which is stable.

Just updated to the latest master of lbrycrd a couple days ago, but have not been able to make it stable. Daemon will shutdown after a couple rpc calls to getclaimtrie and/or getblocktemplate, and all calls are generally very slow to execute, espescially getclaimtrie which takes from 60000-80000ms. Daemon shuts down for apparently no good reason, but every now and then i see `Error: Out of memory. Terminating.` as last entry in log, sometimes its just nothing indicating that the daemon have shut down in the logs. ``` 2019-05-19T13:16:38Z addSupportToQueues: nValidAtHeight: 570138 2019-05-19T13:16:38Z UpdateTip: new best=d8cf51c89206d1ff70066c4b05074ac2174486bd0c65a0362a796aa8f91ad5ce height=570138 version=0x20000000 log2_work=72.799238 tx=4053388 date='2019-05-19T13:15:24Z' progress=0.989165 cache=120.5MiB(550771txo) 2019-05-19T13:16:38Z [default wallet] AddToWallet 922459c27b7a3cd0ab02a7fd6c943a873ab03b020154fb4368409f0d1f4fd485 update 2019-05-19T13:16:42Z Error: Out of memory. Terminating. ``` Sometimes getblocktemplate response is `error -1: createnewblock: testblockvalidity failed: bad-claim-merkle-hash (code 16)` which usually is gone within a couple seconds. build: LBRYcrd Core version v0.17.1.0-55f5f2049 (release build) os: Ubuntu 18.04.1 LTS Available RAM is above 40GB, Server load is less than 1. ``` 2019-05-19T13:20:29Z LBRYcrd Core version v0.17.1.0-55f5f2049 (release build) 2019-05-19T13:20:29Z InitParameterInteraction: parameter interaction: -whitelistforcerelay=1 -> setting -whitelistrelay=1 2019-05-19T13:20:29Z Assuming ancestors of block a6bbb48f5343eb9b0287c22f3ea8b29f36cf10794a37f8a925a894d6f4519913 have valid signatures. 2019-05-19T13:20:29Z Setting nMinimumChainWork=000000000000000000000000000000000000000000000000607ca7e806c4c1e9 2019-05-19T13:20:29Z Using the 'sse4(1way),sse41(4way),avx2(8way)' SHA256 implementation 2019-05-19T13:20:29Z Using RdRand as an additional entropy source 2019-05-19T13:20:29Z Default data directory /home/lbry/.lbrycrd 2019-05-19T13:20:29Z Using data directory /home/lbry/.lbrycrd 2019-05-19T13:20:29Z Using config file /home/lbry/.lbrycrd/lbrycrd.conf 2019-05-19T13:20:29Z Using at most 125 automatic connections (1024 file descriptors available) 2019-05-19T13:20:29Z Using 16 MiB out of 32/2 requested for signature cache, able to store 524288 elements 2019-05-19T13:20:29Z Using 16 MiB out of 32/2 requested for script execution cache, able to store 524288 elements 2019-05-19T13:20:29Z Using 12 threads for script verification 2019-05-19T13:20:29Z scheduler thread start 2019-05-19T13:20:29Z Binding RPC on address 0.0.0.0 port 23301 failed. 2019-05-19T13:20:29Z HTTP: creating work queue of depth 16 2019-05-19T13:20:29Z Config options rpcuser and rpcpassword will soon be deprecated. Locally-run instances may remove rpcuser to use cookie-based auth, or may be replaced with rpcauth. Please see share/rpcauth for rpcauth auth generation. 2019-05-19T13:20:29Z HTTP: starting 12 worker threads 2019-05-19T13:20:29Z Using wallet directory /home/lbry/.lbrycrd 2019-05-19T13:20:29Z init message: Verifying wallet(s)... 2019-05-19T13:20:29Z Using BerkeleyDB version Berkeley DB 4.8.30: (April 9, 2010) 2019-05-19T13:20:29Z Using wallet wallet.dat 2019-05-19T13:20:29Z BerkeleyEnvironment::Open: LogDir=/home/lbry/.lbrycrd/database ErrorFile=/home/lbry/.lbrycrd/db.log ``` For now i have switched back to release v0.12.4.0 which is stable.
tzarebczan commented 2019-05-19 17:13:03 +02:00 (Migrated from github.com)

Thanks for opening this issue! Can we show you some appreciation for your time? We'll take a look into this.

We recently merged lots of Bitcoin upstream changes and I'm guessing there are still a few kinks to work out. I'll give it a shot sometime this week also. Would you mind trying to backup your current data (besides wallet) and syncing from scratch to see if that makes a difference? Would be a good test.

Thanks for opening this issue! Can we show you [some appreciation](https://LBRY.com/FAQ/appreciation) for your time? We'll take a look into this. We recently merged lots of Bitcoin upstream changes and I'm guessing there are still a few kinks to work out. I'll give it a shot sometime this week also. Would you mind trying to backup your current data (besides wallet) and syncing from scratch to see if that makes a difference? Would be a good test.
bvbfan commented 2019-05-19 17:25:36 +02:00 (Migrated from github.com)

We are aware of this issues, hopefully we will merge our work past couple of months, soon. https://github.com/lbryio/lbrycrd/pull/276

We are aware of this issues, hopefully we will merge our work past couple of months, soon. https://github.com/lbryio/lbrycrd/pull/276
BrannonKing commented 2019-05-20 18:46:36 +02:00 (Migrated from github.com)

@cod3gen , is getclaimtrie in a method that you use often? We had planned to remove that in our next software release. It won't scale. The current RPC framework in lbrycrd (and its bitcoin base) does not allow streaming output; it requires the entire JSON object in RAM.

@cod3gen , is getclaimtrie in a method that you use often? We had planned to remove that in our next software release. It won't scale. The current RPC framework in lbrycrd (and its bitcoin base) does not allow streaming output; it requires the entire JSON object in RAM.
cod3gen commented 2019-05-20 19:58:00 +02:00 (Migrated from github.com)

Since you say you where going to remove it, i had to check what it actually is used for... I operate powermining.pw mining pool, based on yiimp stratum. If claimtrie is missing from getblocktemplate, it will try to retrieve claim hash from getclaimtrie... Which then again seems to dump all claims on chain? No wonder its taking ages to process.. Really not a optimal alternative method which really should have been just removed honestly. But in order to actually request a getclaimtrie, "claimtrie" must be missing from getblocktemplate.. I did not check this before i reverted to the older version, will check this on a backup server to see if there are any changes there.

Since you say you where going to remove it, i had to check what it actually is used for... I operate powermining.pw mining pool, based on yiimp stratum. If claimtrie is missing from getblocktemplate, it will try to retrieve claim hash from getclaimtrie... Which then again seems to dump all claims on chain? No wonder its taking ages to process.. Really not a optimal alternative method which really should have been just removed honestly. But in order to actually request a getclaimtrie, "claimtrie" must be missing from getblocktemplate.. I did not check this before i reverted to the older version, will check this on a backup server to see if there are any changes there.
BrannonKing commented 2019-05-20 20:07:02 +02:00 (Migrated from github.com)

Excellent explanation; you've pointed out that the claimtrie field is missing in the getblocktemplate return in the current master. That is a definite oversight on our part. I'll get that rectified shortly.

Excellent explanation; you've pointed out that the claimtrie field is missing in the getblocktemplate return in the current master. That is a definite oversight on our part. I'll get that rectified shortly.
cod3gen commented 2019-05-20 20:55:56 +02:00 (Migrated from github.com)

Yeah, started latest version and claimtrie is missing from blocktemplate, so once that`s in place again it most likely work as normal for pool operations. Beside getblocktemplate, what other commands would reveal claimtrie hex?

Also tested a bit more about the out of memory problem.. After 1-3 runs of either getclaimtrie OR getclaimsintrie, will result in daemon crash complaining out of memory. While monitoring ram usage, daemon consumed above 28GB physical memory before it crashed (at the same time lbrcyrd-cli consumed upto 10GB physical memory) which result in close to 40GB memory which was what i had left.. Turning on a larger swap space to compensate makes the daemon "survive" but ends with "error: couldn't parse reply from server" in client..

Yeah, started latest version and claimtrie is missing from blocktemplate, so once that`s in place again it most likely work as normal for pool operations. Beside getblocktemplate, what other commands would reveal claimtrie hex? Also tested a bit more about the out of memory problem.. After 1-3 runs of either getclaimtrie OR getclaimsintrie, will result in daemon crash complaining out of memory. While monitoring ram usage, daemon consumed above 28GB physical memory before it crashed (at the same time lbrcyrd-cli consumed upto 10GB physical memory) which result in close to 40GB memory which was what i had left.. Turning on a larger swap space to compensate makes the daemon "survive" but ends with "error: couldn't parse reply from server" in client..
cod3gen commented 2019-05-20 20:59:08 +02:00 (Migrated from github.com)

Will rebuild latest master once a fix have been applied :-)

Will rebuild latest master once a fix have been applied :-)
BrannonKing commented 2019-05-20 21:03:07 +02:00 (Migrated from github.com)

I don't think there is any other way to get the value you are looking for. getblockheader will includes a field named nameclaimroot that has the value you are looking for on mined data, but not on pre-mined data (which I assume is what you need the block template for).

I don't think there is any other way to get the value you are looking for. getblockheader will includes a field named nameclaimroot that has the value you are looking for on mined data, but not on pre-mined data (which I assume is what you need the block template for).
BrannonKing commented 2019-05-20 21:06:18 +02:00 (Migrated from github.com)

I also want to confirm that @bvbfan and I were seeing the same behavior where each call to getclaimsintrie or getclaimtrie leaks to RAM the entire return value.

I also want to confirm that @bvbfan and I were seeing the same behavior where each call to getclaimsintrie or getclaimtrie leaks to RAM the entire return value.
tzarebczan commented 2019-05-22 03:32:15 +02:00 (Migrated from github.com)

Hey @cod3gen, thanks for pointing out this issue! Can we show you some appreciation?

Hey @cod3gen, thanks for pointing out this issue! Can we show you some [appreciation](https://lbry.com/faq/appreciation)?
BrannonKing commented 2019-05-28 23:50:51 +02:00 (Migrated from github.com)

@cod3gen , the current master now includes the correct data in the getblocktemplate method. You may test it. Be careful, though. We are planning a few more major changes and a lot of testing on it over the next few weeks as we prepare for the next software release. It's not well-tested at the moment. Treat it with alpha status. Don't put any serious money in it. Feel free to report any further issues that you see with it.

@cod3gen , the current master now includes the correct data in the getblocktemplate method. You may test it. Be careful, though. We are planning a few more major changes and a lot of testing on it over the next few weeks as we prepare for the next software release. It's not well-tested at the moment. Treat it with alpha status. Don't put any serious money in it. Feel free to report any further issues that you see with it.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lbrycrd#277
No description provided.