made two scripts to bring in many claims to regtest #198

Closed
BrannonKing wants to merge 294 commits from add_claim_import_scripts into master
BrannonKing commented 2018-08-27 23:48:32 +02:00 (Migrated from github.com)

Usage:

Scenario 1:

  1. With mainnet server running: ./lbrycrd-cli getclaimsintrie > intrie.txt
  2. with regtest server running: ../contrib/devtools/import_claims_from_claimsintrie_output.py ./lbrycrd/src/lbrycrd-cli < intrie.txt

Scenario 2:

  1. with regtest server running: ./import_claims_from_name_per_line.py ../../src/lbrycrd-cli < /usr/share/dict/american-english
Usage: Scenario 1: 1. With mainnet server running: ./lbrycrd-cli getclaimsintrie > intrie.txt 2. with regtest server running: ../contrib/devtools/import_claims_from_claimsintrie_output.py ./lbrycrd/src/lbrycrd-cli < intrie.txt Scenario 2: 1. with regtest server running: ./import_claims_from_name_per_line.py ../../src/lbrycrd-cli < /usr/share/dict/american-english
lbrynaut (Migrated from github.com) reviewed 2018-08-27 23:48:32 +02:00
BrannonKing commented 2018-08-27 23:54:39 +02:00 (Migrated from github.com)

The proposed approach is not a quick copy. I spent quite a bit of time tracking through the performance issues associated with this but could see no obvious wins. See:

screenshot from 2018-08-27 13-58-46

The proposed approach is not a quick copy. I spent quite a bit of time tracking through the performance issues associated with this but could see no obvious wins. See: ![screenshot from 2018-08-27 13-58-46](https://user-images.githubusercontent.com/1509322/44688351-3aa72a80-aa11-11e8-903b-ab8443b7a99b.png)
bvbfan commented 2018-08-28 07:01:36 +02:00 (Migrated from github.com)

To me, it looks like slow down functions are (based on you sceenshot)
CCryptoKeyStore::HaveKey
isMine
CWallet::AvailableCoins
CWallet::CreateTransaction

To me, it looks like slow down functions are (based on you sceenshot) CCryptoKeyStore::HaveKey isMine CWallet::AvailableCoins CWallet::CreateTransaction
bvbfan commented 2018-09-21 14:43:10 +02:00 (Migrated from github.com)

The problem is that in CWallet::AvailableCoins we can have potentially O(N*M) calls to IsMine which calls to CBasicKeyStore::HaveKey where we can find a recursive mutex. That's extremely downsides look-ups furthermore recursive mutex is even slower than normal one. We have calls to HaveKey also in CWallet::GetKeyFromPool -> CWallet::ReserveKeyFromKeyPool. Since we don't own mutex so it's create -> release recursive mutex every time, one possible solution is to owns cs_KeyStore earlier before first loop in AvailableCoins, some kind of LOCK3 (which is not present). With C++11 atomic will be great improvement for variables but for containers still not. We can use boost::shared_mutex for multiple-readers / single-writer pattern.

The problem is that in CWallet::AvailableCoins we can have potentially O(N*M) calls to IsMine which calls to CBasicKeyStore::HaveKey where we can find a recursive mutex. That's extremely downsides look-ups furthermore recursive mutex is even slower than normal one. We have calls to HaveKey also in CWallet::GetKeyFromPool -> CWallet::ReserveKeyFromKeyPool. Since we don't own mutex so it's create -> release recursive mutex every time, one possible solution is to owns cs_KeyStore earlier before first loop in AvailableCoins, some kind of LOCK3 (which is not present). With C++11 atomic will be great improvement for variables but for containers still not. We can use boost::shared_mutex for multiple-readers / single-writer pattern.
BrannonKing commented 2018-09-21 16:37:30 +02:00 (Migrated from github.com)

The LOCK2 is just two LOCK calls; it's okay to lock a third right after. If we call LOCK on a recursive mutex that is already owned, is that faster?

What can we do to reduce the HaveKey time? Can we cache the results (per LOCK of cs_main)? Is it getting called multiple times with the same input?

The LOCK2 is just two LOCK calls; it's okay to lock a third right after. If we call LOCK on a recursive mutex that is already owned, is that faster? What can we do to reduce the HaveKey time? Can we cache the results (per LOCK of cs_main)? Is it getting called multiple times with the same input?
bvbfan commented 2018-09-21 18:49:27 +02:00 (Migrated from github.com)

The LOCK2 is just two LOCK calls;

But it shouldn't, the idea behind that is atomic lock of more than one mutexes to avoid deadlock
https://en.cppreference.com/w/cpp/thread/lock

it's okay to lock a third right after.

You can try it.

If we call LOCK on a recursive mutex that is already owned, is that faster?

Sure, the slower part is acquiring

What can we do to reduce the HaveKey time? Can we cache the results (per LOCK of cs_main)? Is it getting called multiple times with the same input?

Yes, you can minimize calls to HaveKey by making a map <key, result> in AvailableCoins before first loop, even it can be local variable not guarded by any mutex.

If you want help in implementation i can give a try to make it.

> The LOCK2 is just two LOCK calls; But it shouldn't, the idea behind that is atomic lock of more than one mutexes to avoid deadlock https://en.cppreference.com/w/cpp/thread/lock > it's okay to lock a third right after. You can try it. > If we call LOCK on a recursive mutex that is already owned, is that faster? Sure, the slower part is acquiring > What can we do to reduce the HaveKey time? Can we cache the results (per LOCK of cs_main)? Is it getting called multiple times with the same input? Yes, you can minimize calls to HaveKey by making a map <key, result> in AvailableCoins before first loop, even it can be local variable not guarded by any mutex. If you want help in implementation i can give a try to make it.
BrannonKing commented 2018-09-27 17:52:27 +02:00 (Migrated from github.com)

I don't think this approach is going to be sufficient. It's just too slow. Last night I exported 350k claims from mainnet. I broke the file into quarters. I then ran import scripts on all four in parallel. Ten hours later We had 100k blocks and about 100k claims imported. However, the import rate had slowed to about 200/minute, which says we need another 21 hours to complete this. However, it's probably going to continue to slow as the tree gets more nodes.

I don't think this approach is going to be sufficient. It's just too slow. Last night I exported 350k claims from mainnet. I broke the file into quarters. I then ran import scripts on all four in parallel. Ten hours later We had 100k blocks and about 100k claims imported. However, the import rate had slowed to about 200/minute, which says we need another 21 hours to complete this. However, it's probably going to continue to slow as the tree gets more nodes.
BrannonKing commented 2019-02-12 23:17:54 +01:00 (Migrated from github.com)

Not only is this approach insufficient from a performance standpoint, it's also insufficient on its data. It needs to bring in the real values from mainnet. To do that, it has to parse the metadata on mainnet and replace the claimIds with updated inserts.

Not only is this approach insufficient from a performance standpoint, it's also insufficient on its data. It needs to bring in the real values from mainnet. To do that, it has to parse the metadata on mainnet and replace the claimIds with updated inserts.

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lbrycrd#198
No description provided.