Separate disk related functions in CClaimTrieDb #140

2018-05-19T23:00:18+02:00

bvbfan commented

2018-05-19 23:00:18 +02:00

(Migrated from github.com)

Signed-off-by: Anthony Fieroni bvbfan@abv.bg

Signed-off-by: Anthony Fieroni <bvbfan@abv.bg>

bvbfan commented

2018-05-21 07:40:17 +02:00

(Migrated from github.com)

@kaykurokawa, OSX build fails, but it's not looks like pull request related.

kaykurokawa commented

2018-05-21 21:27:15 +02:00

(Migrated from github.com)

Yes OSX build on travis is currently failing for all : https://github.com/lbryio/lbrycrd/issues/120

kaykurokawa (Migrated from github.com) reviewed 2018-05-23 18:15:35 +02:00

src/claimtrie.cpp

					
				@ -1,18 +1,25 @@

				#include "claimtrie.h"

kaykurokawa (Migrated from github.com) commented

2018-05-23 18:15:35 +02:00

I think these can go in claimtriedb.h

kaykurokawa (Migrated from github.com) reviewed 2018-05-23 18:25:05 +02:00

src/claimtrie.cpp

					
				@ -771,2 +936,2 @@

				        if (hasChild && nNextHeight < Params().GetConsensus().nMaxTakeoverWorkaroundHeight) {

				            removalWorkaround.insert(name);

				    success = currentNode->removeClaim(outPoint, claim);

				    base->removeFromClaimIndex(claim);

kaykurokawa (Migrated from github.com) commented

2018-05-23 18:25:05 +02:00

Ideally this block of code would reside in CClaimTrieDb , the motivation would be that it would be much clearer exactly what is being written/read in the database if we have it in a well defined place. As you said, a more aggressive refactor could have CClaimTrieDb manage the dirty* variables and would make this possible.

kaykurokawa commented

2018-05-23 18:34:04 +02:00

(Migrated from github.com)

Hi, as you mentioned in https://github.com/lbryio/lbrycrd/issues/136 , a bit more aggressive refactor would be acceptable. To be specific , the various dirty* memeber variables that is currently managed by CClaimTrie could be managed by CClaimTrieDb. This would make it much clearer exactly what kind of things are being written/read to disk and how, and should also improve some DRY problems involved in the manipulation of dirty* member variables.

Please note though I think there is a slight misunderstanding you may have about CClaimTrieCache , this class is NOT a cache of the CClaimTrie. CClaimTrieCache actually contains a cache of changes that must be applied to CClaimTrie upon block increment (removing claims, adding claims, etc..). Thus CClaimTrieCache should be minimally affected by this refactor.

Hi, as you mentioned in https://github.com/lbryio/lbrycrd/issues/136 , a bit more aggressive refactor would be acceptable. To be specific , the various dirty* memeber variables that is currently managed by CClaimTrie could be managed by CClaimTrieDb. This would make it much clearer exactly what kind of things are being written/read to disk and how, and should also improve some DRY problems involved in the manipulation of dirty* member variables. Please note though I think there is a slight misunderstanding you may have about CClaimTrieCache , this class is NOT a cache of the CClaimTrie. CClaimTrieCache actually contains a cache of changes that must be applied to CClaimTrie upon block increment (removing claims, adding claims, etc..). Thus CClaimTrieCache should be minimally affected by this refactor.

bvbfan commented

2018-05-23 18:58:01 +02:00

(Migrated from github.com)

You're right about of CClaimTrieCache, i wrote the comment before i see the implementation, name lies to me, did you think it's proper name?

bvbfan (Migrated from github.com) reviewed 2018-05-23 19:01:01 +02:00

src/claimtrie.cpp

					
				@ -771,2 +936,2 @@

				        if (hasChild && nNextHeight < Params().GetConsensus().nMaxTakeoverWorkaroundHeight) {

				            removalWorkaround.insert(name);

				    success = currentNode->removeClaim(outPoint, claim);

				    base->removeFromClaimIndex(claim);

bvbfan (Migrated from github.com) commented

2018-05-23 19:01:00 +02:00

I'll investigate to move dirty queues as well as read/write in CClaimTrieDb

bvbfan (Migrated from github.com) reviewed 2018-05-28 10:16:40 +02:00

src/claimtrie.cpp

					
				@ -1,18 +1,25 @@

				#include "claimtrie.h"

bvbfan (Migrated from github.com) commented

2018-05-28 10:16:40 +02:00

I've give a try to implement it on elegant way. Generic in claimtriedb as

template<typename K, typename V>
bool getQueue(const K &key, V &value) const
{
    auto hash = typeid(std::map<K, V>).hash_code();
    auto it = map_hashes.find(hash);
    if (it == map_hashes.end()) return false;
    auto map = reinterpret_cast<std::map<K, V>*>(it->second);
    auto i = map->find(key);
    if (i != map->end()) { value = i->second; return true; }
    return false;
}

I mean it can be used a hash to store generic queues (hash will be a key in db not char) so all definitions will gone away, but we should provide db migration, right?

I've give a try to implement it on elegant way. Generic in claimtriedb as ``` template<typename K, typename V> bool getQueue(const K &key, V &value) const { auto hash = typeid(std::map<K, V>).hash_code(); auto it = map_hashes.find(hash); if (it == map_hashes.end()) return false; auto map = reinterpret_cast<std::map<K, V>*>(it->second); auto i = map->find(key); if (i != map->end()) { value = i->second; return true; } return false; } ``` I mean it can be used a hash to store generic queues (hash will be a key in db not char) so all definitions will gone away, but we should provide db migration, right?

bvbfan commented

2018-05-28 20:08:23 +02:00

(Migrated from github.com)

I force push new version, but it looks like not updated here :)

bvbfan commented

2018-05-31 10:30:40 +02:00

(Migrated from github.com)

I can't figure out why when i write to leveldb CClaimTrieNode is not empty, but when i reads it is. Did you have any advice? Test fails about that

void CCMap<K, V>::write(size_t, CClaimTrieDb*) [with K = std::basic_string<char>; V = CClaimTrieNode; size_t = long unsigned int], , 0
void CCMap<K, V>::write(size_t, CClaimTrieDb*) [with K = std::basic_string<char>; V = CClaimTrieNode; size_t = long unsigned int], t, 0
void CCMap<K, V>::write(size_t, CClaimTrieDb*) [with K = std::basic_string<char>; V = CClaimTrieNode; size_t = long unsigned int], te, 0
void CCMap<K, V>::write(size_t, CClaimTrieDb*) [with K = std::basic_string<char>; V = CClaimTrieNode; size_t = long unsigned int], tes, 0
void CCMap<K, V>::write(size_t, CClaimTrieDb*) [with K = std::basic_string<char>; V = CClaimTrieNode; size_t = long unsigned int], test, 0
bool CClaimTrieDb::SeekFirstKey(K&, V&) const [with K = std::basic_string<char>; V = CClaimTrieNode], , 1
bool CClaimTrieDb::SeekFirstKey(K&, V&) const [with K = std::basic_string<char>; V = CClaimTrieNode], t, 1
bool CClaimTrieDb::SeekFirstKey(K&, V&) const [with K = std::basic_string<char>; V = CClaimTrieNode], te, 1
bool CClaimTrieDb::SeekFirstKey(K&, V&) const [with K = std::basic_string<char>; V = CClaimTrieNode], tes, 1
bool CClaimTrieDb::SeekFirstKey(K&, V&) const [with K = std::basic_string<char>; V = CClaimTrieNode], test, 0

I can't figure out why when i write to leveldb CClaimTrieNode is not empty, but when i reads it is. Did you have any advice? Test fails about that ``` void CCMap<K, V>::write(size_t, CClaimTrieDb*) [with K = std::basic_string<char>; V = CClaimTrieNode; size_t = long unsigned int], , 0 void CCMap<K, V>::write(size_t, CClaimTrieDb*) [with K = std::basic_string<char>; V = CClaimTrieNode; size_t = long unsigned int], t, 0 void CCMap<K, V>::write(size_t, CClaimTrieDb*) [with K = std::basic_string<char>; V = CClaimTrieNode; size_t = long unsigned int], te, 0 void CCMap<K, V>::write(size_t, CClaimTrieDb*) [with K = std::basic_string<char>; V = CClaimTrieNode; size_t = long unsigned int], tes, 0 void CCMap<K, V>::write(size_t, CClaimTrieDb*) [with K = std::basic_string<char>; V = CClaimTrieNode; size_t = long unsigned int], test, 0 bool CClaimTrieDb::SeekFirstKey(K&, V&) const [with K = std::basic_string<char>; V = CClaimTrieNode], , 1 bool CClaimTrieDb::SeekFirstKey(K&, V&) const [with K = std::basic_string<char>; V = CClaimTrieNode], t, 1 bool CClaimTrieDb::SeekFirstKey(K&, V&) const [with K = std::basic_string<char>; V = CClaimTrieNode], te, 1 bool CClaimTrieDb::SeekFirstKey(K&, V&) const [with K = std::basic_string<char>; V = CClaimTrieNode], tes, 1 bool CClaimTrieDb::SeekFirstKey(K&, V&) const [with K = std::basic_string<char>; V = CClaimTrieNode], test, 0 ```

bvbfan commented

2018-05-31 22:37:39 +02:00

(Migrated from github.com)

Ok, Kay, can you review it? The architecture aims to provide queues by needs, for now they are maps, through CClaimTrieDb. I plan to refactor CClaimTrieCache too to benefit db cache and to be renamed, as we discuss. One test still fail, somehow, i can't see this is my fault. Hope we can merge it in refactor branch after you explain your remarks.
Regards

Ok, Kay, can you review it? The architecture aims to provide queues by needs, for now they are maps, through CClaimTrieDb. I plan to refactor CClaimTrieCache too to benefit db cache and to be renamed, as we discuss. One test still fail, somehow, i can't see this is my fault. Hope we can merge it in refactor branch after you explain your remarks. Regards

bvbfan commented

2018-06-05 05:37:32 +02:00

(Migrated from github.com)

Did you have a time to take a look on this? After all i have one pending change, last implementation uses vector, under the hood in CClaimTrieDb, to not reorder cached queues.

kaykurokawa commented

2018-06-06 17:21:00 +02:00

(Migrated from github.com)

Will start looking at this today

bvbfan commented

2018-06-10 08:38:58 +02:00

(Migrated from github.com)

Kay i will change hash calculation since i check older version of boost, https://wandbox.org/ helped a lot, Boost.TypeIndex started at version 1.56, but the problem is around 1.64 it's change hash calculation, which is annoying. I will implement it in different way.

kaykurokawa (Migrated from github.com) reviewed 2018-06-13 18:01:04 +02:00

src/claimtrie.h

					
				@ -308,0 +295,4 @@

				typedef std::pair<std::string, claimQueueValueType> claimQueueEntryType;

				typedef std::pair<std::string, supportQueueValueType> supportQueueEntryType;

kaykurokawa (Migrated from github.com) commented

2018-06-13 18:01:04 +02:00

This block of code (253 - 293) is a bit confusing, it's basically just to enable swap() right?
Maybe adding some comments about what its trying to achieve would be helpful.

This block of code (253 - 293) is a bit confusing, it's basically just to enable swap() right? Maybe adding some comments about what its trying to achieve would be helpful.

kaykurokawa (Migrated from github.com) reviewed 2018-06-13 18:19:30 +02:00

src/claimtriedb.h

					
				@ -0,0 +61,4 @@

				    CClaimTrieDb(bool fMemory = false, bool fWipe = false);

				    ~CClaimTrieDb();

kaykurokawa (Migrated from github.com) commented

2018-06-13 18:19:30 +02:00

Spelling : "durty" ->"dirty"

bvbfan (Migrated from github.com) reviewed 2018-06-13 18:27:22 +02:00

src/claimtrie.h

					
				@ -308,0 +295,4 @@

				typedef std::pair<std::string, claimQueueValueType> claimQueueEntryType;

				typedef std::pair<std::string, supportQueueValueType> supportQueueEntryType;

bvbfan (Migrated from github.com) commented

2018-06-13 18:27:22 +02:00

From 262 to 284 - yes, it for std::swap to work. Helpers and typedefs are to make a different type when insert to db, so let's see an example

typedef Generic<outPointHeightType, queueNameRowHelper> queueNameRowValueType;
typedef Generic<outPointHeightType, supportQueueNameRowHelper> supportQueueNameRowValueType;

Helper is used only to make these 2 types different e.g. to stored in different keys in db. I'll commented it.

From 262 to 284 - yes, it for std::swap to work. Helpers and typedefs are to make a different type when insert to db, so let's see an example ``` typedef Generic<outPointHeightType, queueNameRowHelper> queueNameRowValueType; typedef Generic<outPointHeightType, supportQueueNameRowHelper> supportQueueNameRowValueType; ``` Helper is used only to make these 2 types different e.g. to stored in different keys in db. I'll commented it.

kaykurokawa (Migrated from github.com) reviewed 2018-06-13 19:23:54 +02:00

src/claimtriedb.h

					
				@ -0,0 +51,4 @@

				    virtual ~CCBase() {}

				    virtual void write(const size_t key, CClaimTrieDb *db) = 0;

				};

kaykurokawa (Migrated from github.com) commented

2018-06-13 19:23:54 +02:00

Would like to see better description of CClaimTrieDb class, maybe something like this:
"This class implements key value storage for use by the CClaimTrie class. It allows for the storage of values of datatype V that can be retrieved using key datatype K. Changes to the key value storage is buffered until they are written to disk using writeQueues()."

Would like to see better description of CClaimTrieDb class, maybe something like this: "This class implements key value storage for use by the CClaimTrie class. It allows for the storage of values of datatype V that can be retrieved using key datatype K. Changes to the key value storage is buffered until they are written to disk using writeQueues()."

kaykurokawa (Migrated from github.com) reviewed 2018-06-13 19:26:30 +02:00

src/claimtriedb.h

					
				@ -0,0 +89,4 @@

				     */

				    template <typename K, typename V>

				    void updateQueueRow(const K &key, V &row);

kaykurokawa (Migrated from github.com) commented

2018-06-13 19:26:30 +02:00

Clarity and grammar for keyTypeEmpty() docstring:
"Check that there are no data stored under key datatype K and value datatype V. Checks both the buffer and disk."

Clarity and grammar for keyTypeEmpty() docstring: "Check that there are no data stored under key datatype K and value datatype V. Checks both the buffer and disk."

kaykurokawa (Migrated from github.com) reviewed 2018-06-13 19:59:19 +02:00

src/claimtriedb.h

					
				@ -0,0 +61,4 @@

				    CClaimTrieDb(bool fMemory = false, bool fWipe = false);

				    ~CClaimTrieDb();

kaykurokawa (Migrated from github.com) commented

2018-06-13 19:59:19 +02:00

Actually, we should probably stop using this "dirty" term as it is not very descriptive. Maybe:
"Write to disk the buffered changes to the key value storage"

Actually, we should probably stop using this "dirty" term as it is not very descriptive. Maybe: "Write to disk the buffered changes to the key value storage"

kaykurokawa (Migrated from github.com) reviewed 2018-06-13 20:01:33 +02:00

src/claimtriedb.h

					
				@ -0,0 +70,4 @@

				    /**

				     * Gets a map representation of K type / V type stored by their hash

				     * @param[out] map  key / value pairs readed from queues and disk

				     */

kaykurokawa (Migrated from github.com) commented

2018-06-13 20:01:33 +02:00

"key / value pairs readed from queues and disk" - > "key / value pairs read from disk with changes from the buffer applied"

kaykurokawa (Migrated from github.com) reviewed 2018-06-13 20:05:19 +02:00

src/claimtriedb.h

					
				@ -0,0 +79,4 @@

				     * @param[in] key   key to looking for in dirty queues and disk

				     * @param[out] row  value which is found

				     */

				    template <typename K, typename V>

kaykurokawa (Migrated from github.com) commented

2018-06-13 20:05:18 +02:00

"key to looking for in dirty queues and disk" -> "key to look for in buffer and disk"

kaykurokawa (Migrated from github.com) reviewed 2018-06-13 20:05:49 +02:00

src/claimtriedb.h

					
				@ -0,0 +86,4 @@

				     * Update value of type V by key of type K through their hash

				     * @param[in] key       key to looking for in dirty queues and disk

				     * @param[in/out] row   update value and gets its last value

				     */

kaykurokawa (Migrated from github.com) commented

2018-06-13 20:05:49 +02:00

" key to looking for in dirty queues and disk" - > "key to look for in buffer and disk"

kaykurokawa (Migrated from github.com) reviewed 2018-06-13 20:06:30 +02:00

src/claimtriedb.h

					
				@ -0,0 +99,4 @@

				    /**

				     * Get a map representation of K type / V type stored by theirs hash

				     * @param[out] map  key / value pairs readed only from disk

				     */

kaykurokawa (Migrated from github.com) commented

2018-06-13 20:06:30 +02:00

"key / value pairs readed only from disk" -> "key/value pairs, read only from disk"

kaykurokawa (Migrated from github.com) reviewed 2018-06-13 20:07:11 +02:00

src/claimtriedb.h

					
				@ -0,0 +118,4 @@

				    /**

				     * Represents dirty queues before stored to disk

				     */

kaykurokawa (Migrated from github.com) commented

2018-06-13 20:07:11 +02:00

"Represents buffer of changes"

bvbfan (Migrated from github.com) reviewed 2018-06-13 20:09:11 +02:00

src/claimtrie.h

					
				@ -308,0 +295,4 @@

				typedef std::pair<std::string, claimQueueValueType> claimQueueEntryType;

				typedef std::pair<std::string, supportQueueValueType> supportQueueEntryType;

bvbfan (Migrated from github.com) commented

2018-06-13 20:09:11 +02:00

line 253: /// Helpers to separate queue types from each other
line 271: /// Make std::swap to work with custom types
line 288: /// Each type will be stored in database separately

line 253: /// Helpers to separate queue types from each other line 271: /// Make std::swap to work with custom types line 288: /// Each type will be stored in database separately

kaykurokawa (Migrated from github.com) reviewed 2018-06-13 20:27:19 +02:00

src/claimtrie.h

					
				@ -308,0 +295,4 @@

				typedef std::pair<std::string, claimQueueValueType> claimQueueEntryType;

				typedef std::pair<std::string, supportQueueValueType> supportQueueEntryType;

kaykurokawa (Migrated from github.com) commented

2018-06-13 20:27:19 +02:00

looks good

kaykurokawa commented

2018-06-13 22:03:55 +02:00

(Migrated from github.com)

Updated doc strings in 53cf80d99c ,
See branch: seperate_disk_claim

Regarding terminology, I tried to change things so that "queue" refers to a series of data that is used by CClaimTrie (a "queue" can be on disk or in memory). "buffer" refers to a set of changes to "queue" that is stored in memory. I removed mentions to "dirty" or "dirty queue" as I think its not descriptive enough.

Updated doc strings in 53cf80d99c2577582396612215b03bb1307b5a57 , See branch: seperate_disk_claim Regarding terminology, I tried to change things so that "queue" refers to a series of data that is used by CClaimTrie (a "queue" can be on disk or in memory). "buffer" refers to a set of changes to "queue" that is stored in memory. I removed mentions to "dirty" or "dirty queue" as I think its not descriptive enough.

bvbfan commented

2018-06-14 06:06:04 +02:00

(Migrated from github.com)

It looks good to me.

kaykurokawa (Migrated from github.com) reviewed 2018-06-19 22:00:16 +02:00

src/claimtrie.cpp

					
				@ -1439,0 +1899,4 @@

				bool CClaimTrieCache::forkForExpirationChange(bool increment)

				{

				    /*

				    If increment is True, we have forked to extend the expiration time, thus items in the expiration queue

kaykurokawa (Migrated from github.com) commented

2018-06-19 22:00:16 +02:00

Upon fresh start (with empty data directory), there won't be any nodes to load, so this will cause lbrycrdd to exit. See
https://github.com/lbryio/lbrycrd/blob/master/src/init.cpp#L1286

On previous ReadFromDisk() function, it will only fail if it somehow pcursor->GetValue(*node) returns False.

Upon fresh start (with empty data directory), there won't be any nodes to load, so this will cause lbrycrdd to exit. See https://github.com/lbryio/lbrycrd/blob/master/src/init.cpp#L1286 On previous ReadFromDisk() function, it will only fail if it somehow pcursor->GetValue(*node) returns False.

kaykurokawa commented

2018-06-19 22:04:05 +02:00

(Migrated from github.com)

I found a problem, try syncing lbrycrdd from scratch (clear out data directory before starting lbrycrdd).
See comment above, should be a simple fix.

I found a problem, try syncing lbrycrdd from scratch (clear out data directory before starting lbrycrdd). See comment above, should be a simple fix.

bvbfan commented

2018-06-20 05:43:24 +02:00

(Migrated from github.com)

diff --git a/src/claimtriedb.cpp b/src/claimtriedb.cpp
index 0c0eccf7..198d6641 100644
--- a/src/claimtriedb.cpp
+++ b/src/claimtriedb.cpp
@@ -136,20 +136,17 @@ bool CClaimTrieDb::seekByKey(std::map<K, V, C> &map) const
     const size_t hash = hashType<K, V>();
     boost::scoped_ptr<CDBIterator> pcursor(const_cast<CClaimTrieDb*>(this)->NewIterator());
 
-    bool found = false;
-
     for (pcursor->SeekToFirst(); pcursor->Valid(); pcursor->Next()) {
         std::pair<size_t, K> key;
         if (pcursor->GetKey(key)) {
             if (hash == key.first) {
                 V value;
                 if (!pcursor->GetValue(value)) return false;
-                found = true;
                 map.insert(std::make_pair(key.second, value));
             }
         }
     }
-    return found;
+    return true;
 }
 
 template <typename K, typename V, typename C>

``` diff --git a/src/claimtriedb.cpp b/src/claimtriedb.cpp index 0c0eccf7..198d6641 100644 --- a/src/claimtriedb.cpp +++ b/src/claimtriedb.cpp @@ -136,20 +136,17 @@ bool CClaimTrieDb::seekByKey(std::map<K, V, C> &map) const const size_t hash = hashType<K, V>(); boost::scoped_ptr<CDBIterator> pcursor(const_cast<CClaimTrieDb*>(this)->NewIterator()); - bool found = false; - for (pcursor->SeekToFirst(); pcursor->Valid(); pcursor->Next()) { std::pair<size_t, K> key; if (pcursor->GetKey(key)) { if (hash == key.first) { V value; if (!pcursor->GetValue(value)) return false; - found = true; map.insert(std::make_pair(key.second, value)); } } } - return found; + return true; } template <typename K, typename V, typename C> ```

bvbfan commented

2018-06-20 05:47:50 +02:00

(Migrated from github.com)

So maybe we should update test as well
claimtriebranching_tests.cpp:319

BOOST_CHECK(pclaimTrie->ReadFromDisk(true));

So maybe we should update test as well claimtriebranching_tests.cpp:319 ``` BOOST_CHECK(pclaimTrie->ReadFromDisk(true)); ```

kaykurokawa commented

2018-06-20 16:13:46 +02:00

(Migrated from github.com)

Yes, these changes look good, also add a docstring to seekByKey()

"Returns false if database read fails."

Can you make these changes on your branch,
Also, add the docstring commit 53cf80d99c
And than squash these commits (maybe one commit is ok?) to make it ready for merge to master.

Yes, these changes look good, also add a docstring to seekByKey() "Returns false if database read fails." Can you make these changes on your branch, Also, add the docstring commit 53cf80d99c2577582396612215b03bb1307b5a57 And than squash these commits (maybe one commit is ok?) to make it ready for merge to master.

bvbfan commented

2018-06-20 16:22:02 +02:00

(Migrated from github.com)

Also notice that i have pending changes to that branch
https://github.com/lbryio/lbrycrd/issues/145

Also notice that i have pending changes to that branch https://github.com/lbryio/lbrycrd/issues/145

kaykurokawa commented

2018-06-27 18:52:07 +02:00

(Migrated from github.com)

Closing as further work is continued in #160

Pull request closed

Please reopen this pull request to perform a merge.

Sign in to join this conversation.

No reviewers

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lbrycrd#140

				`@ -1,18 +1,25 @@`
				`#include "claimtrie.h"`

				`@ -1,18 +1,25 @@`
				`#include "claimtrie.h"`

Rows
Columns