The main normalization branch that needs review pronto #235

Merged
kaykurokawa merged 10 commits from normalization_1107_2 into master 2019-02-09 20:03:00 +01:00
kaykurokawa commented 2018-11-08 03:53:22 +01:00 (Migrated from github.com)

Some small commits on top of normalization PR #159.. The thread is getting quite lengthy so I thought I'd open a new PR... @BrannonKing can quickly review this here and move these commits in there when he's ready.

a0741b8
Added tests that checks the normalization only , without making claims. Contains tests for invalid utf8, some tests for NFD normalization in certain languages, and casings on non english languages.

c26f2c9
Just moved and clarified a comment on the normalizeClaimName function

2f61fb2
Added an RPC call checknormalization so user can check what the normalization function does from RPC/CLI

Some small commits on top of normalization PR #159.. The thread is getting quite lengthy so I thought I'd open a new PR... @BrannonKing can quickly review this here and move these commits in there when he's ready. a0741b8 Added tests that checks the normalization only , without making claims. Contains tests for invalid utf8, some tests for NFD normalization in certain languages, and casings on non english languages. c26f2c9 Just moved and clarified a comment on the normalizeClaimName function 2f61fb2 Added an RPC call checknormalization so user can check what the normalization function does from RPC/CLI
BrannonKing (Migrated from github.com) reviewed 2018-11-08 03:53:22 +01:00
BrannonKing commented 2018-11-08 18:09:26 +01:00 (Migrated from github.com)

This looks fine to me. I'll include it on the other branch shortly.

This looks fine to me. I'll include it on the other branch shortly.
BrannonKing (Migrated from github.com) reviewed 2018-11-10 01:07:00 +01:00
@ -1420,2 +1282,4 @@
namesToCheckForTakeover.insert(name);
CClaimTrieNode* rootNode = getRoot();
return recursivePruneName(rootNode, 0, name);
BrannonKing (Migrated from github.com) commented 2018-11-10 01:06:59 +01:00

I was trying to establish a boundary between the trie and the queues where we always normalize going into the trie. The same is true for the support map.

I was trying to establish a boundary between the trie and the queues where we always normalize going into the trie. The same is true for the support map.
BrannonKing (Migrated from github.com) reviewed 2018-11-10 01:07:45 +01:00
BrannonKing (Migrated from github.com) commented 2018-11-10 01:07:45 +01:00

@bvbfan , this situation is for you to look at.

@bvbfan , this situation is for you to look at.
bvbfan (Migrated from github.com) requested changes 2018-11-12 10:23:40 +01:00
bvbfan (Migrated from github.com) left a comment

I think that we should commit to master all work before this changes, without non normalized names. Then we can figure out easily special case that we want to handle. I was thinking for simple solution, we can add only non normalized names to claim index and undo blocks, but i should check it.

I think that we should commit to master all work before this changes, without non normalized names. Then we can figure out easily special case that we want to handle. I was thinking for simple solution, we can add only non normalized names to claim index and undo blocks, but i should check it.
@ -1692,3 +1556,3 @@
expirationQueueType::iterator CClaimTrieCache::getExpirationQueueCacheRow(int nHeight, bool createIfNotExists) const
expirationQueueType::iterator CClaimTrieCacheBase::getExpirationQueueCacheRow(int nHeight, bool createIfNotExists) const
{
bvbfan (Migrated from github.com) commented 2018-11-12 10:10:26 +01:00

Cache normalizeClaimName(name) before loop, we don't need to do it more that once.

Cache normalizeClaimName(name) before loop, we don't need to do it more that once.
bvbfan (Migrated from github.com) commented 2018-11-12 10:10:47 +01:00

Cache normalizeClaimName(name) before loop

Cache normalizeClaimName(name) before loop
@ -2017,7 +1876,7 @@ void CClaimTrieCache::removeSupportFromExpirationQueue(const std::string& name,
}
}
bvbfan (Migrated from github.com) commented 2018-11-12 10:11:18 +01:00

Cache normalizeClaimName(name) before loop

Cache normalizeClaimName(name) before loop
@ -2094,0 +1959,4 @@
// it will cause a double remove when rolling back -- something we don't handle correctly
// (same problem exists for the insertSupportUndo below)
//for (insertUndoType::reverse_iterator itInsertUndo = insertUndo.rbegin(); itInsertUndo != insertUndo.rend(); ++itInsertUndo)
// if (itInsertUndo->name == itEntry->first && itInsertUndo->outPoint == itEntry->second.outPoint)
bvbfan (Migrated from github.com) commented 2018-11-12 10:12:35 +01:00

Cache normalizeClaimName(name) before loop.

Cache normalizeClaimName(name) before loop.
bvbfan (Migrated from github.com) commented 2018-11-12 10:09:32 +01:00

You should return or make function void

You should return or make function void
bvbfan commented 2018-11-15 20:20:41 +01:00 (Migrated from github.com)

We have 2 major issues in last attempt to keep track of original name, they aren't handled by test case, cause reordering nodes after normalization:

  1. we loose ability to return same best claim
  2. takeover height can be away different

I've thinking about to minimize normalization effort by lower-casing on demand, note

std::string ut8_string = boost::locale::conv::to_utf<char>(name, "UTF-8", boost::locale::conv::stop);
if (utf8_string.empty())
    return name;
normalized = boost::locale::normalize(utf8_string, boost::locale::norm_nfd, utf8);

// lowering in case of real normalization
if (utf8_string != normalized)
    normalized = boost::locale::to_lower(normalized, utf8);
return normalized;

We can avoid lower-casing when we does not have real change.

We have 2 major issues in last attempt to keep track of original name, they aren't handled by test case, cause reordering nodes after normalization: 1. we loose ability to return same best claim 2. takeover height can be away different I've thinking about to minimize normalization effort by lower-casing on demand, note ``` std::string ut8_string = boost::locale::conv::to_utf<char>(name, "UTF-8", boost::locale::conv::stop); if (utf8_string.empty()) return name; normalized = boost::locale::normalize(utf8_string, boost::locale::norm_nfd, utf8); // lowering in case of real normalization if (utf8_string != normalized) normalized = boost::locale::to_lower(normalized, utf8); return normalized; ``` We can avoid lower-casing when we does not have *real* change.
BrannonKing (Migrated from github.com) reviewed 2018-11-19 16:36:28 +01:00
@ -1692,3 +1556,3 @@
expirationQueueType::iterator CClaimTrieCache::getExpirationQueueCacheRow(int nHeight, bool createIfNotExists) const
expirationQueueType::iterator CClaimTrieCacheBase::getExpirationQueueCacheRow(int nHeight, bool createIfNotExists) const
{
BrannonKing (Migrated from github.com) commented 2018-11-19 16:36:28 +01:00

I was expecting the outPoint to be reasonably unique, meaning that we would only call the normalization once or less on average.

I was expecting the outPoint to be reasonably unique, meaning that we would only call the normalization once or less on average.
BrannonKing (Migrated from github.com) reviewed 2018-11-19 16:36:52 +01:00
BrannonKing (Migrated from github.com) commented 2018-11-19 16:36:52 +01:00

Again here I was thinking that outPoint was unique.

Again here I was thinking that outPoint was unique.
BrannonKing commented 2018-11-19 16:46:26 +01:00 (Migrated from github.com)

they aren't handled by test case, cause reordering nodes after normalization...

I did write a new test case to ensure that the nodes are reordered going forwards or backwards. (But I didn't do anything to the test to ensure that they were out of order at any point during the test.)

we loose ability to return same best claim

I'm not sure what you're stating here exactly. We still return the best claim, but we now compare across a larger set of claims. It is true that previous best may come into competition with claims that it was not competing against before the fork.

takeover height can be away different

Only if the new competition causes it to change. Otherwise it should hold on to its previous value.

I've thinking about to minimize normalization effort by lower-casing on demand, note
std::string ut8_string = boost::locale::conv::to_utf(name, "UTF-8", boost::locale::conv::stop);
if (utf8_string.empty())
return name;
normalized = boost::locale::normalize(utf8_string, boost::locale::norm_nfd, utf8);
// lowering in case of real normalization
if (utf8_string != normalized)
normalized = boost::locale::to_lower(normalized, utf8);
return normalized;

This code doesn't make any sense to me. The goal was to make more names compete with each other. It's not just for normalization alone. We want to avoid having winners off by one letter casing, etc.

> they aren't handled by test case, cause reordering nodes after normalization... I did write a new test case to ensure that the nodes are reordered going forwards or backwards. (But I didn't do anything to the test to ensure that they were out of order at any point during the test.) > we loose ability to return same best claim I'm not sure what you're stating here exactly. We still return the best claim, but we now compare across a larger set of claims. It is true that previous best may come into competition with claims that it was not competing against before the fork. > takeover height can be away different Only if the new competition causes it to change. Otherwise it should hold on to its previous value. > I've thinking about to minimize normalization effort by lower-casing on demand, note > std::string ut8_string = boost::locale::conv::to_utf<char>(name, "UTF-8", boost::locale::conv::stop); > if (utf8_string.empty()) > return name; > normalized = boost::locale::normalize(utf8_string, boost::locale::norm_nfd, utf8); > // lowering in case of real normalization > if (utf8_string != normalized) > normalized = boost::locale::to_lower(normalized, utf8); > return normalized; This code doesn't make any sense to me. The goal was to make more names compete with each other. It's not just for normalization alone. We want to avoid having winners off by one letter casing, etc.
bvbfan commented 2018-11-19 18:08:25 +01:00 (Migrated from github.com)

It is true that previous best may come into competition with claims that it was not competing against before the fork.

I mean exactly that.

Only if the new competition causes it to change.

It's true if you check that there isn't claims after normalization will have same names and takeover height will override another one.

It's not just for normalization alone. We want to avoid having winners off by one letter casing, etc.

Yeah, in that case lower case makes sense,

> It is true that previous best may come into competition with claims that it was not competing against before the fork. I mean exactly that. > Only if the new competition causes it to change. It's true if you check that there isn't claims after normalization will have same names and takeover height will override another one. > It's not just for normalization alone. We want to avoid having winners off by one letter casing, etc. Yeah, in that case lower case makes sense,
BrannonKing (Migrated from github.com) reviewed 2018-11-22 17:01:53 +01:00
BrannonKing (Migrated from github.com) commented 2018-11-22 17:01:53 +01:00

fixed

fixed
BrannonKing commented 2018-11-22 17:05:07 +01:00 (Migrated from github.com)

I just pushed in some minor changes and fixes. You will now see normalizedName in the RPC calls. (I haven't done a major RPC field name refactor yet.) I yanked all the code to return pending supports. I couldn't get the normalized name correct there, and the whole idea seemed wrong after further reflection. A support is not a support until the block increments. Same with claims. Please review it. Thanks.

I just pushed in some minor changes and fixes. You will now see normalizedName in the RPC calls. (I haven't done a major RPC field name refactor yet.) I yanked all the code to return pending supports. I couldn't get the normalized name correct there, and the whole idea seemed wrong after further reflection. A support is not a support until the block increments. Same with claims. Please review it. Thanks.
bvbfan (Migrated from github.com) approved these changes 2018-11-23 14:07:55 +01:00
BrannonKing (Migrated from github.com) reviewed 2018-11-28 21:30:16 +01:00
BrannonKing (Migrated from github.com) commented 2018-11-28 21:30:16 +01:00

I think I understand how this is happening now. I have some ideas on how to address it.

I think I understand how this is happening now. I have some ideas on how to address it.
bvbfan (Migrated from github.com) reviewed 2018-11-28 21:39:44 +01:00
bvbfan (Migrated from github.com) commented 2018-11-28 21:39:44 +01:00

Is this still happen? When I correct unit test it does not throw.

Is this still happen? When I correct unit test it does not throw.
BrannonKing (Migrated from github.com) reviewed 2018-11-28 22:46:51 +01:00
BrannonKing (Migrated from github.com) commented 2018-11-28 22:46:51 +01:00

It fails for me in latest with this test: claimtriebranching_tests/claimtriecache_normalization

It fails for me in latest with this test: claimtriebranching_tests/claimtriecache_normalization
bvbfan (Migrated from github.com) reviewed 2018-11-29 09:14:11 +01:00
bvbfan (Migrated from github.com) commented 2018-11-29 09:14:10 +01:00

Did you try that patch:

diff --git a/src/test/claimtriebranching_tests.cpp b/src/test/claimtriebranching_tests.cpp
index 48c5ce4f..6072947a 100644
--- a/src/test/claimtriebranching_tests.cpp
+++ b/src/test/claimtriebranching_tests.cpp
@@ -1212,7 +1212,6 @@ BOOST_AUTO_TEST_CASE(claimtriecache_normalization)
     BOOST_CHECK(name != name_upper);
 
     // Add another set of unicode claims that will collapse after the fork
-    CMutableTransaction tx1 = fixture.MakeClaim(fixture.GetCoinbase(), name, "amilie", 2);
     CMutableTransaction tx2 = fixture.MakeClaim(fixture.GetCoinbase(), name_upper, "amelie", 2);
     fixture.MakeClaim(fixture.GetCoinbase(), "amelie1", "amelie", 2);
     fixture.IncrementBlocks(1);
@@ -1231,6 +1230,7 @@ BOOST_AUTO_TEST_CASE(claimtriecache_normalization)
     fixture.setNormalizationForkHeight(1);
     int currentHeight = chainActive.Height();
 
+    CMutableTransaction tx1 = fixture.MakeClaim(fixture.GetCoinbase(), name, "amilie", 2);
     fixture.IncrementBlocks(1);
     // Ok normalization fix the name problem
     BOOST_CHECK(pclaimTrie->getInfoForName(name_normd, nval1));
@@ -1252,9 +1252,8 @@ BOOST_AUTO_TEST_CASE(claimtriecache_normalization)
     BOOST_CHECK(trieCache.spendClaim(name_upper, COutPoint(tx2.GetHash(), 0), currentHeight, amelieValidHeight));
 
     BOOST_CHECK(!pclaimTrie->getInfoForName(name, nval1));
-    BOOST_CHECK(trieCache.getInfoForName(name, nval1));
+    BOOST_CHECK(!trieCache.getInfoForName(name, nval1));
     BOOST_CHECK(trieCache.addClaim(name, COutPoint(tx1.GetHash(), 0), ClaimIdHash(tx1.GetHash(), 0), CAmount(2), currentHeight + 1));
-    BOOST_CHECK(trieCache.getInfoForName(name, nval1));
     insertUndoType insertUndo;
     claimQueueRowType expireUndo;
     insertUndoType insertSupportUndo;
Did you try that patch: ``` diff --git a/src/test/claimtriebranching_tests.cpp b/src/test/claimtriebranching_tests.cpp index 48c5ce4f..6072947a 100644 --- a/src/test/claimtriebranching_tests.cpp +++ b/src/test/claimtriebranching_tests.cpp @@ -1212,7 +1212,6 @@ BOOST_AUTO_TEST_CASE(claimtriecache_normalization) BOOST_CHECK(name != name_upper); // Add another set of unicode claims that will collapse after the fork - CMutableTransaction tx1 = fixture.MakeClaim(fixture.GetCoinbase(), name, "amilie", 2); CMutableTransaction tx2 = fixture.MakeClaim(fixture.GetCoinbase(), name_upper, "amelie", 2); fixture.MakeClaim(fixture.GetCoinbase(), "amelie1", "amelie", 2); fixture.IncrementBlocks(1); @@ -1231,6 +1230,7 @@ BOOST_AUTO_TEST_CASE(claimtriecache_normalization) fixture.setNormalizationForkHeight(1); int currentHeight = chainActive.Height(); + CMutableTransaction tx1 = fixture.MakeClaim(fixture.GetCoinbase(), name, "amilie", 2); fixture.IncrementBlocks(1); // Ok normalization fix the name problem BOOST_CHECK(pclaimTrie->getInfoForName(name_normd, nval1)); @@ -1252,9 +1252,8 @@ BOOST_AUTO_TEST_CASE(claimtriecache_normalization) BOOST_CHECK(trieCache.spendClaim(name_upper, COutPoint(tx2.GetHash(), 0), currentHeight, amelieValidHeight)); BOOST_CHECK(!pclaimTrie->getInfoForName(name, nval1)); - BOOST_CHECK(trieCache.getInfoForName(name, nval1)); + BOOST_CHECK(!trieCache.getInfoForName(name, nval1)); BOOST_CHECK(trieCache.addClaim(name, COutPoint(tx1.GetHash(), 0), ClaimIdHash(tx1.GetHash(), 0), CAmount(2), currentHeight + 1)); - BOOST_CHECK(trieCache.getInfoForName(name, nval1)); insertUndoType insertUndo; claimQueueRowType expireUndo; insertUndoType insertSupportUndo; ```
BrannonKing (Migrated from github.com) reviewed 2018-12-11 09:13:45 +01:00
BrannonKing (Migrated from github.com) commented 2018-12-11 09:13:45 +01:00

@lbrynaut & @kaykurokawa , this is the line where I wanted you to verify that getDelayForName would return the same as in addClaim (aka, the same going forward as going backward).

@lbrynaut & @kaykurokawa , this is the line where I wanted you to verify that getDelayForName would return the same as in addClaim (aka, the same going forward as going backward).
BrannonKing (Migrated from github.com) reviewed 2018-12-12 16:32:18 +01:00
BrannonKing (Migrated from github.com) commented 2018-12-12 16:32:17 +01:00

This needs to be 'normalized_name'. After looking through the lbry codebase today, I saw how much they really want PEP8 names.

This needs to be 'normalized_name'. After looking through the lbry codebase today, I saw how much they really want PEP8 names.
BrannonKing (Migrated from github.com) reviewed 2018-12-12 16:32:53 +01:00
BrannonKing (Migrated from github.com) commented 2018-12-12 16:32:52 +01:00

I'm thinking to change this to always return the normalized name, not just after the fork. That we people can adjust to it sooner.

I'm thinking to change this to always return the normalized name, not just after the fork. That we people can adjust to it sooner.
lbrynaut (Migrated from github.com) reviewed 2018-12-13 15:07:32 +01:00
@ -738,1 +745,4 @@
AS_IF([test "x$ICU_PREFIX" != xauto], [
ICU_CPPFLAGS="-I$ICU_PREFIX/include"
ICU_LIBS="-L$ICU_PREFIX/lib -licui18n -licuuc -licudata -ldl"
lbrynaut (Migrated from github.com) commented 2018-12-13 15:07:32 +01:00

ICU detection is still broken. What this is doing is preferring the system ICU over the built dependency ICU version, which is not what we want. Our build must require --with-icu and prefer that path. I'll update with something later and you can see if it works for you.

ICU detection is still broken. What this is doing is preferring the system ICU over the built dependency ICU version, which is not what we want. Our build must require --with-icu and prefer that path. I'll update with something later and you can see if it works for you.
BrannonKing (Migrated from github.com) reviewed 2018-12-13 15:29:49 +01:00
@ -738,1 +745,4 @@
AS_IF([test "x$ICU_PREFIX" != xauto], [
ICU_CPPFLAGS="-I$ICU_PREFIX/include"
ICU_LIBS="-L$ICU_PREFIX/lib -licui18n -licuuc -licudata -ldl"
BrannonKing (Migrated from github.com) commented 2018-12-13 15:29:48 +01:00

I don't want it to require --with-icu. However, if --with-icu is provided it had darn well better use that one.

I don't want it to _require_ --with-icu. However, if --with-icu is provided it had darn well better use that one.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lbrycrd#235
No description provided.