Implement a hard fork to normalize claim names #159

Closed
lbrynaut wants to merge 323 commits from normalization-rewrite into master
lbrynaut commented 2018-06-12 16:11:18 +02:00 (Migrated from github.com)

This is a complete re-write of #102 and replaces it since it's much more complete and handles situations that the other does not handle.

This PR contains a lot of changes and I expect a detailed review will take some time to ensure correctness. I also expect the reproducible_build script will need further modification (likely won't work as-is again due to the added ICU dependency).

@kaykurokawa The "claimtriebranching_hardfork_disktest" unit test does not seem to work with this PR, if you have a second to see what's going on there, it would help. It came up after things were working and then I rebased to latest, so it's commented out for now.

EDIT: @kaykurokawa This last comment is no longer true, so can be ignored now.

This is a complete re-write of #102 and replaces it since it's much more complete and handles situations that the other does not handle. This PR contains a lot of changes and I expect a detailed review will take some time to ensure correctness. I also expect the reproducible_build script will need further modification (likely won't work as-is again due to the added ICU dependency). @kaykurokawa The "claimtriebranching_hardfork_disktest" unit test does not seem to work with this PR, if you have a second to see what's going on there, it would help. It came up after things were working and then I rebased to latest, so it's commented out for now. EDIT: @kaykurokawa This last comment is no longer true, so can be ignored now.
lbrynaut commented 2018-06-26 21:47:42 +02:00 (Migrated from github.com)

Addresses #65

WIP regarding travis/dependency addition.

Updated.

Addresses #65 ~~WIP regarding travis/dependency addition.~~ Updated.
BrannonKing (Migrated from github.com) reviewed 2018-07-17 21:29:30 +02:00
BrannonKing (Migrated from github.com) commented 2018-07-17 21:08:27 +02:00

Delete it if it's obsolete.

Delete it if it's obsolete.
BrannonKing (Migrated from github.com) commented 2018-07-17 21:22:18 +02:00

boost.locale does not require this? Why would we need to get this separately from boost?

boost.locale does not require this? Why would we need to get this separately from boost?
BrannonKing (Migrated from github.com) commented 2018-07-17 21:29:07 +02:00

We default this to true after testing but before the release?

We default this to true after testing but before the release?
BrannonKing (Migrated from github.com) commented 2018-07-17 21:15:53 +02:00

This exists just to convert a char to a string?

This exists just to convert a char to a string?
lbrynaut (Migrated from github.com) reviewed 2018-07-17 21:33:41 +02:00
lbrynaut (Migrated from github.com) commented 2018-07-17 21:33:41 +02:00

A normalized string, yes.

A normalized string, yes.
lbrynaut (Migrated from github.com) reviewed 2018-07-17 21:33:43 +02:00
lbrynaut (Migrated from github.com) commented 2018-07-17 21:33:43 +02:00

Because we're using icu for normalization, which doesn't work through boost unless it's explicitly enabled.

Because we're using icu for normalization, which doesn't work through boost unless it's explicitly enabled.
lbrynaut (Migrated from github.com) reviewed 2018-07-17 21:35:06 +02:00
lbrynaut (Migrated from github.com) commented 2018-07-17 21:35:05 +02:00

I don't know what you're asking. It's a flag specified by a caller.

I don't know what you're asking. It's a flag specified by a caller.
BrannonKing (Migrated from github.com) reviewed 2018-07-17 21:53:52 +02:00
BrannonKing (Migrated from github.com) commented 2018-07-17 21:53:52 +02:00

So what is the gain with using boost's wrapper? Suppose we just make ICU calls directly?

So what is the gain with using boost's wrapper? Suppose we just make ICU calls directly?
lbrynaut (Migrated from github.com) reviewed 2018-07-17 21:56:44 +02:00
lbrynaut (Migrated from github.com) commented 2018-07-17 21:56:44 +02:00

It's C++ and style-compatible. I also have previous experience using it, for exactly this kind of purpose.

It's C++ and style-compatible. I also have previous experience using it, for exactly this kind of purpose.
BrannonKing (Migrated from github.com) reviewed 2018-07-18 16:24:03 +02:00
BrannonKing (Migrated from github.com) commented 2018-07-18 16:22:32 +02:00

It appears that the utf8 object is immutable. Can we move the construction of it out of our inline method? It seems we could just construct a static instance once and be done with it.

It appears that the `utf8` object is immutable. Can we move the construction of it out of our inline method? It seems we could just construct a static instance once and be done with it.
lbrynaut commented 2018-07-18 21:50:39 +02:00 (Migrated from github.com)

Updated.

Updated.
lbrynaut (Migrated from github.com) reviewed 2018-07-26 21:38:51 +02:00
lbrynaut (Migrated from github.com) left a comment

@kaykurokawa Let's talk.

@kaykurokawa Let's talk.
lbrynaut (Migrated from github.com) commented 2018-07-26 21:18:05 +02:00

Remove echo

Remove echo
lbrynaut (Migrated from github.com) commented 2018-07-26 21:18:18 +02:00

Remove step output

Remove step output
@ -26,7 +25,6 @@ function HELP {
exit 1
}
lbrynaut (Migrated from github.com) commented 2018-07-26 21:18:26 +02:00

Revert

Revert
lbrynaut (Migrated from github.com) commented 2018-07-26 21:19:08 +02:00

Note: This makes the build times accurate

Note: This makes the build times accurate
@ -376,3 +394,4 @@
git fetch origin2
git checkout origin2/master contrib/devtools/clang-format-diff.py
git diff -U0 origin2/master -- '*.h' '*.cpp' | ./contrib/devtools/clang-format-diff.py -p1
}
lbrynaut (Migrated from github.com) commented 2018-07-26 21:19:29 +02:00

Note that travis builds will be parallelized by this

Note that travis builds will be parallelized by this
lbrynaut (Migrated from github.com) commented 2018-07-26 21:20:34 +02:00

All fork heights are deferred to @kaykurokawa

All fork heights are deferred to @kaykurokawa
lbrynaut (Migrated from github.com) commented 2018-07-26 21:20:42 +02:00

All fork heights are deferred to @kaykurokawa

All fork heights are deferred to @kaykurokawa
lbrynaut (Migrated from github.com) commented 2018-07-26 21:20:48 +02:00

All fork heights are deferred to @kaykurokawa

All fork heights are deferred to @kaykurokawa
lbrynaut (Migrated from github.com) commented 2018-07-26 21:21:37 +02:00

Remove debug

Remove debug
lbrynaut (Migrated from github.com) commented 2018-07-26 21:21:53 +02:00

Remove debug

Remove debug
lbrynaut (Migrated from github.com) commented 2018-07-26 21:25:00 +02:00

To be clear, this fork implements normalization by removing the concept of a claimtrie of chars and we instead have a claimtrie of code points, represented as chars (via utf8).

To be clear, this fork implements normalization by removing the concept of a claimtrie of _chars_ and we instead have a claimtrie of _code points_, represented as chars (via utf8).
@ -1426,3 +1361,3 @@
{
LogPrintf("%s: Removing a claim was unsuccessful. name = %s, txhash = %s, nOut = %d", __func__, name.c_str(), outPoint.hash.GetHex(), outPoint.n);
LogPrintf("%s: Removing a claim was unsuccessful. name = %s, txhash = %s, nOut = %d\n", __func__, name.c_str(), outPoint.hash.GetHex(), outPoint.n);
return false;
lbrynaut (Migrated from github.com) commented 2018-07-26 21:27:00 +02:00

Style: remove blank lines

Style: remove blank lines
lbrynaut (Migrated from github.com) commented 2018-07-26 21:28:09 +02:00

Style revert unnecessary changes

Style revert unnecessary changes
lbrynaut (Migrated from github.com) commented 2018-07-26 21:33:47 +02:00

Remove fprintf

Remove fprintf
lbrynaut (Migrated from github.com) commented 2018-07-26 21:34:39 +02:00

@kaykurokawa Let's talk about this test at some point.

@kaykurokawa Let's talk about this test at some point.
BrannonKing (Migrated from github.com) approved these changes 2018-07-26 22:51:22 +02:00
bvbfan (Migrated from github.com) reviewed 2018-09-17 18:57:13 +02:00
bvbfan (Migrated from github.com) commented 2018-09-17 18:57:13 +02:00

Does we need to copy trie node?

dirtyNodes[normalizeClaimName(it->first)] = new CClaimTrieNode(*(it->second));
Does we need to copy trie node? ``` dirtyNodes[normalizeClaimName(it->first)] = new CClaimTrieNode(*(it->second)); ```
bvbfan (Migrated from github.com) reviewed 2018-09-17 19:05:21 +02:00
bvbfan (Migrated from github.com) commented 2018-09-17 19:05:21 +02:00

Returning const on non reference is non-sense in C++

Returning const on non reference is non-sense in C++
kaykurokawa commented 2018-09-17 21:54:42 +02:00 (Migrated from github.com)

I opened up a separate issue to discuss the issue of "how to do character normalization" here: https://github.com/lbryio/lbrycrd/issues/204 .

Seems like the method in this PR may not be optimal but we can discuss that separately since I think it will not affect the bulk of the changes here.

I opened up a separate issue to discuss the issue of "how to do character normalization" here: https://github.com/lbryio/lbrycrd/issues/204 . Seems like the method in this PR may not be optimal but we can discuss that separately since I think it will not affect the bulk of the changes here.
BrannonKing commented 2018-09-20 18:39:28 +02:00 (Migrated from github.com)

My first attempt at compiling this branch gave me this error:

../libtool: line 7485: cd: auto/lib: No such file or directory
libtool:   error: cannot determine absolute directory name of 'auto/lib'

I am using system-installed boost (with the OpenSSL 1.1 PR) as I haven't been able to make the reproducible_build.sh work for me since July (on Bionic). After getting the above error, I validated that "auto" was the path for ICU, googled a bit, and read the configure.ac a bit. I decided to add this to the configure command:

--with-icu=`pkg-config icu-i18n --variable=prefix`

That did alleviate the issue. It seems that configure.ac is not correctly falling back to pkgconfig in the situation where --with-icu is omitted.

My first attempt at compiling this branch gave me this error: ``` ../libtool: line 7485: cd: auto/lib: No such file or directory libtool: error: cannot determine absolute directory name of 'auto/lib' ``` I am using system-installed boost (with the OpenSSL 1.1 PR) as I haven't been able to make the reproducible_build.sh work for me since July (on Bionic). After getting the above error, I validated that "auto" was the path for ICU, googled a bit, and read the configure.ac a bit. I decided to add this to the configure command: ``` --with-icu=`pkg-config icu-i18n --variable=prefix` ``` That did alleviate the issue. It seems that configure.ac is not correctly falling back to pkgconfig in the situation where `--with-icu` is omitted.
lbrynaut (Migrated from github.com) reviewed 2018-09-21 21:22:43 +02:00
lbrynaut (Migrated from github.com) commented 2018-09-21 21:22:43 +02:00

Agreed. So is inline.

Agreed. So is inline.
lbrynaut commented 2018-09-21 21:25:37 +02:00 (Migrated from github.com)

That did alleviate the issue. It seems that configure.ac is not correctly falling back to pkgconfig in the situation where --with-icu is omitted.

Hrm, odd. It should be a separate case in the configure. I was focused on getting the reproducible builds going because we don't want boost or icu shared linkage (at least for the distributed binaries), as it's error prone and system dependent.

> That did alleviate the issue. It seems that configure.ac is not correctly falling back to pkgconfig in the situation where --with-icu is omitted. Hrm, odd. It should be a separate case in the configure. I was focused on getting the reproducible builds going because we don't want boost or icu shared linkage (at least for the distributed binaries), as it's error prone and system dependent.
lbrynaut (Migrated from github.com) reviewed 2018-09-21 21:26:18 +02:00
lbrynaut (Migrated from github.com) commented 2018-09-21 21:26:18 +02:00

No, we don't want the non-normalized value.

No, we don't want the non-normalized value.
lbrynaut (Migrated from github.com) reviewed 2018-09-21 23:16:09 +02:00
lbrynaut (Migrated from github.com) commented 2018-09-21 23:16:08 +02:00

I'm keeping inline in place of a storage specifier though, will remove the const.

I'm keeping inline in place of a storage specifier though, will remove the const.
lbrynaut commented 2018-09-22 00:07:25 +02:00 (Migrated from github.com)

@BrannonKing reported a performance issue, and it's true, this is slower than it needs to be. Correctness first, then performance is how I've always operated ;-)

Anyway, I've already got a partial fix, but will be updating when I think it's ready.

@BrannonKing reported a performance issue, and it's true, this is slower than it needs to be. Correctness first, then performance is how I've always operated ;-) Anyway, I've already got a partial fix, but will be updating when I think it's ready.
bvbfan (Migrated from github.com) reviewed 2018-09-22 05:59:59 +02:00
bvbfan (Migrated from github.com) commented 2018-09-22 05:59:59 +02:00

But why we want an empty node, it can wipe db in some cases.

But why we want an empty node, it can wipe db in some cases.
bvbfan (Migrated from github.com) reviewed 2018-09-22 06:05:18 +02:00
bvbfan (Migrated from github.com) commented 2018-09-22 06:05:18 +02:00

Leaking this is not a problem right now, but if we use placement or in-class operator new (that i've plans to do it) will be.

static CClaimTrie normalizedTrie(false, false, true);
normalizationForkStateInstance.normalizedTrie = &normalizedTrie;

It'll be better

Leaking this is not a problem right now, but if we use placement or in-class operator new (that i've plans to do it) will be. ``` static CClaimTrie normalizedTrie(false, false, true); normalizationForkStateInstance.normalizedTrie = &normalizedTrie; ``` It'll be better
lbrynaut (Migrated from github.com) reviewed 2018-09-24 19:23:28 +02:00
lbrynaut (Migrated from github.com) commented 2018-09-24 19:23:27 +02:00

This doesn't address what's needed (and doesn't save any memory, just moves it). A future release will delete the non-normalized trie, which will reduce memory, but can't be done until we're safely past the fork.

This doesn't address what's needed (and doesn't save any memory, just moves it). A future release will delete the non-normalized trie, which will reduce memory, but can't be done until we're safely past the fork.
BrannonKing commented 2018-09-24 23:07:57 +02:00 (Migrated from github.com)

My experience with this branch thus far (with fork height at 1M):

  1. I originally ran it with my existing mainnet data (including txindex). I was told that I needed to reindex.
  2. I let the reindex run over the weekend; it was soon obvious that it needed a lot of time. It took 50+ hours.
  3. I stopped the server so I could set the fork block height, recompiled, and reran. I was again told that I needed to reindex.
  4. I then fired up a lbrycrdd build from a different branch. It crashed with this error: main.cpp:2124: bool DisconnectBlock(const CBlock&, CValidationState&, const CBlockIndex*, CCoinsViewCache&, CClaimTrieCache&, bool*): Assertion 'pindex->GetBlockHash() == trieCache.getBestBlock()' failed.
My experience with this branch thus far (with fork height at 1M): 1. I originally ran it with my existing mainnet data (including txindex). I was told that I needed to reindex. 2. I let the reindex run over the weekend; it was soon obvious that it needed a lot of time. It took 50+ hours. 3. I stopped the server so I could set the fork block height, recompiled, and reran. I was again told that I needed to reindex. 4. I then fired up a lbrycrdd build from a different branch. It crashed with this error: `main.cpp:2124: bool DisconnectBlock(const CBlock&, CValidationState&, const CBlockIndex*, CCoinsViewCache&, CClaimTrieCache&, bool*): Assertion 'pindex->GetBlockHash() == trieCache.getBestBlock()' failed.`
bvbfan (Migrated from github.com) reviewed 2018-09-25 15:48:54 +02:00
bvbfan (Migrated from github.com) commented 2018-09-25 15:48:54 +02:00

Mine concern is that operator delete isn't called, it can do many other things than to free memory.

Mine concern is that operator delete isn't called, it can do many other things than to free memory.
lbrynaut (Migrated from github.com) reviewed 2018-09-26 13:42:54 +02:00
lbrynaut (Migrated from github.com) commented 2018-09-26 13:42:54 +02:00

Understood, but it needs to remain in memory and intact past the fork. A future release will remove it (by deleting it) when it's clear that we can. If you're generally pointing out that the single pointer is leaked, it's not particularly helpful because we know there are leaks in the claimtrie already (and this one isn't one in particular to be concerned about, IMO). The proposed suggestion doesn't change that either. Upstream uses managed pointers, and while our first update to it will not, it will be much easier to transition after that update. I was originally planning to do that before the upstream release, but it makes sense to do it after since we can skip boost pointers altogether and just use std ones.

Understood, but it needs to remain in memory and intact past the fork. A future release will remove it (by deleting it) when it's clear that we can. If you're generally pointing out that the single pointer is leaked, it's not particularly helpful because we know there are leaks in the claimtrie already (and this one isn't one in particular to be concerned about, IMO). The proposed suggestion doesn't change that either. Upstream uses managed pointers, and while our first update to it will not, it will be much easier to transition after that update. I was originally planning to do that before the upstream release, but it makes sense to do it after since we can skip boost pointers altogether and just use std ones.
lbrynaut (Migrated from github.com) reviewed 2018-09-26 13:43:02 +02:00
lbrynaut (Migrated from github.com) commented 2018-09-26 13:43:01 +02:00

Normalization can narrow the claimtrie name space.

Normalization can narrow the claimtrie name space.
lbrynaut commented 2018-09-26 13:48:13 +02:00 (Migrated from github.com)

My experience with this branch thus far (with fork height at 1M):

@BrannonKing I dusted off the webs, please test and find more (fork related) issues.

> My experience with this branch thus far (with fork height at 1M): @BrannonKing I dusted off the webs, please test and find more (fork related) issues.
BrannonKing commented 2018-09-26 18:16:29 +02:00 (Migrated from github.com)

The reindex issue is resolved. However, I was unable to run past the fork height. I modified the fork height to occur in the next few minutes. However, it never went past the block height one before my fork height. I waited about 20min. This is on mainnet. I'm seeing a lot of this error:

EXCEPTION: 15dbwrapper_error       
Database I/O error       
lbrycrd in ProcessMessages()
The reindex issue is resolved. However, I was unable to run past the fork height. I modified the fork height to occur in the next few minutes. However, it never went past the block height one before my fork height. I waited about 20min. This is on mainnet. I'm seeing a lot of this error: ``` EXCEPTION: 15dbwrapper_error Database I/O error lbrycrd in ProcessMessages() ```
kaykurokawa (Migrated from github.com) reviewed 2018-09-26 18:19:10 +02:00
kaykurokawa (Migrated from github.com) commented 2018-09-26 18:19:09 +02:00

looks like you were testing this and forgot to change?

looks like you were testing this and forgot to change?
kaykurokawa commented 2018-09-26 18:22:45 +02:00 (Migrated from github.com)

The reindex issue is resolved

Yes, same for me. Previous PR told me I needed reindex when starting with mainnet data.

I modified the fork height to occur in the next few minutes. However, it never went past the block height one before my fork height. I waited about 20min. This is on mainnet.

Brannon that sounds like expected behavior, because the fork happened on your codebase but not on mainnet.

>The reindex issue is resolved Yes, same for me. Previous PR told me I needed reindex when starting with mainnet data. > I modified the fork height to occur in the next few minutes. However, it never went past the block height one before my fork height. I waited about 20min. This is on mainnet. Brannon that sounds like expected behavior, because the fork happened on your codebase but not on mainnet.
BrannonKing commented 2018-09-26 18:36:01 +02:00 (Migrated from github.com)

If I try to run on mainnet with a fork height that is past I crash with this error:

EXCEPTION: 15dbwrapper_error       
Database I/O error       
lbrycrd in AppInit() 
If I try to run on mainnet with a fork height that is past I crash with this error: ``` EXCEPTION: 15dbwrapper_error Database I/O error lbrycrd in AppInit() ```
lbrynaut (Migrated from github.com) reviewed 2018-09-26 18:56:10 +02:00
lbrynaut (Migrated from github.com) commented 2018-09-26 18:56:10 +02:00

No, I left it open so it can be decided by all of us. Locale-aware or not. Choose wisely ;-)

No, I left it open so it can be decided by all of us. Locale-aware or not. Choose wisely ;-)
kaykurokawa (Migrated from github.com) reviewed 2018-09-26 21:16:36 +02:00
kaykurokawa (Migrated from github.com) commented 2018-09-26 21:16:35 +02:00

According to http://charette.no-ip.com:81/programming/doxygen/boost/namespaceboost_1_1algorithm.html#a1a318a47f9b4482c28552a62d0297a27 , it seems to be using std::locale() which is whatever the global locale is.

And it seems like the global locale always defaults to the "C" locale for C++ programs? (At least that's what it is set to for me, and other sources seem to say that the global locale should always be "C" )

According to http://charette.no-ip.com:81/programming/doxygen/boost/namespaceboost_1_1algorithm.html#a1a318a47f9b4482c28552a62d0297a27 , it seems to be using std::locale() which is whatever the global locale is. And it seems like the global locale always defaults to the "C" locale for C++ programs? (At least that's what it is set to for me, and other sources seem to say that the global locale should always be "C" )
kaykurokawa (Migrated from github.com) reviewed 2018-09-26 22:04:18 +02:00
kaykurokawa (Migrated from github.com) commented 2018-09-26 22:04:18 +02:00

I don't understand exactly what can cause a string to shrink here?

I don't understand exactly what can cause a string to shrink here?
lbrynaut (Migrated from github.com) reviewed 2018-09-26 22:11:53 +02:00
lbrynaut (Migrated from github.com) commented 2018-09-26 22:11:53 +02:00

Right, C is usually the default, which is not UTF-8. The goal is to be systematic (i.e. consistently using normalization), and then applying some effort on lower casing (where a perfect system for all encodings doesn't exist). So long as we're consistent, it's best effort IMO. I'm open to other takes on it.

Right, C is usually the default, which is not UTF-8. The goal is to be systematic (i.e. consistently using normalization), and then applying some effort on lower casing (where a perfect system for all encodings doesn't exist). So long as we're consistent, it's best effort IMO. I'm open to other takes on it.
lbrynaut (Migrated from github.com) reviewed 2018-09-26 22:15:17 +02:00
lbrynaut (Migrated from github.com) commented 2018-09-26 22:15:17 +02:00

That's for UTF-8 code points after normalization. For example, 0xc0a0 and 0x20 are encodings of the same thing, but collapsing the former (due to the lower casing) is not handled as equivalent. When there's a dispute, we use the original. Perhaps we could opt to use something else in that case, but again, that's for consistency.

That's for UTF-8 code points after normalization. For example, 0xc0a0 and 0x20 are encodings of the same thing, but collapsing the former (due to the lower casing) is not handled as equivalent. When there's a dispute, we use the original. Perhaps we could opt to use something else in that case, but again, that's for consistency.
lbrynaut (Migrated from github.com) reviewed 2018-09-26 22:31:40 +02:00
lbrynaut (Migrated from github.com) commented 2018-09-26 22:31:40 +02:00

I don't understand exactly what can cause a string to shrink here?

Also, technically it isn't a string as you normally think of them anymore. It is a string of byte-length code points, which are variably sized to make the characters. Sometimes a single encoded character may be N bytes, but after normalization it may be M.

> I don't understand exactly what can cause a string to shrink here? Also, technically it isn't a string as you normally think of them anymore. It is a string of byte-length code points, which are variably sized to make the characters. Sometimes a single encoded character may be N bytes, but after normalization it may be M.
kaykurokawa (Migrated from github.com) reviewed 2018-09-26 23:18:22 +02:00
kaykurokawa (Migrated from github.com) commented 2018-09-26 23:18:21 +02:00

Let me rephrase, I get why the size of the un-normalized string could shrink after normalization,

But I don't get why we are using that as the basis for determining whether we return the original un-normalized string, or the normalized string? Is there something special about this rule? Can you elaborate on your specific example of 0x20 and 0xc0a0 here? What are those characters and why aren't they handled as equivalent?

Let me rephrase, I get why the size of the un-normalized string could shrink after normalization, But I don't get why we are using that as the basis for determining whether we return the original un-normalized string, or the normalized string? Is there something special about this rule? Can you elaborate on your specific example of 0x20 and 0xc0a0 here? What are those characters and why aren't they handled as equivalent?
kaykurokawa commented 2018-09-27 01:28:11 +02:00 (Migrated from github.com)

I think we should discuss what happens to bytes that are invalid utf-8 . Because utf-8 is variable width, not all bytes are valid utf-8. It seems like the normalization function we have now will parse invalid bytes without throwing an error. I'm not sure exactly what the logic is but if I try passing in a byte that is invalid like 0xFF to the normalization function, it returns back 0xFF because it hits this below line in the normalization function:

return (normalized.size() >= name.size()) ? normalized : name;

The variable "normalized" was actually empty when I inspected it. I create this unit test gist, for testing purposes, you can check it out: https://gist.github.com/kaykurokawa/2c35843eb09da2bc8f31ffac46de4099

So two questions. First question is whether we should attempt to normalize invalid bytes at all, perhaps we should reject them from ever entering the claimtrie? Second question is if we are normalizing invalid bytes, than is this current method correct (is it possible that invalid bytes could be accidentally normalized into valid bytes?)

I think we should discuss what happens to bytes that are invalid utf-8 . Because utf-8 is variable width, not all bytes are valid utf-8. It seems like the normalization function we have now will parse invalid bytes without throwing an error. I'm not sure exactly what the logic is but if I try passing in a byte that is invalid like 0xFF to the normalization function, it returns back 0xFF because it hits this below line in the normalization function: `return (normalized.size() >= name.size()) ? normalized : name;` The variable "normalized" was actually empty when I inspected it. I create this unit test gist, for testing purposes, you can check it out: https://gist.github.com/kaykurokawa/2c35843eb09da2bc8f31ffac46de4099 So two questions. First question is whether we should attempt to normalize invalid bytes at all, perhaps we should reject them from ever entering the claimtrie? Second question is if we are normalizing invalid bytes, than is this current method correct (is it possible that invalid bytes could be accidentally normalized into valid bytes?)
BrannonKing commented 2018-09-27 18:25:40 +02:00 (Migrated from github.com)

If our standard is UTF-8, we should reject anything that is not valid UTF-8. That's my vote.

If our standard is UTF-8, we should reject anything that is not valid UTF-8. That's my vote.
BrannonKing commented 2018-09-27 18:28:08 +02:00 (Migrated from github.com)

I tried to test this in a simple fashion this morning with no luck:

  1. Remove ~/.lbrycrd/regtest
  2. Compile with regtest normalization fork height at 20.
  3. Run the daemon: ./src/lbrycrdd -server -txindex -regtest -walletbroadcast=0
  4. Generate 30 blocks: ./lbrycrd/src/lbrycrd-cli -regtest generate 30
  5. Get this error:
error code: -1
error message:
Database I/O error
I tried to test this in a simple fashion this morning with no luck: 1. Remove `~/.lbrycrd/regtest` 2. Compile with regtest normalization fork height at 20. 3. Run the daemon: `./src/lbrycrdd -server -txindex -regtest -walletbroadcast=0` 4. Generate 30 blocks: `./lbrycrd/src/lbrycrd-cli -regtest generate 30` 5. Get this error: ``` error code: -1 error message: Database I/O error ```
lbrynaut commented 2018-10-03 18:23:38 +02:00 (Migrated from github.com)

I tried to test this in a simple fashion this morning with no luck:

1. Remove `~/.lbrycrd/regtest`

2. Compile with regtest normalization fork height at 20.

3. Run the daemon: `./src/lbrycrdd -server -txindex -regtest -walletbroadcast=0`

4. Generate 30 blocks: `./lbrycrd/src/lbrycrd-cli -regtest generate 30`

5. Get this error:
error code: -1
error message:
Database I/O error

@BrannonKing Thanks, give it a go again, this is corrected.

> I tried to test this in a simple fashion this morning with no luck: > > 1. Remove `~/.lbrycrd/regtest` > > 2. Compile with regtest normalization fork height at 20. > > 3. Run the daemon: `./src/lbrycrdd -server -txindex -regtest -walletbroadcast=0` > > 4. Generate 30 blocks: `./lbrycrd/src/lbrycrd-cli -regtest generate 30` > > 5. Get this error: > > > ``` > error code: -1 > error message: > Database I/O error > ``` @BrannonKing Thanks, give it a go again, this is corrected.
lbrynaut commented 2018-10-03 18:33:40 +02:00 (Migrated from github.com)

So two questions. First question is whether we should attempt to normalize invalid bytes at all, perhaps we should reject them from ever entering the claimtrie? Second question is if we are normalizing invalid bytes, than is this current method correct (is it possible that invalid bytes could be accidentally normalized into valid bytes?)

@kaykurokawa Good questions. There are few issues that need addressing:

  1. Normalization can fail
  2. Lower-casing can fail
    [ Generically, we call both steps normalization in the code ]

The general approach I was going for is that if either fails, fall-back to our current behavior (which is to just treat the bytes as bytes in the claimtrie). Maybe that was a bad assumption on what we need, or maybe it's not working as intended (although it appears to be to me).

Glad we're finally doing this kind of review though because I did expect this process to take some time and we're getting to the hard parts. Open to suggestions. I'm not sure rejecting any non-UTF8 inputs is the correct solution, as it does place a larger burden on the applications using lbrycrd, who really may not care about this feature.

Either way, I think we can all agree that consistency is the most important, in that either the same byte or code-point sequences are stored consistently in the claimtrie.

> So two questions. First question is whether we should attempt to normalize invalid bytes at all, perhaps we should reject them from ever entering the claimtrie? Second question is if we are normalizing invalid bytes, than is this current method correct (is it possible that invalid bytes could be accidentally normalized into valid bytes?) @kaykurokawa Good questions. There are few issues that need addressing: 1) Normalization can fail 2) Lower-casing can fail [ Generically, we call both steps normalization in the code ] The general approach I was going for is that if either fails, fall-back to our current behavior (which is to just treat the bytes as bytes in the claimtrie). Maybe that was a bad assumption on what we need, or maybe it's not working as intended (although it appears to be to me). Glad we're finally doing this kind of review though because I did expect this process to take some time and we're getting to the hard parts. Open to suggestions. I'm not sure rejecting any non-UTF8 inputs is the correct solution, as it does place a larger burden on the applications using lbrycrd, who really may not care about this feature. Either way, I think we can all agree that consistency is the most important, in that either the same byte or code-point sequences are stored consistently in the claimtrie.
BrannonKing (Migrated from github.com) requested changes 2018-10-08 21:49:57 +02:00
BrannonKing (Migrated from github.com) commented 2018-10-08 21:40:46 +02:00

What if the existing node had additional claims on it?

What if the existing node had additional claims on it?
BrannonKing (Migrated from github.com) commented 2018-10-08 21:38:31 +02:00

This line is brutally expensive.

This line is brutally expensive.
BrannonKing (Migrated from github.com) commented 2018-10-08 21:39:30 +02:00

Again, regenerating the hashes for the entire trie (x2) is too expensive for logging.

Again, regenerating the hashes for the entire trie (x2) is too expensive for logging.
BrannonKing (Migrated from github.com) commented 2018-10-08 21:42:23 +02:00

This line looks suspicious to me; what if the previous node had claims on it? It appears that we don't keep those. Is this where we're losing claims? We need to change them so that all the original ones still complete; don't throw them away.

This line looks suspicious to me; what if the previous node had claims on it? It appears that we don't keep those. Is this where we're losing claims? We need to change them so that all the original ones still complete; don't throw them away.
@ -1663,3 +1593,4 @@
{
name = normalizeClaimName(name);
return removeClaim(name, outPoint, nHeight, nValidAtHeight, true);
}
BrannonKing (Migrated from github.com) commented 2018-10-08 21:46:15 +02:00

I know this is a preexisting line, but it seems to make a harsh assumption about the order that data in the trie is read from disk (it must be "pre-order").

I know this is a preexisting line, but it seems to make a harsh assumption about the order that data in the trie is read from disk (it must be "pre-order").
@ -2745,3 +2758,4 @@
}
int64_t nTime3 = GetTimeMicros(); nTimeConnect += nTime3 - nTime2;
LogPrint("bench", " - Connect %u transactions: %.2fms (%.3fms/tx, %.3fms/txin) [%.2fs]\n", (unsigned)block.vtx.size(), 0.001 * (nTime3 - nTime2), 0.001 * (nTime3 - nTime2) / block.vtx.size(), nInputs <= 1 ? 0 : 0.001 * (nTime3 - nTime2) / (nInputs-1), nTimeConnect * 0.000001);
BrannonKing (Migrated from github.com) commented 2018-10-08 21:35:44 +02:00

Is there an advantage to having this code to enable the normalization go here rather than in the CClaimTrieCache object (since the latter also gets IncrementBlock called)? What rule would guide that?

Is there an advantage to having this code to enable the normalization go here rather than in the CClaimTrieCache object (since the latter also gets IncrementBlock called)? What rule would guide that?
BrannonKing (Migrated from github.com) commented 2018-10-08 21:36:27 +02:00

Why do we need to make a new trieCache object here?

Why do we need to make a new trieCache object here?
BrannonKing (Migrated from github.com) reviewed 2018-10-08 22:00:37 +02:00
BrannonKing (Migrated from github.com) commented 2018-10-08 22:00:30 +02:00

Thinking about this some: an alternative to this would be to simply walk the trie and add necessary changes to the insertion and deletion queues. As written this seems to make an assumption that the vast majority of the trie will have to change. If we modify the cache instead the persisted undo info would allow us to rollback without having to store the duplicated trie in RAM.

Thinking about this some: an alternative to this would be to simply walk the trie and add necessary changes to the insertion and deletion queues. As written this seems to make an assumption that the vast majority of the trie will have to change. If we modify the cache instead the persisted undo info would allow us to rollback without having to store the duplicated trie in RAM.
BrannonKing (Migrated from github.com) requested changes 2018-10-08 22:34:49 +02:00
BrannonKing (Migrated from github.com) commented 2018-10-08 22:34:43 +02:00

This line is missing the normalizedClaimName on it->first ? And for all the items in this copy, do we not need to merge them instead of replace them as there might be conflicts?

This line is missing the `normalizedClaimName` on `it->first` ? And for all the items in this copy, do we not need to merge them instead of replace them as there might be conflicts?
lbrynaut (Migrated from github.com) reviewed 2018-10-08 23:30:47 +02:00
lbrynaut (Migrated from github.com) commented 2018-10-08 23:30:47 +02:00

In what way? Show me the measurements.

In what way? Show me the measurements.
BrannonKing (Migrated from github.com) reviewed 2018-10-09 00:07:00 +02:00
BrannonKing (Migrated from github.com) commented 2018-10-09 00:07:00 +02:00

We presently have 5.5million nodes in the trie. getMerkleHash(true) triggers a rehash on every one of those. I just measured the call at 3 microseconds each. This was on a non-optimized build, but even if you figure 1 microsecond per hash call you're looking at 10 seconds of CPU hit for that one line.

We presently have 5.5million nodes in the trie. `getMerkleHash(true)` triggers a rehash on every one of those. I just measured the call at 3 microseconds each. This was on a non-optimized build, but even if you figure 1 microsecond per hash call you're looking at 10 seconds of CPU hit for that one line.
bvbfan (Migrated from github.com) reviewed 2018-10-09 10:11:22 +02:00
@ -742,12 +749,20 @@ if test x$use_pkgconfig = xyes; then
AC_MSG_ERROR(pkg-config not found.)
fi
if test "${ICU_PREFIX}" != "auto"; then
bvbfan (Migrated from github.com) commented 2018-10-09 10:11:22 +02:00

I investigate why configure without --with-icu fails, this section is for pkg-config you can replace all lines 756-765 with

PKG_CHECK_MODULES([ICUI18N], [icu-i18n],, [AC_MSG_ERROR(icu-i18n  not found.)])

Then it will make all needed flags and libs for compilation, else section (792-802) looks correct.

I investigate why configure without --with-icu fails, this section is for pkg-config you can replace all lines 756-765 with ``` PKG_CHECK_MODULES([ICUI18N], [icu-i18n],, [AC_MSG_ERROR(icu-i18n not found.)]) ``` Then it will make all needed flags and libs for compilation, `else` section (792-802) looks correct.
BrannonKing (Migrated from github.com) reviewed 2018-10-09 19:08:17 +02:00
@ -742,12 +749,20 @@ if test x$use_pkgconfig = xyes; then
AC_MSG_ERROR(pkg-config not found.)
fi
if test "${ICU_PREFIX}" != "auto"; then
BrannonKing (Migrated from github.com) commented 2018-10-09 19:08:17 +02:00

I just tested this and still got the same error as before: cannot determine absolute directory name of 'auto/lib'. I cleaned my repository. I modified configure.ac as suggested above, then reran autogen.sh and configure (CXXFLAGS="-g" ./configure --enable-cxx --enable-static --disable-shared --with-pic --with-gui=no) and make.

I just tested this and still got the same error as before: `cannot determine absolute directory name of 'auto/lib'`. I cleaned my repository. I modified configure.ac as suggested above, then reran autogen.sh and configure (`CXXFLAGS="-g" ./configure --enable-cxx --enable-static --disable-shared --with-pic --with-gui=no`) and make.
bvbfan (Migrated from github.com) reviewed 2018-10-09 19:50:12 +02:00
@ -742,12 +749,20 @@ if test x$use_pkgconfig = xyes; then
AC_MSG_ERROR(pkg-config not found.)
fi
if test "${ICU_PREFIX}" != "auto"; then
bvbfan (Migrated from github.com) commented 2018-10-09 19:50:11 +02:00

That's the diff, it works here.

diff --git a/configure.ac b/configure.ac
index 3d9f513d9..243edf5b5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -753,15 +753,9 @@ if test x$use_pkgconfig = xyes; then
   m4_ifdef(
     [PKG_CHECK_MODULES],
     [
-      ICU_CPPFLAGS="-I$ICU_PREFIX/include"
-      CPPFLAGS="$ICU_CPPFLAGS $CPPFLAGS"
-      ICU_LIBS="-L$ICU_PREFIX/lib -licui18n -licuuc -licudata -dl"
-      AC_MSG_WARN([Using ICU_CPPFLAGS $ICU_CPPFLAGS])
-      AC_MSG_WARN([Using CPPFLAGS $CPPFLAGS])
+      PKG_CHECK_MODULES([ICUI18N], [icu-i18n],, [AC_MSG_ERROR(icu-i18n  not found.)])
 
       AC_CHECK_HEADER([unicode/errorcode.h],,AC_MSG_ERROR(libicu headers missing))
-      AC_CHECK_LIB([icudata], [main], ICU_LIBS="-L$ICU_PREFIX/lib -licui18n -licuuc -licudata -dl",
-        AC_CHECK_LIB([icu18n], [main],ICU_LIBS=$ICU_LIBS, ICU_LIBS="-L$ICU_PREFIX/lib -lsicudt -lsicuin -lsicuio -lsicule -lsiculx -lsicutest -lsicutu -lsicuuc"))
 
       PKG_CHECK_MODULES([SSL], [libssl],, [AC_MSG_ERROR(openssl  not found.)])
       PKG_CHECK_MODULES([CRYPTO], [libcrypto],,[AC_MSG_ERROR(libcrypto  not found.)])
That's the diff, it works here. ``` diff --git a/configure.ac b/configure.ac index 3d9f513d9..243edf5b5 100644 --- a/configure.ac +++ b/configure.ac @@ -753,15 +753,9 @@ if test x$use_pkgconfig = xyes; then m4_ifdef( [PKG_CHECK_MODULES], [ - ICU_CPPFLAGS="-I$ICU_PREFIX/include" - CPPFLAGS="$ICU_CPPFLAGS $CPPFLAGS" - ICU_LIBS="-L$ICU_PREFIX/lib -licui18n -licuuc -licudata -dl" - AC_MSG_WARN([Using ICU_CPPFLAGS $ICU_CPPFLAGS]) - AC_MSG_WARN([Using CPPFLAGS $CPPFLAGS]) + PKG_CHECK_MODULES([ICUI18N], [icu-i18n],, [AC_MSG_ERROR(icu-i18n not found.)]) AC_CHECK_HEADER([unicode/errorcode.h],,AC_MSG_ERROR(libicu headers missing)) - AC_CHECK_LIB([icudata], [main], ICU_LIBS="-L$ICU_PREFIX/lib -licui18n -licuuc -licudata -dl", - AC_CHECK_LIB([icu18n], [main],ICU_LIBS=$ICU_LIBS, ICU_LIBS="-L$ICU_PREFIX/lib -lsicudt -lsicuin -lsicuio -lsicule -lsiculx -lsicutest -lsicutu -lsicuuc")) PKG_CHECK_MODULES([SSL], [libssl],, [AC_MSG_ERROR(openssl not found.)]) PKG_CHECK_MODULES([CRYPTO], [libcrypto],,[AC_MSG_ERROR(libcrypto not found.)]) ```
BrannonKing commented 2018-10-12 18:49:40 +02:00 (Migrated from github.com)

I'm getting close to having a solution that doesn't require keeping two tries in memory. As part of that work, I've been narrowing in on two rules:

  1. You cannot normalize a string inside of a method that takes a partial key; you run the risk of losing data.
  2. You cannot normalize inside of a claimtrie's method called from claimtriecache; they aren't on guaranteed to be on the same block height. (It's possible that we could make this work by careful locations for turning the normalization handling in the claimtrie on/off, but I haven't seen a need for it yet.)
I'm getting close to having a solution that doesn't require keeping two tries in memory. As part of that work, I've been narrowing in on two rules: 1. You cannot normalize a string inside of a method that takes a partial key; you run the risk of losing data. 2. You cannot normalize inside of a claimtrie's method called from claimtriecache; they aren't on guaranteed to be on the same block height. (It's possible that we could make this work by careful locations for turning the normalization handling in the claimtrie on/off, but I haven't seen a need for it yet.)
lbrynaut (Migrated from github.com) reviewed 2018-10-19 15:28:14 +02:00
lbrynaut (Migrated from github.com) commented 2018-10-19 15:28:13 +02:00

Removal of this assert is fine (it's no longer needed using this approach).

Removal of this assert is fine (it's no longer needed using this approach).
bvbfan (Migrated from github.com) reviewed 2018-10-19 16:13:11 +02:00
@ -2077,0 +2068,4 @@
bool CClaimTrieCache::incrementBlock(insertUndoType& insertUndo, claimQueueRowType& expireUndo,
insertUndoType& insertSupportUndo, supportQueueRowType& expireSupportUndo,
std::vector<std::pair<std::string, int> >& takeoverHeightUndo) const
bvbfan (Migrated from github.com) commented 2018-10-19 16:13:11 +02:00

Why commented code?

Why commented code?
bvbfan (Migrated from github.com) reviewed 2018-10-19 16:45:41 +02:00
bvbfan (Migrated from github.com) commented 2018-10-19 16:45:41 +02:00

For OpenSSL 1.0 you can use EVP_MD_CTX_create or

#if OPENSSL_VERSION_NUMBER >= 0x10000000L && OPENSSL_VERSION_NUMBER < 0x10100000L
#define EVP_MD_CTX_new EVP_MD_CTX_create
#define EVP_MD_CTX_free EVP_MD_CTX_destroy
#endif

Then you can use EVP_MD_CTX_new and EVP_MD_CTX_free unconditional.

For OpenSSL 1.0 you can use EVP_MD_CTX_create or ``` #if OPENSSL_VERSION_NUMBER >= 0x10000000L && OPENSSL_VERSION_NUMBER < 0x10100000L #define EVP_MD_CTX_new EVP_MD_CTX_create #define EVP_MD_CTX_free EVP_MD_CTX_destroy #endif ``` Then you can use EVP_MD_CTX_new and EVP_MD_CTX_free unconditional.
bvbfan (Migrated from github.com) reviewed 2018-10-19 16:49:01 +02:00
bvbfan (Migrated from github.com) commented 2018-10-19 16:49:01 +02:00

If this assert fails it looks like it compares noramalized and not normalized name, that's pretty likely when it used in increment block.

If this assert fails it looks like it compares noramalized and not normalized name, that's pretty likely when it used in increment block.
BrannonKing (Migrated from github.com) reviewed 2018-10-19 21:18:32 +02:00
@ -2077,0 +2068,4 @@
bool CClaimTrieCache::incrementBlock(insertUndoType& insertUndo, claimQueueRowType& expireUndo,
insertUndoType& insertSupportUndo, supportQueueRowType& expireSupportUndo,
std::vector<std::pair<std::string, int> >& takeoverHeightUndo) const
BrannonKing (Migrated from github.com) commented 2018-10-19 21:18:31 +02:00

it was for a previous discussion and now removed

it was for a previous discussion and now removed
BrannonKing (Migrated from github.com) reviewed 2018-10-19 21:19:33 +02:00
BrannonKing (Migrated from github.com) commented 2018-10-19 21:19:33 +02:00

wasn't meant to be part of this review; make sure #221 got it correct

wasn't meant to be part of this review; make sure #221 got it correct
BrannonKing (Migrated from github.com) reviewed 2018-10-19 21:22:50 +02:00
BrannonKing (Migrated from github.com) commented 2018-10-19 21:22:50 +02:00

I found and fixed the problem here. Because I was reusing the InsertUndo I had to ensure that the decrementBlock only put appropriate items back into the QueueCache.

I found and fixed the problem here. Because I was reusing the InsertUndo I had to ensure that the decrementBlock only put appropriate items back into the QueueCache.
BrannonKing (Migrated from github.com) reviewed 2018-10-19 21:23:18 +02:00
@ -2745,3 +2758,4 @@
}
int64_t nTime3 = GetTimeMicros(); nTimeConnect += nTime3 - nTime2;
LogPrint("bench", " - Connect %u transactions: %.2fms (%.3fms/tx, %.3fms/txin) [%.2fs]\n", (unsigned)block.vtx.size(), 0.001 * (nTime3 - nTime2), 0.001 * (nTime3 - nTime2) / block.vtx.size(), nInputs <= 1 ? 0 : 0.001 * (nTime3 - nTime2) / (nInputs-1), nTimeConnect * 0.000001);
BrannonKing (Migrated from github.com) commented 2018-10-19 21:23:18 +02:00

removed in my changes

removed in my changes
BrannonKing (Migrated from github.com) reviewed 2018-10-19 21:23:35 +02:00
@ -2745,3 +2758,4 @@
}
int64_t nTime3 = GetTimeMicros(); nTimeConnect += nTime3 - nTime2;
LogPrint("bench", " - Connect %u transactions: %.2fms (%.3fms/tx, %.3fms/txin) [%.2fs]\n", (unsigned)block.vtx.size(), 0.001 * (nTime3 - nTime2), 0.001 * (nTime3 - nTime2) / block.vtx.size(), nInputs <= 1 ? 0 : 0.001 * (nTime3 - nTime2) / (nInputs-1), nTimeConnect * 0.000001);
BrannonKing (Migrated from github.com) commented 2018-10-19 21:23:35 +02:00

resolved

resolved
BrannonKing (Migrated from github.com) reviewed 2018-10-19 21:24:36 +02:00
@ -742,12 +749,20 @@ if test x$use_pkgconfig = xyes; then
AC_MSG_ERROR(pkg-config not found.)
fi
if test "${ICU_PREFIX}" != "auto"; then
BrannonKing (Migrated from github.com) commented 2018-10-19 21:24:36 +02:00

included in the latest checkin

included in the latest checkin
BrannonKing (Migrated from github.com) approved these changes 2018-10-19 21:24:58 +02:00
bvbfan (Migrated from github.com) reviewed 2018-10-22 08:26:43 +02:00
bvbfan (Migrated from github.com) commented 2018-10-22 08:26:41 +02:00

It should be static, inilining here is unwanted, furthermore checking to be performed normalize should not be here. For now it should have ability to be called from CClaimTrie and CClaimTrieCache depending of their heights.

It should be static, inilining here is unwanted, furthermore checking to be performed normalize should not be here. For now it should have ability to be called from CClaimTrie and CClaimTrieCache depending of their heights.
BrannonKing commented 2018-10-22 16:42:11 +02:00 (Migrated from github.com)

@bvbfan , excellent analysis. With our work on #44 it appears that we achieved independence from methods taking a name in rpc/claimtrie.cpp. I think that is critical to your suggestion that we don't do string normalization in the CClaimTrie (instead, we do it all in the "cache"). I really like the suggestion! I think that puts a high priority on getting #44 merged and rebasing our normalization on top of that.

@bvbfan , excellent analysis. With our work on #44 it appears that we achieved independence from methods taking a name in rpc/claimtrie.cpp. I think that is critical to your suggestion that we don't do string normalization in the CClaimTrie (instead, we do it all in the "cache"). I really like the suggestion! I think that puts a high priority on getting #44 merged and rebasing our normalization on top of that.
BrannonKing (Migrated from github.com) reviewed 2018-10-30 00:05:04 +01:00
BrannonKing (Migrated from github.com) commented 2018-10-30 00:05:04 +01:00

fixed in last push

fixed in last push
BrannonKing (Migrated from github.com) reviewed 2018-10-30 00:07:31 +01:00
BrannonKing (Migrated from github.com) commented 2018-10-30 00:07:31 +01:00

Fixed in current push.

Fixed in current push.
bvbfan (Migrated from github.com) reviewed 2018-10-30 14:25:56 +01:00
@ -2291,3 +2318,2 @@
{
if (*itNamesToCheck == itSupportQueue->first && itSupportQueue->second.outPoint == itSupportQueueName->outPoint && itSupportQueue->second.nValidAtHeight == itSupportQueueName->nHeight)
{
if (nameToCheck == itSupportQueue->first && itSupportQueue->second.outPoint == itSupportQueueName->outPoint && itSupportQueue->second.nValidAtHeight == itSupportQueueName->nHeight) {
bvbfan (Migrated from github.com) commented 2018-10-30 14:25:55 +01:00

I don't see a reason to make param in/out.

I don't see a reason to make param in/out.
BrannonKing (Migrated from github.com) reviewed 2018-10-30 14:29:35 +01:00
@ -2291,3 +2318,2 @@
{
if (*itNamesToCheck == itSupportQueue->first && itSupportQueue->second.outPoint == itSupportQueueName->outPoint && itSupportQueue->second.nValidAtHeight == itSupportQueueName->nHeight)
{
if (nameToCheck == itSupportQueue->first && itSupportQueue->second.outPoint == itSupportQueueName->outPoint && itSupportQueue->second.nValidAtHeight == itSupportQueueName->nHeight) {
BrannonKing (Migrated from github.com) commented 2018-10-30 14:29:35 +01:00

The out parameter is used in mining.cpp and main.cpp (to avoid duplicate work on that name). Out parameter usage is not very obvious, though. I'm open to other suggestions.

The out parameter is used in mining.cpp and main.cpp (to avoid duplicate work on that name). Out parameter usage is not very obvious, though. I'm open to other suggestions.
bvbfan (Migrated from github.com) reviewed 2018-10-30 15:03:18 +01:00
@ -2291,3 +2318,2 @@
{
if (*itNamesToCheck == itSupportQueue->first && itSupportQueue->second.outPoint == itSupportQueueName->outPoint && itSupportQueue->second.nValidAtHeight == itSupportQueueName->nHeight)
{
if (nameToCheck == itSupportQueue->first && itSupportQueue->second.outPoint == itSupportQueueName->outPoint && itSupportQueue->second.nValidAtHeight == itSupportQueueName->nHeight) {
bvbfan (Migrated from github.com) commented 2018-10-30 15:03:17 +01:00

Yeah, i see now. The right approach, to me, is to call normalize before spend, we can prevent double work if we have function like isNormlized before real normalization. But i don't think double call will slow down significantly.

Yeah, i see now. The right approach, to me, is to call normalize before spend, we can prevent double work if we have function like isNormlized before real normalization. But i don't think double call will slow down significantly.
BrannonKing (Migrated from github.com) requested changes 2018-11-02 23:23:48 +01:00
@ -368,0 +380,4 @@
CPPFLAGS="${CPPFLAGS}" LDFLAGS="${LDFLAGS}" \
./configure --without-gui ${OPTIONS} \
--with-boost="${BOOST_PREFIX}" \
--with-icu="${ICU_PREFIX}" >> "${LBRYCRD_LOG}" 2>&1
BrannonKing (Migrated from github.com) commented 2018-11-02 23:23:43 +01:00

This is not going to be a good enough substitute for static linking. When I run ldd on a build from this script I see libicui18n.so.55 => not found, libicuuc.so.55 => not found. It's possible that we could deploy some shared libraries alongside our executables.

This is not going to be a good enough substitute for static linking. When I run `ldd` on a build from this script I see `libicui18n.so.55 => not found, libicuuc.so.55 => not found`. It's possible that we could deploy some shared libraries alongside our executables.
bvbfan (Migrated from github.com) reviewed 2018-11-05 10:35:30 +01:00
@ -368,0 +380,4 @@
CPPFLAGS="${CPPFLAGS}" LDFLAGS="${LDFLAGS}" \
./configure --without-gui ${OPTIONS} \
--with-boost="${BOOST_PREFIX}" \
--with-icu="${ICU_PREFIX}" >> "${LBRYCRD_LOG}" 2>&1
bvbfan (Migrated from github.com) commented 2018-11-05 10:35:29 +01:00

Notice line 333 it has --enable-shared, so building with this option the result is dynamic libraries (.so) but for static build we want a static ones (.a) so it should be --disable-shared --enable-static.

Notice line 333 it has --enable-shared, so building with this option the result is dynamic libraries (*.so) but for static build we want a static ones (*.a) so it should be --disable-shared --enable-static.
bvbfan (Migrated from github.com) reviewed 2018-11-05 13:31:18 +01:00
bvbfan (Migrated from github.com) commented 2018-11-05 13:31:17 +01:00

Function opening brace goes to the new line, that's tidy expectation.

Function opening brace goes to the new line, that's tidy expectation.
bvbfan (Migrated from github.com) reviewed 2018-11-05 13:31:37 +01:00
bvbfan (Migrated from github.com) commented 2018-11-05 13:31:37 +01:00

Opening brace on new line.

Opening brace on new line.
BrannonKing (Migrated from github.com) reviewed 2018-11-05 19:26:20 +01:00
BrannonKing (Migrated from github.com) commented 2018-11-05 19:26:19 +01:00

I haven't ran the formatter on this yet. I assume that would pick this up?

I haven't ran the formatter on this yet. I assume that would pick this up?
bvbfan (Migrated from github.com) reviewed 2018-11-05 19:29:17 +01:00
bvbfan (Migrated from github.com) commented 2018-11-05 19:29:17 +01:00

Yes, it will.

Yes, it will.
BrannonKing (Migrated from github.com) reviewed 2018-11-07 01:05:19 +01:00
@ -368,0 +380,4 @@
CPPFLAGS="${CPPFLAGS}" LDFLAGS="${LDFLAGS}" \
./configure --without-gui ${OPTIONS} \
--with-boost="${BOOST_PREFIX}" \
--with-icu="${ICU_PREFIX}" >> "${LBRYCRD_LOG}" 2>&1
BrannonKing (Migrated from github.com) commented 2018-11-07 01:05:19 +01:00

fixed

fixed
BrannonKing (Migrated from github.com) approved these changes 2018-11-08 06:28:10 +01:00
BrannonKing commented 2018-11-08 06:48:55 +01:00 (Migrated from github.com)

I did finally get a green on the Linux compilation again. Some notes on my issues with the reproducible_build.sh:

  1. The ICU_PREFIX was being exported to the child makefiles but wasn't always set to a real value when relying on pkg-config. This was causing issues. I changed the child makefiles to rely wholly upon ICU_LIBS and ICU_CPPFLAGS, which should be always set.
  2. I could not get ICU v55 to compile statically. I upped it to v57. Our current version of Boost (v59) would not compile with newer versions of ICU.
  3. The script sets "-e", or maybe it's "-u", which makes it exit whenever any command returns nonzero. It also sets pipeline, which makes it keep the worst result from any method. The flag handling in the wait method was always returning 1 on my machine, causing the whole script to exit. I ditched the flag usage there in favor of a sub-terminal.
  4. The parameters for the boost b2 command weren't getting grouped appropriately. I put in quotes. This made it so that I could not make that call from the "background" function. I'm hoping someone else has a solution for that. The background function seems to run it fine, it just doesn't find the ICU deps when called from there. I could not figure that out. I did add a grep check to ensure that the ICU deps are found.
  5. Because I'm now compiling Boost without the background minute message, TravisCI times out at 10 minutes when compiling Boost afresh. I don't have a solution for this yet. It wants output every 10 minutes. It's ridiculous that the timeout is not configurable. Instead, we push the output to a log file for each dependency build. I'm not sure that's actually a desirable feature. The other side affect of that is how we hide any error reported by our script with a postdump of 200 or 1000 lines of the log file.
  6. The TravisCI caching keeps the build folder around. Our script was checking to see if the dependency's parent folder existed there. If it did exist, it would not run make on that dependency. Hence, if you fail on a dependency it won't get rebuilt because of the TravisCI cache. Not only that, but build/boost was always there -- even if I deleted the cached file from TravisCI. I had to change the script to always build dependencies, which is fast for all except openssl who thinks it's cool to rewrite a whole bunch of source code files with every configure call.
  7. The OSX build now fails with this error: "Too many levels of symbolic links" while compiling openssl a second time. I don't have a solution for that yet.
  8. Statically linking in ICU requires fPIC. I had to add that to the build of ICU.
  9. The "clone" flag that didn't do anything is now gone.
  10. The reproducible_build originally used pkg-config to find ICU, but it would possibly return a system installation. It's now hard-coded to the one pulled by the script.

The advantage of Docker is that you don't have to recompile the dependencies every time. We can stop wasting time on that. I feel strongly like we need to move that direction.

I did finally get a green on the Linux compilation again. Some notes on my issues with the reproducible_build.sh: 1. The ICU_PREFIX was being exported to the child makefiles but wasn't always set to a real value when relying on pkg-config. This was causing issues. I changed the child makefiles to rely wholly upon ICU_LIBS and ICU_CPPFLAGS, which should be always set. 2. I could not get ICU v55 to compile statically. I upped it to v57. Our current version of Boost (v59) would not compile with newer versions of ICU. 3. The script sets "-e", or maybe it's "-u", which makes it exit whenever any command returns nonzero. It also sets pipeline, which makes it keep the worst result from any method. The flag handling in the wait method was always returning 1 on my machine, causing the whole script to exit. I ditched the flag usage there in favor of a sub-terminal. 4. The parameters for the boost b2 command weren't getting grouped appropriately. I put in quotes. This made it so that I could not make that call from the "background" function. I'm hoping someone else has a solution for that. The background function seems to run it fine, it just doesn't find the ICU deps when called from there. I could not figure that out. I did add a grep check to ensure that the ICU deps are found. 5. Because I'm now compiling Boost without the background minute message, TravisCI times out at 10 minutes when compiling Boost afresh. I don't have a solution for this yet. It wants output every 10 minutes. It's ridiculous that the timeout is not configurable. Instead, we push the output to a log file for each dependency build. I'm not sure that's actually a desirable feature. The other side affect of that is how we hide any error reported by our script with a postdump of 200 or 1000 lines of the log file. 6. The TravisCI caching keeps the build folder around. Our script was checking to see if the dependency's parent folder existed there. If it did exist, it would not run make on that dependency. Hence, if you fail on a dependency it won't get rebuilt because of the TravisCI cache. Not only that, but build/boost was always there -- even if I deleted the cached file from TravisCI. I had to change the script to always build dependencies, which is fast for all except openssl who thinks it's cool to rewrite a whole bunch of source code files with every configure call. 7. The OSX build now fails with this error: "Too many levels of symbolic links" while compiling openssl a second time. I don't have a solution for that yet. 8. Statically linking in ICU requires fPIC. I had to add that to the build of ICU. 9. The "clone" flag that didn't do anything is now gone. 10. The reproducible_build originally used pkg-config to find ICU, but it would possibly return a system installation. It's now hard-coded to the one pulled by the script. The advantage of Docker is that you don't have to recompile the dependencies every time. We can stop wasting time on that. I feel strongly like we need to move that direction.
BrannonKing commented 2018-11-22 17:01:16 +01:00 (Migrated from github.com)

#235 supersedes this PR.

#235 supersedes this PR.

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lbrycrd#159
No description provided.