Add encoding for binary data returned from RPC, make upstream handle it and UTF8 #236

Closed
opened 2018-11-08 03:56:16 +01:00 by kaykurokawa · 7 comments
kaykurokawa commented 2018-11-08 03:56:16 +01:00 (Migrated from github.com)

Three RPC/CLI issues related to the Normalization hard fork PR #159

I've had some problems being able to input utf8 strings into the CLI interface .. I'm not sure what the problem is but it does not seem to be parsing the string properly, I made sure my command line supported utf8. I don't think there is any changes we need to make to lbrycrd but I could be wrong..

For RPC, input works fine (as long as your JSON RPC supports it, the popular one does https://github.com/jgarzik/python-bitcoinrpc ). However, I realized that the RPC output actually encodes unicode in this weird way: https://github.com/lbryio/lbrycrd/blob/master/src/univalue/lib/univalue_write.cpp#L15 , so this is actually a bit annoying here as we need to decode this properly, or perhaps its ok to remove this encoding? I presume its there because some command lines may support only ASCII...

I think an RPC call that lets the user know what normalization does would be good:

checknormalization(name) -> returns normalized name

This function was added here: https://github.com/lbryio/lbrycrd/pull/235, but will probably need some additional return arguments based on what we decide for https://github.com/lbryio/lbrycrd/issues/234 (i.e, are we returning a normalized string with the casing preserved?)

Further I think existing RPC calls need to be modified depending on what we decide here https://github.com/lbryio/lbrycrd/issues/234 , i.e. are we normalizing the input "name" for calls like getvalueforname() and getclaimsforname() or are we expecting that the input is already normalized?

Three RPC/CLI issues related to the Normalization hard fork PR #159 1) I've had some problems being able to input utf8 strings into the CLI interface .. I'm not sure what the problem is but it does not seem to be parsing the string properly, I made sure my command line supported utf8. I don't think there is any changes we need to make to lbrycrd but I could be wrong.. For RPC, input works fine (as long as your JSON RPC supports it, the popular one does https://github.com/jgarzik/python-bitcoinrpc ). However, I realized that the RPC output actually encodes unicode in this weird way: https://github.com/lbryio/lbrycrd/blob/master/src/univalue/lib/univalue_write.cpp#L15 , so this is actually a bit annoying here as we need to decode this properly, or perhaps its ok to remove this encoding? I presume its there because some command lines may support only ASCII... 2) I think an RPC call that lets the user know what normalization does would be good: checknormalization(name) -> returns normalized name This function was added here: https://github.com/lbryio/lbrycrd/pull/235, but will probably need some additional return arguments based on what we decide for https://github.com/lbryio/lbrycrd/issues/234 (i.e, are we returning a normalized string with the casing preserved?) 3) Further I think existing RPC calls need to be modified depending on what we decide here https://github.com/lbryio/lbrycrd/issues/234 , i.e. are we normalizing the input "name" for calls like getvalueforname() and getclaimsforname() or are we expecting that the input is already normalized?
bvbfan commented 2018-11-08 09:16:43 +01:00 (Migrated from github.com)

Further I think existing RPC calls need to be modified depending on what we decide here #234 , i.e. are we normalizing the input "name" for calls like getvalueforname() and getclaimsforname() or are we expecting that the input is already normalized?

We have relation one to many, normalization name to claims and supports so we expect non-normalized string, then we find node by normalized it, results should be filter by non-normalized string again to show exactly desired claims. So we should make changes to these RPC methods as well, good point. But, we don't need such a method like checknormalization, let's make an example:
let's we have 3 claims Dog, DOg, DoG
checknormalization DOg => dog
getclaimsforname dog => will return nothing, why? because we have 3 claims but none of them is exactly lowercase. The user desire is DOg so calling checknormalization will lie him, instead:
getclaimsforname DOg => will return only 1 claim, exactly named DOg

Normalization should be transparent and makes sense in trie not outside so if you have before normalization one claim DOg you will have same result afterwards.

> Further I think existing RPC calls need to be modified depending on what we decide here #234 , i.e. are we normalizing the input "name" for calls like getvalueforname() and getclaimsforname() or are we expecting that the input is already normalized? We have relation one to many, normalization name to claims and supports so we expect non-normalized string, then we find node by normalized it, results should be filter by non-normalized string again to show exactly desired claims. So we should make changes to these RPC methods as well, good point. But, we don't need such a method like checknormalization, let's make an example: let's we have 3 claims Dog, DOg, DoG checknormalization DOg => dog getclaimsforname dog => will return nothing, why? because we have 3 claims but none of them is exactly lowercase. The user desire is DOg so calling checknormalization will lie him, instead: getclaimsforname DOg => will return only 1 claim, exactly named DOg Normalization should be transparent and makes sense in trie not outside so if you have before normalization one claim DOg you will have same result afterwards.
BrannonKing commented 2018-11-08 18:26:58 +01:00 (Migrated from github.com)

Responses to @kaykurokawa :

  1. I have no problem removing the output encoding. No one is running ascii-only shells. Let's do it.
  2. I think we've fully defined that our "normalization" includes a case change. We have two strings: the normalized form and the original. Let's not add a third state of "normalized and lower cased". I think #235 is sufficient.
  3. Yes, we normalize the input for any query. The user will need to see all claims that compete against each other. And the winning claim is against all names that normalize to the same thing.

Responses to @bvbfan :

  1. In your example, "getclaimsforname dog" should return all three. That's its existing behavior.
  2. "getclaimsforname" dog has to return the winning claim at the node. That's its existing behavior. The node lookup now requires that we normalize the input first.
  3. Yes, I want to keep the same behavior for names not in competition. However, normalizing the trie will cause a few additional competitions that were not pre-existing. That will be okay.
Responses to @kaykurokawa : 1. I have no problem removing the output encoding. No one is running ascii-only shells. Let's do it. 2. I think we've fully defined that our "normalization" includes a case change. We have two strings: the normalized form and the original. Let's not add a third state of "normalized and lower cased". I think #235 is sufficient. 3. Yes, we normalize the input for any query. The user will need to see all claims that compete against each other. And the winning claim is against all names that normalize to the same thing. Responses to @bvbfan : 1. In your example, "getclaimsforname dog" should return all three. That's its existing behavior. 2. "getclaimsforname" dog has to return the winning claim at the node. That's its existing behavior. The node lookup now requires that we normalize the input first. 3. Yes, I want to keep the same behavior for names not in competition. However, normalizing the trie will cause a few additional competitions that were not pre-existing. That will be okay.
bvbfan commented 2018-11-08 19:18:23 +01:00 (Migrated from github.com)

@BrannonKing, you are right for current normalization implementation, but i got the impression we will change it. Note test before normalization

BOOST_AUTO_TEST_CASE(claim_rpcs_claims_test)
{
    ClaimTrieChainFixture fixture;
    std::string sName1("DOg");
    std::string sName2("DoG");
    std::string sValue("test");

    rpcfn_type getclaimsforname = tableRPC["getclaimsforname"]->actor;

    UniValue claims;

    CMutableTransaction tx1 = fixture.MakeClaim(fixture.GetCoinbase(), sName1, sValue, 1);
    fixture.IncrementBlocks(1);

    CMutableTransaction tx2 = fixture.MakeClaim(fixture.GetCoinbase(), sName2, sValue, 2);
    fixture.IncrementBlocks(1);

    UniValue params1(UniValue::VARR);
    params1.push_back(UniValue(sName1)); // DOg

    claims = getclaimsforname(params1, false)["claims"];
    BOOST_CHECK(claims.size() == 1U);
    BOOST_CHECK(claims[0]["nEffectiveAmount"].get_int() == 1);

    UniValue params2(UniValue::VARR);
    params2.push_back(UniValue(sName2)); // DoG

    claims = getclaimsforname(params2, false)["claims"];
    BOOST_CHECK(claims.size() == 1U);
    BOOST_CHECK(claims[0]["nEffectiveAmount"].get_int() == 2);
}

It returns only one claim - the exactly case sensitive name, after normalization claims size will be 2. Since we can keep non-normalization name we can keep same behavior.

@BrannonKing, you are right for current normalization implementation, but i got the impression we will change it. Note test before normalization ``` BOOST_AUTO_TEST_CASE(claim_rpcs_claims_test) { ClaimTrieChainFixture fixture; std::string sName1("DOg"); std::string sName2("DoG"); std::string sValue("test"); rpcfn_type getclaimsforname = tableRPC["getclaimsforname"]->actor; UniValue claims; CMutableTransaction tx1 = fixture.MakeClaim(fixture.GetCoinbase(), sName1, sValue, 1); fixture.IncrementBlocks(1); CMutableTransaction tx2 = fixture.MakeClaim(fixture.GetCoinbase(), sName2, sValue, 2); fixture.IncrementBlocks(1); UniValue params1(UniValue::VARR); params1.push_back(UniValue(sName1)); // DOg claims = getclaimsforname(params1, false)["claims"]; BOOST_CHECK(claims.size() == 1U); BOOST_CHECK(claims[0]["nEffectiveAmount"].get_int() == 1); UniValue params2(UniValue::VARR); params2.push_back(UniValue(sName2)); // DoG claims = getclaimsforname(params2, false)["claims"]; BOOST_CHECK(claims.size() == 1U); BOOST_CHECK(claims[0]["nEffectiveAmount"].get_int() == 2); } ``` It returns only one claim - the exactly case sensitive name, after normalization claims size will be 2. Since we can keep non-normalization name we can keep same behavior.
BrannonKing commented 2018-11-29 10:52:13 +01:00 (Migrated from github.com)

Let's change this issue to just refer to the RPC output encoding. From our discussion yesterday, it sounded like people were in favor of holding off on changing the output encoding until the upstream merge is done. It doesn't have to be done as part of the normalization branch (and there are already many changes there). Part of that requires us to change how the value field is encoded. It will need to come out as hex or base64. Polls on that are ongoing.

Let's change this issue to just refer to the RPC output encoding. From our discussion yesterday, it sounded like people were in favor of holding off on changing the output encoding until the upstream merge is done. It doesn't have to be done as part of the normalization branch (and there are already many changes there). Part of that requires us to change how the value field is encoded. It will need to come out as hex or base64. Polls on that are ongoing.
kaykurokawa commented 2018-12-13 03:15:26 +01:00 (Migrated from github.com)

I tried changing https://github.com/lbryio/lbrycrd/blob/master/src/univalue/lib/univalue_write.cpp#L15 to not encode unicode in such format, but this change was not sufficient to print out utf8 characters through the CLI. The below error was returned,

error: couldn't parse reply from server

I tried changing https://github.com/lbryio/lbrycrd/blob/master/src/univalue/lib/univalue_write.cpp#L15 to not encode unicode in such format, but this change was not sufficient to print out utf8 characters through the CLI. The below error was returned, error: couldn't parse reply from server
BrannonKing commented 2018-12-13 09:58:42 +01:00 (Migrated from github.com)

@kaykurokawa , I did the same change originally and saw the same error. I realized that the univalue changes went all across the codebase. Hence, the only approach I had was to upgrade the whole package. That's what I did in the branch named "normalization_plus_base64". You're welcome to rebase that branch if needed.

@kaykurokawa , I did the same change originally and saw the same error. I realized that the univalue changes went all across the codebase. Hence, the only approach I had was to upgrade the whole package. That's what I did in the branch named "normalization_plus_base64". You're welcome to rebase that branch if needed.
BrannonKing commented 2019-05-08 17:39:57 +02:00 (Migrated from github.com)

Done in master. We now return hexadecimal for the value fields.

Done in master. We now return hexadecimal for the value fields.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lbrycrd#236
No description provided.