Normalization UX level discussion #234
Labels
No labels
area: devops
area: discovery
area: docs
area: livestream
area: proposal
consider soon
Epic
good first issue
hacktoberfest
hard fork
help wanted
icebox
Invalid
level: 0
level: 1
level: 2
level: 3
level: 4
needs: exploration
needs: grooming
needs: priority
needs: repro
needs: tech design
on hold
priority: blocker
priority: high
priority: low
priority: medium
resilience
soft fork
Tom's Wishlist
type: bug
type: discussion
type: improvement
type: new feature
type: refactor
type: task
type: testing
unplanned
work in progress
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: LBRYCommunity/lbrycrd#234
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Discuss here what should happen at the UX level for normalization. A lot of this will be done at the layers above lbrycrd, but some of the implementations described below will need some help from lbrycrd. I think important thing to consider here is what an unnormalized claim name is. Is it just completely invalid ? Or does it contain some information that we want to preserve?
a) After normalization hard fork, what should happen when users attempt to make a claim with an unnormalized string (i.e. user wants to make claim DOG, but it will be normalized to dog). Should we reject such claim attempts and tell them the correct normalized form? Should we allow them to make the claim, and than perhaps afterwards tell them what the proper normalized form is?
b) If a claim name is normalized into a different string, do we need to preserve the original unnormalized string for the user somehow? (i.e, a user makes a claim as DOG, we normalize it to dog, but we still store somewhere that the user wanted to make the claim as DOG). Do other users need to see the original unnormalized string as well (make it accessible via blockchain, instead of personal storage)?
c) After normalization hard fork, what should happen when user attempt to search for an unnormalized string ? Do we autocorrect to the normalized form? or do we warn the user that it is an invalid string and give them the normalized form?
Might be missing some other UX considerations.. please list if needed.
Personal opinions:
It seems like for a), it would be better to reject claim attempts instead of auto-normalizing since user may not be aware that normalization exists
For b), I feel that it is not necessary to preserve the unnormalized string. Maybe only downside is that users might be confused if they had claims before the normalization hard fork and sees that their claims changed..
For c), I think this one does not make much of a difference, but it seems like autocorrect would work fine.
My vote:
a) making an claim with non-normalized characters will be unnoticeable to the user.
b) yes, the original string is always preserved. When running getClaimById you would always expect to see the original bytes. No user should ever care/know that we changed the case on DOG for our internal structure. RPC methods like getclaimtrie return original names/bytes for all claims.
c) searching for an unnormalized string will return the unnormalized string. We normalized it when we put it into the trie, we normalized the search text before we used it on the trie, and we pulled the original name bytes on the winning claim before returning.
My answers:
A) I think it is probably safest to reject these at the blockchain level but I'm somewhat ambivalent. Relatedly, normalization should be happening at other levels and I'm not sure this is filed. @BrannonKing if we do not have an epic for changes at other levels related to normalization, can you start one?
B) At the blockchain level, storing the non-normalized name would waste space. I am not concerned with losing historical names at this time.
C) Ambivalent. I would be okay with either auto-normalizing search terms or erroring but providing the correct one.
A and C could potentially be options with the default to be to normalize.
If we're doing any auto normalization, it would be good (necessary?) for the blockchain layer to expose methods that allow me to directly call normalization functions as well.
I may be missing some background on this but from a superficial point of view:
a) Magic bad. Fail fast. Reject.
b) I don't really understand this one. If a transaction is submitted with a non-normalized claim name in it, how would lbrycrd "normalize" the claim name in the transaction without modifying and thus invalidating the transaction?
c) Magic bad. Fail fast. Reject. Returning the normalized form in the error message would be nice though. From a UX perspective I see it working like this: user types stuff, app submits search to server, server responds with error and the correct normalized string, app updates the input box with the correct normalized form and then subsequently submits that normalized search string to server. Or app implements normalization itself (less appealing).
I don't think we should change the users' data: what they put into the system should be what people see. That's what they paid for. Normalization and case indifference are there to allow people to wholly own their brand, with other niceties to help people locate items without having to be perfectly specific. If it doesn't look right they can try it again; let's put that responsibility on the user. I've been thinking about this problem since @kaykurokawa 's comment about how to avoid messing up SI units and acronyms. The only way to do it (at least at the lbrycrd level) is to return for display exactly what the user sent.
I'm more ambivalent on this than my original comment now that I better understand the issue.
I had never even considered continuing to display the names the way the user entered them. I had simply assumed we'd be dropping the old formatting.
For what it's worth, there is precedence of explicitly disallowing and it's the current domain system. But that doesn't mean we can't do better.
Here's some pros/cons of each approach.
Keeping User Formatting
Dropping User Formatting
In thinking about this issue, please consider UI/UX all the way down to the user-interaction and browser level.
Currently domains are all lower case and this makes the choices in how to handle URLs at the browser level quite simple. If names can be mixed case but are only searched as lower-case, it may introduce some weird UX. For example, at what point do we normalize what the user has typed? Browsers replace any domain name with all lower-case as soon as I hit enter - what would a LBRY browser do if names are resolved as lower-case but can have upper-case when resolved?
First, an example. Consideration: 3Blue1Brown . On his domain he uses the lowercase version of that, but his channel names keep the casing. I don't know if he would be offended if we lower-cased his brand, but I can bet he would prefer that we don't.
Second, I can't think of a way that dropping the user formatting would save us space. We store the name for every claim in the DB either way.
Concerning the UX, consider this example:
A. Four claims are owned: BROWN at 1LBC, Brown at 3, broWN at 2, and LeRoy_Brown.
B. Searching for brown, with any case variation, should return all four.
C. Opening lbry://BROWN (with any case variation) would immediately switch to lbry://Brown (the current node winner).
D. Opening lbry://brown$3 would immediately switch to lbry://BROWN$3.
Point being: the user didn't have to think about the casing.
Example 2:
A. One user owns Amélie (aka, "Ame\u0301lie") and a second user owns Amélie (aka, "Am\u0065lie").
B. Searching for either one shows both, since the search would be normalized on its way in.
C. Opening one or the other would switch to the current winner on that node with no obvious way to know that another very similar endpoint exists.
To me, user can be frustrated of word lower-casing, most of sites / applications have a simple rules of correct naming that will not affect user acceptance.
a) reject, we should not guessing or correcting user' input
b) no, simple rules - clean relationship
c) warn user that we can show only normalization form of its demand
OK, I'm fairly persuaded by the case to keep user formatting.
Will it be possible for claim updates to change the formatting?
@lyoshenka suggested that we don't change the structures for the output of the RPC calls; instead, we can add an original_name field to the claims. I like this; it keeps backwards compatibility on the RPC calls.
@kauffj , yes it would be possible for the claim updates to change the formatting. I like that plan!
Some premises:
With that in mind, I'm gonna answer these in reverse because my reasoning flows that way.
c) For a search, the search string should be normalized on the way in and search should be done against normalized names. Each search result should contain both
name
(the original) and thenormalized_name
fields. Then upstream apps can do what is right for them.One thing to consider here is that by returning both, we're forcing upstream app devs to understand what normalization is and to make a decision about which field they need. I'd like to avoid this (simple is better) but I'm not sure we can.
b) We're talking about storing in memory, right? Yes, store both. Is there a concern about the amount of data that needs to be in memory? In most cases the names will be the same (I assume), so we're not increasing memory consumption by that much. If I'm misunderstanding the tradeoff, please explain.
a) Allow non-normalized claims. I don't think this will be confusing to users. When they search for their intended claim name before making the claim, they will see all the claims that they are competing with. So when they make their claim, they will not be surprised about the result.
@grin for b) the trade off is mostly just a matter of whether the unnormalized version of the string is preserved and means anything or not for the users.
If the unnormalized string means something (it is preserved and shown to the user), than it allows the user to have better expressiveness. If it mean nothing, than I think there is better simplicity in how the naming works. So I think there is no right answer here, just whether we want to allow better expressiveness and functionality at the cost of simplicity and comprehensibility.
In general I would prefer only having one version of the string returned (either the original or the normalized version) because that's simpler and we should not be asking people to make decisions when we can make the decision for them. However I don't think we can do that here. The Unicode normalization FAQ says
I take that to mean that normalization is just for comparison. For other purposes (such as displaying the name), we should be using the original form. Since most upstream applications will want to do both (e.g. sorting claims by name requires comparisons), we have to either return both or ask all upstream apps to implement normalization themselves if they want to compare names.
One mistake I've been making is thinking of names as needing to be normalized "in general". I now realize that's not the right way to look at it. Normalization is just for comparison. For everything else (displaying, storing, etc), we use the original.
Does this make sense? It also leads to the following answers to the original questions:
This is fine. Nothing changes here.
Yes. Claim names are stored in their original form and returned to the user that way. We should also return the normalized form for convenience, unless we expect upstream users to implement normalization themselves.
Search involves comparisons. Anytime a comparison is made, the normalized form of the string should be used. You can search for any string you want, and it will be normalized before comparing.
Data returned in the RPC commands typically looks like this:
The name of the node is the normalized name, and it's not "per claim". We can continue to return that name in addition to adding one more field inside the claim structures that have the original name.
I was making the same mistake and have updated my thinking.
I've been working on this. I need to understand one more part of this: when removing items from the activation and expiration queues we compare the name and the outpoint. Do we really need to compare the name in that situation?
Addendum: I said above that we should be using the original name for "everything else (displaying, storing, etc)". I think that's wrong in the case of storing names in the claimtrie, because the location of the claim in the trie depends on the name and claims in the same location compete for the same name. So we should be using the normalized name as the path in the claimtrie.
@BrannonKing i think a field called
name
should always contain the original name. if we are returning the normalized name, we should call itnormalized_name
to indicate that its not the name they claimed. We can also useoriginal_name
for the original if we want to return both and be very clear about it.For the RPC command you gave above, I'd recommend using
normalized_name
for the top-level name andoriginal_name
inside each claim object. Droppingname
would be a BC break, but I think its clearer and more consistent that way.I'm open to something better than
normalized_name
, but it can't simply bename
.Or, an alternative change to the above would be to not return any names at the top level, and to simply return a list of claim objects, each of which has
name
andnormalized_name
as fields. This also lets us be more consistent - everywhere that a claim is returned in by the API, each claim always has those two fields.Back to the original post with conclusions:
We decided to use normalization for internal comparisons and competition. No normalization is required on the data before it is sent to lbrycrd.
Yes. We decided to preserve the original in the claimId table and to return that value in various RPC calls. We anticipate that those displaying the claim will use the returned original name.
Yes, we normalize all inputs and return all competitors that match that.