Autocomplete Query is not returning the proper results #55

Closed
opened 2018-03-17 18:14:37 +01:00 by tiger5226 · 9 comments
tiger5226 commented 2018-03-17 18:14:37 +01:00 (Migrated from github.com)

The autocomplete query suffers from the same problems that search did previously. the value section of the elastic document is of type nested, which means the query needs to also be a nested query. The result therefore is actually only searching the name of a claim instead of search for the best auto complete term across the main fields of title, description and author. Below is an example that is current:

https://lighthouse.lbry.io/autocomplete?s=test%20a

Result:

[
"make-a-test-tube-thunderstorm",
"Make a Test Tube Thunderstorm!",
"NurdRage","a-test-of-wills-charles-todd-mobi","A Test of Wills By Charles Todd Mobi Format","upload-test-12-11-17-a","spee.ch","how-to-detect-a-secret-nuclear-test","How To Detect A Secret Nuclear Test","minutephysics"]

Now if you look at the first result make-a-test-tube-thunderstorm and search that with

https://lighthouse.lbry.io/search?s=test%20a

Result:

[
{"name":"test","claimId":"a607349ce83d3a86bac87a967ce7f9647e1ba736","value":{"claimType":"streamType","stream":{"metadata":{"preview":"","license":"Creative Commons Attribution 4.0 International","licenseUrl":"","thumbnail":"","nsfw":false,"author":"test","description":"test","language":"en","title":"test","version":"_0_1_0"},"source":{"sourceType":"lbry_sd_hash","source":"29ad218c61c599499b22c17228371b5fe9a6e725edc9ef691a7819b3e7406500467852920629fbcddbcacf13d31579d4","version":"_0_0_1","contentType":"image/png"},"version":"_0_0_1"},"version":"_0_0_1"}},
{"name":"make-a-test-tube-thunderstorm","claimId":"40d193673b0730907449a3dde387b2cdb0314eff",
...

You can see that the claim name test is first but then "name":"make-a-test-tube-thunderstorm" is second. So it is only searching the name field.

The internal server error should be a separate issue. We should not hit an error by entering a query. I will create another issue for this.

Lastly, since the elastic search query needs to be modified, getting this right takes some time, I don't think this is a level 1, so I increased it to a level 2.

The autocomplete query suffers from the same problems that search did previously. the value section of the elastic document is of type nested, which means the query needs to also be a nested query. The result therefore is actually only searching the name of a claim instead of search for the best auto complete term across the main fields of title, description and author. Below is an example that is current: https://lighthouse.lbry.io/autocomplete?s=test%20a Result: ``` [ "make-a-test-tube-thunderstorm", "Make a Test Tube Thunderstorm!", "NurdRage","a-test-of-wills-charles-todd-mobi","A Test of Wills By Charles Todd Mobi Format","upload-test-12-11-17-a","spee.ch","how-to-detect-a-secret-nuclear-test","How To Detect A Secret Nuclear Test","minutephysics"] ``` Now if you look at the first result `make-a-test-tube-thunderstorm` and search that with https://lighthouse.lbry.io/search?s=test%20a Result: ``` [ {"name":"test","claimId":"a607349ce83d3a86bac87a967ce7f9647e1ba736","value":{"claimType":"streamType","stream":{"metadata":{"preview":"","license":"Creative Commons Attribution 4.0 International","licenseUrl":"","thumbnail":"","nsfw":false,"author":"test","description":"test","language":"en","title":"test","version":"_0_1_0"},"source":{"sourceType":"lbry_sd_hash","source":"29ad218c61c599499b22c17228371b5fe9a6e725edc9ef691a7819b3e7406500467852920629fbcddbcacf13d31579d4","version":"_0_0_1","contentType":"image/png"},"version":"_0_0_1"},"version":"_0_0_1"}}, {"name":"make-a-test-tube-thunderstorm","claimId":"40d193673b0730907449a3dde387b2cdb0314eff", ... ``` You can see that the claim name `test` is first but then `"name":"make-a-test-tube-thunderstorm"` is second. So it is only searching the name field. The internal server error should be a separate issue. We should not hit an error by entering a query. I will create another issue for this. Lastly, since the elastic search query needs to be modified, getting this right takes some time, I don't think this is a level 1, so I increased it to a level 2.
tiger5226 commented 2018-04-22 01:33:04 +02:00 (Migrated from github.com)

@tzarebczan Do you know how autocomplete is even used in the application? I don't think it is being leveraged right now. The api is no longer broken and I cannot get what I would expect from an autocomplete feature even if the results are not the best.

@tzarebczan Do you know how autocomplete is even used in the application? I don't think it is being leveraged right now. The api is no longer broken and I cannot get what I would expect from an autocomplete feature even if the results are not the best.
neb-b commented 2018-04-23 08:03:42 +02:00 (Migrated from github.com)

We are pretty much just splitting any string typed into the search box and sending that to the api. This is where we do it https://github.com/lbryio/lbry-redux/blob/master/src/redux/actions/search.js#L72

We are pretty much just splitting any string typed into the search box and sending that to the api. This is where we do it https://github.com/lbryio/lbry-redux/blob/master/src/redux/actions/search.js#L72
tzarebczan commented 2018-04-23 15:10:01 +02:00 (Migrated from github.com)

@tiger5226 talked to @seanyesmunt about this - believe he was trying to use it at one point but then disabled it when it was broken. He'll look into it.

How much effort is it to make it work similar to the search function?

@tiger5226 talked to @seanyesmunt about this - believe he was trying to use it at one point but then disabled it when it was broken. He'll look into it. How much effort is it to make it work similar to the search function?
tiger5226 commented 2018-05-07 17:03:26 +02:00 (Migrated from github.com)

uggh..sorry I missed this conversation. My notifications are all working great now, won't missed another one like this.

So it was a good amount of work getting the query for search right. Posted link so you can see the complexity of the query. The good news is I did a lot of the work already and can reuse about 80% of it. Most of the work for autocomplete would be tweaking and testing. I could probably do it in a half day and have it ready for deployment.

uggh..sorry I missed this conversation. My notifications are all working great now, won't missed another one like this. So it was a good amount of work getting the [query](https://github.com/lbryio/lighthouse/blob/master/server/controllers/lighthouse.js#L39-L143) for search right. Posted link so you can see the complexity of the query. The good news is I did a lot of the work already and can reuse about 80% of it. Most of the work for autocomplete would be tweaking and testing. I could probably do it in a half day and have it ready for deployment.
tzarebczan commented 2018-05-07 20:47:07 +02:00 (Migrated from github.com)

@tiger5226 , also, this is the response when searching for "url":

["ucb-UrLej5w0hJI","Japanese 7B - 2015-03-05: Novel: No Longer Human","UCBerkeley"]

Japanese 7B - 2015-03-05: Novel: No Longer Human = title of 'ucb-UrLej5w0hJI' - do we want both the title and URL returning here? The idea of autocomplete is to give the user suggestions on URLs + suggested search terms (titles? - but based on search within titles?) that match their search input. I guess the question is do we want to return a title for a search term that matches the URL or should it be searching titles for search term suggestions?

@seanyesmunt do you need an identifier to tell if a result is a URL or not? This is related to https://github.com/lbryio/lbry-app/issues/1454. We are already doing this for channels, but they are easier to identify.

@tiger5226 , also, this is the response when searching for "url": `["ucb-UrLej5w0hJI","Japanese 7B - 2015-03-05: Novel: No Longer Human","UCBerkeley"]` Japanese 7B - 2015-03-05: Novel: No Longer Human = title of 'ucb-UrLej5w0hJI' - do we want both the title and URL returning here? The idea of autocomplete is to give the user suggestions on URLs + suggested search terms (titles? - but based on search within titles?) that match their search input. I guess the question is do we want to return a title for a search term that matches the URL or should it be searching titles for search term suggestions? @seanyesmunt do you need an identifier to tell if a result is a URL or not? This is related to https://github.com/lbryio/lbry-app/issues/1454. We are already doing this for channels, but they are easier to identify.
tiger5226 commented 2018-05-07 22:56:36 +02:00 (Migrated from github.com)

So the results are not good. Below is the where it collates the results.
https://github.com/lbryio/lighthouse/blob/master/server/controllers/lighthouse.js#L261

The query itself, is effectively only querying against the claim names, not the things you would think. The results are just adding the name, author and title to an array of options that are then returned.

The result that we return should have a match. if it is not a match we should not return it. I would be pretty weirded out by entering url and getting back UCBerkeley as the suggestion.

I always default to the most familiar search tool because that is what the vast majority of users will be interested in and will most likely be their intent.

screen shot 2018-05-07 at 4 55 22 pm
So the results are not good. Below is the where it collates the results. https://github.com/lbryio/lighthouse/blob/master/server/controllers/lighthouse.js#L261 The query itself, is effectively only querying against the claim names, not the things you would think. The results are just adding the name, author and title to an array of options that are then returned. The result that we return should have a match. if it is not a match we should not return it. I would be pretty weirded out by entering `url` and getting back `UCBerkeley` as the suggestion. I always default to the most familiar search tool because that is what the vast majority of users will be interested in and will most likely be their intent. <img width="808" alt="screen shot 2018-05-07 at 4 55 22 pm" src="https://user-images.githubusercontent.com/3402064/39724719-907c9ad2-5217-11e8-8478-02466f8c4138.png">
tiger5226 commented 2018-05-10 04:10:37 +02:00 (Migrated from github.com)

So in looking at the logs of lighthouse, the autocomplete is being called quite often. So I presume this is from the app and it is in fact running.

@seanyesmunt - Does this sound right?

Either way, when testing in the app, typing 1 character at a time, I notice some lag. The logs show that the current query for autocomplete takes around 12-25ms. This is very fast, however the search query is what I think is causing the lag. So we need to make sure we are not doing the same query, because that would be too noticeable when typing into the search bar. I think we should stick with the prefix query and fix the problems with its current state where it is unintentionally only searching the claim name.

Additionally, we need to run a regex over each result to make sure we only send the ones that contain the query. Not sure of the performance implications of this. I think it is a requirement so maybe it doesn't matter and we would have to fix it no matter what if this is the case. I will certainly do testing around it.

So in looking at the logs of lighthouse, the autocomplete is being called quite often. So I presume this is from the app and it is in fact running. @seanyesmunt - Does this sound right? Either way, when testing in the app, typing 1 character at a time, I notice some lag. The logs show that the current query for autocomplete takes around 12-25ms. This is very fast, however the search query is what I think is causing the lag. So we need to make sure we are not doing the same query, because that would be too noticeable when typing into the search bar. I think we should stick with the prefix query and fix the problems with its current state where it is unintentionally only searching the claim name. Additionally, we need to run a regex over each result to make sure we only send the ones that contain the query. Not sure of the performance implications of this. I think it is a requirement so maybe it doesn't matter and we would have to fix it no matter what if this is the case. I will certainly do testing around it.
tiger5226 commented 2018-05-27 05:11:15 +02:00 (Migrated from github.com)

I confirmed this was the case. The redesign is now using the autocomplete.

I confirmed this was the case. The redesign is now using the autocomplete.
tiger5226 commented 2018-09-16 07:55:11 +02:00 (Migrated from github.com)

I have made the referenced changes and reworked the auto-complete query. It now returns results that actually mean something

Query Example - "Shi"

Old results:

"shia2",
"spee.ch",
"shia4",
"shialabeouf",
"Shia LaBeouf Style, Fashion & Looks | Dressing like a Broke College Kid",
"shibewink",
"shibe wink",
"shiad"

New Results

"#ShistsuMassager Naturalico Shiatsu Massager Heated",
"[osu!taiko] Mew104 | Shiraishi - Shinsekai [Taiko] HDHRFL SS 363pp #1",
"Shimmer and Shine Halloween Farm Festival - Kid Friendly Gaming!",
"osu player84 | Chino - Shinsaku no Shiawase wa Kochira! [Happy~!] +HD,DT | FC 99.50% 730pp #1",
"Shipyard",
"Toy | Susumu Hirasawa - Yume no Shima Shinen Kouen [KIRBY Mix] +DT 97.04% FC 298pp #3"

What it was doing was duplicating information for a claim. So if the term matched the title it would pass the claim name even if it had nothing to do with the search. So we can get more results now.

Solved with https://github.com/lbryio/lighthouse/pull/116

I have made the referenced changes and reworked the auto-complete query. It now returns results that actually mean something ### Query Example - "Shi" Old results: ``` "shia2", "spee.ch", "shia4", "shialabeouf", "Shia LaBeouf Style, Fashion & Looks | Dressing like a Broke College Kid", "shibewink", "shibe wink", "shiad" ``` New Results ``` "#ShistsuMassager Naturalico Shiatsu Massager Heated", "[osu!taiko] Mew104 | Shiraishi - Shinsekai [Taiko] HDHRFL SS 363pp #1", "Shimmer and Shine Halloween Farm Festival - Kid Friendly Gaming!", "osu player84 | Chino - Shinsaku no Shiawase wa Kochira! [Happy~!] +HD,DT | FC 99.50% 730pp #1", "Shipyard", "Toy | Susumu Hirasawa - Yume no Shima Shinen Kouen [KIRBY Mix] +DT 97.04% FC 298pp #3" ``` What it was doing was duplicating information for a claim. So if the term matched the title it would pass the claim name even if it had nothing to do with the search. So we can get more results now. Solved with https://github.com/lbryio/lighthouse/pull/116
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lighthouse.js#55
No description provided.