Search doesn't find content with exactly matching title #150

Closed
opened 2019-03-10 22:42:02 +01:00 by tzarebczan · 30 comments
tzarebczan commented 2019-03-10 22:42:02 +01:00 (Migrated from github.com)

@eggplantbren commented on Sun Mar 10 2019

Feel free to close this if I've done something silly or misunderstood something.

The Issue

Sometimes, when I search for something I know the exact title of, the search results cannot find that item (or it appears really low in the ranking).

Steps to Reproduce

  1. Search for Psychology of Redemption in Christianity (this is just an example, there are others)
  2. Observe that content with that exact title (lbry://jp-DtiRzQMgBDM#461f1b1b421ac2f2198e8a918a90a775978b9931) is not returned.

Suggested Solutions

I don't know anything about search, to be honest, but perhaps matches in the title should be prioritized in the results?

System Configuration

App | 0.29.3
Daemon (lbrynet) | 0.32.4
Connected Email | brendonbrewer@hotmail.com Update mailing preferences
Reward Eligible | Yes
Platform | Linux (Linux-4.19.24-1-MANJARO-x86_64-with-arch-Manjaro-Linux)
Installation ID | 5knbgNSAYuYYHFTsHSbX89mMHE7gNRhotpsLto6x8FXUqjTVb6w7ao3RMqCkTdxHRp

Screenshots

That first result looks dodgy at first but I think it's a bald guy's head, hahaha

image

@eggplantbren commented on [Sun Mar 10 2019](https://github.com/lbryio/lbry-desktop/issues/2318) Feel free to close this if I've done something silly or misunderstood something. ## The Issue Sometimes, when I search for something I know the exact title of, the search results cannot find that item (or it appears really low in the ranking). ### Steps to Reproduce 1. Search for _Psychology of Redemption in Christianity_ (this is just an example, there are others) 2. Observe that content with that exact title (`lbry://jp-DtiRzQMgBDM#461f1b1b421ac2f2198e8a918a90a775978b9931`) is not returned. ### Suggested Solutions I don't know anything about search, to be honest, but perhaps matches in the title should be prioritized in the results? ## System Configuration ``` App | 0.29.3 Daemon (lbrynet) | 0.32.4 Connected Email | brendonbrewer@hotmail.com Update mailing preferences Reward Eligible | Yes Platform | Linux (Linux-4.19.24-1-MANJARO-x86_64-with-arch-Manjaro-Linux) Installation ID | 5knbgNSAYuYYHFTsHSbX89mMHE7gNRhotpsLto6x8FXUqjTVb6w7ao3RMqCkTdxHRp ``` ## Screenshots That first result looks dodgy at first but I think it's a bald guy's head, hahaha ![image](https://user-images.githubusercontent.com/1578298/54091853-f2b68680-43e9-11e9-9398-c3e4f5b3056c.png)
tzarebczan commented 2019-03-10 22:45:11 +01:00 (Migrated from github.com)

Thanks for opening the issue @eggplantbren! Search results are weighed by a variety of factors like where it finds a hit (title vs desc), how many times, LBC on the claim and possibly a few others. We can look into giving more weight for exact matches.

Thanks for opening the issue @eggplantbren! Search results are weighed by a variety of factors like where it finds a hit (title vs desc), how many times, LBC on the claim and possibly a few others. We can look into giving more weight for exact matches.
tiger5226 commented 2019-03-11 00:39:08 +01:00 (Migrated from github.com)

Unfortunately, we have adjusted the search algorithm to remove common words to prevent different aspects of weight forcing unrelated claims to the top of search.

So Psychology of Redemption in Christianity becomes Psychology Redemption Christianity. of and in are considered terms we exclude from search queries. Since there would no longer be a "perfect match" it does not find that exact title, even though a perfect match is something it does.

I would suggest we remove the "washing" part of the match phrase sub-query and make set it to 0 slop. This way it is only checking for an exact match. Currently it allows some slop but it is kind of useless in a way if we are intending it to catch exact phrases. We could also add another sub query just for an unwashed search.

Unfortunately, we have adjusted the search algorithm to remove common words to prevent different aspects of weight forcing unrelated claims to the top of search. So `Psychology of Redemption in Christianity` becomes `Psychology Redemption Christianity`. `of` and `in` are considered terms we exclude from search queries. Since there would no longer be a "perfect match" it does not find that exact title, even though a perfect match is something it does. I would suggest we remove the "washing" part of the match phrase sub-query and make set it to 0 slop. This way it is only checking for an exact match. Currently it allows some slop but it is kind of useless in a way if we are intending it to catch exact phrases. We could also add another sub query just for an unwashed search.
eggplantbren commented 2019-03-11 00:56:42 +01:00 (Migrated from github.com)

I don't understand the technical parts, but even without of and in, why wouldn't that be a good match?

I don't understand the technical parts, but even without `of` and `in`, why wouldn't that be a good match?
tiger5226 commented 2019-03-11 02:36:19 +01:00 (Migrated from github.com)

Good question. There are lots of claims with Psychology and Christianity in the name, title and description. There are different weights assign to different sub-queries, name is most valuable, then title then description. However, a claim with Psychology in the description 4 times would be more powerful than a claim with it just once. The way elasticsearch works in its primal form(more complex queries are used too), is called hits. The hits are matches found for terms in a query. These hits can have weights assigned to signify importance. The "Formula" so to speak gets pretty complex. Also ours is still pretty immature. I found some mature ones with a query for search that was well over 2K lines.

However, if you are really interested in how weights are assigned, below is the responsible function. Suggestions are welcome always! ( I see you are a statistician 🥇 )

4056fb86cf/server/controllers/lighthouse.js (L25-L251)

Good question. There are lots of claims with `Psychology` and `Christianity` in the name, title and description. There are different weights assign to different sub-queries, name is most valuable, then title then description. However, a claim with `Psychology` in the description 4 times would be more powerful than a claim with it just once. The way elasticsearch works in its primal form(more complex queries are used too), is called hits. The hits are matches found for terms in a query. These hits can have weights assigned to signify importance. The "Formula" so to speak gets pretty complex. Also ours is still pretty immature. I found some mature ones with a query for search that was well over 2K lines. However, if you are really interested in how weights are assigned, below is the responsible function. Suggestions are welcome always! ( I see you are a statistician 🥇 ) https://github.com/lbryio/lighthouse/blob/4056fb86cf82230a15211452ece85c275bdf804b/server/controllers/lighthouse.js#L25-L251
eggplantbren commented 2019-03-13 23:19:18 +01:00 (Migrated from github.com)

It looks quite complex to weight the different factors in a search. However, for reference, Google, Duckduckgo, Bing, and Youtube search all gave that video as the first result, and I think searches that include a few uncommon keywords that are all in a single title should generally return that title as a top result.

I can find other examples if you like.

It looks quite complex to weight the different factors in a search. However, for reference, Google, Duckduckgo, Bing, and Youtube search all gave that video as the first result, and I think searches that include a few uncommon keywords that are all in a single title should generally return that title as a top result. I can find other examples if you like.
tzarebczan commented 2019-03-13 23:35:46 +01:00 (Migrated from github.com)

Thanks for the feedback. It's the main reason we also added the thumbsdown button:
image

(cc: @tiger5226 )

Thanks for the feedback. It's the main reason we also added the thumbsdown button: ![image](https://user-images.githubusercontent.com/8120721/54319071-c30ca780-45be-11e9-923e-c0522970335f.png) (cc: @tiger5226 )
tiger5226 commented 2019-03-14 00:21:52 +01:00 (Migrated from github.com)

Bahahahahaha...I love this!

Bahahahahaha...I love this!
tiger5226 commented 2019-03-14 00:31:29 +01:00 (Migrated from github.com)

@eggplantbren I agree. Search is very complex unfortunately, especially when you get into weights were there is an optimization that needs to happen. Google had a novel idea to use backlinks to show the importance of content. We have made great strides but there is still much more that can be done as you correctly point out. Another idea I have been looking into is to have something akin to backlinks like views. Views sort of tell us the importance or relevance of content. Big search engines have their own proprietary software and algorithms and they are really good at it.

Elasticsearch is really great and helps a lot to make this task possible. Hooray for open source. I probably would not venture to say we could be as good as them quickly though. Elasticsearch basically uses something akin to what Google succeeded, which is assigning a weight to a hit which is a search term that was found in an elastic document.

Examples are always welcome too because that provides us with the equivalent of a unit test to base our expectations on for the query after we make changes. They are also easy to pinpoint too. Unfortunately, what happens is we adjust a weight to give us what we want in a particular result, but then many other undesired consequences arise.

@eggplantbren I agree. Search is very complex unfortunately, especially when you get into weights were there is an optimization that needs to happen. Google had a novel idea to use backlinks to show the importance of content. We have made great strides but there is still much more that can be done as you correctly point out. Another idea I have been looking into is to have something akin to backlinks like views. Views sort of tell us the importance or relevance of content. Big search engines have their own proprietary software and algorithms and they are really good at it. Elasticsearch is really great and helps a lot to make this task possible. Hooray for open source. I probably would not venture to say we could be as good as them quickly though. Elasticsearch basically uses something akin to what Google succeeded, which is assigning a weight to a `hit` which is a search term that was found in an elastic document. Examples are always welcome too because that provides us with the equivalent of a unit test to base our expectations on for the query after we make changes. They are also easy to pinpoint too. Unfortunately, what happens is we adjust a weight to give us what we want in a particular result, but then many other undesired consequences arise.
eggplantbren commented 2019-03-14 01:02:07 +01:00 (Migrated from github.com)

Unfortunately, what happens is we adjust a weight to give us what we want in a particular result, but then many other undesired consequences arise.

That's unsurprising.

Sorry if this is a silly issue.

> Unfortunately, what happens is we adjust a weight to give us what we want in a particular result, but then many other undesired consequences arise. That's unsurprising. Sorry if this is a silly issue.
tiger5226 commented 2019-03-14 01:33:40 +01:00 (Migrated from github.com)

Certainly isn't. Not all scenarios end up that way. Actually, as I noted above, this particular case should be resolvable without side effects which is great. Identifying an issue is the 1st step to making the software better! Thanks for reporting it!

Certainly isn't. Not all scenarios end up that way. Actually, as I noted above, this particular case should be resolvable without side effects which is great. Identifying an issue is the 1st step to making the software better! Thanks for reporting it!
eggplantbren commented 2019-06-01 21:39:35 +02:00 (Migrated from github.com)

Here's another pretty bizarre search result. "levitation baby mackenzie" fails to return lbry://six#441c77b2dcd6cc344904d3746f04060a02414a5c

Here's another pretty bizarre search result. "levitation baby mackenzie" fails to return `lbry://six#441c77b2dcd6cc344904d3746f04060a02414a5c`
tzarebczan commented 2019-06-02 18:26:31 +02:00 (Migrated from github.com)

Can confirm the above, but it does return under just mackenzie. Also returns for Mackenzie levitation but not Mackenzie levitation baby.

Can confirm the above, but it does return under just `mackenzie`. Also returns for `Mackenzie levitation` but not `Mackenzie levitation baby`.
ghost commented 2019-06-17 08:38:25 +02:00 (Migrated from github.com)

Here's another example:
If I search Final Fantasy VIII Remastered: Nintendo Switch trailer E3 2019 I should get a result leading to this claim: lbry://y2matecom-finalfantasyviiiremasterednintendoswitchtrailernintendoe32019ywNYKWQEbZI1080p#ca73086e3d897c8a77935fd63f7ef5f48b0d34f8

But that claim is nowhere to be found in the search results.

Here's another example: If I search **Final Fantasy VIII Remastered: Nintendo Switch trailer E3 2019** I should get a result leading to this claim: `lbry://y2matecom-finalfantasyviiiremasterednintendoswitchtrailernintendoe32019ywNYKWQEbZI1080p#ca73086e3d897c8a77935fd63f7ef5f48b0d34f8` But that claim is nowhere to be found in the search results.
eggplantbren commented 2019-06-17 08:40:29 +02:00 (Migrated from github.com)

I just realised this (MH's example) could be because it's a recent publish and chainquery is playing up and I think search might use chainquery.

On Mon, Jun 17, 2019 at 6:38 PM Michael H. notifications@github.com wrote:

Here's another example:
If I search 'Final Fantasy VIII Remastered: Nintendo Switch trailer E3
2019' I should get a result leading to this claim:
lbry://y2matecom-finalfantasyviiiremasterednintendoswitchtrailernintendoe32019ywNYKWQEbZI1080p#ca73086e3d897c8a77935fd63f7ef5f48b0d34f8

But that claim is nowhere to be found in the search results.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/lbryio/lighthouse/issues/150?email_source=notifications&email_token=AAMBKORT3HRWULIY54N4F33P24WOHA5CNFSM4G46C2E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX2FYTI#issuecomment-502553677,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMBKOW3NK5E2U3J56PLUWLP24WOHANCNFSM4G46C2EQ
.

--
Dr Brendon J. Brewer
Department of Statistics, The University of Auckland, New Zealand
Ph: +64 27 500 1336
Web: https://www.brendonbrewer.com/

I just realised this (MH's example) could be because it's a recent publish and chainquery is playing up and I think search might use chainquery. On Mon, Jun 17, 2019 at 6:38 PM Michael H. <notifications@github.com> wrote: > Here's another example: > If I search 'Final Fantasy VIII Remastered: Nintendo Switch trailer E3 > 2019' I should get a result leading to this claim: > lbry://y2matecom-finalfantasyviiiremasterednintendoswitchtrailernintendoe32019ywNYKWQEbZI1080p#ca73086e3d897c8a77935fd63f7ef5f48b0d34f8 > > But that claim is nowhere to be found in the search results. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/lbryio/lighthouse/issues/150?email_source=notifications&email_token=AAMBKORT3HRWULIY54N4F33P24WOHA5CNFSM4G46C2E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX2FYTI#issuecomment-502553677>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAMBKOW3NK5E2U3J56PLUWLP24WOHANCNFSM4G46C2EQ> > . > -- Dr Brendon J. Brewer Department of Statistics, The University of Auckland, New Zealand Ph: +64 27 500 1336 Web: https://www.brendonbrewer.com/
ghost commented 2019-06-22 23:09:44 +02:00 (Migrated from github.com)

Similar to the example above. If I search morgonaut, the channel @morgonaut should come up as a result (lbry://morgonaut/#118d5abf71473407d12eed67802daa3193d4b330). Instead the channel @Hackintosh comes up (lbry://@Hackintosh#f07599446da48a01e6836c307cbfdbe5a547827c).

Both of these channels are made by the same person and have the same content on them. Not sure why only one of them comes up.

Similar to the example above. If I search **morgonaut**, the channel @morgonaut should come up as a result (lbry://morgonaut/#118d5abf71473407d12eed67802daa3193d4b330). Instead the channel @Hackintosh comes up (lbry://@Hackintosh#f07599446da48a01e6836c307cbfdbe5a547827c). Both of these channels are made by the same person and have the same content on them. Not sure why only one of them comes up.
tzarebczan commented 2019-06-22 23:17:15 +02:00 (Migrated from github.com)

Mark is already looking into that one, thanks.

Mark is already looking into that one, thanks.
tiger5226 commented 2019-07-13 03:09:54 +02:00 (Migrated from github.com)

All of these cases listed are resolved except for the spent one. This is now fixed with the latest push to master. We have removed query washing for phrase matching and partial string comparison.

All of these cases listed are resolved except for the spent one. This is now fixed with the latest push to master. We have removed query washing for phrase matching and partial string comparison.
eggplantbren commented 2019-07-13 03:12:13 +02:00 (Migrated from github.com)

Fantastic, thanks!

On Sat, 13 Jul 2019, 1:09 PM Mark, notifications@github.com wrote:

All of these cases listed are resolved except for the spent one. This is
now fixed with the latest push to master. We have removed query washing for
phrase matching and partial string comparison.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/lbryio/lighthouse/issues/150?email_source=notifications&email_token=AAMBKOUSSN4FMO2EBYTFGL3P7ETOJA5CNFSM4G46C2E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ3GCHA#issuecomment-511074588,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMBKOQSJRZ4DYQIIWLHGODP7ETOJANCNFSM4G46C2EQ
.

Fantastic, thanks! On Sat, 13 Jul 2019, 1:09 PM Mark, <notifications@github.com> wrote: > All of these cases listed are resolved except for the spent one. This is > now fixed with the latest push to master. We have removed query washing for > phrase matching and partial string comparison. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/lbryio/lighthouse/issues/150?email_source=notifications&email_token=AAMBKOUSSN4FMO2EBYTFGL3P7ETOJA5CNFSM4G46C2E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ3GCHA#issuecomment-511074588>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAMBKOQSJRZ4DYQIIWLHGODP7ETOJANCNFSM4G46C2EQ> > . >
eggplantbren commented 2019-07-16 03:49:18 +02:00 (Migrated from github.com)

psychology of redemption in christianity still doesn't return that JBP vid for me, nor does psychology redemption christianity'. levitation baby mackenzie' works (finds my new claim for the same content)

`psychology of redemption in christianity` still doesn't return that JBP vid for me, nor does `psychology redemption christianity'. `levitation baby mackenzie' works (finds my new claim for the same content)
tiger5226 commented 2019-07-17 00:21:32 +02:00 (Migrated from github.com)

It does show up now https://lighthouse.lbry.com/search?s=Psychology%20of%20Redemption%20in%20Christianity&size=225

it's just not in the top 10 results. The claim name holds more prominence than the title. I think this is just a case of SEO. They should claim a better name than jp-DtiRzQMgBDM. Thoughts?

It does show up now https://lighthouse.lbry.com/search?s=Psychology%20of%20Redemption%20in%20Christianity&size=225 it's just not in the top 10 results. The claim name holds more prominence than the title. I think this is just a case of SEO. They should claim a better name than `jp-DtiRzQMgBDM`. Thoughts?
eggplantbren commented 2019-07-17 00:26:08 +02:00 (Migrated from github.com)

Interesting, thanks. I am slightly surprised about the name counting more than the title. On the one hand I understand how important names are in the LBRY system, but I think people tend to put more thought into the title than the claim name. Might be worth experimenting with different weightings?

Interesting, thanks. I am slightly surprised about the name counting more than the title. On the one hand I understand how important names are in the LBRY system, but I think people tend to put more thought into the title than the claim name. Might be worth experimenting with different weightings?
tiger5226 commented 2019-07-17 01:34:45 +02:00 (Migrated from github.com)

I increased the weighting of names because there were 2 other issues dealing with Channels not being returned in search results. Weights venture into a very sensitive area for search. There are many gives and takes. Like if we think the title is more important, then common words in titles will push channels down in the search results since they are based on the name.

I am certainly open to tweaking the weights though. Ideally, we have a test to pass. Even then it can be nearly impossible meet all the requirements. Maybe I make the phrase match hold a lot more weight. So for the title phrase match query I give it the highest weight.

I increased the weighting of names because there were 2 other issues dealing with Channels not being returned in search results. Weights venture into a very sensitive area for search. There are many gives and takes. Like if we think the title is more important, then common words in titles will push channels down in the search results since they are based on the name. I am certainly open to tweaking the weights though. Ideally, we have a test to pass. Even then it can be nearly impossible meet all the requirements. Maybe I make the phrase match hold a lot more weight. So for the title phrase match query I give it the highest weight.
kauffj commented 2019-07-17 05:35:49 +02:00 (Migrated from github.com)

@tiger5226 can channels and streams be given different treatments?

Either way, I think it would be a good idea to write a utility that searches the names and titles of the top n streams and channels and reports on what percentage appear in the top m results. This could then be used to tune parameters to find the values that hit the highest percent.

@tiger5226 can channels and streams be given different treatments? Either way, I think it would be a good idea to write a utility that searches the names and titles of the top `n` streams and channels and reports on what percentage appear in the top `m` results. This could then be used to tune parameters to find the values that hit the highest percent.
tiger5226 commented 2019-07-18 04:37:22 +02:00 (Migrated from github.com)

So I did some research on this. We do leverage filters for the api. A bool query is what can have the filter sub query. So right now these additional queries added can be put inside a bool query with a should along with a filter. Then we can use the filter to make sure it only searches channels for these additional query added for channel names anyway.

Regarding the utility...you are right, we should just bang this out really quick. We should keep the KPI simple too. What defines "top n streams"? n is the parameter but what is top? We can use internal-apis to get the channels with the highest subscribers on youtube. We can also use views to get the top streams in the last 7 days so the KPI is dynamic and then output the results to slack so we can see it every day.

So I did some research on this. We do leverage filters for the api. A `bool` query is what can have the `filter` sub query. So right now these additional queries added can be put inside a bool query with a `should` along with a `filter`. Then we can use the filter to make sure it only searches channels for these additional query added for channel names anyway. Regarding the utility...you are right, we should just bang this out really quick. We should keep the KPI simple too. What defines "top n streams"? n is the parameter but what is top? We can use internal-apis to get the channels with the highest subscribers on youtube. We can also use views to get the top streams in the last 7 days so the KPI is dynamic and then output the results to slack so we can see it every day.
tiger5226 commented 2019-07-18 04:40:51 +02:00 (Migrated from github.com)

maybe we scale the KPI score:

1 - it does not appear in the results ( max is 10K )
2 - appears in the top 500 results
3 - " " " top 100 results
4 - " " " top 50 results
5 - " " " top 10 results

Then have a score for name and title

maybe we scale the KPI score: 1 - it does not appear in the results ( max is 10K ) 2 - appears in the top 500 results 3 - " " " top 100 results 4 - " " " top 50 results 5 - " " " top 10 results Then have a score for name and title
kauffj commented 2019-07-18 13:09:28 +02:00 (Migrated from github.com)

@tiger5226 nice!

Yes, I was proposing to take the top YouTube creators by subscribers or views (maybe both) from our database.

In terms of where to output the KPI, I think it'd be better to write it to something readable by metabase.

A more dynamic scoring is a cool idea, but I would weight it more strongly towards needing to be in the top, e.g.

10 - 1st result
8 - Top 3 results
5 - First page
2 - Second page
1 - First 3-5 pages

I forget the stats, but a very large percentage (> 90) do not page search results.

@tiger5226 nice! Yes, I was proposing to take the top YouTube creators by subscribers or views (maybe both) from our database. In terms of where to output the KPI, I think it'd be better to write it to something readable by metabase. A more dynamic scoring is a cool idea, but I would weight it more strongly towards needing to be in the top, e.g. 10 - 1st result 8 - Top 3 results 5 - First page 2 - Second page 1 - First 3-5 pages I forget the stats, but a very large percentage (> 90) do not page search results.
eggplantbren commented 2019-07-19 01:08:45 +02:00 (Migrated from github.com)

Are you proposing using youtube subscribers as part of LBRY's search metric? That sounds terrible but hopefully I just misunderstood.

Are you proposing using youtube subscribers as part of LBRY's search metric? That sounds terrible but hopefully I just misunderstood.
tiger5226 commented 2019-07-19 02:15:17 +02:00 (Migrated from github.com)

No not as part of the search metric but for the sampling. When taking a KPI ( key performance indicator ), we need a sample. Ideally we have a meaningful sample rather than a random sample. Most viewers are looking for specific creators on the platform and use search to find them. So grabbing the sample from the creators with high views or subscribers gives us an idea of how likely someone is to search for the creator. When sampling I want to make sure what people are likely to search for are actually being returned in the results.

No not as part of the search metric but for the sampling. When taking a KPI ( key performance indicator ), we need a sample. Ideally we have a meaningful sample rather than a random sample. Most viewers are looking for specific creators on the platform and use search to find them. So grabbing the sample from the creators with high views or subscribers gives us an idea of how likely someone is to search for the creator. When sampling I want to make sure what people are likely to search for are actually being returned in the results.
eggplantbren commented 2019-07-19 02:16:47 +02:00 (Migrated from github.com)

That makes sense, thanks for clarifying.

On Fri, Jul 19, 2019 at 12:15 PM Mark notifications@github.com wrote:

No not as part of the search metric but for the sampling. When taking a
KPI ( key performance indicator ), we need a sample. Ideally we have a
meaningful sample rather than a random sample. Most viewers are looking for
specific creators on the platform and use search to find them. So grabbing
the sample from the creators with high views or subscribers gives us an
idea of how likely someone is to search for the creator. When sampling I
want to make sure what people are likely to search for are actually being
returned in the results.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/lbryio/lighthouse/issues/150?email_source=notifications&email_token=AAMBKOQ7ABZRU4RGO53F7GDQAEBRNA5CNFSM4G46C2E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2KGEFQ#issuecomment-513040918,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMBKOVXC7A4P3CKXPCHVX3QAEBRNANCNFSM4G46C2EQ
.

--
Dr Brendon J. Brewer
Department of Statistics, The University of Auckland, New Zealand
Ph: +64 27 500 1336
Web: https://www.brendonbrewer.com/

That makes sense, thanks for clarifying. On Fri, Jul 19, 2019 at 12:15 PM Mark <notifications@github.com> wrote: > No not as part of the search metric but for the sampling. When taking a > KPI ( key performance indicator ), we need a sample. Ideally we have a > meaningful sample rather than a random sample. Most viewers are looking for > specific creators on the platform and use search to find them. So grabbing > the sample from the creators with high views or subscribers gives us an > idea of how likely someone is to search for the creator. When sampling I > want to make sure what people are likely to search for are actually being > returned in the results. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/lbryio/lighthouse/issues/150?email_source=notifications&email_token=AAMBKOQ7ABZRU4RGO53F7GDQAEBRNA5CNFSM4G46C2E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2KGEFQ#issuecomment-513040918>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAMBKOVXC7A4P3CKXPCHVX3QAEBRNANCNFSM4G46C2EQ> > . > -- Dr Brendon J. Brewer Department of Statistics, The University of Auckland, New Zealand Ph: +64 27 500 1336 Web: https://www.brendonbrewer.com/
tiger5226 commented 2019-07-19 02:19:10 +02:00 (Migrated from github.com)

@kauffj expecting things to be in the top 10 will probably be volatile and less meaningful because the results are 10K and it makes the KPI very binary. However, I see it's where we want to be so we should probably try it first and if it does turn out to be volatile, we can easily tweak it.

@kauffj expecting things to be in the top 10 will probably be volatile and less meaningful because the results are 10K and it makes the KPI very binary. However, I see it's where we want to be so we should probably try it first and if it does turn out to be volatile, we can easily tweak it.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lighthouse.js#150
No description provided.