Filters #49

Closed
opened 2018-03-09 21:15:58 +01:00 by btzr-io · 12 comments
btzr-io commented 2018-03-09 21:15:58 +01:00 (Migrated from github.com)

Filtering server side

Is there any news / plans for filter integration,
like search only for a specific content type: audio / video / files,
or channel and tags...
(if the daemon ever support it) 🙃

https://github.com/lbryio/lbry-app/issues/664#issuecomment-335684392

### Filtering server side Is there any news / plans for filter integration, like search only for a specific `content type`: audio / video / files, or `channel` and `tags`... (if the daemon ever support it) :upside_down_face: https://github.com/lbryio/lbry-app/issues/664#issuecomment-335684392
filipnyquist commented 2018-03-10 07:00:04 +01:00 (Migrated from github.com)

Yes, we have this metadata in the search database, when chainquery is added this will be possible to do!

Yes, we have this metadata in the search database, when chainquery is added this will be possible to do!
tiger5226 commented 2018-03-15 12:12:56 +01:00 (Migrated from github.com)

We can extend the search api so it resembles a more common example. Right now we only accept 1 parameter which is the search string.

We should have a set of arguments that are defined for advanced search capabilities. For example:

  1. Channel or Claim or Playlist, etc
  2. ClaimID ( there are some less known ways defined for the protocol )
  3. Active in claim trie
  4. language
  5. content type
  6. By Publisher
  7. controlling claims

Probably a lot more ways to search but we should be able to have a better api for this. Would love some feedback on an advanced search capability. Knowing what we need or expect will help in designing the best format for the api.

@lyoshenka - Any suggestions here?

We can extend the search api so it resembles a more common example. Right now we only accept 1 parameter which is the search string. We should have a set of arguments that are defined for advanced search capabilities. For example: 1) Channel or Claim or Playlist, etc 2) ClaimID ( there are some less known ways defined for the protocol ) 3) Active in claim trie 4) language 5) content type 6) By Publisher 7) controlling claims Probably a lot more ways to search but we should be able to have a better api for this. Would love some feedback on an advanced search capability. Knowing what we need or expect will help in designing the best format for the api. @lyoshenka - Any suggestions here?
kauffj commented 2018-03-15 14:24:31 +01:00 (Migrated from github.com)

@tiger5226 I'm for providing this functionality, but the vast majority of users are going to be used to a single box where they type things and get what they want. I think it's important we prioritize this use-case over advanced searching which is more likely to be for power users.

@tiger5226 I'm for providing this functionality, but the vast majority of users are going to be used to a single box where they type things and get what they want. I think it's important we prioritize this use-case over advanced searching which is more likely to be for power users.
lyoshenka commented 2018-03-16 13:59:30 +01:00 (Migrated from github.com)

agreed, not a priority right now

agreed, not a priority right now
tiger5226 commented 2018-03-17 17:16:55 +01:00 (Migrated from github.com)

Ok all clear!

So then keeping the same search field for lighthouse we could enhance the field to allow for tags. This also allows the feature to be there before it is released in the app.

so if a user types contentType: followed by a word (or phrase in quotes ) it will create a tag in the app. that appears like contentType: files. This lets the user know they are searching only files with that tag. Below is an example of tag system is OSX search.
screen shot 2018-03-17 at 12 01 49 pm

Our tags based on current information in the elastic document are as follows:
controllingClaims: <boolean>
channels: <boolean>
Author: "<phrase>"
Language: <isocode>
ActiveClaim: <boolean>
contentType: <type>

Below is an example elastic document stored in elastic search:

{
  "name": "pok-mon-from-other-games-gabby-games",
  "claimId": "da73dabbc78e429254fec760d1dc97cc179b7878",
  "value": {
    "claimType": "streamType",
    "stream": {
      "metadata": {
        "preview": "",
        "license": "Copyrighted (contact author)",
        "licenseUrl": "",
        "thumbnail": "http:\/\/berk.ninja\/thumbnails\/nz7uDRZBKCo",
        "nsfw": false,
        "author": "Rerez",
        "description": "Nintendo needs new Pok\u00e9mon! Gabby is here to help!\nSubscribe to Rerez: https:\/\/goo.gl\/0qauGx\nBecome a Patron!: https:\/\/www.patreon.com\/rerez\nFollow Rerez on Twitter: https:\/\/twitter.com\/RerezTV\n\nCheck out Adam's work at: https:\/\/www.youtube.com\/user\/ItsaDogandGame\n\nRerez is a YouTube channel presented by Shane Luis all about video games. Presenting you the newest, strangest and most unique gaming topics. Featuring high quality reviews of new video games, consoles, previews, oddities, rare titles, unknown hardware, classic and retro games, and much more!\n\nCredits:\n...\nhttps:\/\/www.youtube.com\/watch?v=nz7uDRZBKCo",
        "language": "en",
        "title": "Pok\u00e9mon From Other Games! - Gabby Games - Rerez",
        "version": "_0_1_0"
      },
      "source": {
        "sourceType": "lbry_sd_hash",
        "source": "e427b1add5f93adcc21a6623c8664319cd9f2353bcf5499b32ca2d204c623d9f7014cf8c679bdb573c988e8e7a4bd983",
        "version": "_0_0_1",
        "contentType": "video\/mp4"
      },
      "version": "_0_0_1"
    },
    "publisherSignature": {
      "signature": "07c1825acf0a764a38ec92dc1c348d36eb9efc3126c821d452a80e0c75a13200923c7d8d91485e750e55577c826b8c22a5ef83ed136296ad2bcd039fd2b7af34",
      "certificateId": "6f36dfa66117b39500d6a74da38ddcb5a37301d8",
      "signatureType": "SECP256k1",
      "version": "_0_0_1"
    },
    "version": "_0_0_1"
  }
}

So what we do here is check if the first term in a search query is a keyword for a tag. If so we process it as such. If the next two characters is " then we split the string to get phrases associated with the tag. The rest is the search query.

We can start off allowing 1 tag, then can extend to allow for n tags.

This should keep it independent of the apps features and allows it to be used straight away.

Ok all clear! So then keeping the same search field for lighthouse we could enhance the field to allow for tags. This also allows the feature to be there before it is released in the app. so if a user types `contentType:` followed by a word (or phrase in quotes ) it will create a tag in the app. that appears like `contentType: files`. This lets the user know they are searching only files with that tag. Below is an example of tag system is OSX search. <img width="244" alt="screen shot 2018-03-17 at 12 01 49 pm" src="https://user-images.githubusercontent.com/3402064/37557440-1afa7216-29db-11e8-9ab8-596421d4b21f.png"> Our tags based on current information in the elastic document are as follows: `controllingClaims: <boolean>` `channels: <boolean>` `Author: "<phrase>"` `Language: <isocode>` `ActiveClaim: <boolean>` `contentType: <type>` Below is an example elastic document stored in elastic search: ``` { "name": "pok-mon-from-other-games-gabby-games", "claimId": "da73dabbc78e429254fec760d1dc97cc179b7878", "value": { "claimType": "streamType", "stream": { "metadata": { "preview": "", "license": "Copyrighted (contact author)", "licenseUrl": "", "thumbnail": "http:\/\/berk.ninja\/thumbnails\/nz7uDRZBKCo", "nsfw": false, "author": "Rerez", "description": "Nintendo needs new Pok\u00e9mon! Gabby is here to help!\nSubscribe to Rerez: https:\/\/goo.gl\/0qauGx\nBecome a Patron!: https:\/\/www.patreon.com\/rerez\nFollow Rerez on Twitter: https:\/\/twitter.com\/RerezTV\n\nCheck out Adam's work at: https:\/\/www.youtube.com\/user\/ItsaDogandGame\n\nRerez is a YouTube channel presented by Shane Luis all about video games. Presenting you the newest, strangest and most unique gaming topics. Featuring high quality reviews of new video games, consoles, previews, oddities, rare titles, unknown hardware, classic and retro games, and much more!\n\nCredits:\n...\nhttps:\/\/www.youtube.com\/watch?v=nz7uDRZBKCo", "language": "en", "title": "Pok\u00e9mon From Other Games! - Gabby Games - Rerez", "version": "_0_1_0" }, "source": { "sourceType": "lbry_sd_hash", "source": "e427b1add5f93adcc21a6623c8664319cd9f2353bcf5499b32ca2d204c623d9f7014cf8c679bdb573c988e8e7a4bd983", "version": "_0_0_1", "contentType": "video\/mp4" }, "version": "_0_0_1" }, "publisherSignature": { "signature": "07c1825acf0a764a38ec92dc1c348d36eb9efc3126c821d452a80e0c75a13200923c7d8d91485e750e55577c826b8c22a5ef83ed136296ad2bcd039fd2b7af34", "certificateId": "6f36dfa66117b39500d6a74da38ddcb5a37301d8", "signatureType": "SECP256k1", "version": "_0_0_1" }, "version": "_0_0_1" } } ``` So what we do here is check if the first term in a search query is a keyword for a tag. If so we process it as such. If the next two characters is ` "` then we split the string to get phrases associated with the tag. The rest is the search query. We can start off allowing 1 tag, then can extend to allow for n tags. This should keep it independent of the apps features and allows it to be used straight away.
btzr-io commented 2018-03-25 19:23:12 +02:00 (Migrated from github.com)

Tags

Would be possible to read / add tags inside description?
My proposal would be to let the user add the tags inside the claim description:

Last line

Add the tags in the last line of the claim description:
Not really necessary, see hash-tags implementation below:

Hash tags

This could be added in any line of the description:

This is a #video a about #cats!
Please support my channel for more #awesome content: <http://click.bait.com>

// Get tags
getTags(description) -> [video, cats, awesome]

Limit

Also set a limit for tags so users don't abuse the tagging system,
Maybe less than 10 / 15 would be ok?

App (UX)

We can also add a simple form to attach / add tags for better (UX) inside the publish page:
image

### Tags Would be possible to read / add `tags` inside description? My proposal would be to let the user add the tags inside the claim description: ### ~Last line~ ~Add the tags in the last line of the claim description:~ Not really necessary, see `hash-tags` implementation below: ### Hash tags This could be added in any line of the description: ``` This is a #video a about #cats! Please support my channel for more #awesome content: <http://click.bait.com> // Get tags getTags(description) -> [video, cats, awesome] ``` ### Limit Also set a limit for tags so users don't abuse the tagging system, Maybe less than `10` / `15` would be ok? ### App (UX) We can also add a simple form to attach / add tags for better (UX) inside the publish page: ![image](https://user-images.githubusercontent.com/14793624/37877913-42fdac42-3016-11e8-92ab-471da29eaacd.png)
tiger5226 commented 2018-04-08 09:23:41 +02:00 (Migrated from github.com)

@btzr-io I understand what you are saying. So previously there was a much greater need to flag search terms or tags as only claim names produced results from the search and discovery. However, massive strides have been made in this area. Now Author, Title and Description are fully searchable. What does this mean? Well it means everything is a #. Any word, phrase, prefix and suffix is searchable.

I don't see a need for a tagging system like that since we basically already have it now in the background(try searching an obscure term in a claim now 👍 ). However, there could be a use case to provide extra weight to certain search terms created by the community of publishers and are not part of the metadata. This could be abused though via SEO. Common tags could be flooded to get higher in the results. Contrary to that the same can happen now. There has to be a balancing factor. Something that disincentivizes gaming the system. For example, google skyrocketed to number 1 because search wasn't about what web pages contained the most keywords, it was about the relevance of the page too. PageRank was a major boon because it stopped the bad players. The complimentary side of that, is yes terms are good, but how many people who arrive at the claim, stream the content? Or better yet, how many claims were supported? That ratio is what should drive the weight of the tags. So dynamic weights would really help here as well. So we have a set of booster tags, and if the content is good the factor of the boost is increased.

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor

I will leave the UI side to the app developers ( I like your tag better than mine 👍 ).

@btzr-io I understand what you are saying. So previously there was a much greater need to flag search terms or tags as only claim names produced results from the search and discovery. However, massive strides have been made in this area. Now Author, Title and Description are fully searchable. What does this mean? Well it means everything is a `#`. Any word, phrase, prefix and suffix is searchable. I don't see a need for a tagging system like that since we basically already have it now in the background(try searching an obscure term in a claim now 👍 ). However, there could be a use case to provide extra weight to certain search terms created by the community of publishers and are not part of the metadata. This could be abused though via SEO. Common tags could be flooded to get higher in the results. Contrary to that the same can happen now. There has to be a balancing factor. Something that disincentivizes gaming the system. For example, google skyrocketed to number 1 because search wasn't about what web pages contained the most keywords, it was about the relevance of the page too. PageRank was a major boon because it stopped the bad players. The complimentary side of that, is yes terms are good, but how many people who arrive at the claim, stream the content? Or better yet, how many claims were supported? That ratio is what should drive the weight of the tags. So dynamic weights would really help here as well. So we have a set of booster tags, and if the content is good the factor of the boost is increased. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor I will leave the UI side to the app developers ( I like your tag better than mine 👍 ).
kauffj commented 2018-08-20 19:01:02 +02:00 (Migrated from github.com)

@tiger5226 or @filipnyquist is this viable to pursue now that chainquery is integrated?

@tiger5226 or @filipnyquist is this viable to pursue now that chainquery is integrated?
kauffj commented 2018-08-20 19:01:56 +02:00 (Migrated from github.com)

The most immediately useful filter would be filtering by file type.

The most immediately useful filter would be filtering by file type.
kauffj commented 2018-08-30 16:39:04 +02:00 (Migrated from github.com)

A lot of @jamesbiller proposed maker community initiatives could really use the ability to filter by file types.

I split out this aspect of filtering into a separate ticket: #111

A lot of @jamesbiller proposed maker community initiatives could really use the ability to filter by file types. I split out this aspect of filtering into a separate ticket: #111
tiger5226 commented 2018-08-31 00:42:17 +02:00 (Migrated from github.com)

@tiger5226 or @filipnyquist is this viable to pursue now that chainquery is integrated?

I believe I missed this comment, but I don't think Chainquery is required here. This has been ready for a while, but it was put on the backburner a while ago because it was not urgent and seen as a power user feature. I like the idea to split out the file type stuff.

>@tiger5226 or @filipnyquist is this viable to pursue now that chainquery is integrated? I believe I missed this comment, but I don't think Chainquery is required here. This has been ready for a while, but it was put on the backburner a while ago because it was not urgent and seen as a power user feature. I like the idea to split out the file type stuff.
tiger5226 commented 2019-12-16 18:01:27 +01:00 (Migrated from github.com)

done

done
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lighthouse.js#49
No description provided.