Include full URLs in search results #31

Closed
opened 2017-10-11 15:35:06 +02:00 by kauffj · 19 comments
kauffj commented 2017-10-11 15:35:06 +02:00 (Migrated from github.com)

URLs are typically what will ultimately be resolved or used by a service or application utilizing the search API. So it'd probably make sense to just return these.

URLs are typically what will ultimately be resolved or used by a service or application utilizing the search API. So it'd probably make sense to just return these.
tiger5226 commented 2018-04-08 06:12:53 +02:00 (Migrated from github.com)

@kauffj I wanted to just get this one done, however, there could be dependencies, and there may be alternative uses of getting claim value details based on a search result. Either way it returns all that it does because the data is stored in elastic search as an elastic document that can be summed up as a json object of information. The results are for these documents. If we wanted to return the URLs it would just be some minor post processing.

Since there are external dependencies on this API it is probably best to create a new API:

current http://localhost:50005/search?s=
new http://localhost:50005/url/search?s=

This way we can have the service live before it is used live by the app and the current one can still be used unhindered until the adjustment in the app is ready. Also it reduces golive dependencies. The drawback is now we have 2 apis.

@kauffj I wanted to just get this one done, however, there could be dependencies, and there may be alternative uses of getting claim value details based on a search result. Either way it returns all that it does because the data is stored in elastic search as an elastic document that can be summed up as a json object of information. The results are for these documents. If we wanted to return the URLs it would just be some minor post processing. Since there are external dependencies on this API it is probably best to create a new API: current http://localhost:50005/search?s=<term> new http://localhost:50005/url/search?s=<term> This way we can have the service live before it is used live by the app and the current one can still be used unhindered until the adjustment in the app is ready. Also it reduces golive dependencies. The drawback is now we have 2 apis.
kauffj commented 2018-05-22 23:36:50 +02:00 (Migrated from github.com)

I was going through tickets assigned to me and realized I never saw above until now. Good catch of a real issue, but I'd propose the proper fix for this is likely to add a version parameter the endpoint to allow graceful migration, not diverge endpoints.

I was going through tickets assigned to me and realized I never saw above until now. Good catch of a real issue, but I'd propose the proper fix for this is likely to add a version parameter the endpoint to allow graceful migration, not diverge endpoints.
tiger5226 commented 2018-05-23 03:31:15 +02:00 (Migrated from github.com)

Yeah that makes sense! Will do

Yeah that makes sense! Will do
tiger5226 commented 2018-09-15 16:22:10 +02:00 (Migrated from github.com)

Looking here it appears we use the name and claim id to build a URI and that is what gets used. So we do not actually need another api, we can just return these things. However, for debugging the api, this additional information is valuable.

Looking [here]( https://github.com/lbryio/lbry-redux/blob/7507cfd2f77a0f27b6eb678481df80b54cfa49d4/src/redux/actions/search.js#L59-L62) it appears we use the name and claim id to build a URI and that is what gets used. So we do not actually need another api, we can just return these things. However, for debugging the api, this additional information is valuable.
tiger5226 commented 2018-09-15 16:22:53 +02:00 (Migrated from github.com)

New results without sending this information is a lot smaller.

screen shot 2018-09-15 at 10 21 56 am
New results without sending this information is a lot smaller. <img width="1245" alt="screen shot 2018-09-15 at 10 21 56 am" src="https://user-images.githubusercontent.com/3402064/45587243-4a5bb580-b8d1-11e8-8af2-c172d90fa9a0.png">
tiger5226 commented 2018-09-15 16:29:12 +02:00 (Migrated from github.com)

@seanyesmunt we have the PR ready, but I just wanted to double check with you that we are not using the search APIs anywhere else.

@seanyesmunt we have the PR ready, but I just wanted to double check with you that we are not using the search APIs anywhere else.
neb-b commented 2018-09-17 15:32:33 +02:00 (Migrated from github.com)

They are used in the android app. It would definitely be a smoother transition to include versioning in the api

They are used in the android app. It would definitely be a smoother transition to include versioning in the api
kauffj commented 2018-09-17 16:11:34 +02:00 (Migrated from github.com)

I'm not 100% on what I meant by this, but I think I just meant to actually include the URL so it doesn't have to be reconstructed, not to actually strip out the metadata (though that may be fine).

Ideally, it'd return the canonical url.

I'm not 100% on what I meant by this, but I _think_ I just meant to actually include the URL so it doesn't have to be reconstructed, not to actually strip out the metadata (though that may be fine). Ideally, it'd return [the canonical url](https://github.com/lbryio/lbry/issues/958).
tiger5226 commented 2018-09-18 01:26:04 +02:00 (Migrated from github.com)

@seanyesmunt Yeah I agree. However, in the link I posted for lbry-redux (both mobile and desktop) we are only using those two values. So I don't think this versioning work needs be done right now.

@kauffj I would think parsing these two pieces of information is more costly than concatenating where the api is used. I feel like the search api should just give the raw information (name and claimId) needed with as little work as possible. It makes the API more flexible for future usage. Also, showing the full claim information also makes it as flexible as possible. I would ultimately, be in favor of leaving the API unchanged in case we want to use this information in the future.

@seanyesmunt Yeah I agree. However, in the link I posted for lbry-redux (both mobile and desktop) we are only using those two values. So I don't think this versioning work needs be done right now. @kauffj I would think parsing these two pieces of information is more costly than concatenating where the api is used. I feel like the search api should just give the raw information (name and claimId) needed with as little work as possible. It makes the API more flexible for future usage. Also, showing the full claim information also makes it as flexible as possible. I would ultimately, be in favor of leaving the API unchanged in case we want to use this information in the future.
kauffj commented 2018-09-18 18:01:41 +02:00 (Migrated from github.com)

It's still not performant to display/determine the shortest possible canonical URL, but I'm fine waiting on the daemon to fix that.

That said, even without proper canonical URLs, I do think it makes sense to return the full URL. It'd be like if a search engine returned the host and TLD separately (technically host includes TLD but you get what I mean).

It's still not performant to display/determine the shortest possible canonical URL, but I'm fine waiting on the daemon to fix that. That said, even without proper canonical URLs, I do think it makes sense to return the full URL. It'd be like if a search engine returned the host and TLD separately (technically host includes TLD but you get what I mean).
neb-b commented 2018-09-18 18:55:46 +02:00 (Migrated from github.com)

@tiger5226 Whoops. You are right. I thought we were handling that data differently.

@tiger5226 Whoops. You are right. I thought we were handling that data differently.
tiger5226 commented 2018-09-19 05:03:17 +02:00 (Migrated from github.com)

All clear. I will leave the current fields alone and add one for the canonical url which should be:
<claim_name>#<claim_id> is this correct @kauffj ?

I reviewed the issue linked, but I don't see what exactly we want the canonical url to be. Sorry if I missed something.

All clear. I will leave the current fields alone and add one for the canonical url which should be: `<claim_name>#<claim_id>` is this correct @kauffj ? I reviewed the issue linked, but I don't see what exactly we want the canonical url to be. Sorry if I missed something.
kauffj commented 2018-09-19 16:58:21 +02:00 (Migrated from github.com)

@tiger5226 no, the canonical URL is more complex than that. Given full access to chainquery however, it should be possible to compute it in a single query/pass.

I updated https://github.com/lbryio/lbry/issues/958 to (hopefully) be significantly more clear

@tiger5226 no, the canonical URL is more complex than that. Given full access to chainquery however, it should be possible to compute it in a single query/pass. I updated https://github.com/lbryio/lbry/issues/958 to (hopefully) be significantly more clear
tiger5226 commented 2018-09-23 16:47:28 +02:00 (Migrated from github.com)

@kauffj This seems very complex. In a given query each row would have to compare against all other rows with a similar set of fields. Do you think this is something I should adjust Chainquery to calculate when a new claim comes in? Also what is the impact if a claim is abandoned? I presume that doesn't matter. If it existed in the blockchain at some point it impacts the canonical url. Is this correct? Adjusting Chainquery seems like a good idea. I could see someone using the API to gain quick access to the canonical URL or getting the claimid from a canonical url. If chainquery has already done these calculations it will be much faster. Also the adjustment to the lighthouse query becomes super simple as well. We just add a field to grab and push to elastic search.

@kauffj This seems very complex. In a given query each row would have to compare against all other rows with a similar set of fields. Do you think this is something I should adjust Chainquery to calculate when a new claim comes in? Also what is the impact if a claim is abandoned? I presume that doesn't matter. If it existed in the blockchain at some point it impacts the canonical url. Is this correct? Adjusting Chainquery seems like a good idea. I could see someone using the API to gain quick access to the canonical URL or getting the claimid from a canonical url. If chainquery has already done these calculations it will be much faster. Also the adjustment to the lighthouse query becomes super simple as well. We just add a field to grab and push to elastic search.
kauffj commented 2018-09-24 16:49:19 +02:00 (Migrated from github.com)

@tiger5226 canonical URLs cannot (and won't) change even if a claim is abandoned.

We can probably wait to handle this in lighthouse until it's been added at lower levels.

@tiger5226 canonical URLs cannot (and won't) change even if a claim is abandoned. We can probably wait to handle this in lighthouse until it's been added at lower levels.
tiger5226 commented 2018-09-25 04:18:56 +02:00 (Migrated from github.com)

ok I will put this in the icebox.

ok I will put this in the icebox.
alyssaoc commented 2018-09-26 16:45:24 +02:00 (Migrated from github.com)

@tiger5226 Is this ready for review or going to the icebox?

@tiger5226 Is this ready for review or going to the icebox?
tiger5226 commented 2018-09-27 04:33:42 +02:00 (Migrated from github.com)

icebox, corrected the pipeline to triaged.

icebox, corrected the pipeline to triaged.
lyoshenka commented 2018-10-17 15:38:23 +02:00 (Migrated from github.com)

closing until it's been added at a lower level

closing until it's been added at a lower level
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/lighthouse.js#31
No description provided.