Local ytsync #112

Open
opened 2021-09-28 17:13:04 +02:00 by lyoshenka · 1 comment
lyoshenka commented 2021-09-28 17:13:04 +02:00 (Migrated from github.com)

We'd like to let users run their own ytsync process to sync their youtube channel to LBRY without relying on the Odysee service.

Assumptions

  • user has youtube-dl installed locally
  • user has the SDK installed and running
  • user has a youtube channel and a LBRY channel

Corner cases

  • disk out of space
  • no channel created on LBRY yet
  • channel key not in wallet
  • not enough LBC
  • video already synced
  • failed to reflect
  • youtube throttling
  • the are probably many others. comb through existing code to identify more

Iterations

v1 - Sync a single video

Get local sync working for a single video. Don't worry about any corner cases yet.

Inputs: youtube video ID, LBRY channel ID, path to temp dir
Outputs: a video is published to the LBRY channel

  • download the video and metadata
    • Important metadata for us includes title, description, bitrate and dimensions, release time, etc. I believe code already exists for all of this.
  • generate a thumbnail
  • publish the thumbnail and video to LBRY
  • make sure content is reflected
  • clean up temp data

v2 - Identify synced videos

Go through a youtube channel and figure out which videos have already been synced to a LBRY channel

If a video has already been synced (by Odysee or by a previous run of local ytsync), it should never be synced again. Local ytsync should detect that, or in the worst case should make a strong guess or get a human involved.

Inputs: youtube channel ID, LBRY channel ID
Outputs: a list of youtube video IDs that have not been synced yet

There are several options:

  • assume everything before some video is synced
    • Useful for testing and picking up from a previous sync
  • get/put sync info from Odysee by proving you have private key for channel
    • Probably the easiest good way
  • scan yt channel and try to match up which videos are not synced yet
    • this would be heuristic-based but we have some strong signals to identify a video
    • video id is usually part of existing video descriptions or thumbnail names

In addition we may want to update claim protobuf so you can explicitly tag videos as having been synced from youtube. This would be the best way to identify synced content in the future.

In any case, its a good idea to locally cache what videos have been synced successfully.

v3 - Continuous sync

  • run in the background
  • regularly check for new videos (via polling or websub or something else)
    • The faster a new video can go from "published to yt" to "synced to LBRY", the better
  • for each new video, sync it
    • before starting sync, check with Odysee that it has not started the sync yet
    • notify Odysee that a sync has been started, and then again when the sync is done
    • This requires a endpoints in the Odysee API
  • detect corner cases and get a human involved

v4 - Hardening and optimization

  • fix all corner cases
  • sync new content as fast as possible
  • ensure local list of already synced content is filled on startup
  • consider dropping reliance on the SDK. There is standalone Go code for most of the things it does (search, publish, reflect). If you understand how these things work, you could even do this as part of v1 and it may even make your life easier.

v5 - Contribute to Odysee

Figure out how to help odysee with centralized ytsync. Simplest form is detecting new videos from other syncing channels. Harder is participating in sync for those channels.

v6 - Cluster

Run a local cluster of ytsync nodes that split the work of syncing content. They should share a queue and data store, keep lists of synced content up to date, etc.

Notes on Odysee API endpoints

  • start/stop odysee ytsync
  • get list of synced videos
  • mark a video as synced

One gotcha is duplicate syncs (race condition). Make sure you don't start syncing something at the same time that odysee starts syncing it. Maybe need a way to mark a sync as in-progress.

Another gotcha is what happens if you start syncing a video but Odysee doesn't know about this video yet (maybe notification didn't arrive or was missed). How can you mark it as synced? Maybe once you turn sync off, you're just taking responsibility for it?

We'd like to let users run their own ytsync process to sync their youtube channel to LBRY without relying on the Odysee service. ## Assumptions - user has [youtube-dl](https://github.com/yt-dlp/yt-dlp) installed locally - user has [the SDK](https://github.com/lbryio/lbry-sdk) installed and running - user has a youtube channel and a LBRY channel ## Corner cases - disk out of space - no channel created on LBRY yet - channel key not in wallet - not enough LBC - video already synced - failed to reflect - youtube throttling - the are probably many others. comb through existing code to identify more ## Iterations ### v1 - Sync a single video Get local sync working for a single video. Don't worry about any corner cases yet. **Inputs**: youtube video ID, LBRY channel ID, path to temp dir **Outputs**: a video is published to the LBRY channel - download the video and metadata - Important metadata for us includes title, description, bitrate and dimensions, release time, etc. I believe code already exists for all of this. - generate a thumbnail - publish the thumbnail and video to LBRY - make sure content is reflected - clean up temp data ### v2 - Identify synced videos Go through a youtube channel and figure out which videos have already been synced to a LBRY channel If a video has already been synced (by Odysee or by a previous run of local ytsync), it should never be synced again. Local ytsync should detect that, or in the worst case should make a strong guess or get a human involved. **Inputs**: youtube channel ID, LBRY channel ID **Outputs**: a list of youtube video IDs that have not been synced yet There are several options: - assume everything before some video is synced - Useful for testing and picking up from a previous sync - get/put sync info from Odysee by proving you have private key for channel - Probably the easiest good way - scan yt channel and try to match up which videos are not synced yet - this would be heuristic-based but we have some strong signals to identify a video - video id is usually part of existing video descriptions or thumbnail names In addition we may want to update claim protobuf so you can explicitly tag videos as having been synced from youtube. This would be the best way to identify synced content in the future. In any case, its a good idea to locally cache what videos have been synced successfully. ### v3 - Continuous sync - run in the background - regularly check for new videos (via polling or websub or something else) - The faster a new video can go from "published to yt" to "synced to LBRY", the better - for each new video, sync it - before starting sync, check with Odysee that it has not started the sync yet - notify Odysee that a sync has been started, and then again when the sync is done - This requires a endpoints in the Odysee API - detect corner cases and get a human involved ### v4 - Hardening and optimization - fix all corner cases - sync new content as fast as possible - ensure local list of already synced content is filled on startup - consider dropping reliance on the SDK. There is standalone Go code for most of the things it does (search, publish, reflect). If you understand how these things work, you could even do this as part of v1 and it may even make your life easier. ### v5 - Contribute to Odysee Figure out how to help odysee with centralized ytsync. Simplest form is detecting new videos from other syncing channels. Harder is participating in sync for those channels. ### v6 - Cluster Run a local cluster of ytsync nodes that split the work of syncing content. They should share a queue and data store, keep lists of synced content up to date, etc. ## Notes on Odysee API endpoints - start/stop odysee ytsync - get list of synced videos - mark a video as synced One gotcha is duplicate syncs (race condition). Make sure you don't start syncing something at the same time that odysee starts syncing it. Maybe need a way to mark a sync as in-progress. Another gotcha is what happens if you start syncing a video but Odysee doesn't know about this video yet (maybe notification didn't arrive or was missed). How can you mark it as synced? Maybe once you turn sync off, you're just taking responsibility for it?
pseudoscalar commented 2021-11-17 17:09:57 +01:00 (Migrated from github.com)

Note for beyond v1:
Check for YouTube API request limits and halt sync if gone over.
Getting the release date right is pretty important, and any failure to determine it should probably stop the sync.

Note for beyond v1: Check for YouTube API request limits and halt sync if gone over. Getting the release date right is pretty important, and any failure to determine it should probably stop the sync.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: LBRYCommunity/ytsync#112
No description provided.