- Previously, we tried to solve the "file locked" problem by only making one retry after a super long delay. This was from an anecdote that it's more likely to lock up if the delay was short.
- This didn't help at all for our case, and Andrey has made some locking mechanism changes in the backend.
- The reduced number of retries probably increased the number of "failed to upload chunk" errors (not sure), which is supposedly a normal occurrence and we're expected to keep retrying.
Restoring the retry behavior and monitor...
There is anecdote that we need to wait up to 2 minutes to preven the locking scenario.
`https://github.com/tus/tusd/pull/667#issuecomment-1079647640`
## Change
Instead of multiple retries at short intervals, do a one-time retry after a 2-minute wait. We'll do this until the fix is available in tusd v2.
## Ticket
910
## Changes
- Change the "message" from a generic "tus-upload" to more specific ones like "tus: failed to resume upload". These are grouped as "Events" in sentry, so we can isolate and search for them easily.
- Pass more info to Sentry (previously only available from Slack). It is still good to send to both, since some browsers block Sentry even without blocker extensions.
- Reduce verbosity of Slack's
## Notes
- Was unable to change the "unknown" problem mentioned in the ticket. The API does not accept `new Error('xxx')`, even though that's being mentioned by many in the forums. It might be due to the version of Sentry that we are using.
- To search for tus issues, go to "Issues" and query `message:tus*`. Results are collapsed per event, so click on the item of interest, then click "Events" at the upper right to see all occurrences of the same problem.
* Move into getLocalStorageSummary + always log
- Move into getLocalStorageSummary to clean up the clutter.
- Always log the localStorage info to get a bigger picture of what's going on with the QuotaExceededError.
* Remove 'findPreviousUploads' - we use the url stored in Redux.
Something I forgot to remove in the past. It also reads from localStorage, so remove since we are trying to avoid touching localStorage.
* Ensure localStorage is not used when uploading
I don't think it's being written when `storeFingerprintForResuming` is disabled, but doing the suggestion nonetheless.
`https://github.com/tus/tus-js-client/issues/315#issuecomment-1046821112`
We no longer ask tus to save the upload URL since December, so there should no reason for it to be writing to localStorage.
Adding more logs to determine what is the actual cause -- localStorage being full, or not available. Neither should affect the upload but they are the only known causes for that error message, so try to narrow down the investigation path.
With the throughput tweaks at the backend, it seems like the number of "file is locked" errors have reduced.
The next thing to try is to reduce the chunk size, hoping that file writes would be faster, reducing the lock duration from causing a timeout.
Completely remove any assumptions of multi-tab uploading from server status (should have done it previously, but wanted to be conservative). This should make it less confusing to the user.
The real issue still remains -- the upload is somehow locked at the backend.
Also, when we override the error to present a user-friendly message to the user, pass the original error to the log (just in case it gives extra info).
From the logs, it seems like the second retry (5s) fixes the "normal" cases, so just remove the first retry (0s).
Also from the logs, if a retry doesn't work by the third attempt (10s), it's most likely the "locked" case and retrying further doesn't help. So, reduce one more useless retry attemp.
Removed logging since we've gathered enough data, and that this hook is expected to be hit a lot, so we don't want to clog the logs.
## Background
Per developer of `tus-js-client`, it is normal to occasionally encounter upload errors. The auto-retry mechanism is meant to address this.
While implementing tab-lock to prevent multiple uploads of the same file, 423_locked was used to detect this scenario. But 423_locked could also mean "the server is busy writing the chunk" (per discussion with Randy), so we kind of disabled the auto-retry mechanism accidentally.
Meanwhile, from a prior discussion with Randy, one of the chunk-writing duration took 3 minutes. Our current maximum of "retry after 15s" wouldn't help.
## Change
1. Given that tab-locking was improved recently and no longer reliant on the server error messages (we use secure storage to mark a file as locked), reverted the change to "skip retry on 409/423". This is now back to normal recommended behavior.
2. `tus-js-client` currently does not support variable retry delay, otherwise we could prolong the delay if the error was 423. Since we know it could take up to 3 minutes, and that we don't know if it's file-size dependant, just add another 30s retry and put a friendlier message asking the user to retry themselves after waiting a bit.
Our current chunk size is 25,000,000.
Google and S3 documentation suggests the chunk size to be multiples of 256KiB. MongoDB too. We aren't using any of those, but I guess no harm doing the same. From the logs, the values "25,000,000" and "50,000,000" seems to be common.
## Ticket
418 TUS: skip fingerprint storage
- Fingerprints for canceled uploads are not being cleared by tus-js-client. It's in localStorage, and there is limit for that.
- We are storing the confirmed fingerprint (from the backend) in redux anyway, so we don't need that functionality.
* Upload: fix redux key clash
## Issue
`params` is the "final" value that will be passed to the SDK and `channel` is not a valid argument (it should be `channel_name`). Also, it seems like we only pass the channel ID now and skip the channel name entirely.
For the anonymous case, a clash will still happen when since the channel part is hardcoded to `anonymous`.
## Approach
Generate a guid in `params` and use that as the key to handle all the cases above. We couldn't use the `uploadUrl` because v1 doesn't have it.
The old formula is retained to allow users to retry or cancel their existing uploads one last time (otherwise it will persist forever). The next upload will be using the new key.
* Upload: add tab-locking
## Issue
- The previous code does detect uploads from multiple tabs, but it was done by handling the CONFLICT error message from the backend. At certain corner-cases, this does not work well. A better way is to not allow resumption while the same file is being uploading from another tab.
- When an upload from 1 tab finishes, the GUI on the other tab does not remove the completed item. User either have to refresh or click Cancel. Clicking Cancel results in the 404 backend error. This should be avoided.
## Approach
- Added tab synchronization and locking by passing the "locked" and "removed" information through `localStorage`.
## Other considered approaches
- Wallet sync -- but decided not to pollute the wallet.
- 3rd-party redux tab syncing -- but decided it's not worth adding another module for 1 usage.
* Upload: check if locked before confirming delete
## Reproduce
Have 2 tabs + paused upload
Open "cancel" dialog in one of the tabs.
Continue upload in other tab
Confirm cancellation in first tab
Upload disappears from both tabs, but based on network traffic the upload keeps happening.
(If upload finishes the claim seems to get created)
## Issue
The TUS client automatically removes the upload fingerprint whenever there is a 4xx error. When we try to resume later, we couldn't find the the fingerprint and ended up creating a new upload ID.
## Changes
Since we are also storing the uploadUrl ourselves, provided that to override the tus client's default behavior of restarting a new session on 4xx errors.
The stalling behavior has changed a bit, probably with the removal of CF.
The stall difference between 10MB and 50MB is not too noticable, so picking 25MB as a start.
## Issue
The status = 0 is due to unresponsive backend right after the tus-upload. No root-cause found yet.
## Change
It may or may not help, but adding a delay to account for the unresponsive stage for now.
## Issue/Steps
From Randy:
- started the upload then open a new tab of the same page
- one of the tab finished the upload and successfully published the file, and the other tab received 404 error on patch and head request, because the file is already removed on the server
## Changes
Use the default onRetry code that ignores all 4xx, except for LOCKED and CONFLICT. Had to duplicate some code from tus because I still need to inject the 'retry' progress for the GUI to update the string.
## Issue
If you make 2 claims from the same source file, the second upload thinks it's trying to resume from the first one. They should be unique uploads.
## Approach
Stash the upload url for comparison when looking up existing uploads to resume.
Stash that in `params` to minimize code changes. We'll just need to ensure it is cleared before we generate the SDK payload.
* Publish button: use spinner instead of "Publishing..."
Looks better, plus the preview could take a while sometimes.
* Refactor `doPublish`. No functional change
This is to allow `doPublish` to accept a custom payload as an input (for resuming uploads), instead of always resolving it from the redux data.
* Add doPublishResume
* Support resume-able upload via tus
## Issue
38 Handle resumable file upload
## Notes
Since we can't serialize a File object, we'll need to the user to re-select the file to resume.
* Exclude "modified date" for Firefox/Android
## Issue
It appears that the modification date of the Android file changes when selected, so that file was deemed "different" when trying to resume upload.
## Change
Exclude modification date for now. Let's assume a smart user.
* Move 'currentUploads' to 'publish' reducer
`publish` is currently rehydrated, so we can ride on that and don't need to store the `currentUploads` in `localStorage` for persistence. This would allow us to store Markdown Post data too, as `localStorage` has a 5MB limit per app.
We could have also made `webReducer` rehydrate, but in this repo, there is no need to split it to another reducer. It also makes more sense to be part of publish anyway (at least to me).
This change is mostly moving items between files, with the exception of
1. An additional REHYDRATE in the publish reducer to clean up the tusUploader.
2. Not clearing `currentUploads` in CLEAR_PUBLISH.
* Restore v1 code for livestream replay, etc.
v2 (tus) does not handle `remote_url`, so the app still needs v1 for that. Since we'll still have v1 code, use v1 for previews as well.