Uploader unsuitable for folders of numerous small files

drzraf · January 2, 2023, 4:55pm

I just tried an upload test case: 3.6 GB, 770 folders, ~ 10k files.

I found the uploading process ridiculously slow: The reason seems to be that, even with a queue size of 3, the transaction overhead of almost half a second per file applying for each and every kb-sized file results in an exponential slow-down rendering the current uploader unsuitable even for a folder containing a few hundred files.

output

(I know such a comparison doesn’t hold justice for Pydio but I feel I should mention that using direct-storage s3cmd/rclone/rsync/ftp/… we would not even have the time to read the file names being uploaded)

Uploader configuration

I don’t know how related to the S3 backend this is.

In the context of this test-case I found that the *upload would simply stop a definitive way with no chance to resume… with the Uncaught (in promise) invalid token console message after one of the uploads received an unexpected 403.

Like in "PutObjectPart has failed" (the S3 upload failure monster revival) - #4 by charles no trace of the 403 or any error got logged in the backend (with all the frustration this implies)

I think this case is related to the session expiring which, in my opinion, isn’t something that must happen under any condition: Ongoing uploads are sacred and must be preserved whatever the login-timeout policy is. Anyway, I’ll see if tweaking accessTokenLifespan works (as advised in Unable to upload big files - #2 by sam0204), but in that case that should definitely be added to the documentation given the number of threads created around this very issue.

charles · January 2, 2023, 6:04pm

hi again raf,

i’m not sure you are having the correct approach here. Did you try any other pure-http uploader to send 10k files on a server? Any success with that? I doubt it. This is clearly not “suitable” for that usecase.
The tools you are pointing at (s3cmd, rclone, rsync, ftp) are all using different protocols (or at least a clever dedicated binary client) that we just cannot re-invent in a silly, memory-limited browser.

Why not looking at the cells-client, CellsSync, or simply uploading a couple of archives with your biggest folders?

May i add :

It is. when uploading to s3 backend, we directly forward the http PUT to the backend, making the overall process naturally slower.

charles · January 2, 2023, 6:06pm

And please do not reply me that we could at least improve error logging ! I know, it’s true, but believe me i really doubt it would help to know more about the error in that case.

system · March 3, 2023, 6:06pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
403 and 502 errors when uploading large files (especially ≥ 1 GB) Pydio Cells upload , s3	8	2196	December 12, 2022
Unable to upload big files Pydio Cells upload	6	2184	December 24, 2020
[Bug*] Pydio suffering when I upload big files (PutObjectPartFailed) Pydio Cells upload , authentication , docker	2	893	July 17, 2022
Issues with uploading and file sizes Pydio 8 php71	6	2551	October 31, 2023
Pydio 8.2.2 upload limit Pydio Cells	4	467	May 16, 2023

Uploader unsuitable for folders of numerous small files

Related topics