Uploader unsuitable for folders of numerous small files

I just tried an upload test case: 3.6 GB, 770 folders, ~ 10k files.

I found the uploading process ridiculously slow: The reason seems to be that, even with a queue size of 3, the transaction overhead of almost half a second per file applying for each and every kb-sized file results in an exponential slow-down rendering the current uploader unsuitable even for a folder containing a few hundred files.

output

(I know such a comparison doesn’t hold justice for Pydio but I feel I should mention that using direct-storage s3cmd/rclone/rsync/ftp/… we would not even have the time to read the file names being uploaded)

Uploader configuration

I don’t know how related to the S3 backend this is.

In the context of this test-case I found that the *upload would simply stop a definitive way with no chance to resume… with the Uncaught (in promise) invalid token console message after one of the uploads received an unexpected 403.

Like in "PutObjectPart has failed" (the S3 upload failure monster revival) - #4 by charles no trace of the 403 or any error got logged in the backend (with all the frustration this implies)

I think this case is related to the session expiring which, in my opinion, isn’t something that must happen under any condition: Ongoing uploads are sacred and must be preserved whatever the login-timeout policy is. Anyway, I’ll see if tweaking accessTokenLifespan works (as advised in Unable to upload big files - #2 by sam0204), but in that case that should definitely be added to the documentation given the number of threads created around this very issue.

hi again raf,

i’m not sure you are having the correct approach here. Did you try any other pure-http uploader to send 10k files on a server? Any success with that? I doubt it. This is clearly not “suitable” for that usecase.
The tools you are pointing at (s3cmd, rclone, rsync, ftp) are all using different protocols (or at least a clever dedicated binary client) that we just cannot re-invent in a silly, memory-limited browser.

Why not looking at the cells-client, CellsSync, or simply uploading a couple of archives with your biggest folders?

May i add :

It is. when uploading to s3 backend, we directly forward the http PUT to the backend, making the overall process naturally slower.

And :pray: please do not reply me that we could at least improve error logging :wink: ! I know, it’s true, but believe me i really doubt it would help to know more about the error in that case.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.