Checksum behaviour: what happens if a checksum is invalid?

sundaycrunk · March 9, 2023, 11:04am

Hey all,

I’m curious if anyone can shed some light on some specifics of the file hashing system in cells:

Does the software itself ever compare the hash value generated on upload to a fresh one to establish file integrity, or does it rely on the storage to provide data with guaranteed integrity?
What exactly happens in Cells if a checksum changes for a file? (due to corruption or other file damage).
Is the system capable of reporting files with bad integrity with a “corrupted” flag or in some other way making any damaged files visible?

Thanks all,

John.

charles · March 14, 2023, 5:10pm

Hi John

Currently, the hash is essentially used as an ETag, to detect easily any changes on a file inside the server. So basically this is used by the Cells Sync client to speed up changes detection.
Your usecase could be interesting indeed : as it is computed on-the-fly at upload time and then stored in a metadata, it could be possible to recompute a file checksum by re-reading the storage and compare both values. But there is nothing currently implemented for that.
As a side-note, beware that v4 introduced a new hashing mechanism, that basically computes hashes on chunks with fixed size (10MB), to ensure consistency between a file uploaded via multipart (parts can arrive in any order) or by standard PUT request. This implies that multiparts part sizes must be a multiple of 10MB.

sundaycrunk · March 15, 2023, 5:03pm

Hey Charles, great to hear from you.

Thanks for the insight, this is very useful information. Prior to making this forum post, I put together a quick system that generates client-side checksums using WebWorkers and spark-md5.js with a similar hash chunking procedure to the one you describe in Cells, it seems to work very smoothly so I can confirm it looks like a valid approach.
I need to do some additional testing with extreme upload structures, but it seems so far that the client can keep up nicely. I’ll keep you updated if I devise any systems to replicate a “fixity” style scheduled check, but it should be straightforward for me to implement an initial comparison between the pre and post upload values.

John

charles · March 15, 2023, 5:29pm

Sounds really cool and definitely an interesting add-on to validate uploads integrity ! did you integrate it directly in the uploader.html plugin ?

system · May 14, 2023, 5:30pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cells Admin Structured Datasource Checksum Development	1	256	May 20, 2023
Problem with indexing storage (Cells 4.0.4) Pydio Cells s3	5	360	February 6, 2023
How to verify the downloaded Cells Home binaries? Pydio Cells install	2	422	February 24, 2020
NoSuchKey: Cells silently mis-uploaded files on S3 storage Pydio Cells upload	13	1190	October 9, 2022
Request signature we calculated does not match (bis) [chunked-upload] Pydio Cells upload , s3	9	967	April 29, 2022

Checksum behaviour: what happens if a checksum is invalid?

Related topics