Checksum behaviour: what happens if a checksum is invalid?

Hey all,

I’m curious if anyone can shed some light on some specifics of the file hashing system in cells:

  • Does the software itself ever compare the hash value generated on upload to a fresh one to establish file integrity, or does it rely on the storage to provide data with guaranteed integrity?

  • What exactly happens in Cells if a checksum changes for a file? (due to corruption or other file damage).

  • Is the system capable of reporting files with bad integrity with a “corrupted” flag or in some other way making any damaged files visible?

Thanks all,

John.

Hi John

Currently, the hash is essentially used as an ETag, to detect easily any changes on a file inside the server. So basically this is used by the Cells Sync client to speed up changes detection.
Your usecase could be interesting indeed : as it is computed on-the-fly at upload time and then stored in a metadata, it could be possible to recompute a file checksum by re-reading the storage and compare both values. But there is nothing currently implemented for that.
As a side-note, beware that v4 introduced a new hashing mechanism, that basically computes hashes on chunks with fixed size (10MB), to ensure consistency between a file uploaded via multipart (parts can arrive in any order) or by standard PUT request. This implies that multiparts part sizes must be a multiple of 10MB.

Hey Charles, great to hear from you.

Thanks for the insight, this is very useful information. Prior to making this forum post, I put together a quick system that generates client-side checksums using WebWorkers and spark-md5.js with a similar hash chunking procedure to the one you describe in Cells, it seems to work very smoothly so I can confirm it looks like a valid approach.
I need to do some additional testing with extreme upload structures, but it seems so far that the client can keep up nicely. I’ll keep you updated if I devise any systems to replicate a “fixity” style scheduled check, but it should be straightforward for me to implement an initial comparison between the pre and post upload values.

John

Sounds really cool and definitely an interesting add-on to validate uploads integrity ! did you integrate it directly in the uploader.html plugin ?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.