"Scanning Contents" Phase of Syncronization Abysmally Slow

Describe your issue in detail

I’m currently working on importing a lot of data into my Pydio Cells instance. For the most part, this is fine, especially for smaller files. However, today I am currently working on video files, and it takes about 15 minutes to “scan” even just a 350 MiB video file, and considering I’m wanting to import over 500 movies into it, you can probably see why I am beginning to feel a bit concerned.

To perform these tasks, I am using the legacy “Import Existing Data” DataSource type.

It’s worth noting that I have ticked the “Use Native Etags” box in the Advanced Options of the DataSource (primarily cause I had issues with it unticked, but this is out of the scope of this topic) so I don’t really know what could be taking so long.

What I’m hoping to gain out of this topic is more insight into what this “scanning” is, and whether it’s something that can be skipped or changed in some way. I’m new to Pydio Cells, so I apologize if my questions are answered elsewhere, but I’m unsure where else to look, and I see no similar topics on the forum, nor issues on GitHub, or mentions in the documentation. I’m also hoping to fix another issue where it seems to occasionally “cancel” itself midway through (although I should probably put this in another topic, and I probably will after I sleep).

To show what I mean, this is on episode 19 of a show. I have been checking on it periodically to make sure it’s still working. Occasionally it fails, with the error message saying “context canceled,” even though I never canceled it. Luckily when I press Re-Synchronize, it seems to pick back up where it left off, but it’s still rather annoying to have to check on it every so often just to make sure it’s still working. All in all, I’ve probably been waiting 3 hours.

What version of Cells are you using?

Pydio Cells Home Edition 4.4.0 (02218f0a100cdc10c932e57c3bac9923bd799df1) linux/arm

What is the server OS? Database name/version??

OS: DietPi v9.2.1
Database: MariaDB Ver 15.1

What steps have you taken to resolve this issue already?

I have looked through the documentation to see what this could be. The only possibly relevant thing I can find in the documentation has to do with scanning for malware, but I am unsure whether that is the same thing.

Here’s an example of the “context cancelled” error. Nothing shows up in the log, so I just hit “Re-Sync.”

EDIT: I was wrong, I did end up finding it in the logs in the scheduler, but it didn’t provide much useful info: Error while running action actions.tree.cells-hash - context canceled

To make it short, the scanning operation computes a unique hash (signature) of each file required for further resync. It is requiring some CPU, and it’s parallelized on multiple CPUs… What’s your hardware for that? Anyway to provide more resource on your server ?
As for the “context canceled”, it has probably to be read as a timeout. The resync operation does have a default timeout to avoid being locked. So if scanning every video is super-long, this is not really surprising.

Makes sense. That said, a hash does show up even if it has not yet been scanned… I dunno.

Is it possible to lengthen the timeout? I am running it on a lower-end machine (a Pine H64), and this is my only issue I have with Cells itself.