Synchronizing / indexing system files?

Rex4748 · March 11, 2020, 2:08pm

I was hoping to use Cells as a basic file viewer for my CentOS server. All the files are being created by the system, not through Cells.

I created a data source with my folder, and added it to a new workspace. However, when I add files to the folder, they do not show up automatically in Cells. I used Pydio 8 Community a bit before this, and that seemed to work fine. In order for them to show up, I have to re-synchronize the datasource.

So, I have a few questions:

Does synchronizing also index the files in the database, for faster searching?
Is there a way to automate this, and if so, what is the best way?
Do all the files need to be re-indexed every time this happens, or does it only need to synchronize the new files?

Question 3 is especially important because the folder is going to have tens of thousands of text files, and having to re-index them all every time likely won’t be practical, and I’ll have to find a different solution.

Thank you.

c12simple · March 13, 2020, 9:14am

Yes. Synchronizing is indexation process. It does not make a copied version of data.
If you change files directly from file system, the indexation is required. However, cells will sync just changed parts in directory
Yes, you can use it from cli:
/home/cells data sync --service=pydio.grpc.data.sync.datasource_name --path=/

Rex4748 · March 13, 2020, 11:38am

Oh, that’s great. So if I set up a cron task for ’ /home/cells data sync --service=pydio.grpc.data.sync.datasource_name --path=/’ (with my paths) to run every 5 minutes or so, it will check the folder for new files only, and add those new files to the database? Which should mean even with folders that have 50,000 text files or something, the performance should be alright? Just confirming.

Is there a way to see which files are indexed, or which files are being indexed as the sync is running?

c12simple · March 13, 2020, 11:42am

No way

cron job is ok, but it should be run with “pydio” user. Should be longer thang 5 minutes depending on the size of data. 50k text files => I think 20 minutes is fine.

Rex4748 · March 13, 2020, 12:14pm

Why 20 minutes? If the bulk of the files are already indexed and it’s only scanning for new files, would it have that much of a performance impact?

Topic		Replies	Views
API for Synchronizing / indexing system files Pydio Cells api	6	591	July 2, 2020
Possible delayed sync issues? Pydio Cells	5	587	May 11, 2021
[SOLVED]Pydio Cells Synchronization Pydio Cells	1	517	October 10, 2019
Sync DataSource Pydio Cells	7	1650	November 3, 2020
Crazy synchronisation issues! (NOT a PydioSync question!) Pydio Cells	0	393	May 24, 2019

Synchronizing / indexing system files?

Related topics