CellsSync Slow Performance [Benchmark]

I’m evaluating a fresh Pydio Cells install inside VirtualBox and running into very slow sync speeds when trying to sync a folder with lots of small files. I’m wondering if someone can try running the same benchmark and report their results.

I’ve started the sync 45 minutes ago and the current status is:

The folder it’s syncing has approximately 305.031 items and is 2,91 GB in size. The screenshot shows 4.55 GB but that’s only because I tried a single 2.86 GB file first. Syncing one large file wasn’t so slow.

The test is to sync the extracted contents of the Firefox source code.

I installed Cells with Docker:

version: '2'

    image: pydio/cells:latest
    restart: unless-stopped
    ports: ["8080:8080"]
      - /mnt2/cells-test:/var/cells
      - CELLS_LOG_LEVEL=production
      - CELLS_BIND=

    image: mysql:5.7
    restart: unless-stopped
    ports: ["3306:3306"]
      MYSQL_DATABASE: cells
      MYSQL_USER: pydio
      MYSQL_PASSWORD: P@ssw0rd
    command: [mysqld, --character-set-server=utf8mb4, --collation-server=utf8mb4_unicode_ci]

Task Manager of the VM host:

CellsSync is installed on a Mac in the same LAN. Both machines use a wired gigabit connection (no WiFi).

I remember trying several alternatives years ago and only Seafile synced quickly for folders with many small files. Later I tried Resilio Sync, which is also fast. Is it possible to get good performance with folders like this using Pydio? Or is Pydio not suitable for this workload?

For comparison: Syncthing took 20 minutes to sync the whole directory.

Anyone willing to try and report the sync speed with your CellsSync setup?

Hi Jip-Hop
Thanks for your detailed report.
Indeed, syncing tons of small files is not the best usecase for Cells Sync, and tools like seafile or syncthing are probably best suited for that.
That said, we are looking at a way to archive and send the whole folder as a bunch for the very first indexation. On the roadmap…
Also the next version will bring a new type of datasource that will make the api more synchronous which may drastically speed up the process, but we are still working on it in a dev branch (@next).