CellsSync Slow Performance [Benchmark]

I’m evaluating a fresh Pydio Cells install inside VirtualBox and running into very slow sync speeds when trying to sync a folder with lots of small files. I’m wondering if someone can try running the same benchmark and report their results.

I’ve started the sync 45 minutes ago and the current status is:

The folder it’s syncing has approximately 305.031 items and is 2,91 GB in size. The screenshot shows 4.55 GB but that’s only because I tried a single 2.86 GB file first. Syncing one large file wasn’t so slow.

The test is to sync the extracted contents of the Firefox source code.

I installed Cells with Docker:

version: '2'
services:

  cells:
    image: pydio/cells:latest
    restart: unless-stopped
    ports: ["8080:8080"]
    volumes:
      - /mnt2/cells-test:/var/cells
    
    environment:
      - CELLS_LOG_LEVEL=production
      - CELLS_BIND=192.168.1.174:8080

  mysql:
    image: mysql:5.7
    restart: unless-stopped
    ports: ["3306:3306"]
    environment:
      MYSQL_ROOT_PASSWORD: P@ssw0rd
      MYSQL_DATABASE: cells
      MYSQL_USER: pydio
      MYSQL_PASSWORD: P@ssw0rd
    command: [mysqld, --character-set-server=utf8mb4, --collation-server=utf8mb4_unicode_ci]

Task Manager of the VM host:

CellsSync is installed on a Mac in the same LAN. Both machines use a wired gigabit connection (no WiFi).

I remember trying several alternatives years ago and only Seafile synced quickly for folders with many small files. Later I tried Resilio Sync, which is also fast. Is it possible to get good performance with folders like this using Pydio? Or is Pydio not suitable for this workload?

For comparison: Syncthing took 20 minutes to sync the whole directory.

Anyone willing to try and report the sync speed with your CellsSync setup?

Hi Jip-Hop
Thanks for your detailed report.
Indeed, syncing tons of small files is not the best usecase for Cells Sync, and tools like seafile or syncthing are probably best suited for that.
That said, we are looking at a way to archive and send the whole folder as a bunch for the very first indexation. On the roadmap…
Also the next version will bring a new type of datasource that will make the api more synchronous which may drastically speed up the process, but we are still working on it in a dev branch (@next).
-c

Hi @charles. Has the recent v3 release made any improvements with regards to this use-case (syncing tons of small files)?

I can’t repeat my previous benchmark exactly, but I just ran Pydio Cells Home Edition 3.0.3 with Docker Desktop on MacOS. I installed CellsSync on the same Mac and tried to sync the same Firefox code repository I synced in my previous benchmark. The default storage, since v3, now uses Flat Datasources. I confirmed this is indeed the case for the personal datasource to which I’m syncing my test data.

I’ve been syncing for about 30 minutes now, and so far CellsSync was able to sync just 8035 out of 284.036 files.

This is the Docker Compose file I used:

version: '3.7'
services:

  cells:
    image: pydio/cells:latest
    restart: unless-stopped
    ports: ["8080:8080"]
    environment:
      - CELLS_LOG_LEVEL=production
      - CELLS_BIND=:8080
    volumes:
      - data:/var/cells/data
      - cellsdir:/var/cells

  mysql:
    image: mysql:5.7
    restart: unless-stopped
    ports: ["3306:3306"]
    environment:
      MYSQL_ROOT_PASSWORD: P@ssw0rd
      MYSQL_DATABASE: cells
      MYSQL_USER: pydio
      MYSQL_PASSWORD: P@ssw0rd
    command: [mysqld, --character-set-server=utf8mb4, --collation-server=utf8mb4_unicode_ci]
    volumes:
      - mysqldir:/var/lib/mysql

volumes:
    data: {}
    cellsdir: {}
    mysqldir: {}

Can you confirm this slow speed is still to be expected with v3? Or is there something I’m missing, which would make this sync faster?

Cheers,
Jip

Hi @Jip-Hop at that time v3 does not improve this scenario.