Goodbye, structured file storage? Oops

Describe your issue in detail

Cells v5.0.0 (CE) doesn’t seem to be able to open any files inside my structured storage.

The data sources exist; they were properly configured; they have been assigned to workspaces whose root folders show up to users when browsing (on the Web UI); all of them worked perfectly and flawlessly before testing out on the Pydio Cells v.5 beta.

Now every workspace folder shows as empty on the Web UI, even if all the files are definitely available on the filesystem — as they should.

Although the draft v5 documentation still mentions structured storage, and still explains the use cases for them, it seems that they have been discontinued:

This “format” is decoupled from the storage technology : any datasource (Local FS, S3 or S3 compatible, etc…) supports both formats.

Well, not any longer.

Although I’m not quite happy about the sudden and unexpected change, this is v5 after all, and that means ‘things break’. However, I’d expect that at least the whole bulk of data would have been converted automatically in the background.

Allegedly, there is a way (actually two) to convert between the two formats:

To recap, there is one way of doing backups: start with a structured data source, do a snapshot; then the snapshot can be loaded directly into a flat file data source. But, of course, it means you need to have done such a snapshot using the “old” version of Cells… otherwise, it’s now too late!

And there is one way of ‘migrating’ from one format to the other: that requires using the migrate command from the CLI. This, however, assumes an existing, working, fully functional structured data source, i.e., one that is recognised by Cells as being a data source. Since that isn’t the case, I’m not quite sure how I should proceed.

More details, logs, errors…

Cells is being launched from systemd, and the journal is an endless stream of the same message.

Note: The log files are huge — too big for these forums — so, if you really wish to read them, I’ve uploaded them to Pastebin.

100+ first lines of the log after a restart, slightly redacted for privacy reasons

Note that the memory consumption (2.1GB reported) is perfectly acceptable for my use case.

Looking under the services directory, I get the following: (oneworkspace and anotherworkspace are redacted names)

Contents of the services directory
├── pydio.gateway.data
├── pydio.grpc.activity
│   ├── activities.db
│   ├── fifo-idmChanges
│   │   ├── 000012.log
│   │   ├── CURRENT
│   │   ├── CURRENT.bak
│   │   ├── GOQUE
│   │   ├── LOCK
│   │   ├── LOG
│   │   └── MANIFEST-000013
│   ├── fifo-metaChanges
│   │   ├── 000001.log
│   │   ├── CURRENT
│   │   ├── GOQUE
│   │   ├── LOCK
│   │   ├── LOG
│   │   └── MANIFEST-000000
│   └── fifo-treeChanges
│       ├── 000002.ldb
│       ├── 000005.ldb
│       ├── 000006.log
│       ├── CURRENT
│       ├── CURRENT.bak
│       ├── GOQUE
│       ├── LOCK
│       ├── LOG
│       └── MANIFEST-000007
├── pydio.grpc.data.objects
│   └── default
│       ├── local1
│       │   └── config.json.deprecated
│       └── local2
│           └── config.json.deprecated
├── pydio.grpc.data.sync.oneworkspace
├── pydio.grpc.data.sync.cellsdata
├── pydio.grpc.data.sync.anotherworkspace
├── pydio.grpc.data.sync.maindatastorage
├── pydio.grpc.data.sync.personal
├── pydio.grpc.data.sync.pydiods1
├── pydio.grpc.docstore
│   ├── docstore.bleve
│   │   ├── index_meta.json
│   │   └── store
│   │       └── root.bolt
│   └── docstore.db
├── pydio.grpc.install
│   └── <no value>
├── pydio.grpc.jobs
│   ├── jobs.db
│   └── tasklogs.bleve
│       ├── index_meta.json
│       └── store
│           ├── 000000548d55.zap
│           └── root.bolt
├── pydio.grpc.log
│   └── syslog.bleve
│       ├── index_meta.json
│       └── store
│           └── root.bolt
├── pydio.grpc.mailer
│   └── queue.db
├── pydio.grpc.search
│   ├── fifo-search
│   │   ├── 000001.log
│   │   ├── CURRENT
│   │   ├── GOQUE
│   │   ├── LOCK
│   │   ├── LOG
│   │   └── MANIFEST-000000
│   └── searchengine.bleve
│       ├── index_meta.json
│       └── store
│           ├── 000000000002.zap
│           └── root.bolt
├── pydio.grpc.tasks
│   ├── fifo-jobs
│   │   ├── 000076.log
│   │   ├── 000078.ldb
│   │   ├── CURRENT
│   │   ├── CURRENT.bak
│   │   ├── GOQUE
│   │   ├── LOCK
│   │   ├── LOG
│   │   └── MANIFEST-000077
│   ├── fifo-topic.pydio.meta.nodes.changes
│   │   ├── 000001.log
│   │   ├── CURRENT
│   │   ├── GOQUE
│   │   ├── LOCK
│   │   ├── LOG
│   │   └── MANIFEST-000000
│   └── fifo-topic.pydio.tree.nodes.changes
│       ├── 000002.ldb
│       ├── 000005.ldb
│       ├── 000006.log
│       ├── CURRENT
│       ├── CURRENT.bak
│       ├── GOQUE
│       ├── LOCK
│       ├── LOG
│       └── MANIFEST-000007
├── pydio.grpc.update
│   ├── cells-v4.9.92-dev.20251121093310
│   └── cells-v4.9.94-alpha01
└── pydio.grpc.versions
    └── versions.db

Notes

  1. All entries under pydio.grpc.data.sync.* are empty,
  2. All permissions are absolutely correct for the user/group under which cells is running
  3. The entry shown as <no value> is actually a file with that exact name! Its binary content is:
Hexdump of pydio.grpc.install/<no value>
00000000  00 00 00 00 00 00 00 00  04 00 00 00 00 00 00 00  |................|
00000010  ed da 0c ed 02 00 00 00  00 10 00 00 00 00 00 00  |................|
00000020  03 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  02 00 00 00 00 00 00 00  04 00 00 00 00 00 00 00  |................|
00000040  00 00 00 00 00 00 00 00  ee fd 89 46 11 6e 51 07  |...........F.nQ.|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000  01 00 00 00 00 00 00 00  04 00 00 00 00 00 00 00  |................|
00001010  ed da 0c ed 02 00 00 00  00 10 00 00 00 00 00 00  |................|
00001020  03 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00001030  02 00 00 00 00 00 00 00  04 00 00 00 00 00 00 00  |................|
00001040  01 00 00 00 00 00 00 00  0f 48 79 51 1a 35 4c 26  |.........HyQ.5L&|
00001050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00002000  02 00 00 00 00 00 00 00  10 00 00 00 00 00 00 00  |................|
00002010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00003000  03 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  |................|
00003010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00004000

The errors, therefore, are considerably less than before. But are the files there? No, everything continues to be as empty as before, including the personal data files.

This gave me an idea: what happens when I upload some files to my personal data folder? It started empty (like the others). Via the web UI, I uploaded a few folders of images that I knew I had never uploaded before. And, sure thing, they “appeared” there — on the Web UI side of things, or via WebDAV, they were clearly “there”.

But none of my “older” documents were.

Looking at the directories, I now found a lot of files with UUID names on the root directory of data/personal/, like this:

# ls -la data/personal/
total 19M
drwxr-xr-x   7 cellsuser cellsgroup 4.0K May 31 12:21 .
drwxr-xr-x  13 cellsuser cellsgroup 4.0K Nov 24  2025 ..
drwxr-xr-x  24 cellsuser cellsgroup  20K Nov 22  2025 gwynethllewelyn
drwxr-xr-x   2 cellsuser cellsgroup 4.0K Nov 22  2025 anotheruser
drwxr-xr-x   3 cellsuser cellsgroup 4.0K Nov 22  2025 otheruser
drwxr-xr-x 102 cellsuser cellsgroup  12K Nov 22  2025 stillanotheruser
-rw-r--r--   1 cellsuser cellsgroup 4.3K May 31 12:21 10280960-4a21-4642-b77f-e3652cd0b97f
-rw-r--r--   1 cellsuser cellsgroup 6.9M May 31 12:21 2fbb392f-a956-45b7-b4a6-4fbad8f71af0
-rw-r--r--   1 cellsuser cellsgroup  75K May 31 12:21 374053cc-01e7-45f4-a544-d52846e75800
-rw-r--r--   1 cellsuser cellsgroup 1.7M May 31 12:21 43621dde-23d2-49fa-aa18-a0d729ba26d5
[...]
-rw-r--r--   1 cellsuser cellsgroup 175K May 31 12:21 d5b66e04-6fbf-4ff2-906b-8f37d3fe8b8d
-rw-r--r--   1 cellsuser cellsgroup 1.7M May 31 12:21 f5e13d16-f432-46fd-90df-605f3590cdbd
-rw-r--r--   1 cellsuser cellsgroup  32K Nov 24  2025 snapshot.db

Oh, okay. It seems that Cells can only use flat file storage — no more structured filesystems!

So that essentially

What version of Cells are you using?

Just upgraded to v5.0.0! (Community Edition)
(I was running the v5 beta)

What is the server OS? Database name/version? Browser name or mobile device description (if issue appears client-side)?

  • Ubuntu Linux (x86_64) 24.04.4 LTS (8 cores, gazillions of unused RAM and disk space)
  • mysqld Ver 10.11.14-MariaDB-0ubuntu0.24.04.1-log for debian-linux-gnu on x86_64 (Ubuntu 24.04)
  • Issue is browser-independent, but I’ve mostly tested with Brave and Safari on macOS Tahoe (arm64)

What steps have you taken to resolve this issue already?

Panic?..

The first thing I did, obviously, was to deal with the “path error”. Maybe the bleve database was corrupted…? So, I stopped cells again, moved the whole pydio.grpc.log directory to /tmp, allowing cells to recreate it from scratch, and launched cells again:

Logs after
May 31 12:03:09 myserver systemd[1]: Stopping cells.service - Pydio Cells...
May 31 12:03:09 myserver systemd[1]: cells.service: Deactivated successfully.
May 31 12:03:09 myserver systemd[1]: Stopped cells.service - Pydio Cells.
May 31 12:03:09 myserver systemd[1]: cells.service: Consumed 2min 25.600s CPU time, 395.8M memory peak.
May 31 12:04:18 myserver systemd[1]: Starting cells.service - Pydio Cells...
May 31 12:04:18 myserver systemd[1]: Started cells.service - Pydio Cells.
May 31 12:04:19 myserver cells[506030]: Starting Pydio Cells Home Edition
May 31 12:04:19 myserver cells[506030]: Version:        5.0.0 (0946aa462c58f04f559c1acfa1543362c2743d5f) built on 26 May26 07:07 +0000
May 31 12:04:19 myserver cells[506030]: Go Build:        go1.26.3 (amd64)
May 31 12:04:19 myserver cells[506030]: Working Dir:         /my/path/to/cells
May 31 12:04:19 myserver cells[506030]: Binding service to '127.0.0.1:8443'
May 31 12:04:19 myserver cells[506030]: Checking config at file:///my/path/to/cells/pydio.json?readOnly=true
May 31 12:04:19 myserver cells[506030]: ✔ Default datasource set. A config process has already been performed.
May 31 12:04:19 myserver cells[506030]: {"level":"info","ts":1780225459.1050253,"msg":"redirected default logger","from":"stderr","to":"caddy.logging.writers.cells"}
May 31 12:04:20 myserver cells[506030]: {"level":"error","ts":"2026-05-31T12:04:20+01:00","logger":"pydio.grpc.log","msg":"Cannot open index on /my/path/to/cells/services/pydio.grpc.log/syslog.bleve, original openError was timeout, next New was cannot create new index, path already exists ","tag":"broker"}
May 31 12:04:20 myserver cells[506030]: {"level":"error","ts":"2026-05-31T12:04:20+01:00","logger":"pydio.grpc.log","msg":"[GRPC]/log.LogRecorder/PutLog cannot create new index, path already exists","errorId":"65ec2877-6e21","ClientCaller":"github.com/pydio/cells/v5/common/telemetry/log/service/sync.go:115:service.(*LogSyncer).logSyncerClientReconnect()","error":"cannot create new index, path already exists","tag":"broker"}
May 31 12:04:20 myserver cells[506030]: {"level":"error","ts":"2026-05-31T12:04:20+01:00","logger":"pydio.grpc.log","msg":"Cannot open index on /my/path/to/cells/services/pydio.grpc.log/syslog.bleve, original openError was timeout, next New was cannot create new index, path already exists ","tag":"broker"}
May 31 12:04:20 myserver cells[506030]: {"level":"error","ts":"2026-05-31T12:04:20+01:00","logger":"pydio.grpc.log","msg":"[GRPC]/log.LogRecorder/PutLog cannot create new index, path already exists","errorId":"675ccd61-3d4d","ClientCaller":"github.com/pydio/cells/v5/common/telemetry/log/service/sync.go:115:service.(*LogSyncer).logSyncerClientReconnect()","error":"cannot create new index, path already exists","tag":"broker"}
`
``

After the restart, the pydio.grpc.log directory was recreated from scratch, and rewritten exactly as before. pydio.grpc.log/syslog.bleve/index_meta.js, in particular, is identical to the one before and contains:

{"storage":"boltdb","index_type":"scorch"}

(no newline at the end)

Nevertheless, under dir, I now get the expected bolt archive: 000000000077.zap.

But of course no automatic ‘migration’ process was triggered whatsoever.

The next step was to attempt to run the manual migration process. As per the above-mentioned instructions page, I stopped Cells and started it in the ‘special’ mode:

cells start -x pydio.grpc.data.sync

I got a list of all my data sources, and, as expected, all were flagged as “flat”. I picked one that I could afford to completely destroy (if things went dramatically wrong!), which, unsurprisingly, is the personal data source, mostly used for experiments (I have backups elsewhere)

And then I ran:

$ ./cells admin datasource migrate

 **************************************************************************************
 * To run this command, please first make sure to **NOT** run Cells in a normal mode. *
 * You must exclude sync services, by running `./cells start -x pydio.grpc.data.sync` *
 **************************************************************************************

? Are you sure that sync services are NOT running? [y/N] y█
✔ personal (flat)
Migrating flat datasource personal to structured - Original bucket is personal
The following bucket will be used for migrating data, change the name if you want: personal-structured
Error: open registry.Registry: no scheme in URL "/registry"
Usage:
  ./cells admin datasource migrate [flags]

Flags:
  -d, --dry-run      Do not apply any changes
  -f, --force        Skip initial warning
  -h, --help         help for migrate
  -m, --move-files   Delete original files after copying to new bucket

Global Flags:
      --cluster string               Name of the cluster for the node (default "default")
      --config string                Configuration storage URL. Supported schemes: etcd|file|grpc|mem|vault|vaults|xds (default "file:///my/config/location/path/pydio.json")
      --grpc_client_timeout string   Default timeout for long-running GRPC calls, expressed as a golang duration (default "60m")

open registry.Registry: no scheme in URL "/registry"

I don’t know what exactly happened here or where to search for errors (Caddy’s log file — which might have something to do with registered endpoints — is absolutely empty).

Currently, my best choice, therefore, is to use cells-client and to manually copy everything into the right place…

Hi,

Structured datasource are not deprecated yet but are not enabled by default. You can enable it by enabling the “Allow structured format storage” box in the “Admin Driver” plugin of the “All plugins” section. You will need a restart for the change to apply. I’ll update the documentation to reflect this.

Now for the problem itself, it seems that the bleve logs driver has issues reopening a v4 index. I’ll have a look at that. Would it be ok to share that log directory with us ?

And for the datasource migrate issue, I already have a fix that will be available in the next version.

Thanks,

Greg