Cells 3.0 Home: Flat storage: Generating Index snapshots: An opinion

Hi there
I jumped onto 3.0 today.

I believe there is an aspect too problematic to put into ED/CE or vice versa.

All of this is mentioned in the docs:
A “flat” FS with UUIDs is coupled against the hierarchy inside cells (DB), and that is the nature of generic descriptors. Right.

However, if a DS is reconnected, that link is lost.
Only if we have a “snapshot.db” file with a DB dump are we able to essentially recover our implicit files again.

This can be done via bash scripting, cron jobs etc (manually is not a good plan).
I should mention, you have provided a migrate-function.

But the brain flow is this:
By the time the home user finds this out, he/she has a ton of data on the drive, or worse, in the cloud over a slim cable.
The poor soul goes into scripting, to find out later, that such a script must be humanly monitored to be safe. It is a frustrating process. It is mistake-prone. What if we create another DS?
Should critical features be teased and locked-in but half-usable, or better be optional for the advanced?

ergo
I believe you should give HOME the full auto-backup-schedule,
or make plain DS the default again.

Sorry for the critics, I appreciate the idea, and I see the benefits,
but I believe it’s too dangerous in the current setup.

Best Regards
Manu

Hi @maweber

Thanks for the detailed feedback.

Honestly speaking, I have tried to gather such opinions from the community since more than 1 month now by advertising the various release candidate versions in the forum and so on but without much success:

it is our burden as Open Source driven development team: nobody never tries the new versions until they are really out…

Thus said, we have already prepared a few options for those who are afraid of the new flat format:

  • there is an environment variable to rather use structured datasource as default like in good ol’ time: just do export CELLS_DEFAULT_DS_STRUCT=true before launching the configuration process
  • as you said, there are some mechanisms to snapshot your index and also migrate from flat to structured and reverse, see cells admin datasource --help

Then, we will wait and see, if there are a lot a pressure from the community to rather use the structured DS as default, we might reconsider this option in a future version. But the new format has a lot of advantages, especially for large setups, and is much more in line with the standards (S3…) we use to manage files.

So @maweber thanks again for the feedback. And to all others, any thought on this?

@bsinou
Thanks for your reply, and for the ENV.

I’m sorry to hear you got poor feedback in advance. I’m not sure I could help that way, unless I lay out my use case that is maybe uncommon. But the decision is yours.

I began to test cells, coming from years of seafile, and started off generating multiple 30gb files (video transfer use case), and testing worst case scenarios. the need of cells to checksum files, immediately dwarves a HDD array (too many spawns), or creates high s3 egress. SSD is no option anyways for video transfers (wear, cost).

I believe the Flat index is the answer to that?
If so, then, of course there is no way around Flat.

My only fundamental problem with any object based system nowadays (Flat kinda is too), is that I must ask how I can leave the system in terms of backups. It has been a rough ride on the seafile side realizing it is not so easy to divide the important TBs from the unimportant TBs and get them to a different system. As much as I love the streamlined object idea, most of these systems are hermetic. Not sure though how that helps the discussion. It is inherited in the technique. These Enterprise methods seem difficult to bridge.

M

Just for clarification :

  1. In fact backuping your MySQL DB is enough to recover the files hierarchy. I guess any serious admin would setup at least a regular DB backup. The on-file snapshot adds another layer of security if ever the DB is not recoverable.
  2. We added in the home edition the job to trigger that snapshot directly from within the scheduler. No scripting needed, it’s a one-click operation from the interface.
  3. But indeed, only the Enterprise edition currently provides a way to schedule that snapshot. At that point, we are not even sure that people will be happy to run that job every night or not and we want to make sure that they are able to edit this schedule to their will.

But I totally understand your concerns :slight_smile: