Cells 4.0.0 gets confused when a former directory is deleted and replaced with a file with the same name (Mac user here!) [solved*]

Here is a tricky edge case which I found and could not solve (so far): I’m a Mac user, and, as such, I’ve got a few Mac-specific files from Apple’s own suite of office applications, e.g. Pages, Numbers, Keynote and so forth.

These, and other Mac-native applications that I also own, such as OmniGraffle, had a weird way of creating ‘files’ on disk: instead of a regular file — which would be expected! — it created a directory instead, placing all contents inside of it. There might have been some mystical reason for that in the distant past (e.g. attached content, such as images, would be stored inside the directory as a plain file). For those that don’t know the way Apple does their insane things, all applications still work that way, i.e. they are really directories and not single files, with a complex filesystem structure inside it with all application assets, which you can read and copy at will. The same applies to many components and plugins, and, as such, there is some sort of built-in mechanism at the GUI level which ‘knows’ that certain ‘files’ are, in fact, directories, but present them to users as if they were files.

This is all very nice in the Mac-only universe, but, twenty years later, even Apple figured out that most Apple users don’t use Apple-only hardware/software, but rather a combination of brands.

A typical example: making plain backups — syncing your home folder remotely. If you use Apple’s iCloud, there will be no issues. But most people will use any other provider (… such as… Pydio Cells!!) for that purpose. The sync tools aren’t aware of the ‘special directories’ which macOS treats as ‘files’, so they will simply sync as folders instead. In theory, if all you wish to do is to keep your backup, and never give access to it to anyone who doesn’t own a Mac, then all should be well: once you recover the backup, directories are transferred, but macOS will recognise them as being ‘files’ on your system and treat them as such, no matter where they originally came from. In other words: thankfully, these directories, from a low-level Unix point of view, are ‘regular directories’. It’s just their extension that signals to macOS its ‘special’ status.

However, this scenario is often confusing for third-parties which want to support Apple hardware but have no idea which applications will use ‘files’ or ‘directories-treated-as-files’.

At some point in the recent time — I cannot say exactly when, but my guess is that it was 1-2 years ago — Apple introduced a new file type for Pages, Numbers and Keynote, and promptly asks the user, when they load their ‘old’ content, to convert it to the new format instead (‘at the risk of losing new formatting options not supported before’). The ‘conversion’ is not perceptible for the Mac user: if he was writing on mydocument.pages before (which was in reality a directory, but the user might not be aware of that fact, since the macOS Finder would simply display a ‘file’ named mydocument.pages), there would continue to be a mydocument.pages on that folder, just as before, with an icon showing that it had been written by Pages — no perceptual change whatsoever. Backing it up with Time Machine, iCloud or any other Apple-specific software would also make no difference: visually, it would be ‘the same file name’.

Under the hood, however, Apple did a naughty trick: henceforward, all those pseudo-directories have been secretly turned into ZIP files :wink:. There is no ‘directory’ any more. It’s just a plain ZIP file named (following the example above) mydocument.pages— just as before. There is no visual indication on the GUI about the underlying changes in filesystem structure.

In fact, thanks to this change, other providers such as Microsoft OneDrive can now correctly identify Apple-specific documents and flag them as such. They might not be displayable (yet…) on the Web version of OneDrive, but they are certainly correctly recognised as being a ‘file’ — while previously they were displayed as ‘folders’, since Microsoft’s OneDrive didn’t grasp this concept of ‘pseudo-directories seen as files’.

So far, so good, and the ‘new format’ is rather a blessing, for those — like me! — who use several different external storage/sharing systems — like Pydio Cells! — and had always to explain, over and over again, that what everybody else in the non-Mac world ‘saw’ as a directory was actually a ‘file’ from my perspective… now they will see a file, even if their system doesn’t recognise the format.

OmniGraffle, being so closely integrated with the native Apple ecosystem, promptly did the same, i.e. they migrated their own ‘pseudo-directory’ former file format to a ‘new’ file format which is a ZIP file as well. I secretly assume that Apple might have either explained all their registered developers (those that pay $$$ for being considered Apple developers, that is) why they changed file formats and encouraging them to do the same; or they have simply changed their SDKs without telling anyone anything very specific, except that using the ‘new’ SDK libraries would require a change of file format (which might occur automatically, that is, the developer might not have to develop their own code from scratch to deal with the different way of storing data).

There is a catch, though!

When using Cells as the storage for such backups, a name that used to belong to a directory, but now becomes a regular file (at the same directory level as before), utterly confuses Cells. The ‘confusion’ appears at several levels, and the interesting thing is that the data store (hierarchical — not flat!) may have the correctly named file (and no directory!), but the Web interface will ‘think’ it’s a directory — even after resyncing the storage or the workspace, relaunching Cells, using cells-client to remove the directory, etc. etc.

Nothing seems to be able to delete the directory — ever!

A few interesting and amusing side-effects:

  • When listing the directory’s properties on the web backend, there are no errors — the information is the correct one for the directory before it was replaced by a file with the same name.
  • The web backend also allows ‘entering’ the directory, which, predictably, is empty. Again, no errors.
  • When attempting to delete the directory, there is the customary message informing that it was moved to the recycling bin. No errors, but… there is nothing inside the recycling bin! Forcing it to get emptied does not produce any effect whatsoever (but no errors are thrown, either).
  • Uploading a (completely unrelated) file inside that pseudo-directory throws an error.
  • The same behaviour is consistent across the many ways of accessing the workspace: I tried cells-client, WebDAV, even S3 (with which I have several unrelated issues, though). They all show a directory, not a file; when deleting it, they give no error, but won’t delete it, either (note that I didn’t test the case before with those ‘alternative’ methods
  • Uploading the file with the same name, using the web backend, will throw an error, saying that there is already an item with that name!
  • Uploading it with a different name from the web backend, but later renaming it to the same name as the ‘pseudo-directory’ will not be allowed

The solution? As far as I know, only one: remove the storage and start from scratch.

I’m marking this as ‘solved’ with some caveats…

This very strange behaviour was actually creating conflicts all over the place, namely, for some very weird reason, it managed to exhaust all system handles (file handles, socket handles, Internet socket handles, everything). Granted, the system didn’t stop working: the main daemons had already grabbed their fill of such handles and could happily work with the amount they got. Things would just start to misbehave when something new required a handle for some purpose — a typical example: making a DNS request, for instance. It would fail mysteriously. At the application level it wouldn’t be clear what had failed; the application would just be told that the operating system couldn’t create that particular socket (or file, or whatever).

Retrying a bit later would mysteriously succeed again… for a while… then stop working. This was a new situation, which I had never experienced before. Running out of file handles, well, yes, I had seen that once in a while — the errors are not so mysterious and clearer to read. Running out of sockets… well… I certainly expected a clearer error, being logged somewhere — a bit more than ‘authentication failed’ or ‘could not resolve IP address’ or ‘could not find file’.

Note that this is an oversimplification — there is actually something much deeper going on, namely, why some applications didn’t seem to be affected, while others (such as Cells or Postfix) do, and others still get partially affected, in the sense that they get an unknown error, wait a bit, a handle becomes available, retry, get a positive answer, and releases the handle for other processes to use. Things like, say, PHP-FPM, for instance, would gracefully deal with the situation — they might fail to respawn an instance to handle a request, but that would be ok, the monitor would just kill that instance (thus releasing a few handles back) and try again (with luck, this time it would work).

Miraculously it could be traced to the issue on this thread. It doesn’t seem to bear any relation whatsoever with it, but… once I removed the storage and created it from scratch without the former-directory-now-files entries, everything went back to normal. Everything.

Why? Well, I seriously suspect a relationship with the ‘other’ issue I reported: Cells 4.0.0 RC is constantly writing memory-mapped files to disk, even when nothing is happening - #2 by GwynethLlewelyn

It still baffles me, though :slight_smile:

But I’m glad that it’s ‘fixed’ in my case now (even though I’m not sure why), and maybe this thread might be useful to someone else in the future, who knows…

This topic was automatically closed 11 days after the last reply. New replies are no longer allowed.