Hi!
I’m trying out Cells 4.0.0 RC (RC7 at the time of writing, but the same applies to previous RCs as well) and have noticed that for some mysterious reason, Cells is writing on the disk constantly.
While this is a production installation — not a mere test — it has very few actual users and exists mostly to give easy access to 10±year-old archives to a small team of people who use them very sporadically. Although there are tens of thousands of files there — including large files such as videos and old backups — my expectations are that most of the time, Cells is not being actively used by humans. The only ‘activity’ would come mostly from the many maintenance tasks scheduled by Cells itself.
Ordinarily I wouldn’t worry much about the constant writing — the disks are, after all, enterprise-class and designed to endure the load — but it doesn’t seem ‘right’. Compared to other software running on the same server (including the database server running MariaDB), Cells contributes almost 99% of all disk writes. The constant stream is not necessarily saturating the bus bandwidth towards the disks (at least, according to what I could figure out from its specs). But it is constant and non-stop, 24 hours a day.
Note that the server running this has 64 GBytes of RAM and has been tweaked to keep as much data as it can in memory (there is not even swap). Admittedly, there may be some misconfiguration — in the recent past, for instance, due to a mistake I made, the system would not consume much more than 16 GBytes of RAM, which was strange, since, by default, Linux tries to use up as much memory as it safely can (there is no point in ‘wasting’ RAM, after all). In fact, one of the reasons for investigating why it wasn’t consuming more memory was the huge amount of write requests being made by Cells. I hoped that, with more RAM available, Cells could safely accommodate whatever it wanted in RAM and leave the disk in peace.
To give you an idea: the full data currently being served by Cells is less than 100 GBytes. In theory, almost all of it could be in memory! In practice, only a small percentage of those 100 GBytes might be in use at each time — and only during the short periods when an actual human is browsing through the files and actively uploading or downloading files, during which it is conceivable that Cells might, indeed, increase disk activity. But not all the time.
At first, I thought that this was just Cells writing constantly to some logs (because I might have left the settings at ‘trace’ level, for example). Cells is chatty with its logs, and there are database logs (even more disk writes!) and disk-based logs.
Apparently, the amount of data written to the logs is ‘just fine’, according to my expectations. Logging is not constant, but comes in bursts, every time a scheduled event fires up and/or a human accesses the system. This would be the expected behaviour and should not be significant anyway — at least when compared with the logs coming in from the web and mail servers (which are constant!), or even from systemd
’s own log journals — I mean, Cells is writing to disk about hundred times more frequently than all other systems added together, and, on top of that, it does so constantly.
To say that this is intriguing is an understatement! Note that nothing of the sort happened with version 3; it just became apparent under the first version 4 release.
Also note that the system is not especially busy, in terms of overall activity; the server is deliberately overpowered for the amount of work it is supposed to do. In fact, although there was a significant rise in CPU consumption from version 3 to version 4, most of it comes from I/O waits — in other words, Cells is managing to acquire so many file handlers to open connections to files on disk that it starves off other processes (even Cells’ own subtasks!) attempting to do the same. Unix systems have, these days, excellently design schedulers for practically everything, and this means that, in general, with a little tweaking, and having lots of RAM to spare, it’s possible to fit so much data in memory that the vast majority of the running applications do not really need to write frequently to disk. In effect, it’s mostly the software RAID manager that bears the burden of figuring out when things need to be written to disk.
However, it competes with access to disk with Cells — to the point that Cells can effectively block the underlying user-space disk management subsystem to write to disk. Again, as said, this does not have a dramatic effect overall. Linux is supposed to be able to handle those extreme scenarios, and it does so very effectively. Overall, I think that CPU consumption merely doubled when going from Cells 3 to 4, which could be interpreted that processes get interrupted twice as more as before to wait on I/O, and get swapped out of the scheduling queue until Cells finishes whatever it is doing, and releases a slice of disk I/O access time to other processes. Since all others are not really writing a lot to disk, the scheduler is clever enough to let them write as much as they need without disturbing their functioning. Because most of these processes are working from RAM anyway, they manage to serve whatever services they are providing effectively without delays; eventually, though, they might need to write the odd file to disk (e.g. logging something to a file, saving an incoming email message on disk, etc.), and these disk writing events might expect a longer delay than before (but at the scale humans perceive things, it would not be significant). As mentioned, the biggest single contender for I/O (especially writing to disk) is the MariaDB database management system — there is a limit to how much it can do in memory safely without requiring to finally write something on disk.
All right. This was just to give you an idea that this ‘constant disk writing’ is not crucial for the overall operation of this system — I can live with it being a teensy bit slower than before — but, nevertheless, something must be wrong with the absurd amount of constant disk writing done by Cells.
Ultimately, I managed to track it down to memory-mapped files.
This was by no means obvious to me, but the ‘new’ flat-file storage system that Cells introduced with 4 essentially works on slicing up the disk space in separate files (allegedly with the .zap
extension) which are then managed from within BoltDB (how exactly this is accomplished is yet a mystery to me; aye, I know I could read the code and get an answer, but Cells’ codebase is massive…). While the BoltDB database file format is essentially fixed (as well as its API) and frozen, nothing prevents making subtle changes to the (open-source) code, retaining API compatibility but implementing certain details quite differently; although I suspect that it’s the minio
middleware/framework layer that ‘decides’ how buckets are stored (to disk or other locations).
Whatever layer or level of abstraction is responsible for the actual file layout at some point there are memory-mapped files. A lot of them. Here is how I’ve found about them.
For convenience’s sake, I use systemd
to manage each and every service running on my Ubuntu 22.04.1 LTS server (I’m actually agnostic about the merits and disadvantages of using a specific system manager; the point is that I really want an uniform way of doing such management, instead of having different ‘startup scripts’ scattered all over the filesystem; Unix is already historically very disorganised — almost by design! — and I try to keep the chaos at the minimum manageable level…); using systemctl status cells
, therefore, shows me the root PID of Cells.
And from there I can list all open files and sockets using lsof -p <Cells PID>
— there are many other ways and tools, but I’m a fan of lsof
myself, so… I see that tons of files/sockets are being opened — especially because Cells is really a cluster of micro-services, all connecting to each other via Internet sockets, so that they can be deployed as a cluster on different servers, or containers, thus allowing load to be spread across a cloud. But of special interest (to me) were the memory-mapped files:
cells 3317464 <cells user> mem REG 9,2 14978 173147514 <path to cells>/services/pydio.grpc.log/syslog.bleve.0005/store/000000007782.zap
cells 3317464 <cells user> mem REG 9,2 149215 173147513 <path to cells>/services/pydio.grpc.log/syslog.bleve.0005/store/000000007781.zap
cells 3317464 <cells user> mem-W REG 9,2 65536 173147519 <path to cells>/services/pydio.grpc.log/syslog.bleve.0005/store/root.bolt
cells 3317464 <cells user> mem REG 9,2 14604 173147500 <path to cells>/services/pydio.grpc.log/syslog.bleve.0004/store/0000000078b3.zap
cells 3317464 <cells user> mem REG 9,2 11158 173147499 <path to cells>/services/pydio.grpc.log/syslog.bleve.0004/store/0000000078b2.zap
cells 3317464 <cells user> mem-W REG 9,2 65536 173147502 <path to cells>/services/pydio.grpc.log/syslog.bleve.0004/store/root.bolt
cells 3317464 <cells user> mem REG 9,2 11563 173016463 <path to cells>/services/pydio.grpc.log/syslog.bleve.0003/store/000000006585.zap
cells 3317464 <cells user> mem-W REG 9,2 65536 173016465 <path to cells>/services/pydio.grpc.log/syslog.bleve.0003/store/root.bolt
cells 3317464 <cells user> mem REG 9,2 16620 173016452 <path to cells>/services/pydio.grpc.log/syslog.bleve.0002/store/00000000d3ed.zap
cells 3317464 <cells user> mem REG 9,2 8265 173016451 <path to cells>/services/pydio.grpc.log/syslog.bleve.0002/store/00000000d3ec.zap
cells 3317464 <cells user> mem REG 9,2 11082 173016450 <path to cells>/services/pydio.grpc.log/syslog.bleve.0002/store/00000000d3eb.zap
cells 3317464 <cells user> mem REG 9,2 26789 173016449 <path to cells>/services/pydio.grpc.log/syslog.bleve.0002/store/00000000d3ea.zap
cells 3317464 <cells user> mem REG 9,2 1224448 173016448 <path to cells>/services/pydio.grpc.log/syslog.bleve.0002/store/00000000d3e4.zap
cells 3317464 <cells user> mem-W REG 9,2 65536 173016453 <path to cells>/services/pydio.grpc.log/syslog.bleve.0002/store/root.bolt
cells 3317464 <cells user> mem REG 9,2 12978 173016435 /<path to cells>/services/pydio.grpc.log/syslog.bleve.0001/store/000000008a14.zap
cells 3317464 <cells user> mem REG 9,2 8624 173016434 /<path to cells>/services/pydio.grpc.log/syslog.bleve.0001/store/000000008a13.zap
cells 3317464 <cells user> mem REG 9,2 8613 173016433 <path to cells>/services/pydio.grpc.log/syslog.bleve.0001/store/000000008a12.zap
cells 3317464 <cells user> mem REG 9,2 457424 173016430 <path to cells>/services/pydio.grpc.log/syslog.bleve.0001/store/000000008a0f.zap
[...]
There are about 200 of those; each, in turn, will have an open regular file descriptor — which is the one constantly writing to disk, non-stop, no matter what.
Now, each of those 200 files is not huge in size. They vary a bit in size — from as small as 11KBytes to as large as 11MBytes — and adding the sizes of all of them together is merely 3 GBytes. There is more than enough RAM for that. In other words: it doesn’t make a lot of sense to have the ‘constant writing to disk’ of memory-mapped files if they can all be kept in memory. After all, the whole purpose of memory-mapped files is exactly that: to keep as much of it in memory, act upon it as if it were memory, and just let the underlying operating system deal with eventual reads/writes from disk, if and when it is necessary.
As a consequence, Cells basically brings the whole system down, except, well, for Cells itself.
WIthout Cells running, the system runs at around 12-13% load, and that includes a few dozens of low-traffic websites with associated mailserver (with several layers of spam filtering), a Git repository, a video streaming server, etc. In other words: it’s not very busy, but it’s not sitting prettily and idly twirling its digital fingers.
(In fact, most of the actual disk I/O comes from a stupid security plugin for WordPress which insists in writing constantly logs for all requests on the MariaDB database; since all websites really require that specific plugin, this forces the database server to essentially work as a glorified logging system, for which it’s clearly not the best choice; although I’ve contacted the plugin authors and grumbled a lot about it, they replied that ‘having the logs on the database is essential for providing our security services’. I’d have a load of zero! )
That said, I believe that all the above comes essentially from the ‘new’ way that storage can be defined, i.e. as a ‘flat file’ database, which I do not use (mostly because I require the traditional hierarchical filesystem layout for other services — such as the above-mentioned video streaming service, which needs to access a ‘real’ file, not something virtualised inside an impenetrable container…). Nevertheless, I understand that a few crucial services have already migrated to the ‘new’ flat file format, and that’s why I see all these memory-mapped files, which previously didn’t exist (I think).
I’m pretty sure that these cannot be ‘turned off’, but I wonder if there is any way to mitigate their impact on the disk writes? Why are they constantly writing to disk, after all? And why don’t such writes use some sort of buffering? Surely, on a Cells installation which may have lots of files, but few users accessing them, most of the time, Cells will be inactive, except for the occasional scheduled task?
I also understand that the memory-mapped files are at the very bottom layer of the overall system: there is Minio to manage buckets and export them via a S3 API; there is Bleve to handle full-text indexing; and some of those services will ultimately use a Bolt database to manage everything. Decoupling those systems from each other is probably impossible. However, there might be a flag or configuration option at some level where it might be possible to tweak a setting or two…
That’s what I’m hoping for!
In the meantime, I have no other option but to keep Cells asleep