MySQL connection: "closing bad idle connection"

Hi,

I have a very minor issue which has been baffling me for a while now. I’m currently using Pydio Cells Home Edition 4.2.1-rc1, using MariaDB 10.6.12 running on Ubuntu 22.04.2 LTS, but the issue has been present at least as far back as October 2022. I just have not been able to follow up with it.

Briefly, all my storages show the following errors on the log (click arrow to open):

(some logs were partially redacted for privacy reasons)

{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.pydiods1","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.pydiods1","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.weirdworkspaces","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.weirdworkspaces","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.versions","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.thumbnails","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.thumbnails","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.maindatastorage","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.maindatastorage","msg":"[mysql] 2023/05/20 12:19:09 connection.go:173: driver: bad connection"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.REDACTED","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.REDACTED","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.REDACTED","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.cellsdata","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.cellsdata","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.personal","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.personal","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.versions","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:09+01:00","logger":"pydio.grpc.data.sync.versions","msg":"[mysql] 2023/05/20 12:19:09 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:10+01:00","logger":"pydio.grpc.data.sync.REDACTED","msg":"[mysql] 2023/05/20 12:19:10 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:10+01:00","logger":"pydio.grpc.data.sync.REDACTED","msg":"[mysql] 2023/05/20 12:19:10 connection.go:173: driver: bad connection"}
{"level":"error","ts":"2023-05-20T12:19:10+01:00","logger":"pydio.grpc.data.sync.REDACTED","msg":"[mysql] 2023/05/20 12:19:10 packets.go:123: closing bad idle connection: EOF"}
{"level":"error","ts":"2023-05-20T12:19:10+01:00","logger":"pydio.grpc.data.sync.REDACTED","msg":"[mysql] 2023/05/20 12:19:10 connection.go:173: driver: bad connection"}

… etc. etc. …

And for about the same period, these are the MariaDB (i.e. MySQL) logs (again, click arrow to open):

bt_pydio_cells is, of course, the database name I’m using for this particular installation.

2023-05-20 12:03:18 128789 [Warning] Aborted connection 128789 to db: 'bt_pydio_cells' user: 'REDACTED' host: 'localhost' (Got timeout reading communication packets)
2023-05-20 12:08:18 129064 [Warning] Aborted connection 129064 to db: 'bt_pydio_cells' user: 'REDACTED' host: 'localhost' (Got timeout reading communication packets)
2023-05-20 12:13:18 129287 [Warning] Aborted connection 129287 to db: 'bt_pydio_cells' user: 'REDACTED' host: 'localhost' (Got timeout reading communication packets)
2023-05-20 12:18:18 129421 [Warning] Aborted connection 129421 to db: 'bt_pydio_cells' user: 'REDACTED' host: 'localhost' (Got timeout reading communication packets)
2023-05-20 12:22:09 35575 [Warning] Aborted connection 35575 to db: 'bt_pydio_cells' user: 'REDACTED' host: 'localhost' (Got timeout reading communication packets)
2023-05-20 12:22:09 35577 [Warning] Aborted connection 35577 to db: 'bt_pydio_cells' user: 'REDACTED' host: 'localhost' (Got timeout reading communication packets)

To summarise:

  1. There are two errors that come straight from the MySQL database layer, namely:
    • packets.go:123: closing bad idle connection: EOF
    • connection.go:173: driver: bad connection
  2. The logs show this happening in ‘bursts’, i.e. over a very short amount of time (a minute at most), and very occasionally.
  3. MariaDB shows the corresponding error, i.e. that there were ‘timeout issues’ and that the connection was aborted.
  4. This doesn’t seem to have an overall impact on Pydio Cells, at least, as far as I can (visually) determine.

It’s just annoying, and I happen to like ‘clean’ logs :slight_smile:

As said, the error in question does not come from Pydio Cells itself, it’s just being bubbled-up from the Go library handling the MySQL driver/connection. Searching for the error message, I didn’t find many consistent reports, and these seem to be mostly related to communication timeouts: the default timeout is set to 0, which means ‘reuse this connection forever’. As far as I could search for it in the Pydio Cells code, I couldn’t find any directive changing the timeout level, so I’m assuming it’s left at the default (but maybe I didn’t search ‘deep’ enough!.. :wink: ).

MariaDB, however, has some configured timeouts, which might or not be relevant to this communication error:

interactive_timeout     = 180
wait_timeout            = 180

This comes from some tuning I’ve made in the past: idle connections consume some memory (even though they don’t consume CPU cycles or bandwidth — they’re just idle, after all), and there is a limit to how many connections MySQL/MariaDB can handle at a time (which is a configurable parameter; I have max_connections set at 512 — I’ve got plenty of memory! — but hardly ever gone above 30% of that). A good practice is to terminate idle connections after a while to preserve resources (the same applies to all sorts of servers/services as well, not only database connections).

Maybe it was just that?.. As said elsewhere, my Pydio Cells installation has really very few users. But it naturally runs all regular maintenance checks; therefore, I would expect some database connections to occur regularly.

Well, there is something else which caught my attention: the reference to localhost. A few days ago, for performance reasons, I turned off DNS reverse lookup on MariaDB — after all, it runs on the local host only, so it makes no sense to waste resources/memory/threads to do reverse lookups all the time. This is a recommended setting, which, however, I had turned on a few years ago when I was experimenting with master/slave replication, and then forgot to turn it back off. As a precaution, because MySQL/MariaDB are so finicky about user permissions (they’re tied to hosts, and/or to IP addresses, but they’re not considered ‘the same’), I made sure that the database users for Pydio Cells were allowed to connect both from localhost and 127.0.0.1, to rule out failed authentication issues. And, in fact, those don’t seem to be the case.

My next step was trying to figure out what exactly Pydio Cells is doing when it throws such errors. Since the log timestamps are pretty much random, it’s not easy to catch the culprit. Then, by sheer chance, I remembered to check the slow query logs. Mind you, I’m very generous in what I consider to be a ‘slow’ query: the recommended default is 2 seconds, but I just log slow queries when they take more than 10 seconds — mostly to avoid insanely huge logs, coming (mostly) from WordPress plugins… also, for the kind of things that run on this server, waiting 10 seconds is more than enough, or, putting it into perspective, just waiting 2 seconds for a query to finish is more than reasonable, while taking more than 10 seconds shows that something is wrong somewhere: almost always it’s a database doing lots of JOINs without indexes and/or retrieving datasets with millions of rows because someone forgot to place a LIMIT clause :slight_smile: (or made a mistake typing)

I was not expecting to find anything for Pydio Cells there, but, to my surprise, there was: after converting Unix timestamps to sensible, human-readable dates, I found an almost-perfect match with the following entries:

MariaDB slow query logs (click arrow to open)
# User@Host: REDACTED[REDACTED] @ localhost []
# Thread_id: 129481  Schema: bt_pydio_cells  QC_hit: No
# Query_time: 0.000297  Lock_time: 0.000029  Rows_sent: 1  Rows_examined: 18
# Rows_affected: 0  Bytes_sent: 0
SET timestamp=1684581549;
SELECT
                        p.id,
                        p.effect,
                        p.conditions,
                        p.description,
                        subject.template AS subject,
                        resource.template AS resource,
                        action.template AS action
                FROM
                        ladon_policy AS p

                        INNER JOIN ladon_policy_subject_rel AS rs ON rs.policy = p.id
                        LEFT JOIN ladon_policy_action_rel AS ra ON ra.policy = p.id
                        LEFT JOIN ladon_policy_resource_rel AS rr ON rr.policy = p.id

                        INNER JOIN ladon_subject AS subject ON rs.subject = subject.id
                        LEFT JOIN ladon_action AS action ON ra.action = action.id
                        LEFT JOIN ladon_resource AS resource ON rr.resource = resource.id
                WHERE
                        (subject.has_regex = 0 AND subject.template = 'user:gwynethllewelyn')
                        OR
                        (subject.has_regex = 1 AND CAST('user:gwynethllewelyn' AS BINARY) REGEXP BINARY subject.compiled);
[...]
# User@Host: REDACTED[REDACTED] @ localhost []
# Thread_id: 129481  Schema: bt_pydio_cells  QC_hit: No
# Query_time: 0.000259  Lock_time: 0.000028  Rows_sent: 1  Rows_examined: 18
# Rows_affected: 0  Bytes_sent: 0
SET timestamp=1684581549;
SELECT
                        p.id,
                        p.effect,
                        p.conditions,
                        p.description,
                        subject.template AS subject,
                        resource.template AS resource,
                        action.template AS action
                FROM
                        ladon_policy AS p

                        INNER JOIN ladon_policy_subject_rel AS rs ON rs.policy = p.id
                        LEFT JOIN ladon_policy_action_rel AS ra ON ra.policy = p.id
                        LEFT JOIN ladon_policy_resource_rel AS rr ON rr.policy = p.id

                        INNER JOIN ladon_subject AS subject ON rs.subject = subject.id
                        LEFT JOIN ladon_action AS action ON ra.action = action.id
                        LEFT JOIN ladon_resource AS resource ON rr.resource = resource.id
                WHERE
                        (subject.has_regex = 0 AND subject.template = 'role:cbb95bbe-abda-44a3-b93e-7e9ce211ad66')
                        OR
                        (subject.has_regex = 1 AND CAST('role:cbb95bbe-abda-44a3-b93e-7e9ce211ad66'AS BINARY) REGEXP BINARY subject.compiled);
# User@Host: REDACTED[REDACTED] @ localhost []
# Thread_id: 129483  Schema: bt_pydio_cells  QC_hit: No
# Query_time: 0.000257  Lock_time: 0.000025  Rows_sent: 1  Rows_examined: 18
# Rows_affected: 0  Bytes_sent: 0
SET timestamp=1684581549;
SELECT
                        p.id,
                        p.effect,
                        p.conditions,
                        p.description,
                        subject.template AS subject,
                        resource.template AS resource,
                        action.template AS action
                FROM
                        ladon_policy AS p

                        INNER JOIN ladon_policy_subject_rel AS rs ON rs.policy = p.id
                        LEFT JOIN ladon_policy_action_rel AS ra ON ra.policy = p.id
                        LEFT JOIN ladon_policy_resource_rel AS rr ON rr.policy = p.id

                        INNER JOIN ladon_subject AS subject ON rs.subject = subject.id
                        LEFT JOIN ladon_action AS action ON ra.action = action.id
                        LEFT JOIN ladon_resource AS resource ON rr.resource = resource.id
                WHERE
                        (subject.has_regex = 0 AND subject.template = 'role:e69a27d7-63b4-4ad0-93ab-
0b296ce4b769')
                        OR
                        (subject.has_regex = 1 AND CAST('role:e69a27d7-63b4-4ad0-93ab-0b296ce4b769'
AS BINARY) REGEXP BINARY subject.compiled);

(some details redacted for privacy reasons)

Please note that the logs above are just a tiny fraction of the complex queries that were logged as ‘slow’ for a single second (more precisely: timestamp 1684581549, e.g. Sat May 20 2023 11:19:09 GMT+0000); there are many more on the subsequent seconds/minutes.

Now, I have no easy way to figure out if those complex, sextuple JOINs are regularly made — tens of times per second — and most of the time, such queries just finish in less then 10 seconds and therefore never get logged. And it’s perfectly reasonable to imagine that at the precise moment when these queries were made, the system was very busy with other things; all I can say is that, whatever else the server was doing, the only slow queries logged during that period of time came from Pydio Cells. That’s not saying much — imagine, for instance, that the system was very slow because there was a spike of I/O activity (like a massive spam wave coming in and each message being processed independently), and, as such, MariaDB struggled to get some I/O locks for its own purposes; those queries that required at least some I/O (such as temporary tables too large to be held in memory) might have been delayed for an unacceptable amount of time. So it wouldn’t be the queries themselves to be ‘slow’; it was the system being slow that caused those queries to be much slower than usual. From the data I’ve got, and the lack of experience in analysing the logs I have, I cannot say.

Nevertheless, what I can safely assume is that these huge, complex queries do have an impact on the overall health of the system. It’s just hard to understand exactly how the system is affected. You can see from the logs that the actual recorded query time is measured in single-digit milliseconds (so it’s a bit baffling to understand why these have been recorded as ‘slow’). A bit more digging is necessary to figure out what’s actually going on.

Anyway… maybe someone can explain what exactly all these logs mean — and, therefore, help me to figure out what I’m required to change!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.