Cells 2.0.9 server keeps crashing

Pydio Cells 2.0.9 on Ubuntu 2018

The server keeps crashing and the browser page just displays “There was a timeout while serving the request…”

The caddy_errors.log file is full of these:

29/Jan/2021:11:37:38 -0500 [ERROR 502 /auth/dex/.well-known/openid-configuration] dial tcp: lookup PENDING: Temporary failure in name resolution

Restarting the server will restore service for a few minutes, then these errors pile up again and it crashes.

Note that upgrading the server isn’t really an option at this time due to authentication changes that were made in 2.1 that aren’t compatible with an integration we have with an internal service. Further, I would be afraid to try applying an update while this issue is ongoing anyways.

Help please!

Hi @scott.bentley i’m afraid the only option here would be to upgrade. Is your integration that complex to modify? If you describe your changes we may give you some hints to adapt it to the new code.
Last release is much more stable, CPU sits around 0.1% on idle processes, and our ongoing work on the services registry may fix these issues.

That said, this error

29/Jan/2021:11:37:38 -0500 [ERROR 502 /auth/dex/.well-known/openid-configuration] dial tcp: lookup PENDING: Temporary failure in name resolution

seems to indicate an internal DNS failure in your machine, or a misconfiguration of your authentication system.

@charles

Thanks Charles, I appreciate your help. The reason our integration depends on versions < 2.1 is because that is the version where auth/dex was removed. Our integration does not have a means of requesting the user to login, and so the full OAuth2 workflow that requires a user login and callback url can’t work. I was told a while back that the roadmap included a re-implementation of a simple dex style token acquisition, but I have not yet seen any progress on this.

Re-implementation might be possible using the cec commandline tool, but this requires a lot of work as it is totally different from the current web-based REST connector we have.

seems to indicate an internal DNS failure in your machine, or a misconfiguration of your authentication system.

I thought the same, but nothing has changed within our internal or external network to cause this. Also, it seems to be triggered by certain user actions and makes me think there is a crash happening due to file access that failing somewhere. I have, on occasion, deleted files directly from the server back end rather than through Cells due to the need to quickly clean up storage space and I wonder if this might have caused the issue?

There’s also a lot of stuff like this in the log
01/Feb/2021:11:02:56 -0500 [ERROR 502 /plug/gui.ajax/res/themes/common/images/favicon.png] context canceled
01/Feb/2021:11:02:57 -0500 [ERROR 502 /plug/action.share/res/react-share-form.css] context canceled
01/Feb/2021:11:02:57 -0500 [ERROR 502 /plug/gui.ajax/res/build/pydio.material.min.css] context canceled
01/Feb/2021:11:02:57 -0500 [ERROR 502 /plug/gui.ajax/res/build/PydioHOCs.min.js] context canceled
01/Feb/2021:11:02:57 -0500 [ERROR 502 /plug/gui.ajax/res/themes/common/fonts/roboto-font/roboto.woff2] context canceled
01/Feb/2021:11:02:57 -0500 [ERROR 502 /plug/gui.ajax/res/themes/common/fonts/roboto-font/roboto-medium.woff2] context canceled
01/Feb/2021:11:03:28 -0500 [ERROR 502 /plug/gui.ajax/res/themes/common/images/favicon.png] context canceled
01/Feb/2021:11:08:18 -0500 [ERROR 502 /login] context canceled

AND these are showing up in the tasks.log file:

{“level”:“error”,“ts”:“2021-02-01T11:00:45-05:00”,“logger”:“pydio.grpc.data.sync.cellsdata”,“msg”:“Error while deleting file - justin.lau@hhangus.com/2171387/As builts for Contractor/M-04 DETAILED PIPING LAYOUT.dwg - index://cellsdata - “,“LogType”:“tasks”,“SpanRootUuid”:“9f0d3b77-64a6-11eb-85bf-005056b3f826”,“SpanParentUuid”:“9f0d3b77-64a6-11eb-85bf-005056b3f826”,“SpanUuid”:“9f294430-64a6-11eb-97bd-005056b3f826”,“OperationUuid”:“resync-ds-cellsdata-6216bc6a”,“error”:”{"id":"pydio.grpc.data.index.cellsdata","code":404,"detail":"Could not compute path /justin.lau@hhangus.com/2171387/As builts for Contractor/M-04 DETAILED PIPING LAYOUT.dwg (Cache:GetNodeByPath [justin.lau@hhangus.com,2171387,As builts for Contractor,M-04 DETAILED PIPING LAYOUT.dwg] : potentialNodes not reduced to 1 value (potential:2, newPotential:0)","status":"Not Found"}”}

Aaaaaannnndd…I just noticed this too:

Hey Scott

I think what you are looking at is the brand new Personal Access Token feature (see Method 2) ! Can you check if that suits your needs?

Best

@charles That looks like exactly what I’m looking for!

Any idea about the errors? The server is still crashing frequently. I can try upgrading, but this will require scheduling down time and some development time to try out this new personal token deal and that means we’re still stuck with the crashing for the time being. It would be ideal if the underlying problem could be resolved asap.

Edit: just to put this out there, it seems like the crash comes after the file deletion task fails. The log for which was provided in my previous post. It’s the file deletion for “justin.lau@hhangus.com” that keeps failing and then I think the crash happens at that time.

This topic was automatically closed 11 days after the last reply. New replies are no longer allowed.