[solved] File edit,delete, save/HTTP PUT timeout rpc 500 error

I am encountering a strange issue if my cells instance has been running for a while. Randomly no edit actions (delete,rename,edit, etc.) stop working. The http api returns a timeout (chrome devtools) and the logs state a weird jobs issue.

If I restart if works fine for an undefined amount of hours :confused:

I have no Idea on how to proceed. This just randomly started happening.

Cells v4.1.0. Home
Revision: d7276aacaea3f7c35ece7c893f4098b89bdc5d90

2023-02-14T20:05:52.378Z ERROR pydio.rest.jobs Rest Error 500 {"error": "rpc error: code = Canceled desc = latest balancer error: service is not known by the registry", "SpanUuid": "cf027ed9-35d0-430d-ab1d-eb7869cf3215", "RemoteAddress": "xxx.xxx.xxx.xxx", "UserAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0", "ContentType": "application/json", "HttpProtocol": "HTTP/2.0", "UserName": "flexusma", "UserUuid": "a2bb80d9-9803-4ec9-97e6-d00f451cb087", "GroupPath": "/m4fx/", "Profile": "admin", "Roles": "ROOT_GROUP,d23a93b0-fb4e-43f1-8315-48f328faa8c7,ADMINS,a2bb80d9-9803-4ec9-97e6-d00f451cb087"}

Iโ€™ve just found another similar looking error log, but with a different job name:

GroupPath : /m4fx/
HttpProtocol : HTTP/2.0
JsonZaps : {"ContentType":"application/json"}
Level : error
Logger : pydio.rest.tree
Msg : Rest Error 500 - rpc error: code = Canceled desc = latest balancer error: service is not known by the registry
RemoteAddress : xx.xx.xx.xx
SpanUuid : ed221a83-98d5-4e0a-bd56-6e6ed5612db6
Ts : 1676658235
UserAgent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/110.0
UserName : flexusma
UserUuid : a2bb80d9-9803-4ec9-97e6-d00f451cb087

I think I found the cause on this. I had a large workspace ~250g with large video files that i had previously turned indexing off on. Seems like an index job was still running when restarting the vm and something broke. I removed the datasource and workspaces but the scheduler seems to be crashing on an unfinished job or something similar. I donโ€™t know if there is a way to flush the jobs queue.

Seems like clearing the job history/cache solved the weird issue. Jobs are running fine again:
mv /var/cells/services/pydio.grpc.jobs/tasklogs.bleve/ /var/cells/services/pydio.grpc.jobs/tasklogs.bleve.bck