Files can not be deleted

Some of my files can not be deleted.

2020-03-05T14:25:58.238Z	ERROR	pydio.grpc.data.sync.ovh	Compute Etag Copy	{"error": "We encountered an internal error, please try again."}
2020-03-05T14:25:58.241Z	INFO	pydio.grpc.data.sync.ovh	github.com/pydio/cells/common/sync/endpoints/s3.(*Client).s3forceComputeEtag
2020-03-05T14:25:58.242Z	INFO	pydio.grpc.data.sync.ovh		/opt/teamcity/agent/work/fb9e7e7133d45375/go/src/github.com/pydio/cells/common/sync/endpoints/s3/client-s3.go:350
2020-03-05T14:25:58.242Z	INFO	pydio.grpc.data.sync.ovh	github.com/pydio/cells/common/sync/endpoints/s3.(*Client).ComputeChecksum
2020-03-05T14:25:58.242Z	INFO	pydio.grpc.data.sync.ovh		/opt/teamcity/agent/work/fb9e7e7133d45375/go/src/github.com/pydio/cells/common/sync/endpoints/s3/client-s3.go:222
2020-03-05T14:25:58.250Z	INFO	pydio.grpc.data.sync.ovh	github.com/pydio/cells/common/sync/merger.TreeNodeFromSource.func1.2
2020-03-05T14:25:58.250Z	INFO	pydio.grpc.data.sync.ovh		/opt/teamcity/agent/work/fb9e7e7133d45375/go/src/github.com/pydio/cells/common/sync/merger/tree-node.go:122
2020-03-05T14:25:58.250Z	INFO	pydio.grpc.data.sync.ovh	2020-03-05T14:25:58.238Z	ERROR	Cannot compute checksum for TEST FOO/bar.MP4	{"error": "We encountered an internal error, please try again."}
2020-03-05T14:25:58.251Z	INFO	pydio.grpc.data.sync.ovh	github.com/pydio/cells/common/sync/merger.TreeNodeFromSource.func1.2
2020-03-05T14:25:58.251Z	INFO	pydio.grpc.data.sync.ovh		/opt/teamcity/agent/work/fb9e7e7133d45375/go/src/github.com/pydio/cells/common/sync/merger/tree-node.go:123
2020-03-05T14:25:58.532Z	DEBUG	pydio.grpc.tasks	Force close session now:0c4506ee-91da-4e63-b557-9bbad83eb9f5	{"OperationUuid": "copy-move-599351af-fe5d-4599-99af-89d7458e0d3b-8a8f9da3"}

Buckets state:

$ openstack object list pydio
  RequestsDependencyWarning)
| .pydio
| TEST FOO/.pydio
| TEST FOO/bar.MP4
| recycle_bin/.pydio
$ openstack object list pydio+segments
| TEST FOO/bar.MP4/Y2ExN2FjMDItMjQzZC00ZjY1LTliMDktNWFjNTZmNGI5NjUw/1
| ...
| TEST FOO/bar.MP4/Y2ExN2FjMDItMjQzZC00ZjY1LTliMDktNWFjNTZmNGI5NjUw/45

Possibly related ?

Hi @drzraf,

Could you take a look at the other thread (the one that you suggest), it is indeed related.

By the way, I can see a compute etag error, could you tell me if the files that you are using are empty or with data (and if it is the same data)?

Files are not empty (and scale on multiple segments), some are a couple of gigabytes.

Do you know more about how encryption relates to file not being deletable?
(I’d like to know more in order to to avoid this situation happening again)
Could it be a race-condition of any form?

I started fresh and reset the storage. Deleted the backend S3 bucket and recreated encrypted it from the start.

We started uploading again, but the same problem manifests: We can not remove uploaded files/directories (it’s kind of a moot one!)

The removed directory and its files are expected to be inside “Recycle Bin” after the first attempt, but they are not: Only the (empty) Foo Test directory is created there.

On any subsequent attempts I can see (from tasks.log):

{"level":"error","ts":"2020-03-16T12:20:27Z","logger":"pydio.grpc.tasks","msg":"Error while running action actions.tree.copymove","LogType":"tasks","SpanRootUuid":"5f9602e5-6780-11ea-9ff2-fa163ee72a12","SpanParentUuid":"5f9602e5-6780-11ea-9ff2-fa163ee72a12","SpanUuid":"5fa84913-6780-11ea-bac6-fa163ee72a12","OperationUuid":"copy-move-e47b9517-674f-4e5d-96a9-f9484a1aaada-0e67749b","error":"We encountered an internal error, please try again."}

Another couple of frequent messages in this situation include (pydio.log):

{"level":"info","ts":"2020-03-16T12:21:56Z","logger":"pydio.grpc.data.sync.ovh","msg":"{\"level\":\"error\",\"ts\":\"2020-03-16T12:21:56Z\",\"msg\":\"Cannot compute checksum for Foo Test/BAR.MP4\",\"error\":\"We encountered an internal error, please try again.\"}"}

{"level":"error","ts":"2020-03-16T12:22:23Z","logger":"pydio.grpc.data.sync.ovh","msg":"Error while deleting file","path":"recycle_bin/Foo Test/.pydio","target":"index://ovh","error":"{\"id\":\"pydio.grpc.data.index.ovh\",\"code\":404,\"detail\":\"Could not compute path /recycle_bin/Foo Test/.pydio (Cache:GetNodeByPath [recycle_bin,Foo Test,.pydio] : potentialNodes not reduced to 1 value (potential:2, newPotential:0)\",\"status\":\"Not Found\"}"}

The storage backend contains:

.pydio
Foo Test/BAR.MP4              # <---- still here
recycle_bin/.pydio
recycle_bin/Foo Test/.pydio

and the storage+segments backend’s bucket contains the regular segments:

Foo Test/BAR.MP4/xxxx/{0..125}

What’s happening??

Hi, did you upgrade to last version?
There were issues with S3 api not supporting CopyObjects for files > 5GB
On the other hand, you seem to be using an s3-compatible storage, can you check that it does support CopyObjectMultipart api ?
-c

Tried 2.0.4, but same problem.

One additional logline I just noticed is this one:

{"level":"info","ts":"2020-03-16T19:26:16Z","logger":"pydio.grpc.data.sync.ovh","msg":"{\"level\":\"info\",\"ts\":\"2020-03-16T19:26:16Z\",\"msg\":\"Got errors on datasource, should resync now branch: /ovh/baz.dmg\"}"}

(I don’t know if it’s related nor important)

Needless to say it’s a blocking usability issue. How can I help further debug the issue?
Any S3 traffic/advanced logging capabilities offered?

May I hope to receive some kind of help or hints in order to tackle this fairly blocking issue? :slight_smile:

Another (empty) directory that can not be deleted (using 2.0.5):

pydio.log:

{"level":"info","ts":"2020-04-17T18:11:50Z","logger":"pydio.rest.tree","msg":"Deletion: moving [common-files/Foo Bar/baz] to recycle bin","SpanUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","RemoteAddress":"x.x.x.x","UserAgent":"cells-client/2.0.2","ContentType":"application/json","HttpProtocol":"HTTP/1.1","UserName":"admin","UserUuid":"xxxx","GroupPath":"/","Profile":"admin","Roles":"ROOT_GROUP,ADMINS,ADMINS,894e22cd-5b92-4832-bf08-c3fc60ad38e7","RecycleRoot":{"Uuid":"DATASOURCE:ovh","Path":"ovh/","MTime":"2020-02-25T22:23:36Z","MetaStore":{"name":"\"\"","pydio:meta-data-source-path":"\"\""}}}
{"level":"info","ts":"2020-04-17T18:11:51Z","logger":"pydio.rest.tree","msg":"Recycle bin created before launching move task","SpanUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","RemoteAddress":"x.x.x.x","UserAgent":"cells-client/2.0.2","ContentType":"application/json","HttpProtocol":"HTTP/1.1","UserName":"admin","UserUuid":"xxxx","GroupPath":"/","Profile":"admin","Roles":"ROOT_GROUP,ADMINS,ADMINS,894e22cd-5b92-4832-bf08-c3fc60ad38e7","NodePath":"ovh/recycle_bin"}
{"level":"info","ts":"2020-04-17T18:11:51Z","logger":"pydio.grpc.tasks","msg":"Run Job copy-move-37d2ec47-6cc8-44ff-92d8-924b98631f86 on timer event <nil>","SpanRootUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","SpanParentUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","SpanUuid":"e98a8ceb-80d6-11ea-ab49-fa163ee72a12"}
{"level":"info","ts":"2020-04-17T18:11:51Z","logger":"pydio.grpc.tasks","msg":"Setting Lock on Node with session 404aada5-9dc9-4097-b5a5-47d9e7cb1552","OperationUuid":"copy-move-37d2ec47-6cc8-44ff-92d8-924b98631f86-b38fcdaa","SpanRootUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","SpanParentUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","SpanUuid":"e98a8ceb-80d6-11ea-ab49-fa163ee72a12","OperationUuid":"copy-move-37d2ec47-6cc8-44ff-92d8-924b98631f86-b38fcdaa"}
{"level":"error","ts":"2020-04-17T18:11:51Z","logger":"pydio.grpc.acl","msg":"Filtered Error","OperationUuid":"copy-move-37d2ec47-6cc8-44ff-92d8-924b98631f86-b38fcdaa","SpanRootUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","SpanParentUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","SpanUuid":"e996e3b3-80d6-11ea-94e9-fa163ee72a12","OperationUuid":"copy-move-37d2ec47-6cc8-44ff-92d8-924b98631f86-b38fcdaa","UserName":"admin","UserUuid":"xxxx","GroupPath":"/","Profile":"admin","Roles":"ROOT_GROUP,ADMINS,ADMINS,894e22cd-5b92-4832-bf08-c3fc60ad38e7","error":"Error 1062: Duplicate entry '23-lock--1--1' for key 'acls_u1'"}
{"level":"warn","ts":"2020-04-17T18:11:51Z","logger":"pydio.grpc.tasks","msg":"Could not init lockSession","OperationUuid":"copy-move-37d2ec47-6cc8-44ff-92d8-924b98631f86-b38fcdaa","SpanRootUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","SpanParentUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","SpanUuid":"e98a8ceb-80d6-11ea-ab49-fa163ee72a12","OperationUuid":"copy-move-37d2ec47-6cc8-44ff-92d8-924b98631f86-b38fcdaa","error":"{\"id\":\"dao.error\",\"code\":500,\"detail\":\"DAO error received\",\"status\":\"Internal Server Error\"}"}
{"level":"info","ts":"2020-04-17T18:11:52Z","logger":"pydio.grpc.data.sync.ovh","msg":"Filtering TreePatch took","time":0.00000586,"source":"s3://y.y.y.y:9001/pydio","target":"index://ovh"}

tasks.log

{"level":"info","ts":"2020-04-17T18:11:53Z","logger":"pydio.grpc.tasks","msg":"There are 2 children to move","LogType":"tasks","SpanRootUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","SpanParentUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","SpanUuid":"e98a8ceb-80d6-11ea-ab49-fa163ee72a12","OperationUuid":"copy-move-37d2ec47-6cc8-44ff-92d8-924b98631f86-b38fcdaa"}
{"level":"error","ts":"2020-04-17T18:13:38Z","logger":"pydio.grpc.tasks","msg":"Error while running action actions.tree.copymove","LogType":"tasks","SpanRootUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","SpanParentUuid":"e88adb63-80d6-11ea-94e9-fa163ee72a12","SpanUuid":"e98a8ceb-80d6-11ea-ab49-fa163ee72a12","OperationUuid":"copy-move-37d2ec47-6cc8-44ff-92d8-924b98631f86-b38fcdaa","error":"We encountered an internal error, please try again."}
{"level":"info","ts":"2020-04-17T18:15:23Z","logger":"pydio.grpc.tasks","msg":"Deleted 0 expired revoked tokens","LogType":"tasks","SpanUuid":"67b9eed2-80d7-11ea-ab49-fa163ee72a12","OperationUuid":"actions.auth.prune.tokens-d39020c1"}
{"level":"info","ts":"2020-04-17T18:15:23Z","logger":"pydio.grpc.tasks","msg":"Auth.PruneJob.ResetToken","LogType":"tasks","SpanUuid":"67b9eed2-80d7-11ea-ab49-fa163ee72a12","OperationUuid":"actions.auth.prune.tokens-d39020c1"}
{"level":"info","ts":"2020-04-17T18:15:23Z","logger":"pydio.grpc.mailer","msg":"Successfully sent 0 messages","LogType":"tasks","SpanUuid":"67ce7319-80d7-11ea-94e9-fa163ee72a12","OperationUuid":"flush-mailer-queue-435be272"}

and cells-client which says:

2020/04/17 15:13:39 could not monitor job, copy-move-37d2ec47-6cc8-44ff-92d8-924b98631f86
Nodes have been moved to the Recycle Bin

…but the directory is still here. Wtf? Can’t we simply kind of “force-delete” to get back to a clean state?

> select * from idm_acls WHERE node_id != '-1';
+-----+----------------+--------------------------------------+---------+---------+--------------+---------------------+---------------------+
| id  | action_name    | action_value                         | role_id | node_id | workspace_id | created_at          | expires_at          |
+-----+----------------+--------------------------------------+---------+---------+--------------+---------------------+---------------------+
|   1 | read           | 1                                    |       1 |       1 |            1 | 2020-03-30 17:17:33 | NULL                |
|   2 | write          | 1                                    |       1 |       1 |            1 | 2020-03-30 17:17:33 | NULL                |
|   4 | read           | 1                                    |       4 |       3 |            4 | 2020-03-30 17:17:33 | NULL                |
|   5 | write          | 1                                    |       4 |       3 |            4 | 2020-03-30 17:17:33 | NULL                |
|   6 | deny           | 1                                    |       6 |       1 |            1 | 2020-03-30 17:17:33 | NULL                |
|  24 | recycle_root   | 1                                    |      -1 |      14 |           -1 | 2020-03-30 17:17:33 | NULL                |
|  25 | workspace-path | ovh                                  |      -1 |      15 |           20 | 2020-03-30 17:17:33 | NULL                |
|  26 | recycle_root   | 1                                    |      -1 |      15 |           20 | 2020-03-30 17:17:33 | NULL                |
|  27 | read           | 1                                    |       1 |      15 |           20 | 2020-03-30 17:17:33 | NULL                |
|  28 | write          | 1                                    |       1 |      15 |           20 | 2020-03-30 17:17:33 | NULL                |
|  47 | lock           | 8131a6e8-fd8a-4750-8eb4-23554dca6121 |      -1 |      22 |           -1 | 2020-03-30 17:31:16 | 2020-03-30 17:31:18 |
|  48 | lock           | af721d20-f548-4298-94b1-527564013b0e |      -1 |      23 |           -1 | 2020-03-30 17:31:51 | 2020-03-30 17:31:53 |
|  86 | read           | 1                                    |      71 |      15 |           20 | 2020-04-14 17:10:37 | NULL                |
| 101 | read           | 1                                    |      86 |      15 |           20 | 2020-04-14 17:12:42 | NULL                |
| 102 | write          | 1                                    |      86 |      15 |           20 | 2020-04-14 17:12:42 | NULL                |
+-----+----------------+--------------------------------------+---------+---------+--------------+---------------------+---------------------+

There seems to be at least first bug which is about an expired lock not being removed, creating a MySQL duplicated key error.

That’s interesting
this lock mechanism is new in 2.0.5, seems like the undeleted lock creates an issue the next time you try to delete, indeed.
can you manually clear that lock from the db and try to delete again?
-c

Good to know that deleting an apparently expired lock manually from the database is kind of “safe” operation.
Now that I deleted the lock tuple, I tried deleting this apparently empty directory again. This time: no error but I think I experience the original problem that led to the stale lock:

First strange thing:
{"level":"info","ts":"2020-04-20T14:08:43Z","logger":"pydio.grpc.tasks","msg":"There are 2 children to move","OperationUuid":"copy-move-8f263fcb-c965-411b-9ba0-85dca39a6a60-1832eb3e","SpanRootUuid":"701b96c5-8310-11ea-94e9-fa163ee72a12","SpanParentUuid":"701b96c5-8310-11ea-94e9-fa163ee72a12","SpanUuid":"70321bd7-8310-11ea-ab49-fa163ee72a12","OperationUuid":"copy-move-8f263fcb-c965-411b-9ba0-85dca39a6a60-1832eb3e"}

Then one minute after several errors are thrown like:

{"level":"error","ts":"2020-04-20T14:10:13Z","logger":"pydio.grpc.tasks","msg":"-- Copy ERROR","OperationUuid":"copy-move-8f263fcb-c965-411b-9ba0-85dca39a6a60-1832eb3e","SpanRootUuid":"701b96c5-8310-11ea-94e9-fa163ee72a12","SpanParentUuid":"701b96c5-8310-11ea-94e9-fa163ee72a12","SpanUuid":"70321bd7-8310-11ea-ab49-fa163ee72a12","OperationUuid":"copy-move-8f263fcb-c965-411b-9ba0-85dca39a6a60-1832eb3e","error":"We encountered an internal error, please try again.","from":"ovh/Foo Test/zczx/._test.zip","to":"ovh/recycle_bin/zczx/._test.zip"}
[...]
{"level":"info","ts":"2020-04-20T14:10:18Z","msg":"Got errors on datasource, should resync now branch: /ovh/zczx/.test.zip"}
{"level":"info","ts":"2020-04-20T14:10:18Z","logger":"pydio.grpc.data.sync.ovh","msg":"{\"level\":\"info\",\"ts\":\"2020-04-20T14:10:18Z\",\"msg\":\"Got errors on datasource, should resync now branch: /ovh/zczx/._test.zip\"}"}
[...]
{"level":"error","ts":"2020-04-20T14:10:19Z","msg":"Cannot compute checksum for Foo Test/zczx/._test.zip","error":"not.implemented"}
{"level":"error","ts":"2020-04-20T14:10:19Z","msg":"Cannot compute checksum for ._.DS_Store","error":"not.implemented"}

So the stale lock on the directory came from no being able to delete the directory.
The inability to delete the directory came from an inconsistent state.

What’s this inconsistent state:
Datasource state (S3):
$ openstack object list --prefix "Foo Test/zczx" --long pydio

+--------------------------+-------+----------------------------------+--------------------------+----------------------------+
| Name                     | Bytes | Hash                             | Content Type             | Last Modified              |
+--------------------------+-------+----------------------------------+--------------------------+----------------------------+
| Foo Test/zczx/._test.zip |  4138 | 355fc13876a897b59baff9d22afff444 | application/octet-stream | 2020-03-17T16:40:34.300170 |
| Foo Test/zczx/.pydio     |    36 | c0ef86d019589b95e9ec697b71048aa7 | text/plain               | 2020-03-30T17:17:43.712180 |
+--------------------------+-------+----------------------------------+--------------------------+----------------------------+

Pydio UI state:

  • Foo Test/zczx seems empty when browsing it or using cec ls.

Pydio DB state:

> select * from data_index_ovh_idx_tree WHERE name = '._test.zip'\G
*************************** 1. row ***************************
  uuid: 05b3014c-cb59-4c29-97dc-a0951066e252
 level: 4
  hash: 1172416304
  name: ._test.zip
  leaf: 1
 mtime: 1584463234
  etag: a03ae32277d489a9581ab79cb0a2f28f-1
  size: 4138
  mode: 511
mpath1: 1.2.1.1
mpath2: 
mpath3: 
mpath4: 
   rat:    

> select * from data_meta where data like '%test.zip%'\G
*************************** 1. row ***************************
  node_id: 05b3014c-cb59-4c29-97dc-a0951066e252
namespace: name
   author: 
timestamp: 1585588859
     data: "._test.zip"
   format: json
  • Does it sound like a consistent state?
  • If yes, why does the file does not appear in the UI/cli tools listing? Is it only because it’s a dot-file?
  • Then why, if the state is consistent, Pydio fails to delete the directory?
  • Would resyncing the datasource actually make more good than harm?

Hey

  • At first sight, seems consistent indeed. But the move errors show it may not be …
  • We hide dot-file by default, yes
  • Why move fails : that’s the question
  • Resyncing could be a good test

Give us some time to try to reproduce all this, thanks for such a detailed report.
-c

We encountered an internal error, please try again.… nothing should be so internal that it does not even appear at a debug loglevel :slight_smile: (I got beaten by that class of errors when trying webdav too).

I will wait for your followups. In the meantime let me know if I can help debugging that further.

hello @drzraf

my first feedback: could reproduce the lock issue: but although errors are displayed, they are ignored on purpose, as we did not want the locking mechanism to be blocking.
=> it has to be fixed, but actually it does NOT prevent the move from being performed.

So your real issue is why the heck the move is not working :slight_smile:

So, that would be that Got errors on datasource, should resync now branch: /ovh/zczx/.test.zip.

But I’m not sure how to proceed to dig that further down, becayse I’ve two preliminary blockers:

  • I’ve no way to actually ensure a file is present or not (in the sense of Pydio), because neither web-UI nor cec client show dotfiles (which is painful because I’m getting more and more undeletable .DS_Store files from mac clients). So, in this specific case, I’m left with
    1. SQL database
    2. [Encrypted] files within the object-storage datasource (which are what is under investigation).
  • I really miss a log file dedicated to datasource transactions (where all S3 requests’s payload/responses would be logged). That would be a huge, but valuable, source of information for cases like this. (Minio provides this feature)

Got errors on datasource was triggered at 2020-04-20T14:10:18Z by:

Given the 5 seconds delay, here is what happened previously at 2020-04-20T14:10:13Z (copy-move-8f263fcb-c965-411b-9ba0-85dca39a6a60-1832eb3e pydio.grpc.tasks):

error msg
We encountered an internal error, please try again. HandlerExec: Error on CopyObject
We encountered an internal error, please try again. – Copy ERROR
We encountered an internal error, please try again. Error during copy/move
cannot run action actions.tree.copymove: We encountered an internal error, please try again.

Given the error message origin,

the errors itself definitely belongs to this block:

For reference: https://github.com/minio/minio-go/blob/master/api-put-object-copy.go

just a side-question :

Clients = Cec or cells-sync?

Sorry. I should have said “Mac users” (using the web-UI or webdav I think)