Deletion fails when using Pydio Cells as S3 Storage Provider

Hi There

I’ve setup Pydio Cells using the docker container. I can access and delete files in the Web GUI without any issues.

I’m trying to connect Pydio with Directus, a headless CMS that supports S3 Storage providers. Internally Directus is using the Javascript aws-sdk to connect to S3 providers.

The used configuration is the following:

  STORAGE_PYDIO_DRIVER: 's3'
  STORAGE_PYDIO_KEY: "THE PERSONAL ACCESS TOKEN OF THE ADMIN GOES HERE"
  STORAGE_PYDIO_SECRET: "gatewaysecret" -> fixed string
  STORAGE_PYDIO_BUCKET: "io" -> fixed bucket name
  STORAGE_PYDIO_ROOT: "/directus" -> name of the workspace
  STORAGE_PYDIO_ENDPOINT: "http://dms.service.lo:8080" url to access pydio through traefik

Uploading files works perfectly, I just can’t delete files. Once I delete a file, I get the following exception from the aws-sdk inside of Directus:

E_UNKNOWN: An unknown error happened with the file directus/8d4e93c6-a2b7-452e-86ce-3218df81847e.\n\nError code: InternalError\nOriginal stack:\nInternalError: We encountered an internal error, please try again.\n at Request.extractError (/directus/node_modules/aws-sdk/lib/services/s3.js:714:35)\n at Request.callListeners (/directus/node_modules/aws-sdk/lib/sequential_executor.js:106:20)\n at Request.emit (/directus/node_modules/aws-sdk/lib/sequential_executor.js:78:10)\n at Request.emit (/directus/node_modules/aws-sdk/lib/request.js:688:14)\n at Request.transition (/directus/node_modules/aws-sdk/lib/request.js:22:10)\n at AcceptorStateMachine.runTo (/directus/node_modules/aws-sdk/lib/state_machine.js:14:12)\n at /directus/node_modules/aws-sdk/lib/state_machine.js:26:10\n at Request. (/directus/node_modules/aws-sdk/lib/request.js:38:9)\n at Request. (/directus/node_modules/aws-sdk/lib/request.js:690:12)\n at Request.callListeners (/directus/node_modules/aws-sdk/lib/sequential_executor.js:116:18)

I’m asking here since when I connect Directus to Minio, it works without a hitch. Also if I connect Directus to a different S3 provider it also works.

I also connected to Pydio with Cyberduck. There deletion seems to work but it also behaves weirdly. If I delete a file, it still shows up until I refresh the list, then it disappears.

Is it possible that file deletion on Pydio is not 100% S3 compatible?

Is there a good way how I could debug this issue further on the Pydio side? I don’t see much output on the logs when the deletion is failing.

Thank you for your help!

I did find some more information, when the error occurs, this is Pydios output:

| 2021-08-18T09:32:58.191Z  DEBUG  pydio.gateway.data  jwt rawIdToken verify: failed  {"error": "request_unauthorized", "errorVerbose": "request_unauthorized\ngithub.com/pydio/cells/vendor/github.com/ory/fosite.(*Fosite).IntrospectToken\n\tgithub.com/pydio/cells/vendor/github.com/ory/fosite/introspect.go:69\ngithub.com/pydio/cells/common/auth.(*oryprovider).Verify\n\tgithub.com/pydio/cells/common/auth/jwt_ory.go:382\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).verifyTokenWithRetry\n\tgithub.com/pydio/cells/common/auth/jwt.go:246\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).Verify\n\tgithub.com/pydio/cells/common/auth/jwt.go:294\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd.pydioAuthHandler.ServeHTTP\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/gateway-pydio-auth-handler.go:83\ngithub.com/pydio/cells/common/service/context.HttpSpanHandlerWrapper.func1\n\tgithub.com/pydio/cells/common/service/context/span.go:178\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd.criticalErrorHandler.ServeHTTP\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/generic-handlers.go:775\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/http.(*Server).Start.func1\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/http/server.go:115\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2843\nnet/http.(*conn).serve\n\tnet/http/server.go:1925\nruntime.goexit\n\truntime/asm_amd64.s:1374"}
dms_1              | 2021-08-18T09:32:58.196Z  DEBUG  pydio.gateway.data  jwt rawIdToken verify: failed  {"error": "{\"id\":\"\",\"code\":0,\"detail\":\"request_unauthorized\",\"status\":\"\"}"}
dms_1              | 2021-08-18T09:32:58.206Z  DEBUG  Users Search Query   {"q": "SELECT `t`.`uuid`, `t`.`level`, `t`.`mpath1`, `t`.`mpath2`, `t`.`mpath3`, `t`.`mpath4`, `t`.`name`, `t`.`leaf`, `t`.`etag`, `t`.`mtime` FROM `idm_user_idx_tree` AS `t` WHERE (`t`.`uuid` = ?) ORDER BY `t`.`name` ASC", "q2 length": 1}
dms_1              | 2021-08-18T09:32:58.225Z  DEBUG  pydio.grpc.tree  No user/claims found - skipping user metas on metaStreamers init!
dms_1              | 2021-08-18T09:32:58.226Z  DEBUG  pydio.grpc.tree  ReadNode  {"uuid": "DATASOURCE:personal"}
dms_1              | 2021-08-18T09:32:58.231Z  DEBUG  pydio.grpc.data.index.personal  ReadNode  {"time": "1.182ms", "req": "Node:<Uuid:\"ROOT\" > ", "resp": "Success:true Node:<Uuid:\"ROOT\" Path:\"/\" Type:COLLECTION Size:474121 MTime:1629268303 MetaStore:<key:\"name\" value:\"\\\"\\\"\" > MetaStore:<key:\"pydio:meta-data-source-name\" value:\"\\\"personal\\\"\" > > "}
dms_1              | 2021-08-18T09:32:58.232Z  DEBUG  pydio.grpc.tree  [Look Up] Found node  {"uuid": "DATASOURCE:personal", "datasource": "personal"}
dms_1              | 2021-08-18T09:32:58.232Z  DEBUG  pydio.grpc.tree  Response after lookUp  {"path": "personal/"}
dms_1              | 2021-08-18T09:32:58.233Z  DEBUG  pydio.grpc.meta  ReadNodeStream  {"path": "personal/"}
dms_1              | 2021-08-18T09:32:58.235Z  DEBUG  pydio.grpc.tree  EnrichMetaProvider - Average time spent  {"pydio.grpc.meta": "1.8817ms"}
dms_1              | 2021-08-18T09:32:58.235Z  DEBUG  pydio.grpc.tree  ReadNode  {"time": "8.8924ms", "req": "Node:<Uuid:\"DATASOURCE:personal\" > ", "resp": "Node:<Uuid:\"DATASOURCE:personal\" Path:\"personal/\" Type:COLLECTION Size:474121 MTime:1629268303 MetaStore:<key:\"name\" value:\"\\\"\\\"\" > MetaStore:<key:\"pydio:meta-data-source-name\" value:\"\\\"personal\\\"\" > MetaStore:<key:\"pydio:meta-data-source-path\" value:\"\\\"\\\"\" > MetaStore:<key:\"pydio:meta-loaded-pydio.grpc.meta\" value:\"true\" > > "}
dms_1              | 2021-08-18T09:32:58.247Z  DEBUG  Users Search Query   {"q": "SELECT `t`.`uuid`, `t`.`level`, `t`.`mpath1`, `t`.`mpath2`, `t`.`mpath3`, `t`.`mpath4`, `t`.`name`, `t`.`leaf`, `t`.`etag`, `t`.`mtime` FROM `idm_user_idx_tree` AS `t` WHERE ((`t`.`name` = ?) AND (`t`.`leaf` = ?)) ORDER BY `t`.`name` ASC", "q2 length": 1}
dms_1              | 2021-08-18T09:32:58.252Z  DEBUG  pydio.grpc.data.index.personal  ReadNode  {"time": "2.0158ms", "req": "Node:<Path:\"ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\" > ", "resp": ""}
dms_1              | 2021-08-18T09:32:58.254Z  DEBUG  pydio.grpc.tree  ReadNode  {"time": "4.9506ms", "req": "Node:<Path:\"personal/ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\" MetaStore:<key:\"pydio:meta-data-source-path\" value:\"\\\"ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\\\"\" > > ", "resp": ""}
dms_1              | 2021-08-18T09:32:58.254Z  DEBUG  pydio.grpc.tree  ListNodes  {"time": "15.2313ms", "req": "Node:<Path:\"personal/ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\" MetaStore:<key:\"pydio:meta-data-source-path\" value:\"\\\"ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\\\"\" > > Ancestors:true ", "resp": {}}
dms_1              | 2021-08-18T09:32:58.257Z  DEBUG  Users Search Query   {"q": "SELECT `t`.`uuid`, `t`.`level`, `t`.`mpath1`, `t`.`mpath2`, `t`.`mpath3`, `t`.`mpath4`, `t`.`name`, `t`.`leaf`, `t`.`etag`, `t`.`mtime` FROM `idm_user_idx_tree` AS `t` WHERE ((`t`.`name` = ?) AND (`t`.`leaf` = ?)) ORDER BY `t`.`name` ASC", "q2 length": 1}
dms_1              | 2021-08-18T09:32:58.310Z  DEBUG  pydio.gateway.data  jwt rawIdToken verify: failed  {"error": "request_unauthorized", "errorVerbose": "request_unauthorized\ngithub.com/pydio/cells/vendor/github.com/ory/fosite.(*Fosite).IntrospectToken\n\tgithub.com/pydio/cells/vendor/github.com/ory/fosite/introspect.go:69\ngithub.com/pydio/cells/common/auth.(*oryprovider).Verify\n\tgithub.com/pydio/cells/common/auth/jwt_ory.go:382\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).verifyTokenWithRetry\n\tgithub.com/pydio/cells/common/auth/jwt.go:246\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).Verify\n\tgithub.com/pydio/cells/common/auth/jwt.go:294\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd.pydioAuthHandler.ServeHTTP\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/gateway-pydio-auth-handler.go:83\ngithub.com/pydio/cells/common/service/context.HttpSpanHandlerWrapper.func1\n\tgithub.com/pydio/cells/common/service/context/span.go:178\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd.criticalErrorHandler.ServeHTTP\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/generic-handlers.go:775\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/http.(*Server).Start.func1\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/http/server.go:115\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2843\nnet/http.(*conn).serve\n\tnet/http/server.go:1925\nruntime.goexit\n\truntime/asm_amd64.s:1374"}
dms_1              | 2021-08-18T09:32:58.315Z  DEBUG  pydio.gateway.data  jwt rawIdToken verify: failed  {"error": "{\"id\":\"\",\"code\":0,\"detail\":\"request_unauthorized\",\"status\":\"\"}"}
dms_1              | 2021-08-18T09:32:58.343Z  DEBUG  pydio.grpc.data.index.personal  ReadNode  {"time": "894.9µs", "req": "Node:<Path:\"ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\" > ", "resp": ""}
dms_1              | 2021-08-18T09:32:58.345Z  DEBUG  pydio.grpc.tree  ReadNode  {"time": "4.1047ms", "req": "Node:<Path:\"personal/ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\" MetaStore:<key:\"pydio:meta-data-source-path\" value:\"\\\"ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\\\"\" > > ", "resp": ""}
dms_1              | 2021-08-18T09:32:58.345Z  DEBUG  pydio.grpc.tree  ListNodes  {"time": "13.7849ms", "req": "Node:<Path:\"personal/ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\" MetaStore:<key:\"pydio:meta-data-source-path\" value:\"\\\"ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\\\"\" > > Ancestors:true ", "resp": {}}
dms_1              | 2021-08-18T09:32:58.487Z  DEBUG  pydio.gateway.data  jwt rawIdToken verify: failed  {"error": "request_unauthorized", "errorVerbose": "request_unauthorized\ngithub.com/pydio/cells/vendor/github.com/ory/fosite.(*Fosite).IntrospectToken\n\tgithub.com/pydio/cells/vendor/github.com/ory/fosite/introspect.go:69\ngithub.com/pydio/cells/common/auth.(*oryprovider).Verify\n\tgithub.com/pydio/cells/common/auth/jwt_ory.go:382\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).verifyTokenWithRetry\n\tgithub.com/pydio/cells/common/auth/jwt.go:246\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).Verify\n\tgithub.com/pydio/cells/common/auth/jwt.go:294\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd.pydioAuthHandler.ServeHTTP\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/gateway-pydio-auth-handler.go:83\ngithub.com/pydio/cells/common/service/context.HttpSpanHandlerWrapper.func1\n\tgithub.com/pydio/cells/common/service/context/span.go:178\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd.criticalErrorHandler.ServeHTTP\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/generic-handlers.go:775\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/http.(*Server).Start.func1\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/http/server.go:115\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2843\nnet/http.(*conn).serve\n\tnet/http/server.go:1925\nruntime.goexit\n\truntime/asm_amd64.s:1374"}
dms_1              | 2021-08-18T09:32:58.491Z  DEBUG  pydio.gateway.data  jwt rawIdToken verify: failed  {"error": "{\"id\":\"\",\"code\":0,\"detail\":\"request_unauthorized\",\"status\":\"\"}"}
dms_1              | 2021-08-18T09:32:58.525Z  DEBUG  pydio.grpc.data.index.personal  ReadNode  {"time": "909µs", "req": "Node:<Path:\"ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\" > ", "resp": ""}
dms_1              | 2021-08-18T09:32:58.526Z  DEBUG  pydio.grpc.tree  ReadNode  {"time": "4.9094ms", "req": "Node:<Path:\"personal/ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\" MetaStore:<key:\"pydio:meta-data-source-path\" value:\"\\\"ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\\\"\" > > ", "resp": ""}
dms_1              | 2021-08-18T09:32:58.527Z  DEBUG  pydio.grpc.tree  ListNodes  {"time": "17.7057ms", "req": "Node:<Path:\"personal/ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\" MetaStore:<key:\"pydio:meta-data-source-path\" value:\"\\\"ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\\\"\" > > Ancestors:true ", "resp": {}}
dms_1              | 2021-08-18T09:32:58.720Z  DEBUG  pydio.gateway.data  jwt rawIdToken verify: failed  {"error": "request_unauthorized", "errorVerbose": "request_unauthorized\ngithub.com/pydio/cells/vendor/github.com/ory/fosite.(*Fosite).IntrospectToken\n\tgithub.com/pydio/cells/vendor/github.com/ory/fosite/introspect.go:69\ngithub.com/pydio/cells/common/auth.(*oryprovider).Verify\n\tgithub.com/pydio/cells/common/auth/jwt_ory.go:382\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).verifyTokenWithRetry\n\tgithub.com/pydio/cells/common/auth/jwt.go:246\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).Verify\n\tgithub.com/pydio/cells/common/auth/jwt.go:294\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd.pydioAuthHandler.ServeHTTP\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/gateway-pydio-auth-handler.go:83\ngithub.com/pydio/cells/common/service/context.HttpSpanHandlerWrapper.func1\n\tgithub.com/pydio/cells/common/service/context/span.go:178\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd.criticalErrorHandler.ServeHTTP\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/generic-handlers.go:775\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/http.(*Server).Start.func1\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/http/server.go:115\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2843\nnet/http.(*conn).serve\n\tnet/http/server.go:1925\nruntime.goexit\n\truntime/asm_amd64.s:1374"}
dms_1              | 2021-08-18T09:32:58.724Z  DEBUG  pydio.gateway.data  jwt rawIdToken verify: failed  {"error": "{\"id\":\"\",\"code\":0,\"detail\":\"request_unauthorized\",\"status\":\"\"}"}
dms_1              | 2021-08-18T09:32:58.757Z  DEBUG  pydio.grpc.data.index.personal  ReadNode  {"time": "846.1µs", "req": "Node:<Path:\"ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\" > ", "resp": ""}
dms_1              | 2021-08-18T09:32:58.757Z  DEBUG  pydio.grpc.tree  ReadNode  {"time": "3.3138ms", "req": "Node:<Path:\"personal/ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\" MetaStore:<key:\"pydio:meta-data-source-path\" value:\"\\\"ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\\\"\" > > ", "resp": ""}
dms_1              | 2021-08-18T09:32:58.758Z  DEBUG  pydio.grpc.tree  ListNodes  {"time": "14.1787ms", "req": "Node:<Path:\"personal/ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\" MetaStore:<key:\"pydio:meta-data-source-path\" value:\"\\\"ad36a715-621e-4bb1-b0f4-8bc8a1863cb0\\\"\" > > Ancestors:true ", "resp": {}}

I could locate the issue, the problem appears when Prefix is called with something that does not exist, when using the aws s3 client.

				const response = await this.$driver
					.listObjectsV2({
						Bucket: this.$bucket,
						Prefix: 'directus/c619ee99-1544-4023-8abc-99c82075e8bc',
						ContinuationToken: continuationToken,
						MaxKeys: 1000,
					})
					.promise();
1 Like

Hello @nicam ,

If you think that you found a bug, would you mind opening a github issue with the same exact details.
Thank you.

I did at the same time I created this forum post :slight_smile: Prefix in listObjects throws 500 error · Issue #342 · pydio/cells · GitHub

Hi again, before this thread gets closed… @nicam, did you find a workaround? I understand that the 500 error comes from ‘something that does not exist’. Since I’m no expert in S3 terminology, I’m a bit stumped at understanding what that means.

Do you mean a bucket that doesn’t exist? I suppose not, since Cells only has one bucket (io, with data as an alias).

Do you mean a directory inside the bucket that doesn’t exist? From both your example here and what you’ve posted on GitHub, it’s not clear if you’re trying to access a directory or a file (sorry, it’s probably obvious for you guys, but not for me).

Do you mean a file that doesn’t exist, i.e. a new file to be created via the S3 protocol? That would be my case, since I’m always uploading whole directories with many new files to Pydio, and that’s possibly what causes the error…

Ironically, I hadn’t tried out downloading an existing file. To my surprise and bewilderment, that worked! That, at least, rules out that I was mistyping something in the configuration (such as the personal access token, for instance).

So hmm… I’m going now to test the 3.0.0 release candidate… since allegedly this issue has been fixed there…

@GwynethLlewelyn It happens when the prefix (in my code above) doesn’t match any file, then Pydio returns 500. With other S3 Buckets, this Prefix can be used to filter the result set.

Indeed, that seems to be quite the case: Pydio Cells’ S3 API implementation seems to behave mostly as if it were read-only, although, as you so well pointed out, if a prefix exists, you can overwrite the file (but not delete it!).

That behaviour is anything but obvious, and I have no idea how to change it.

I’m trying to install the 3.0.0 RC in the hope that something has been changed in the implementation of the S3 API. I haven’t made any tests yet, but I remain optimistic :slight_smile: After all, other MinIO-based solutions have no such issues (rather the contrary, they’re quite stable and implement the S3 API fully and correctly), and it baffles me why Cells behaves this way…

Yes I found a workaround, but I had to patch the CMS that was connecting to Pydio.

The prefix was only used to figure out which files already existed, so I just set the prefix to null, retrieved all files and then searched in memory. Which is slower but worked.

Now that’s interesting!

I wonder, does Directus only store files in the S3 buckets, or does it also store folders?

The reason I’m asking is that I seem to be able to upload files to existing folders, but when using Cyberduck to create new folders, all I get is a zero-byte ‘placeholder’ file (with the name I’ve selected for the new folder) — which I cannot delete, either.

Hmm, I thought I had posted some logs of the difference between the sequence of commands sent by Cyberduck compared with other software tools, but I guess it was on a different thread, which I cannot find at this moment :slight_smile: Anyway, the point is that the implementation of the S3 API varies widely across tools. Cyberduck, strangely enough, seems to be the one managing to get most of the operations working with Cells (except, as noticed, creating new folders; and deleting regular files is tricky but sometimes works). Another tool I use is the macOS-specific Transmit, which can’t even open a connection to the Cells S3 API. The many other tools I’ve tried so far have varying degrees of success — most can establish a connection, a few can even list a few folders (at least, those at the top-level), very few can upload files (and if they do, they can only overwrite existing files, not create new ones!), etc.

When looking at the logs made by these tools, it’s quite clear that each does things slightly differently: some try to send a HEAD before doing the PUT/DELETE (Pydio Cells dislikes that approach) and do not continue unless the HEAD succeeds (eventually giving up). Some try to do a HEAD and when it fails just proceed with the PUT/DELETE anyway — the PUT will work, the DELETE… only sometimes. Some have short timeouts and move on with the next instruction when the timeout fails (usually because Pydio Cells has returned a 500); others retry for a long time before moving on, and eventually give up the attempt when it’s clear that they will only get 500s…

This is too baffling for me!

Hey, bumping on that one
So what you say is that the expected behavior for ListObjectsV2 on a non-existing prefix is simply returning an empty result set without error ?

1 Like

Yes exactly. Like most search endpoints.

Thanks :slight_smile:

ok - did you notice other api discrepancy with your s3 client?

Not really, maybe the ones GwynethLlewelyn is describing above.

Just piping in for my 0.02…

I’m far too unskilled to comment on the complexity of the AWS S3 API, but after browsing through the online documentation that I could find (and understand!), as far as I can tell, ListObject and ListObjectV2 only mention that an error should be returned when the request is made by a user without access to the resource (i.e. 403 Access Denied is supposed to be returned in that case). In all other cases — again, as far as I could understand the API descriptions — allegedly 200 OK should be returned, optionally with an empty result set. I say allegedly because, sincerely, I couldn’t find a concrete example of the API returning an empty set — all the examples I found return at least one element in the set, i.e. all assume that there is at least one result that fulfils the query.

I’d guess (probably wrongly!) that returning a 404 would make more sense when the result is that a resource isn’t found. But, clearly, this is not the case: S3 clients get ‘confused’ with those status codes (at least the ones I’ve tried), and while some will attempt to recover and/or retry, many simply fail (because they’re getting an unexpected result). Worse than that, of course, are 50X errors — even fewer S3 clients are able to handle those, and the few that can assume it’s a ‘temporary’ error and try over and over again during a period of time, failing at last when the timeout expires.

Now, not having seen the code from the S3 API implementation — neither in Cells, nor on any of the open-source tools I use — I cannot say what such clients are actually expecting. I’d guess it’s safe to assume that most will be baffled by a 50X error — such errors ought never to be returned (unless, of course, Cells does fail catastrophically… and exits with a panic) — and I haven’t found any guidelines on what to do when that happens. 40X errors are a bit more confusing, since some clients will definitely be aware of a ‘not found’ status of some sort and deal with it, while, for others, anything which isn’t an OK 200 or a 403 Access Denied is treated as some sort of ‘total failure’ and is, therefore, an excuse to abort the connection.

I’ve tried to figure out how exactly errors are passed between the Pydio Cells layer and the MinIO layer. Not being familiar with either, my understanding is, naturally enough, very superficial. Also, at the moment, I have just access to the code in the 3.0.0 branch — under 2.X things might have been different, I don’t know.

Taking all the above into account, I understand that Pydio Cells includes a ‘modified’ version of MinIO (this is actually mentioned on one of the READMEs). In particular, there are some extra files that are not part of the MinIO distribution and are clearly labelled as being part of the Pydio ecosystem as ‘auxiliary’ functions.

In particular, in the file vendor/github.com/pydio/minio-srv/cmd/gateway/pydio/gateway-pydio.go, there is a function pydioToMinioError() which seems to handle the ‘conversion’ of errors at the Pydio Cells layer and inject them into the MinIO/S3 layer. What seems obvious to me at this stage is that only a handful of errors are explicitly covered:

  • 403 Access Denied - Passed to the MinIO/S3 layer, and presumably will, indeed, emit a valid 403 error (according to my interpretation of the S3 API guidelines)
  • 404 Not Found- Passed to the MinIO/S3 layer as an Object Not Found. Allegedly, it will be up to MinIO to figure out what to do in those cases… and we can only assume it’s handled properly (I didn’t check), by sending back an OK 200 with an empty JSON result set.
  • 422 Unprocessable Entity — This sends a QuotaExceeded error via MinIO, which I presume to be related to sending an object that is too large to fit in the user’s total amount of available space. Probably this function is also used for sending objects, not only listing them, thus this weird error (I first thought that ‘quota exceeded’ was related to the network/bandwidth quota, i.e. too many requests made, but in that scenario, 429 Too Many Requests would be the correct error to send; by looking at the error section for MinIO, even though seriously under-commented, it seems to be related to the object’s size instead, since the entry before this one refers to objects that are too small).
  • Any error that contains the word ‘Forbidden’ — this sends a 403 Forbidden error, a.k.a. Access denied.
  • Other errors are not caught directly, they’re supposed to be handled at the level of MinIO’s own error system (which also seems to be reasonably straightforward)

Here is as far as I can understand the whole issue, but I would certainly start at looking at that particular file and put in some extra debug logs, to try to figure out what errors get passed from one layer to the other…

Fixed in master :wink:
Feel free to have a try with https://download.pydio.com/pub/cells/dev/linux-amd64/cells

I’m going to try it out on the freshly released 3.0.0! :smile_cat:

Crossing my fingers (still waiting for the migration to finish)…