Chunked-upload errors through CDNs doing request_buffering

After fixing Request signature we calculated does not match (bis) [chunked-upload] and using 3.0.5 I could observe the following problem using chunked-upload (behind Cloudflare) to an unstructured S3 object storage backend.

./cells configure sites gives

+---+--------------------------+--------------+--------------------------+
| # |               BIND(S)    |     TLS      |          EXTERNAL URL    |
+---+--------------------------------------+--------------+--------------+
| 0 | https://mydomain.org:443 | Lets Encrypt | https://mydomain.org     |
+---+--------------------------+--------------+--------------------------+

One is, I think, a notice I can see dozen of times during a single upload (of a 100MB file split into 5MB chunks):
2022-02-28T19:13:21.777Z DEBUG http: TLS handshake error from 172.69.102.71:54547: EOF
The IP address often change, here is an extract

141.101.109.13
162.158.113.26
162.158.117.186
162.158.196.20
162.158.224.82
162.158.248.73
172.68.206.73
172.69.102.71
172.69.193.35
172.69.198.69
172.70.101.71
172.70.103.69
172.70.121.8
172.70.84.49

They all belong to Cloudflare. My interpretation is that, for an unknown reason, there are a lot of requests (initiated?) by Cells to Cloudflare (or to itself going through CF?) which result in an error.

This is clearly not desirable and it’s to be noted that they are logged in DEBUG level (instead of WARNING)
I’m already in debug loglevel, how to further investigate this?

The other kind of error (which was, again, mislabeled as DEBUG instead of WARNING/ERROR) is:

2022-02-28T19:09:59.422Z	DEBUG	pydio.gateway.data	jwt rawIdToken verify: failed	{"error": "token_expired", "errorVerbose": "token_expired\ngithub.com/pydio/cells/vendor/github.com/ory/fosite.(*Fosite).IntrospectToken\n\tgithub.com/pydio/cells/vendor/github.com/ory/fosite/introspect.go:69\ngithub.com/pydio/cells/common/auth.(*oryprovider).Verify\n\tgithub.com/pydio/cells/common/auth/jwt_ory.go:381\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).verifyTokenWithRetry\n\tgithub.com/pydio/cells/common/auth/jwt.go:245\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).Verify\n\tgithub.com/pydio/cells/common/auth/jwt.go:293\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd.pydioAuthHandler.ServeHTTP\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/gateway-pydio-auth-handler.go:83\ngithub.com/pydio/cells/common/service/context.HttpSpanHandlerWrapper.func1\n\tgithub.com/pydio/cells/common/service/context/span.go:198\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd.criticalErrorHandler.ServeHTTP\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/generic-handlers.go:775\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/http.(*Server).Start.func1\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/http/server.go:115\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2843\nnet/http.(*conn).serve\n\tnet/http/server.go:1925\nruntime.goexit\n\truntime/asm_amd64.s:1374"}
2022-02-28T19:09:59.426Z	DEBUG	pydio.gateway.data	jwt rawIdToken verify: failed	{"error": "{\"id\":\"\",\"code\":0,\"detail\":\"token_expired\",\"status\":\"\"}"}
2022-02-28T19:09:59.427Z	DEBUG	pydio.gateway.data	jwt rawIdToken verify: failed	{"error": "{\"id\":\"token.invalid\",\"code\":401,\"detail\":\"Cannot validate token\",\"status\":\"Unauthorized\"}"}
2022-02-28T19:09:59.428Z	DEBUG	pydio.gateway.data	jwt rawIdToken verify: failed	{"error": "token_expired", "errorVerbose": "token_expired\ngithub.com/pydio/cells/vendor/github.com/ory/fosite.(*Fosite).IntrospectToken\n\tgithub.com/pydio/cells/vendor/github.com/ory/fosite/introspect.go:69\ngithub.com/pydio/cells/common/auth.(*oryprovider).Verify\n\tgithub.com/pydio/cells/common/auth/jwt_ory.go:381\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).verifyTokenWithRetry\n\tgithub.com/pydio/cells/common/auth/jwt.go:245\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).verifyTokenWithRetry\n\tgithub.com/pydio/cells/common/auth/jwt.go:254\ngithub.com/pydio/cells/common/auth.(*JWTVerifier).Verify\n\tgithub.com/pydio/cells/common/auth/jwt.go:293\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd.pydioAuthHandler.ServeHTTP\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/gateway-pydio-auth-handler.go:83\ngithub.com/pydio/cells/common/service/context.HttpSpanHandlerWrapper.func1\n\tgithub.com/pydio/cells/common/service/context/span.go:198\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd.criticalErrorHandler.ServeHTTP\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/generic-handlers.go:775\ngithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/http.(*Server).Start.func1\n\tgithub.com/pydio/cells/vendor/github.com/pydio/minio-srv/cmd/http/server.go:115\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2843\nnet/http.(*conn).serve\n\tnet/http/server.go:1925\nruntime.goexit\n\truntime/asm_amd64.s:1374"}
2022-02-28T19:09:59.430Z	DEBUG	pydio.gateway.data	jwt rawIdToken verify: failed	{"error": "{\"id\":\"\",\"code\":0,\"detail\":\"token_expired\",\"status\":\"\"}"}
2022-02-28T19:09:59.432Z	DEBUG	pydio.gateway.data	jwt rawIdToken verify: failed	{"error": "{\"id\":\"token.invalid\",\"code\":401,\"detail\":\"Cannot validate token\",\"status\":\"Unauthorized\"}"}
2022-02-28T19:09:59.432Z	DEBUG	pydio.gateway.data	error verifying token	{"token": "A7pZEOJIuknwJxUguVuHbsHI8L1JG80KdxDyCqkxBsM.XxvpHcy06UOsNG5BrrIsjYTlqKuEuIlFmclcJNbAISE", "error": "empty idToken"}
  • About the empty idToken, I read Users logged out immediately after login - #2 by bsinou, but it can’t be a cache problem (the Firefox console was opened during the upload what discard any caching problem), and the server time is correct (server = UTC+00, client = UTC-03)

  • About the token_expired, I can’t tell.

But it seems clearly related to the random 403 (or Cloudflare 520) returned to the browser these PUT /io/foobar.mp4?partNumber=4&uploadId=xxxx requests)

The uploader is able to overcome these issues (reuploading chunks until it succeeds) but this is still a quality/instability problem I’d like to see resolved before I roll into production.

thx!

Any hint about this @charles ?

Hello @drzraf

I’m not really sure about your case here. Do you mean that these connections are opened by the server to these cloudfare urls ? Maybe the micro-services are binding themselves on a public ip and internal calls are going through cloudflare? do you have a internal (private) IP on your server? Normally it is priviledged for binding the internal services, and only the main proxy is binding on the public IP.

It happens only with chunked-upload (whatever chunk-size I set). I don’t know if my interface bindings are corrects. Any command-line you’d like ?


1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fa:16:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet 51.xx.xx.xx/32 brd 51.xx.xx.xx scope global dynamic eth0
       valid_lft 65372sec preferred_lft 65372sec
    inet 10.0.0.1/8 scope global eth0:1
       valid_lft forever preferred_lft forever
    inet6 2001:xxxx/128 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fee7:2a12/64 scope link 
       valid_lft forever preferred_lft forever
$ sudo ss -taple | grep -f <(pgrep -wf pydio) -
LISTEN   0         128                0.0.0.0:37193              0.0.0.0:*       users:(("cells",pid=9000,fd=18)) uid:998 ino:92390272 sk:3 <->                 
LISTEN   0         128                0.0.0.0:9001               0.0.0.0:*       users:(("cells",pid=8996,fd=17)) uid:998 ino:92389820 sk:4 <->                 
LISTEN   0         128                0.0.0.0:9002               0.0.0.0:*       users:(("cells",pid=8991,fd=18)) uid:998 ino:92390529 sk:5 <->                 
LISTEN   0         128                0.0.0.0:34243              0.0.0.0:*       users:(("cells",pid=8957,fd=84)) uid:998 ino:92388187 sk:b <->                 
LISTEN   0         128                      *:32873                    *:*       users:(("cells",pid=9029,fd=17)) uid:998 ino:92392178 sk:c v6only:0 <->        
LISTEN   0         128                      *:44649                    *:*       users:(("cells",pid=9006,fd=17)) uid:998 ino:92391414 sk:d v6only:0 <->        
LISTEN   0         128                      *:41673                    *:*       users:(("cells",pid=9058,fd=17)) uid:998 ino:92391382 sk:e v6only:0 <->        
LISTEN   0         128                      *:37257                    *:*       users:(("cells",pid=8957,fd=269)) uid:998 ino:92390764 sk:f v6only:0 <->       
LISTEN   0         128                      *:43081                    *:*       users:(("cells",pid=8957,fd=262)) uid:998 ino:92390747 sk:10 v6only:0 <->      
LISTEN   0         128                      *:44297                    *:*       users:(("cells",pid=8957,fd=243)) uid:998 ino:92390688 sk:11 v6only:0 <->      
LISTEN   0         128                      *:36201                    *:*       users:(("cells",pid=8987,fd=36)) uid:998 ino:92390460 sk:12 v6only:0 <->       
LISTEN   0         128                      *:42857                    *:*       users:(("cells",pid=8957,fd=178)) uid:998 ino:92389909 sk:13 v6only:0 <->      
LISTEN   0         128                      *:34409                    *:*       users:(("cells",pid=8957,fd=72)) uid:998 ino:92388245 sk:14 v6only:0 <->       
LISTEN   0         128                      *:46187                    *:*       users:(("cells",pid=9129,fd=15)) uid:998 ino:92391145 sk:15 v6only:0 <->       
LISTEN   0         128                      *:42347                    *:*       users:(("cells",pid=8957,fd=247)) uid:998 ino:92390393 sk:16 v6only:0 <->      
LISTEN   0         128                      *:35243                    *:*       users:(("cells",pid=8957,fd=213)) uid:998 ino:92390504 sk:17 v6only:0 <->      
LISTEN   0         128                      *:37931                    *:*       users:(("cells",pid=8957,fd=177)) uid:998 ino:92389905 sk:18 v6only:0 <->      
LISTEN   0         128                      *:41419                    *:*       users:(("cells",pid=8957,fd=144)) uid:998 ino:92389574 sk:19 v6only:0 <->      
LISTEN   0         128                      *:41291                    *:*       users:(("cells",pid=8957,fd=83)) uid:998 ino:92388859 sk:1a v6only:0 <->       
LISTEN   0         128                      *:44301                    *:*       users:(("cells",pid=9115,fd=15)) uid:998 ino:92391120 sk:1c v6only:0 <->       
LISTEN   0         128                      *:34029                    *:*       users:(("cells",pid=8957,fd=274)) uid:998 ino:92390812 sk:1d v6only:0 <->      
LISTEN   0         128                      *:33453                    *:*       users:(("cells",pid=8957,fd=40)) uid:998 ino:92389039 sk:1e v6only:0 <->       
LISTEN   0         128                      *:39247                    *:*       users:(("cells",pid=9047,fd=17)) uid:998 ino:92392655 sk:1f v6only:0 <->       
LISTEN   0         128                      *:43343                    *:*       users:(("cells",pid=9041,fd=17)) uid:998 ino:92391291 sk:20 v6only:0 <->       
LISTEN   0         128                      *:35311                    *:*       users:(("cells",pid=9009,fd=17)) uid:998 ino:92391869 sk:21 v6only:0 <->       
LISTEN   0         128                      *:42991                    *:*       users:(("cells",pid=8957,fd=214)) uid:998 ino:92390508 sk:22 v6only:0 <->      
LISTEN   0         128                      *:33807                    *:*       users:(("cells",pid=8957,fd=172)) uid:998 ino:92389884 sk:23 v6only:0 <->      
LISTEN   0         128                      *:39567                    *:*       users:(("cells",pid=8957,fd=142)) uid:998 ino:92389452 sk:24 v6only:0 <->      
LISTEN   0         128                      *:41999                    *:*       users:(("cells",pid=8957,fd=141)) uid:998 ino:92389058 sk:25 v6only:0 <->      
LISTEN   0         128                      *:http                     *:*       users:(("cells",pid=8957,fd=284)) uid:998 ino:92388835 sk:26 v6only:0 <->      
LISTEN   0         128                      *:46769                    *:*       users:(("cells",pid=9124,fd=14)) uid:998 ino:92391151 sk:27 v6only:0 <->       
LISTEN   0         128                      *:46193                    *:*       users:(("cells",pid=9125,fd=14)) uid:998 ino:92391112 sk:28 v6only:0 <->       
LISTEN   0         128                      *:42289                    *:*       users:(("cells",pid=8974,fd=20)) uid:998 ino:92391461 sk:29 v6only:0 <->       
LISTEN   0         128                      *:39633                    *:*       users:(("cells",pid=8991,fd=13)) uid:998 ino:92390513 sk:2a v6only:0 <->       
LISTEN   0         128                      *:34929                    *:*       users:(("cells",pid=8957,fd=179)) uid:998 ino:92389913 sk:2b v6only:0 <->      
LISTEN   0         128                      *:33201                    *:*       users:(("cells",pid=8957,fd=145)) uid:998 ino:92389578 sk:2c v6only:0 <->      
LISTEN   0         128                      *:34259                    *:*       users:(("cells",pid=9061,fd=17)) uid:998 ino:92391903 sk:2d v6only:0 <->       
LISTEN   0         128                      *:39443                    *:*       users:(("cells",pid=9121,fd=15)) uid:998 ino:92391834 sk:2e v6only:0 <->       
LISTEN   0         128                      *:36179                    *:*       users:(("cells",pid=9113,fd=15)) uid:998 ino:92391130 sk:2f v6only:0 <->       
LISTEN   0         128                      *:45459                    *:*       users:(("cells",pid=8957,fd=231)) uid:998 ino:92390212 sk:30 v6only:0 <->      
LISTEN   0         128                      *:42707                    *:*       users:(("cells",pid=8975,fd=15)) uid:998 ino:92389970 sk:31 v6only:0 <->       
LISTEN   0         128                      *:37683                    *:*       users:(("cells",pid=8957,fd=175)) uid:998 ino:92389897 sk:32 v6only:0 <->      
LISTEN   0         128                      *:35859                    *:*       users:(("cells",pid=8957,fd=173)) uid:998 ino:92389887 sk:33 v6only:0 <->      
LISTEN   0         128                      *:38259                    *:*       users:(("cells",pid=8957,fd=14)) uid:998 ino:92388792 sk:34 v6only:0 <->       
LISTEN   0         128                      *:45461                    *:*       users:(("cells",pid=8997,fd=17)) uid:998 ino:92392142 sk:35 v6only:0 <->       
LISTEN   0         128                      *:40981                    *:*       users:(("cells",pid=8957,fd=271)) uid:998 ino:92391470 sk:36 v6only:0 <->      
LISTEN   0         128                      *:33525                    *:*       users:(("cells",pid=9002,fd=12)) uid:998 ino:92390053 sk:37 v6only:0 <->       
LISTEN   0         128                      *:39541                    *:*       users:(("cells",pid=8957,fd=82)) uid:998 ino:92389054 sk:38 v6only:0 <->       
LISTEN   0         128                      *:38325                    *:*       users:(("cells",pid=8957,fd=71)) uid:998 ino:92388241 sk:39 v6only:0 <->       
LISTEN   0         128                      *:34521                    *:*       users:(("cells",pid=8957,fd=266)) uid:998 ino:92390755 sk:3b v6only:0 <->      
LISTEN   0         128                      *:44729                    *:*       users:(("cells",pid=8957,fd=66)) uid:998 ino:92389043 sk:3c v6only:0 <->       
LISTEN   0         128                      *:38553                    *:*       users:(("cells",pid=8957,fd=23)) uid:998 ino:92388820 sk:3d v6only:0 <->       
LISTEN   0         128                      *:38969                    *:*       users:(("cells",pid=8957,fd=16)) uid:998 ino:92388806 sk:3e v6only:0 <->       
LISTEN   0         128                      *:41499                    *:*       users:(("cells",pid=9107,fd=14)) uid:998 ino:92391013 sk:40 v6only:0 <->       
LISTEN   0         128                      *:34651                    *:*       users:(("cells",pid=8957,fd=176)) uid:998 ino:92389901 sk:41 v6only:0 <->      
LISTEN   0         128                      *:36859                    *:*       users:(("cells",pid=8957,fd=140)) uid:998 ino:92388928 sk:42 v6only:0 <->      
LISTEN   0         128                      *:https                    *:*       users:(("cells",pid=8957,fd=283)) uid:998 ino:92388834 sk:43 v6only:0 <->      
LISTEN   0         128                      *:43035                    *:*       users:(("cells",pid=8957,fd=21)) uid:998 ino:92388816 sk:44 v6only:0 <->       
LISTEN   0         128                      *:44253                    *:*       users:(("cells",pid=8957,fd=259)) uid:998 ino:92390705 sk:45 v6only:0 <->      
LISTEN   0         128                      *:40733                    *:*       users:(("cells",pid=8957,fd=67)) uid:998 ino:92388235 sk:46 v6only:0 <->       
LISTEN   0         128                      *:34079                    *:*       users:(("cells",pid=8957,fd=260)) uid:998 ino:92390733 sk:47 v6only:0 <->      
LISTEN   0         128                      *:34911                    *:*       users:(("cells",pid=8957,fd=180)) uid:998 ino:92389918 sk:48 v6only:0 <->      
LISTEN   0         128                      *:38591                    *:*       users:(("cells",pid=8996,fd=10)) uid:998 ino:92389358 sk:49 v6only:0 <->       
LISTEN   0         128                      *:8000                     *:*       users:(("cells",pid=8957,fd=15)) uid:998 ino:92388798 sk:4a v6only:0 <->       
LISTEN   0         128                      *:37889                    *:*       users:(("cells",pid=8957,fd=263)) uid:998 ino:92390760 sk:4b v6only:0 <->      
LISTEN   0         128                      *:38177                    *:*       users:(("cells",pid=8957,fd=255)) uid:998 ino:92390729 sk:4c v6only:0 <->      
LISTEN   0         128                      *:42017                    *:*       users:(("cells",pid=8957,fd=211)) uid:998 ino:92390563 sk:4d v6only:0 <->      
LISTEN   0         128                      *:33571                    *:*       users:(("cells",pid=9114,fd=15)) uid:998 ino:92391156 sk:4e v6only:0 <->       
LISTEN   0         128                      *:45699                    *:*       users:(("cells",pid=8957,fd=264)) uid:998 ino:92390751 sk:4f v6only:0 <->      
LISTEN   0         128                      *:44931                    *:*       users:(("cells",pid=8957,fd=245)) uid:998 ino:92390384 sk:50 v6only:0 <->      
LISTEN   0         128                      *:8003                     *:*       users:(("cells",pid=8957,fd=10)) uid:998 ino:92388788 sk:51 v6only:0 <->       
LISTEN   0         128                      *:41189                    *:*       users:(("cells",pid=8957,fd=81)) uid:998 ino:92388261 sk:52 v6only:0 <->       
LISTEN   0         128                      *:39301                    *:*       users:(("cells",pid=8957,fd=79)) uid:998 ino:92388257 sk:53 v6only:0 <->       
LISTEN   0         128                      *:43461                    *:*       users:(("cells",pid=8957,fd=143)) uid:998 ino:92388932 sk:54 v6only:0 <->      
LISTEN   0         128                      *:34023                    *:*       users:(("cells",pid=8957,fd=273)) uid:998 ino:92390809 sk:55 v6only:0 <->      
LISTEN   0         128                      *:45063                    *:*       users:(("cells",pid=8957,fd=244)) uid:998 ino:92390691 sk:56 v6only:0 <->      
LISTEN   0         128                      *:43911                    *:*       users:(("cells",pid=9000,fd=13)) uid:998 ino:92390259 sk:57 v6only:0 <->       
LISTEN   0         128                      *:38631                    *:*       users:(("cells",pid=8957,fd=227)) uid:998 ino:92390178 sk:58 v6only:0 <->      
LISTEN   0         128                      *:39175                    *:*       users:(("cells",pid=8957,fd=181)) uid:998 ino:92389921 sk:59 v6only:0 <->      
LISTEN   0         128                      *:44039                    *:*       users:(("cells",pid=8957,fd=174)) uid:998 ino:92389891 sk:5a v6only:0 <->      
LISTEN   0         128                      *:36423                    *:*       users:(("cells",pid=8957,fd=77)) uid:998 ino:92388253 sk:5b v6only:0 <->       
LISTEN   0         128                      *:38087                    *:*       users:(("cells",pid=8957,fd=86)) uid:998 ino:92388867 sk:5c v6only:0 <->       
LISTEN   0         128                      *:44583                    *:*       users:(("cells",pid=8957,fd=18)) uid:998 ino:92388165 sk:5d v6only:0 <->

I was able to confirm that the problem was exclusively due to Cloudflare reverse-proxy.
When times come to reenable it, I’ll try to pinpoint and let you know which particular CF settings was at cause.

I can also confirm the issue with multipart uploads behind Cloudflare.

Specifically, the HTML5 uploader fails to complete uploading pending chunks right after the POST request to the /a/frontend/session endpoint, and starts over from the very first part effectively re-uploading everything under the new uploadId.

Cells Client a.k.a cec is also affected in this setup.

Replacing Cloudflare with HAProxy for the purpose of HTTP/2 + SSL works flawlessly.

@charles FYI

interesting.

@rotaru could you please just confirm which version of the server you use?

Sure, @bsinou. It is Pydio Cells Home Edition 3.0.7

Hi @bsinou and @charles.

I have made some more tests, this time with G-Core Labs CDN service (which uses nginx), to check whether the issue is Cloudflare specific.

The results are the same despite the multiple configuration settings being tried: multipart upload faces 403 errors right after the POST request to /a/frontend/session (which happens every ~10 minutes) and starts re-uploading everything from part 1 under a new uploadId.

If I switch back to my CDN-less setup with Cells sitting behind HAProxy, everything works as expected.

Please see the HAR files for more info: Data package from May 25th. - FileTransfer.io

Hope this helps.

Hello,

we discussed this in meeting today and had a quick look at the issue. It seems to rather be a configuration problem in the Cloudflare reverse proxy that impact the communication with the underlying Minio library that handle the files with S3 like protocol. There are related/similar issues opened in both Cloudflare and Minio community.

Unfortunately, we have no time to dig further right now, that leaves you with 3 options:

  • as you have already quoted, you might workaround this by using another reverse proxy
  • if you can dig further and find the correct configuration for Cloudflare, we would happily add this in the knowledge base to help others that might bump in the same issue
  • as the issue is already investigated and qualified, you might open an issue in our github repo: we do not guaranty we will have a look in the coming weeks soon, but if this is in github, we won’t forget it and be forced to have a look at it sooner or later.

Regards

Hey @bsinou, I’m back with some news.

After hours of test and trial, I think I nailed the issue with failing multipart uploads when behind a CDN. At least I know what is causing it. The short answer is request buffering.

First, I installed a Lua script to intercept request/response headers on HAProxy to see if they get changed on the way to Caddy. They don’t.

However, I noticed a significant delay between the browser initiating a PUT request to /io/, and the time it reaches the server. This happens for both G-Core Labs and Cloudflare. It looks like they put the entire request body into a buffer, then open the connection to the origin server, wait for the response, and then forward it to the client.

That wouldn’t be an issue if S3 access key weren’t dependent on JWT, which has a 10-minute lifetime. As far as I can see, almost every network operation run by HTML Uploader is triggering a token validity check and refreshes it when needed.

So, when the browser sends a PUT request for a chunk, it signs it with the current JWT value. Signing of another chunk may trigger a token refresh. And, if that happens long enough before the first request reaches the server, Minio does not accept the signature.

As both G-Core Labs and Cloudflare use nginx under the hood, I tried to reproduce the issue on-premise. And I succeeded.

I configured nginx according to this: MinIO | Setup Nginx proxy with MinIO - Cookbook/Recipe. But the issue persisted, even though they tried to disable buffering. The thing is they addressed only the response part of the connection.

The correct configuration option, which eliminates the delay, is: proxy_request_buffering off;

After adding that line to nginx.conf multipart uploads started to work flawlessly.

Unfortunately, there is no way to disable request buffering with G-Core Labs or Cloudflare.

The only solution for those who require to use a CDN, is to set the ‘Number of concurrent uploads’ to ‘1’ in Cells web console.

Or, maybe you can postpone the processing of queued chunks until the expiring token is refreshed, and also let active uploads finish before refreshing JWT.

Both solutions are far from elegant. Ideally, you could find some other way of generating S3 access key.

@charles you may find it helpful to mention the request buffering issue in the troubleshooting guide.

2 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.