Pydio Cells causing "fork: no child processes" on shared host

Hello,

I installed Pydio Cells on a shared host (CentOS 7). Among other things, the hosting company enforces a limit of 300 Threads per user.

Despite Pydio Cells running fine, I can’t login via ssh any more, because bash is unable to start because of the limit:

-bash-4.2$ ll
-bash: fork: retry: No child processes
-bash: fork: retry: No child processes
-bash: fork: retry: No child processes
♥
-bash: fork: retry: No child processes
-bash: fork: Resource temporarily unavailable

What can I do about this problem?

Kind regards

wow, that’s interesting. you could be able to reduce the number of cells fork, but not sure it will change anything to the number of threads.
We have to test that on our side to see how we can handle the case (maybe with gomaxprocs env variable). @zayn could you try to set this limitation on a test vm ?

Hello @zalintyre,

Could you run the following command and tell me the number of threads that your machine has,
ps -aux | wc -l.

Also do you know your machine hardware ? (lscpu to check the cpu count)

I"ve tried to reproduce your case by limiting the number of threads of my vm with echo 300 > /proc/sys/kernel/threads-max.

I had Cells Up and running with the database on the same server.

Does your Cells instance have lots of datasource?

$ ps -aux | wc -l
19

But from htop, I can see that each of the cells processes spawns about 25 threads. That seems quite many :wink:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                20
On-line CPU(s) list:   0-19
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             20
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 94
Model name:            Intel Core Processor (Skylake, IBRS)
Stepping:              3
CPU MHz:               2199.998
BogoMIPS:              4399.99
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
NUMA node0 CPU(s):     0-19
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ibrs ibpb stibp fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt spec_ctrl intel_stibp

Pydio has one storage on the local SSD and a MySQL database, that’s it.

I start pydio cells using “supervisord”. Might it be case that pydio cells starts daemon processes that supervisord is not able to catch up with?

Hello @zalintyre,

do you mind show me your supervisord config?

Global supervisord config:

$ cat /etc/supervisord.conf
# supervisord config for each user's supervisord instance.

[unix_http_server]
file=/run/supervisord/%(ENV_USER)s/supervisor.sock
username = dummy
password = dummy

[supervisorctl]
serverurl=unix:///run/supervisord/%(ENV_USER)s/supervisor.sock
username = dummy
password = dummy

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisord]
logfile=%(ENV_HOME)s/logs/supervisord.log
logfile_maxbytes=20MB
logfile_backups=3
loglevel=debug
pidfile=/dev/null
childlogdir=%(ENV_HOME)s/tmp
directory=%(ENV_HOME)s
identifier=supervisor_%(ENV_USER)s
nodaemon=true
strip_ansi=true
environment=PATH="/home/%(ENV_USER)s/bin:/home/%(ENV_USER)s/.local/bin:/opt/uberspace/etc/%(ENV_USER)s/binpaths/ruby:%(ENV_PATH)s"

[include]
files = %(ENV_HOME)s/etc/services.d/*.ini

In my user account:

$ cat ~/etc/services.d/pydio.ini
[program:pydio]
command=/home/orous/bin/tini -vvv -g -s -- /home/orous/bin/cells start
autostart=yes
autorestart=no

I tried with autorestart=yes, makes no difference. I added tini because I thought it would reap zombie processes if necessary, doesn’t make a difference either.

Could you take a look at this https://pydio.com/en/docs/kb/devops/running-cells-service-supervisor
and retry.

That doesn’t work either. After supervisorctl stop pydio the processes detach and are re-parented to the init process. Also, every process keeps spawning 25 threads.

Same behavior with cells 2.0. Starts an endless number of threads that get re-parented when the parent process is killed.

Could you please run ps aux | grep cells and then paste the result here, make sure to reproduce the case where you are thread limited.

$ ps -aux | grep cells
orous    20121 56.2  1.3 3088380 384976 ?      Sl   20:42   0:16 /home/orous/bin/cells start
orous    20208 13.5  0.5 1309500 168388 ?      Sl   20:42   0:02 /home/orous/bin/cells start --fork --registry nats --registry_address :4222 --registry_cluster_address :5222 --registry_cluster_routes  --broker nats --broker_address :4222 pydio.grpc.search
orous    20211 13.9  0.5 1293300 169732 ?      Sl   20:42   0:02 /home/orous/bin/cells start --fork --registry nats --registry_address :4222 --registry_cluster_address :5222 --registry_cluster_routes  --broker nats --broker_address :4222 pydio.gateway.websocket
orous    20212 14.3  0.5 1161716 168332 ?      Sl   20:42   0:02 /home/orous/bin/cells start --fork --registry nats --registry_address :4222 --registry_cluster_address :5222 --registry_cluster_routes  --broker nats --broker_address :4222 pydio.grpc.tasks
orous    20215 12.6  0.5 1284760 170588 ?      Sl   20:42   0:02 /home/orous/bin/cells start --fork --registry nats --registry_address :4222 --registry_cluster_address :5222 --registry_cluster_routes  --broker nats --broker_address :4222 pydio.grpc.jobs
orous    20287 14.4  0.5 1612816 167764 ?      Sl   20:42   0:02 /home/orous/bin/cells start --fork --registry nats --registry_address :4222 --registry_cluster_address :5222 --registry_cluster_routes  --broker nats --broker_address :4222 pydio.grpc.data.objects.local1
orous    20288 15.1  0.5 1154096 169700 ?      Sl   20:42   0:03 /home/orous/bin/cells start --fork --registry nats --registry_address :4222 --registry_cluster_address :5222 --registry_cluster_routes  --broker nats --broker_address :4222 pydio.grpc.data.sync.cellsdata
orous    20289 15.8  0.5 1481744 168344 ?      Sl   20:42   0:03 /home/orous/bin/cells start --fork --registry nats --registry_address :4222 --registry_cluster_address :5222 --registry_cluster_routes  --broker nats --broker_address :4222 pydio.grpc.data.sync.personal
orous    20290 14.0  0.5 1022192 170368 ?      Sl   20:42   0:02 /home/orous/bin/cells start --fork --registry nats --registry_address :4222 --registry_cluster_address :5222 --registry_cluster_routes  --broker nats --broker_address :4222 pydio.grpc.data.sync.pydiods1
orous    20431 15.3  0.5 1440444 163404 ?      Sl   20:42   0:02 /home/orous/bin/cells start --fork --registry nats --registry_address :4222 --registry_cluster_address :5222 --registry_cluster_routes  --broker nats --broker_address :4222 pydio.grpc.data.index.personal
orous    20432 13.8  0.5 1219056 167888 ?      Sl   20:42   0:02 /home/orous/bin/cells start --fork --registry nats --registry_address :4222 --registry_cluster_address :5222 --registry_cluster_routes  --broker nats --broker_address :4222 pydio.grpc.data.index.pydiods1
orous    20433 13.5  0.5 1022192 168808 ?      Sl   20:42   0:02 /home/orous/bin/cells start --fork --registry nats --registry_address :4222 --registry_cluster_address :5222 --registry_cluster_routes  --broker nats --broker_address :4222 pydio.grpc.data.index.cellsdata

Hello,
your process number seems normal, if we count all the threads you should have more or less 130.

Can you try with this command ps -efL | grep cells | wc -l (it will list all the cells threads and count the lines)

also on HTOP if you hit shift + h it should display all threads could you tell me the total count (see screenshot)

in my case I have 189 threads

59

Hi,

sorry for the wait!

$ ps -efL | grep cells | wc -l
300

Cells already hit 300 threads from the start.

Tasks: 17, 340 thr; 1 running
Load average: 1.61 1.87 1.83 
Uptime: 29 days, 09:09:22