I am using magento 2.4.3-p2 running php-7.4-fpm and nginx 1.18 on ubuntu 20.04 with elasticsearch 7.17.6 and varnish 6.2.1. My server has 32GB RAM and Intel Xeon-E2136 processor.
Everything has been working great with a very low load average (around 0.2) when the site has low traffic, but yesterday i upgraded to magento 2.4.3-p3 and things immediately when downhill. The load average went up to 5+ and stayed there until i realised 24 hours later. PHP-FM is running 4 processes all with 100% CPU which is causing this. RAM usage is around what it always is at between 16-18GB (this is due to elasticsearch).
When this CPU goes above 5.15 it seems to make varnish fall and i get an error 503 varnish backend fail.
I have rebooted my server, made sure everything is up to date, but the the PHP-FPM usage soon goes back up. Restarting PHP does the same.
I have now downgraded back to 2.4.3-p2 but im still having the same issue, but in theory everything is the same as it was yesterday now, short of restoring the file and database backup.
Is there a way to find out what PHP-FPM processes are doing so i can troubleshoot this issue. Varnish and nginx logs dont seem to give me any useful information.
I’ve changed the timeout in the default.vcl for varnish to 5s instead of 2s which gives me a bit longer under high CPU load, but i’d like to figure out what is causing this as its not sustainable to run a web server at 50% load when the site has no visitors.
php-fpm: pool www
is the line when viewing htop which 4 php-fpm processes the majority of the time all at 100% or very close to.
I think i have narrowed this down to an elasticsearch issue, but not sure how to solve it. Restarting elasticsearch has not helped
Output of ps -ax | grep php
16634 ? Ss 0:00 php-fpm: master process (/etc/php/7.4/fpm/php-fpm.conf)
16644 ? S 11:09 php-fpm: pool www
16645 ? R 22:07 php-fpm: pool www
16646 ? S 25:06 php-fpm: pool www
16647 ? S 27:18 php-fpm: pool www
16648 ? R 22:05 php-fpm: pool www
16649 ? R 43:58 php-fpm: pool www
16691 ? S 22:15 php-fpm: pool www
16765 ? R 51:00 php-fpm: pool www
16819 ? R 60:00 php-fpm: pool www
17451 ? R 30:33 php-fpm: pool www
17628 ? R 35:29 php-fpm: pool www
18082 ? S 9:36 php-fpm: pool www
18928 ? R 1:59 php-fpm: pool www
19234 ? R 3:38 php-fpm: pool www
19714 ? R 26:53 php-fpm: pool www
20430 ? R 10:37 php-fpm: pool www
20919 ? S 1:26 php-fpm: pool www
21121 ? S 0:19 php-fpm: pool www
21156 ? R 12:24 php-fpm: pool www
21164 ? R 5:11 php-fpm: pool www
21223 ? S 1:49 php-fpm: pool www
21230 ? R 15:06 php-fpm: pool www
21270 ? S 3:28 php-fpm: pool www
23831 ? Ss 0:00 /bin/sh -c /usr/bin/php7.4 /var/www/mysite.com/bin/magento cron:run 2>&1 | grep -v "Ran jobs by schedule" >> /var/www/mysite.com/var/log/magento.cron.log
23859 ? S 0:01 /usr/bin/php7.4 /var/www/mysite.com/bin/magento queue:consumers:start inventory.reservations.updateSalabilityStatus --single-thread --max-messages=10000
24493 pts/0 S+ 0:00 grep --color=auto php```