All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Cpanel based Server high-load on every sunday for 5 hours
Hello.
i have a Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz 16GB ram server from OVH. i installed virtualizor on that dedicated server and make 2 vps. installed cpanel/whm and cloudlinux on 1vps, and install nginx on other one. i use vps2 for storage downloadable files only. vps1(cpanel) has some website with low traffic. vps1 has 1 domain account with a large number of email. they use almost 55gb space for email only. now my problem is with vps2 which is cpanel and cloudlinux based. vps2 used almost 300gb and vps1 used almost 60-70gb.
problem is every sunday on 12pm to 5pm (gmt +6) for exact 5hours, server load become high. but vps2 is totally normal, node is totally normal at the time and other time also. i checked,monitored, cpanel L4 technician team then L3 technician team also monitored the vps and cant find exact reason why server become high for exact 5hours on every sunday. while there was no cpu process at the time but I/O wait rate was high. No cronjob No backup No update set on the time. i use "sar" for check server load time, use tech-SysSnapv2 for log details.
Now, cpanel Level 3 and Level 4 team give up and they said it might be other cause for server highload except cpanel. but i think it is because of cpanel. because i have vps2 on same node and node is also normal at the time. both vps has centos7 and up2date.
here is log file from week1 and 2.log 1 log 2 some log file replaced due to late of downloading.
if anyone here have any idea what else can cause this on specific time for exact 5hours every sunday, please tell me. or willing to check and solve the problem also tell me, i will give some courtesy money.
Comments
do they use ColoCrossing?
Best guess you have software raid and that is the mdadm raid resync generating excessive IOPS because it has been left with defaults.
who? the node is my dedicated server from ovh
Very interesting. software raid is in main dedicated server. i virtualized the dedicated server to get 2vps, one vps never gone high while other one become. is it possible due to software raid of dedicated server, one vps can get high load spike?
yes absolutley, i assume they both run different stacks so the lack of IOPS creating io wait could significantly impact 1 and not the other.
with the times it seems almost certain to be the case.
Try and turn off bitmap caching and reduce the max speed to 5000 on your mdadm array and see if it still happens, it may simply be time to upgrade your disk to a faster array.
can you help me in this to do or give me any tutorial or idea to do this? since i am a little bit of new to this thats why i left all this as default.
If you have IPv6 enabled, I would ask you to run
ip route show cache table all |grep -c cache
now while not affected, and then again on Sunday when you are.
Yeah np, just pm me the output of: cat /proc/mdstat
I will try and advise you from there.
here is how i reduce the max speed from 200000 to 5000 by command is :
sysctl -w dev.raid.speed_limit_max=5000
here is the code i use to turn off bitmap since i have two partitions of server md2 and md3 :
mdadm --grow --bitmap=none /dev/md2
here is output of cat /proc/mdstat
am i doing it right? here is the Status of partitions from ovh panel:
whm/cpanel backups maybe
here is latest update, i changed the server time to my time. then i change cronjob time to friday from sunday at /etc/cron.d/raid-check. because server load on friday is not a problem for my office while its offday. i disable bitmap for both partitions by
"mdadm --grow --bitmap=none /dev/md2"
"mdadm --grow --bitmap=none /dev/md3"
also reduce the max speed as advice to 5000 from 200000 by
sysctl -w dev.raid.speed_limit_max=5000
now i will update the domain dns which use high email storage to cloudflare and convert the email format for the account on cpanel to mdbox from maildir.
i hope i will get this problem solved this in sha allah.
output is 0. what does that mean?
Seems you have done enough to see if its the raid sync or at least rule it out, keep us updated.
Neighbor Discovery Cache . It should be 0 or low number < 50. If you have 500 to 1500 come Sunday let me know. If it stays low, then this is not your problem. @AnthonySmith is probably correct, but since this is a one time a week thing, I figured I would give you something else to check.
@FrankZ @AnthonySmith Thanks. since i change max_speed and resync time to 12am of 5(friday), now is the time of resync. but i got no high load on any terminal of my server. so its time to watch on sunday now. i will update you all. thanks again
hello. here is last update. vps is now normal. but main dedicated server is doing something and load average 1 which is normal also. here is some output i got now. please check and tell me that what to worry or not and whats going on. is it syncing now or not? if sync running, how much time will it take accorind to my drive speed?
It will take four days at current speed. So I'd recommend either raising the speed or pausing the check at the end of the period you want the check to run in. Search for "echo idle sys block mdadm" to see how to pause the raid check.
If everything is running fine and the sync time is <7 days (it is) then just leave it your problem is solved.
There is just no way to quickly sync slow sata drives with 0 impact, you get to choose 1 or the other or you upgrade to ssd
can i pause the the proccess now by entering this code?
echo "idle" > /sys/block/md3/md/sync_action
then how much speed should i increase in this drive from current max 5000?
can i make it 15000? when it was 200000, average load was around 15 and it was take 5hours. so i divide 200000 by 15= around 14000.
@FrankZ @AnthonySmith thanks. sunday is gone without any trouble. so now i figure it out that its cause just because of mdadm array. thanks again @AnthonySmith for your idea to turn off bitmap caching and reduce the max speed to 5000 on mdadm array. now i just solved the 11 month aged problem
No worries, if your server has space for another drive you can store the bitmap cache on a tiny ssd which will speed up the resync a lot and no longer impact the IO either.
how would i know that my server has space for another drive or not? i just buy it from ovh soyoustart. ann do i need bitmap cache? is it so important?
That is a question from whomever you lease the server from, but SYS is not flexible so you can forget about it I guess.
You don't need a bitmap cache however it greatly improved sync speed because in simple terms it is used to keep track of blocks that may be out of sync.
For a 2 disk raid 1 I would not worry to much as the penalty for having one probably does more harm to performance than good, probably not worth spending money on now I think about it.
You just have a far better chance of recovering from any significant disk failure/system crash if you have a bitmap cache but its like a permanent double check that slows things down.
I am sure when your server/service grows you will migrate to an SSD based server anyway and none of these problems are noticeable anymore.
You can also yolo and run the cron only once a month Sure it checks for inconsistencies between the two drives, but you perform backups anyway, so in the very unlikely case there would be inconsistencies that would absolutely kill everything (I've yet to see this across 1000+ servers), then you could simply restore your backups.