All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
High CPU Usage on KVM SolusVM
I got a dedicated server, 16cores, 32 threads, 6TB of HDD SOFT/JBOD and 128GB of RAM from OVH, Gigabit network speed. I installed solusvm and installed it for KVM platform and now i am facing the following issues
(I contacted SolusVM before contacting you and they said:
'This will be related to CPU, memory and Disk array on the server. You can check the hardware with your DC')
I have 9 VPS Installed and running
- High CPU Load on Main Node.
top - 00:09:36 up 4 days, 19:31, 2 users, load average: 27.42, 19.99, 18.51
Cpu(s): 4.6%us, 3.9%sy, 0.0%ni, 88.5%id, 3.0%wa, 0.0%hi, 0.0%si, 0.0%st
ALL VPS are Currently IDLE and no client is whatsoever connected or using resources
14173 qemu 20 0 10.8g 451m 5484 S 36.2 0.4 15:13.68 qemu-kvm
14524 qemu 20 0 8647m 447m 5476 S 19.9 0.3 14:04.47 qemu-kvm
14085 qemu 20 0 10.8g 756m 5484 S 19.6 0.6 11:51.64 qemu-kvm
14238 qemu 20 0 10.5g 454m 5484 S 19.6 0.4 15:52.27 qemu-kvm
14664 qemu 20 0 10.5g 448m 5484 S 19.6 0.3 15:08.18 qemu-kvm
14425 qemu 20 0 10.3g 776m 5484 S 18.9 0.6 12:28.16 qemu-kvm
13601 qemu 20 0 10.8g 1.8g 5484 S 18.6 1.4 15:17.75 qemu-kvm
14361 qemu 20 0 10.6g 449m 5484 S 18.2 0.3 14:40.62 qemu-kvm
14568 qemu 20 0 8647m 481m 5440 S 17.6 0.4 13:41.01 qemu-kvm
I have tried giving them all 32 cores or even lower it down, nothing helped.
On the vps, an IDLE vps, following is the load
load average: 1.03, 0.78, 0.75
Cpu(s): 0.6%us, 0.7%sy, 0.0%ni, 95.6%id, 5.2%wa, 0.0%hi, 0.0%si, 0.0%st
While a friend of mine hosted my custom iso on his vps, using kvm (he has 30 vps runnong on this node) and had the following load
load average: 0.09, 0.31, 0.38
Cpu(s): 0.4%us, 0.4%sy, 0.0%ni, 97.1%id, 1.9%wa, 0.1%hi, 0.0%si, 0.0%st
What can be the issue?
Comments
Seems the wa pretty high for unused services. Did you check your drives? You are doing software what? Raid 1?
On the node settings, what do you have set for Disk Cache?
Can you give us the full output of "top", if you can just print screen it may be easier to format within LET.
Disk Cache is Node Default while Disk driver is Virtio
Raid 5 probably that OVH Provide.
Left is Node while right is a box. Partial active now
You have high IO %WA, running on what looks to be RAID5, your going to have a high load during this time.
I would also suggest setting the Disk Cache to none when using KVM & LVM, change at node and then any VM's you have built and restart the VM's.
What does output of "cat /proc/mdstat" show?
Seem to be a HDD issue. Check them to be sure there is not one defect or Raid syncing.
output of "cat /proc/mdstat"
RAID looks fine, disable the cache as above and restart all the VM's. Also check if any heavy I/O within the VM's.
If you install iotop (yum install epel-release -y && yum install iotop -y) on the node you can use to check how much read / write I/O each VM is creating.
What's personalities [faulty], never noticed that before.
iotop
iostat
Seems kvm103 is doing what is probably random writes, which will cause I/O wait on Raid 5 to cause the load to increase.
Do you need the size of Raid 5, or could you do with Raid1?
Once you have a few VM's doing random reads & writes, you soon start to see I/O wait on a Raid 5 setup with no form of Raid Controller with a cache.
My friend is actually running the normal VM on raid 1 which works fine. RAID 1 would do the job.
Yes, Raid 1 removes the large latency from Raid 5 that is causing the pain your seeing in the above config.
Glad your problem is resolved, however I would still recommend changing Disk Cache to none as per default for all VM's.
Problem isnt pretty much resolved because my associate is out of town who sees this work. How can i move it to Raid 1? All VM's are set to disk cache as None and rebooted
Your need to backup any data/VM's and reinstall via OVH with Raid 1.
You can't move from Raid 5 to Raid 1 live without data loss.
Right. Thank you so much. I would let you know more about this. Thanks alot.
Ok, I Reinstalled as Raid1, space got almost halved but servers still got a bit higher load, Disk Cache was default on all boxes, set it to none since it was double caching and now load is alot less. I have one question now. Following is what is on solusvm slave node:
hdparm -W /dev/sda
/dev/sda:
write-caching = 1 (on)
Shall the write cache be on or off on all hard drives?
That is a decision you need to make, you will receive a small performance boost with it being enabled, however if the server hard loses power you may risk corruption of files/data that was held on the cache during the power loss.
Thank you Ashley. You have resolved my problem. Thanks alot. I am running almost 17 boxes right now and everything seems stable.
Great to hear, and no problems.
damn,
@frequenzy007 you should invite @AshleyUk for a dinner or a beer.
*correction
Yes, sure why not? I am gonna invite him but where are you taking us?
Don't be jealous , I'm a guy btw
i've corrected, lets go for a beer
Is that problem KVM running a swap file by some chance? I have seen this problem when people set up large swap files on their KVM guests to try get around memory limits. Just one more thing I don't like about KVM. Customers can't do that on OpenVZ.