Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Unable to find cause of high disk I/O
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Unable to find cause of high disk I/O

emmd19emmd19 Member
edited February 2017 in Help

I've been experiencing issues with high disk I/O on my KVM VPS recently, which has caused several excessive I/O alerts from my host (a well-known provider here on LEB/LET). In particular, there is always a constant disk read of about 7MB/s. The VPS is running Ubuntu 16.04 64-bit and has 1.5GB of RAM, of which about 2/3 is in use at any given time. Swap usage hovers between 25-100MB out of 1.2GB.

Here's the output from a single invocation of iostat, which, according to my understanding, represent running averages since boot:
Here's the I/O graph over the last 24 hours from my host's control panel:

The VPS is used for web/HTTP and PostgreSQL database. I initially suspected the Postgres database was causing issues, but I can't seem to find the source of this phantom read I/O. iotop does not provide any clues, and running iostat at 1s intervals show that disk I/O is minimal. Furthermore, vmstat shows that swap activity is minimal as well. I even considered the possibility that my host's I/O metering was buggy, but they replied that since I/O usage is read directly from the hypervisor it is impossible for their readings to be wrong.

I'm at my wit's end trying to figure this out. Does anyone have any ideas?

«1

Comments

  • I'd run lsof and start checking processes. There's far too much information which you just haven't given us. What sort of host is this running on? What version KVM? Etc, etc..

    There may very well be a nasty way VirtIO is being handled with your Postgres DBs. Check your systat and everything else.

  • emmd19emmd19 Member
    edited February 2017

    How would lsof help? Since I'm just a client I have no idea about the particulars about my hosts underlying KVM implementation, sorry :(

  • Crawl through your dmesg, look for "virtio" information; what's your ethernet device show as, etc..

    What's your build? What're you running for these services- are they stock distrib, custom, etc..

    To check and change things without taking down services, you can play around with ionice.

  • emmd19emmd19 Member
    edited February 2017

    lspci:
    00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device 00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device 00:05.0 SCSI storage controller: Red Hat, Inc Virtio block device 00:06.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon

    All packages are stock - nginx, postgres from Ubuntu repos, and some custom Python/Django projects running behind gunicorn. These are not resource-heavy at all, with the possible exception of frequent database accesses (although certainly not to the point of several MB/s sustained). I've just remounted / with noatime - will let you know how that goes...

  • Your iostat screenshot above shows I/O that seems to be consistent with the graph. When you say minimal I/O with iostat at 1s intervals, does it look like the screenshot above?

    Have you tried stopping Postgresql and measuring the I/O?

  • What does iotop show?

  • @xyz Nope, when running iostat at 1s intervals most of them look like this:

  • Your TPS value is high especially if the provider uses HDDs.

  • emmd19emmd19 Member
    edited February 2017

    @Ishaq When running iotop, the disk read is usually anywhere from 0 to a few hundred KB/s. There are rare moments when it spikes/bursts due to a Postgres SELECT running on a large table, however, these never last more than a second or two. Basically there's nothing I can see that would explain sustained disk I/O.

  • @Ishaq said:
    Your TPS value is high especially if the provider uses HDDs.

    I guess there's not much point in hiding my provider lol. It's LunaNode, and IIRC this is one of their SSD-cached plans out of OVH-BHS.

  • Try installing and running atop.

  • Will try that now. In the meantime, here's about 30 seconds worth of vmstat 1 in case that helps:

  • emmd19 said: @xyz Nope, when running iostat at 1s intervals most of them look like this:

    Could you run iostat 10, leave it for a minute, and screenshot all of the output?

  • atop -d -A specifically. I'm so used to systat that I had to look that up. :D

  • @Ishaq @WSS atop -d -A:

  • How strange.

  • Huh. What's throwing me for a loop is that it's showing 20% USER above, but then- nada, so we should be seeing something here.

    I'm just going to assume it's an ancient KVM on CentOS 6.

    I'm assuming you've tried changing priorities and nonsuch.

  • emmd19emmd19 Member
    edited February 2017

    @xyz iostat 10 for 1 minute (1st entry is the average since boot):

  • @WSS My CPU load average is around 20-30% - is that what USER% means?

  • xyzxyz Member
    edited February 2017

    CPU usage is user+system (+nice if you have any nice'd processes).

    From that iostat, it looks like to me that your I/O is usually low, but you get spikes (like that 2047 tps reading), which causes the average to be what it is. The graph is likely averaging over a long period of time, and the first iostat reading shows an average which seems to be in-line if you average all your other iostat readings.

    Thanked by 1WSS
  • @xyz said:
    CPU usage is user+system (+nice if you have any nice'd processes).

    From that iostat, it looks like to me that your I/O is usually low, but you get spikes (like that 2047 tps reading), which causes the average to be what it is. The graph is likely averaging over a long period of time, and the first iostat reading shows an average which seems to be in-line if you average all your other iostat readings.

    The problem with this is the generated graph, because it looks pretty consistent, I guess we'd need to have better sample data from what the host is running- and again- would be a lot more useful from the host perspective than the QEMU. The end result is that we're all left wondering. :)

  • @xyz said:
    CPU usage is user+system.

    From that iostat, it looks like to me that your I/O is usually low, but you get spikes (like that 2047 tps reading), which causes the average to be what it is. The graph is likely averaging over a long period of time, and the first iostat reading shows an average which seems to be in-line if you average all your other iostat readings.

    Hmm...that makes sense I suppose. I guess the short-term is to upgrade to beefier hardware...

  • @emmd19 said:
    @WSS My CPU load average is around 20-30% - is that what USER% means?

    This little article should help you understand what the different numbers actually mean: http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages

  • So going off the conclusion that I need to upgrade my infrastructure - what do you guys recommend? Is this level of resource/disk utilization still within the realm of VPS, or is dedicated the way to go?

  • I'd recommend you start benchmarking/setting up accounting to see if you can actually find what's going on, first. If you're having quick little stabs that even out to 7MB/s, you might just have to work on your queries and/or change the design. We're all flying blind here- you might try asking your host to show you your allocated system use as well as running accounting under your processes.

  • Your iostat shows up to 25 MB/s read. At the same time it shows that the amount of data read per minute is about the amount one would expect per 10s. Ergo you have something that reads a lot in spikes.

    You will need to watch with finer granularity and find out who is reading cyclically from vda.

    Also show your mounts and tell us about your swap.

  • @bsdguy said:
    Also show your mounts and tell us about your swap.

    Filesystem wouldn't hurt, either. You brazen hussy.

  • Can't you ever think about anything else, you slut? How perfect!

    File system? Don't care yet. 25 MB/s smells strongly like cache. A propos smelling: That whole thing smells.

  • Being that we don't have any host specs, I still wonder if we're getting a combination of random select hits and just an overall shitty driver base since it's all virtio.

    I know you'll do such crazy things for 25MB/s.. even if your sisters' eyebrow entrances me so.

  • emmd19emmd19 Member
    edited February 2017

    All right guys, get your mind out of the gutter :P Filesystems are nothing special, just a single 15GB / formatted as ext4 with plenty of free space:
    Filesystem Size Used Avail Use% Mounted on udev 744M 12K 744M 1% /dev tmpfs 150M 1.3M 149M 1% /run /dev/vda1 16G 9.1G 6.1G 60% / none 4.0K 0 4.0K 0% /sys/fs/cgroup none 5.0M 0 5.0M 0% /run/lock none 749M 0 749M 0% /run/shm none 100M 0 100M 0% /run/user

    Swap consists of a ~256MB swap on /dev/vdb1 and an additional 1GB swapfile mounted on /dev/vda1.

    Swap utilization is modest and currently at 218/1061MB with minimal swap activity.

Sign In or Register to comment.