Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Unable to find cause of high disk I/O - Page 2
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Unable to find cause of high disk I/O

2»

Comments

  • WSS said: The problem with this is the generated graph, because it looks pretty consistent

    The graph looks like it takes 5-6 samples per hour, so it's likely averaging every ~10 minutes. If the load is consistent for those periods of times, you're not going to see spikes that you see with 10s granularity.

    emmd19 said: I guess the short-term is to upgrade to beefier hardware...

    As others mentioned, see if you can identify what's causing the spikes. Are these perhaps the large SELECTs you're referring to?

    You can also just stop PostgreSQL temporarily and do a similar iostat test to see if the read spikes disappear.

    Disk is usually the bane of most server workloads (unless you're on SSDs) - if you do look at upgrading, perhaps consider SSD storage or more RAM (to cache reads more, assuming that they're cacheable).

  • Local-file-as-caches. That'll do. Throw more RAM at it.

  • emmd19emmd19 Member
    edited February 2017

    I've done a brief review of the application code that's running (it powers a website that gets about 200K-400K hits a month). Unfortunately there's not much optimization that can be done in terms of DB calls; the app makes frequent calls to an external API for each incoming HTTP request from a user, and the response of each call results in accesses to many individual rows. Hence it's an inherently read-heavy (and to a lesser extent write-heavy) app.

  • Throw RAM at it and let it become someone else's eventual problem. :D

    Thanked by 1netomx
  • @xyz said:

    WSS said: The problem with this is the generated graph, because it looks pretty consistent

    The graph looks like it takes 5-6 samples per hour, so it's likely averaging every ~10 minutes. If the load is consistent for those periods of times, you're not going to see spikes that you see with 10s granularity.

    emmd19 said: I guess the short-term is to upgrade to beefier hardware...

    As others mentioned, see if you can identify what's causing the spikes. Are these perhaps the large SELECTs you're referring to?

    You can also just stop PostgreSQL temporarily and do a similar iostat test to see if the read spikes disappear.

    Disk is usually the bane of most server workloads (unless you're on SSDs) - if you do look at upgrading, perhaps consider SSD storage or more RAM (to cache reads more, assuming that they're cacheable).

    I can't stop PostgreSQL since it is somewhat mission-critical right now, but I can clone everything to a new VM and let it sit for a while (where it won't be dealing with any workload). I'll let you know how that goes.

  • bsdguybsdguy Member
    edited February 2017

    Your web server (and hence db) isn't guilty I think. The hits don't explain that.
    Before we go on, turn off your swap at least on vda. Having a cache of about 1/3rd of total mem and having a swap is not needed anyway.

    And show us an lsof output cleaned of standard shit.

    Thanked by 1WSS
  • emmd19 said: I've done a brief review of the application code that's running (it powers a website that gets about 200K-400K hits a month). Unfortunately there's not much optimization that can be done in terms of DB calls; the app makes frequent calls to an external API for each incoming HTTP request from a user, and the response of each call results in accesses to many individual rows. Hence it's an inherently read-heavy (and to a lesser extent write-heavy) app.

    10k reqs/day is around one every ~9 seconds. Do you get clumps of requests together often? The graphs show fairly steady load throughout the day (with a few peaks here and there), but you're seeing very spiky load during smaller windows. It's hard to guess, but if you have consistent load, you shouldn't be getting those big read spikes - you'd expect it to be spread out more.

    In other words, your load may be coming from another source.
    Then again, I don't know your application...

  • A very simple means to track this would be to do a ps/systat(atop)/etc from a script to read through and see just what's going on. At least with a basic ext4 system it isn't going to be a silly journal issue or anything like that. Basically, it just needs to be babysat and watched.

  • A very simple means to track this would be to do a ps/systat(atop)/etc from a script to read through and see just what's going on. At least with a basic ext4 system it isn't going to be a silly journal issue or anything like that. Basically, it just needs to be babysat and watched.

  • eva2000eva2000 Veteran
    edited February 2017

    emmd19 said: I've been experiencing issues with high disk I/O on my KVM VPS recently, which has caused several excessive I/O alerts from my host (a well-known provider here on LEB/LET). In particular, there is always a constant disk read of about 7MB/s. The VPS is running Ubuntu 16.04 64-bit and has 1.5GB of RAM, of which about 2/3 is in use at any given time. Swap usage hovers between 25-100MB out of 1.2GB.

    if you have sysstat and sar installed and configured, you'd have historic records for close to exact date and time (10 min interval based entries by default) so you can go back to that exact date/time to review sar -d output to at least see which block device it's coming from with help of lsblk output matching

    last 24hrs for centos not sure for Ubuntu where you sar logs are

    sar -d -f /var/log/sa/sa$(date +%d -d yesterday)

  • My guess (entirely based on the limited context and totally based on previous hunches) is that it's the DB. There are probably tables that could do with an index update and/or an overall statistics update (and related general DB maintenance). When was the last time you explicitly ran some PG maintenace scripts?

    Easy way to test (if possible) - shutdown the DB for a bit and see if the IO graph drops (consistently). Of course it is a HeisenTest(tm) but it's probably worth it. Needless to say I assume you're not going to have some logging daemon spew lots of messages, promptly increasing the IO back to where it was.

    Thanked by 1Falzo
  • rincewindrincewind Member
    edited February 2017

    Install the blktrace package, and run

    btrace /dev/vda1
    

    for an online trace.

    You can also use blktrace to record logs and parse them later with blkparse.

  • emmd19emmd19 Member
    edited February 2017

    @nullnothere Yep, shutting off Postgres makes disk I/O drop like a rock. What sort of maintenance do you recommend? I just finished reindexing the entire database, and so far I've just been letting autovacuum do its thing.

  • @emmd19 said: Yep, shutting off Postgres makes disk I/O drop like a rock. What sort of maintenance do you recommend?

    I'm no PG expert (hopefully some other PG Guru's here can help) but if you're doing all the usual good things (vacuum, analyze, reindex) then I think you're OK. Check that they are happening frequently enough (or that they're happening correctly). You may need to increase autovacuum frequency as well.

    If despite these maintenance jobs (etc.), you're having high IO then you'll have to start looking at the queries (figure out the slow ones etc.). Perhaps somewhere you're doing a table scan (missing an index). The nice thing is that it doesn't appear to be a CPU hog (so it's not some sort of a stupid loop join etc. that the query planner is picking for whatever reason).

    You may have "grown" (DB wise) to a point where now you need more memory for optimum performance etc.

    Hope this gets you going (you're now entering DBA land...)

  • Turns out that I had some relatively DB-intensive cron jobs that were originally configured to run every minute. This was fine when the DB was small but not so much now that the DB is much larger (~2GB). I've set them to run less frequently and that seems to have done the trick for now!

    Thanked by 1Falzo
  • @nullnothere I did some more digging and found a query that is essentially a table scan (filtering on an unindexed boolean). Added a partial index on that field and so far so good.

    Thanked by 1nullnothere
  • As you are at it you might want to considerably increase index memory for PG rather than wasting it on disk caching.

  • @bsdguy said:
    As you are at it you might want to considerably increase index memory for PG rather than wasting it on disk caching.

    WSS said: ... Throw more RAM at it.

    WSS said: Throw RAM at it ...

    ;-) ;-) ;-)

    Thanked by 1nullnothere
  • @Ishaq said:
    What does iotop show?

    Thanks for the advice.

  • AnthonySmithAnthonySmith Member, Patron Provider

    I used to have a script somewhere for measuring the actual IO Disk R+W requests p/second(s) on a per process basis, this thread reminds me I should probably share that.

    Thanked by 1aboanas93
  • @AnthonySmith please share it if you do find it...

Sign In or Register to comment.