help me understand if there is any problem with i/o

vanarp · June 2012

Let me say this upfront that I am new to working with VPS and most of my learning has been through this forum only. I am observing an interesting issue with my VPS which I wanted to clarity with you.

Any time I run the DD command on my VPS after a while (say an hour after previous run), its output shows something like this:

vanarp@vps:~$ dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync;rm test
16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 95.857 s, 11.2 MB/s

Now any immediate runs of same DD command (tested within five mins of above run) shows much improved speeds:

vanarp@vps:~$ dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync;rm test

16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 8.38861 s, 128 MB/s
vanarp@vps:~$ dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync;rm test
16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 11.3696 s, 94.4 MB/s
vanarp@vps:~$ dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync;rm test
16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 7.27529 s, 148 MB/s
vanarp@vps:~$ dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync;rm test
16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 5.73212 s, 187 MB/s
vanarp@vps:~$ dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync;rm test
16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 5.74627 s, 187 MB/s
vanarp@vps:~$ dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync;rm test
16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 9.25248 s, 116 MB/s

Please help me understand whether this is the expected behavior or I should suspect anything wrong with the disk i/o ?

If it helps, I am trying to use this as a web server with LAMP stack. I started paying attention to dd command output when I was observing slow response on my site (wordpress) once in a while.

yomero · June 2012

I recommend you to try ioping (google it) instead of stressing the server with lots of dd's.
Will give you a better idea of how stable is the i/o performance. But according to these results, sounds like some customers are running heavy cronjobs or p2p or a bad optimized db.

HalfEatenPie · June 2012

Correct me if I'm wrong but I think they're being cached (that's why you're getting improved speed right after one another).

Of course I'm in the same boat as you (learning through this forum and few other articles) so this is just my speculation.

taipres · June 2012

It's definitely caching, that's why the drastic speed up. Really though 11.2 MB/s cold is HORRIFIC, you really should open a ticket with your provider, they either have really bad hardware, or you have a bad neighbor who's pounding the disk.

prometeus · June 2012

The subseguent run are helped by some sort of caching. The first slow run is consistent after 15-20 minutes?
As far as you know you are on a busy node? There are some disks with aggressive energy saving features that some provider don't disable so disks that are not spinning require a slow start, but after then speed should improve until the next idle sleep...

Francisco · June 2012

Might be caching, but might be hourly crons too.

I know when hourly crons hit we see a spike of 800+ iops/sec on some of our nodes :S

Francisco

vanarp · June 2012

Thank you for such quick responses!

When you say it could be due to caching, is it good or bad ??

@yomero said: I recommend you to try ioping

Will run ioping in a while and share the results here.

@prometeus said: The first slow run is consistent after 15-20 minutes?

I just ran it again and here it is.

vanarp@vps:~$ dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync;rm test

16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 8.92858 s, 120 MB/s

May be I need to give it more time?

@prometeus said: There are some disks with aggressive energy saving features that some provider don't disable so disks that are not spinning require a slow start, but after then speed should improve until the next idle sleep...

This is what I have been suspecting. But, how to resolve this when host is not ready to acknowledge the issue? I am thinking if it needs may be run DD from CRON ;-)

vanarp · June 2012

@taipres said: you really should open a ticket with your provider, they either have really bad hardware, or you have a bad neighbor who's pounding the disk.

I want to be sure that there is actually a serious problem. If it is bad neighbor why do you think subsequent runs of DD do not exhibit the issue?

prometeus · June 2012

dd show only one face of the i/o, one that usually isn't used so often in real life computing :-)

however 10M/s are for sure a low result for than kind of test. What are ioping results?

vanarp · June 2012

Here are the results of ioping commands:

vanarp@vps:~$ ioping -c 10 .

4096 bytes from . (ext3 /dev/xvda1): request=1 time=0.3 ms

4096 bytes from . (ext3 /dev/xvda1): request=2 time=0.6 ms
4096 bytes from . (ext3 /dev/xvda1): request=3 time=0.3 ms
4096 bytes from . (ext3 /dev/xvda1): request=4 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=5 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=6 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=7 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=8 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=9 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=10 time=4.1 ms
--- . (ext3 /dev/xvda1) ioping statistics ---
10 requests completed in 9019.3 ms, 1316 iops, 5.1 mb/s
min/avg/max/mdev = 0.3/0.8/4.1/1.1 ms

vanarp@vps:~$ ioping -R .

--- . (ext3 /dev/xvda1) ioping statistics ---

5341 requests completed in 2990.3 ms, 3420 iops, 13.4 mb/s
min/avg/max/mdev = 0.1/0.3/259.8/3.7 ms

vanarp@vps:~$ ioping -RL .

--- . (ext3 /dev/xvda1) ioping statistics ---

3094 requests completed in 3000.1 ms, 1586 iops, 396.5 mb/s
min/avg/max/mdev = 0.4/0.6/30.5/0.7 ms

prometeus · June 2012

@vanarp said: ioping -c 10 .

make this a bit longer ( -c 30) than rerun it after a few hours.

also let us see the output of
vmstat 1 20

yomero · June 2012

Pretty good IMHO.

Maybe are the cronjobs, or some bad users

vanarp · June 2012

@prometeus said: make this a bit longer ( -c 30) than rerun it after a few hours.
also let us see the output of vmstat 1 20

sure, i will run the commands after a few hours and post the results.

@yomero said: Pretty good IMHO

I feel the same too. Only the slow speed after a break worries me and I want to be sure it is normal or not.

klikli · June 2012

Actually, is it ok to use /dev/urandom instead?

Francisco · June 2012

@klikli said: Actually, is it ok to use /dev/urandom instead?

no!

/dev/urandom has a very very small pool size.

Francisco

yomero · June 2012

@Francisco said: /dev/urandom has a very very small pool size.

I don't think this is the biggest problem. Edit: well, is related...

If you do it, your first bottleneck will be the CPU trying to get more random to write. You will barely get ~10MB/s or less.

vanarp · June 2012

After many hours (of idle time on vps) here are the latest test results. All commands were run one after the other.

vanarp@vps:~$ ioping -c 30 .

4096 bytes from . (ext3 /dev/xvda1): request=1 time=0.3 ms

4096 bytes from . (ext3 /dev/xvda1): request=2 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=3 time=0.3 ms
4096 bytes from . (ext3 /dev/xvda1): request=4 time=0.3 ms
4096 bytes from . (ext3 /dev/xvda1): request=5 time=0.3 ms
4096 bytes from . (ext3 /dev/xvda1): request=6 time=0.0 ms
4096 bytes from . (ext3 /dev/xvda1): request=7 time=0.5 ms
4096 bytes from . (ext3 /dev/xvda1): request=8 time=0.5 ms
4096 bytes from . (ext3 /dev/xvda1): request=9 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=10 time=0.5 ms
4096 bytes from . (ext3 /dev/xvda1): request=11 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=12 time=0.3 ms
4096 bytes from . (ext3 /dev/xvda1): request=13 time=7.0 ms
4096 bytes from . (ext3 /dev/xvda1): request=14 time=13.3 ms
4096 bytes from . (ext3 /dev/xvda1): request=15 time=0.6 ms
4096 bytes from . (ext3 /dev/xvda1): request=16 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=17 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=18 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=19 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=20 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=21 time=0.3 ms
4096 bytes from . (ext3 /dev/xvda1): request=22 time=0.6 ms
4096 bytes from . (ext3 /dev/xvda1): request=23 time=0.3 ms
4096 bytes from . (ext3 /dev/xvda1): request=24 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=25 time=8.9 ms
4096 bytes from . (ext3 /dev/xvda1): request=26 time=0.3 ms
4096 bytes from . (ext3 /dev/xvda1): request=27 time=0.5 ms
4096 bytes from . (ext3 /dev/xvda1): request=28 time=0.4 ms
4096 bytes from . (ext3 /dev/xvda1): request=29 time=0.3 ms
4096 bytes from . (ext3 /dev/xvda1): request=30 time=0.3 ms

--- . (ext3 /dev/xvda1) ioping statistics ---

30 requests completed in 29115.1 ms, 762 iops, 3.0 mb/s
min/avg/max/mdev = 0.0/1.3/13.3/2.9 ms

vanarp@vps:~$ vmstat 1 20

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 3504 253180 59208 94644 0 0 7 15 10 16 1 1 98 0
0 0 3504 253180 59208 94644 0 0 0 0 14 24 0 0 100 0
0 0 3504 253180 59208 94644 0 0 0 0 13 23 0 0 100 0
0 0 3504 253180 59208 94644 0 0 0 0 11 20 0 0 100 0
0 0 3504 253180 59216 94636 0 0 0 16 20 44 0 0 100 0
0 0 3504 253180 59216 94636 0 0 0 0 10 20 0 0 100 0
0 0 3504 253180 59216 94644 0 0 0 0 11 20 0 0 100 0
0 0 3504 253180 59216 94644 0 0 0 0 11 20 0 0 100 0
0 0 3504 253180 59216 94644 0 0 0 0 10 22 0 0 100 0
0 0 3504 253180 59216 94644 0 0 0 0 11 20 0 0 100 0
0 0 3504 253180 59216 94644 0 0 0 0 10 20 0 0 100 0
0 0 3504 253180 59216 94644 0 0 0 0 6 18 0 0 99 0
0 0 3504 253180 59216 94644 0 0 0 28 11 24 0 0 100 0
0 0 3504 253180 59216 94644 0 0 0 0 9 19 0 0 100 0
0 0 3504 253180 59216 94644 0 0 0 0 10 21 0 0 100 0
0 0 3504 253180 59216 94644 0 0 0 0 8 17 0 0 99 0
0 0 3504 253180 59216 94644 0 0 0 0 10 21 0 0 100 0
0 0 3504 253180 59216 94644 0 0 0 0 10 21 0 0 100 0
0 0 3504 253180 59216 94644 0 0 0 0 9 19 0 0 100 0
0 0 3504 253180 59216 94644 0 0 0 0 9 19 0 0 100 0

vanarp@vps:~$ dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync;rm test

16384+0 records in

16384+0 records out
1073741824 bytes (1.1 GB) copied, 150.739 s, 7.1 MB/s

vanarp@vps:~$ dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync;rm test

16384+0 records in

16384+0 records out
1073741824 bytes (1.1 GB) copied, 5.49874 s, 195 MB/s

prometeus · June 2012

latency is good, vmstat show an almost idle machine...

vanarp · June 2012

So, can it be concluded that the issue is due to one of the below?

Excessive caching of slow disks resulting in better performance of subsequent operations
Aggressive energy savings enabled for disks that they need wake-up call before they can perform up to the speed

EDIT: 3. There are very i/o intensive stuff run by the neighbors on the node

What would you do if you are in this situation?

FRCorey · June 2012

Ignore the DD command really ioping shows the real story that the disks are working mostly fine. I say mostly because the DD command shows that fetching cold data is a little slow. Could be anything, but I suspect that it's more likely that the server could be a RAID-1 vs a RAID-10, or and it can happen sometimes the SATA disk dropped out of SATA 3G and is running SATA 1.5G. That can cause all sorts of fun issues, but a raid array is only as fast as the slowest disk.

One reason I'm looking into CacheCade technology from LSI, it's a hybrid SSD solution without all the expense and greatly improves IO for customers.

yomero · June 2012

Another possibility... is a hard drive dying maybe...

vanarp · June 2012

@yomero said: Another possibility... is a hard drive dying maybe...

Ohh noooo...

i hope the host recognizes the issue before i act on it

yomero · June 2012

@vanarp said: i hope the host recognizes the issue before i act on it

Is just an idea.
Maybe is something else.
The problem is that you see the issue in production more than synthetic tests, slow loading and so.

Howdy, Stranger!

Categories

In this Discussion

help me understand if there is any problem with i/o

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

help me understand if there is any problem with i/o

Comments