Re: Determining if there is an IO bottleneck
- From: Jon Metzger <blagh@xxxxxxxxxxxx>
- Date: Tue, 05 Dec 2006 09:19:08 -0600
c0mput3rb0y@xxxxxxxxxxx wrote:
I'm covering my bases and I thought I'd elicit some help.
I'm attempting to determine if I have a disk IO performance bottleneck
on our critical backend Oracle DBs at peak usage.
The metrics I value most generally look good (await in particular), but
I thought I'd run my situation by others to see if I'm missing
something.
Here's an overview of our configuration:
server HW:
4-proc/single-core 2.6Ghz Opteron CPUs
32GB memory (+24GB swap partition)
Gig NIC
2 QLogic qla2342 2Gb PCI-X fibre HBAs
dual-path'd to storage (1 connection on each HBA)
Paths to LUNs managed via VxDMP
storage HW:
EMC Clariion cx500
42 36GB 15krpm drives
LUNS:
LUN 0: 1x1 HW Raid 1 (redolog "A" members)
LUN 1: 1x1 HW Raid 1 (redolog "B" members)
LUN 2: 3x3 HW Raid 10 (oradata, undo, indexes, etc)
LUN 3: (same as LUN 2)
LUN 4: (same as LUN 2)
LUN 5: (same as LUN 2)
memory:
Read cache: each SP has 128MB
Write cache: 1344MB
Note: LUNs 2-5 is a 4-column SW stripe via VxVM(+VxFS)
server OS & SW:
OS: RHEL4 update 3 AS SMP x86_64 2.6.9-34
Veritas Storage Foundation 4.1 MP2 Standard
VxVM, VxFS and VxDMP are all used
Oracle 10g release 2
Here's some output from "iostat -dkx 10" around our peak usage:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 8.11 0.00 1.50 0.00 76.88 0.00 38.44
51.20 0.02 10.73 9.20 1.38
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdc 0.00 21.22 88.89 771.47 1425.43 15058.96 712.71
7529.48 19.16 2.02 2.35 0.85 72.82
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.50 2.80 36.74 22.42 2387.49 11.21 1193.74
60.95 0.08 1.93 1.43 5.64
sdg 0.00 21.02 85.79 770.17 1393.39 14810.71 696.70
7405.36 18.93 1.81 2.06 0.83 70.66
sdh 0.00 42.94 85.29 804.40 1373.37 15128.03 686.69
7564.01 18.55 1.07 1.20 0.70 62.26
sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdj 0.00 21.12 89.39 777.68 1430.23 14898.80 715.12
7449.40 18.83 1.10 1.26 0.70 60.53
sdk 13.81 0.50 11.01 36.84 15191.19 2388.29 7595.60 1194.14
367.40 0.15 3.19 2.07 9.90
sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 12.30 0.00 1.90 0.00 113.60 0.00 56.80
59.79 0.03 17.68 5.84 1.11
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdc 0.00 5.50 205.30 48.00 3296.00 2381.70 1648.00 1190.85
22.41 2.93 11.55 3.36 85.05
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 19.60 160.50 156.80 5004.70 78.40 2502.35
28.66 0.19 1.08 0.88 15.87
sdg 0.00 5.30 198.40 42.20 3160.80 2270.40 1580.40 1135.20
22.57 2.74 11.58 3.48 83.63
sdh 0.00 6.10 203.30 58.90 3250.40 2382.30 1625.20 1191.15
21.48 4.17 15.92 3.68 96.43
sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdj 0.00 9.90 203.10 47.00 3252.80 2318.60 1626.40 1159.30
22.28 4.15 16.60 3.83 95.80
sdk 3.60 0.00 21.80 160.60 4048.80 5004.70 2024.40 2502.35
49.64 0.21 1.17 0.94 17.13
sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 9.30 0.30 6.50 5.60 126.40 2.80 63.20
19.41 0.49 72.60 3.94 2.68
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 138.80 262.90 2228.00 2366.30 1114.00
1183.15 11.44 1.19 2.98 1.69 67.86
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 46.90 325.00 375.20 9195.10 187.60 4597.55
25.73 0.31 0.82 0.67 24.83
sdg 0.00 0.00 133.30 250.90 2136.80 2308.80 1068.40
1154.40 11.57 1.11 2.88 1.73 66.55
sdh 0.00 0.00 141.10 249.10 2269.60 2265.60 1134.80
1132.80 11.62 1.46 3.72 2.00 77.95
sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdj 0.00 0.00 131.50 251.10 2106.40 2278.20 1053.20
1139.10 11.46 1.35 3.54 1.96 74.93
sdk 0.00 0.00 45.70 325.10 365.60 9202.40 182.80 4601.20
25.80 0.30 0.82 0.67 24.68
sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdm 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
And here's some output from "sar -u":
02:48:01 AM all 5.45 0.00 3.52 7.50 83.53
02:49:01 AM all 8.25 0.00 3.97 7.15 80.62
02:50:01 AM all 6.23 0.00 3.61 7.73 82.43
02:51:01 AM all 9.89 0.00 4.51 8.30 77.29
02:52:01 AM all 6.29 0.00 3.87 6.23 83.61
02:53:01 AM all 6.17 0.00 3.69 6.12 84.02
02:54:01 AM all 9.10 0.00 4.23 7.39 79.28
02:55:01 AM all 7.11 0.00 3.91 6.56 82.43
02:56:01 AM all 6.99 0.00 4.14 7.41 81.46
02:57:01 AM all 9.48 0.00 4.28 7.65 78.59
02:58:01 AM all 6.11 0.00 3.68 8.39 81.82
02:59:01 AM all 6.19 0.00 3.75 8.50 81.56
03:00:01 AM all 9.20 0.00 4.26 10.50 76.04
03:01:01 AM all 7.58 0.00 4.93 19.65 67.84
03:02:01 AM all 10.25 0.00 4.94 21.49 63.32
[snip]
03:45:01 AM all 5.29 0.00 3.40 17.50 73.82
03:46:01 AM all 5.71 0.00 4.21 17.31 72.77
03:47:01 AM all 9.19 0.00 4.71 14.38 71.72
03:48:01 AM all 6.08 0.00 4.27 14.08 75.57
03:49:01 AM all 9.39 0.00 4.56 12.81 73.24
03:50:01 AM all 6.86 0.00 4.27 15.06 73.80
03:51:01 AM all 10.23 0.00 5.16 16.18 68.42
03:52:01 AM all 6.49 0.00 4.44 19.34 69.72
03:53:01 AM all 9.17 0.00 4.69 17.56 68.59
03:54:01 AM all 5.49 0.00 3.87 17.73 72.91
03:55:01 AM all 6.70 0.00 4.14 15.76 73.39
03:56:01 AM all 7.70 0.00 4.89 18.55 68.85
03:57:01 AM all 7.69 0.00 3.58 12.39 76.34
03:58:01 AM all 4.60 0.00 3.31 14.15 77.94
03:59:01 AM all 8.27 0.00 3.66 13.77 74.30
04:00:01 AM all 5.45 0.00 3.52 15.22 75.81
04:01:01 AM all 5.97 0.00 4.19 22.12 67.73
04:02:01 AM all 8.89 0.00 5.32 44.16 41.63
04:03:01 AM all 6.23 0.00 5.06 35.89 52.81
04:04:01 AM all 10.34 0.00 6.04 44.23 39.40
04:05:01 AM all 7.39 0.00 5.74 45.91 40.96
04:06:01 AM all 7.76 0.00 6.33 44.66 41.25
04:07:01 AM all 9.81 0.00 6.03 43.88 40.28
[snip]
04:56:01 AM all 26.98 0.00 10.18 35.82 27.01
04:57:01 AM all 11.35 0.00 9.26 44.18 35.22
04:58:01 AM all 11.66 0.00 9.48 43.54 35.32
04:59:01 AM all 11.74 0.00 9.31 44.51 34.44
05:00:01 AM all 12.16 0.00 8.78 35.32 43.74
05:01:01 AM all 11.92 0.00 8.65 42.03 37.40
05:02:01 AM all 10.30 0.00 8.17 43.66 37.87
05:03:01 AM all 8.35 0.00 7.45 46.75 37.45
05:04:01 AM all 9.74 0.00 7.73 45.83 36.69
05:05:01 AM all 13.17 0.00 8.83 41.52 36.47
05:06:01 AM all 13.25 0.00 9.37 42.03 35.36
05:07:01 AM all 35.41 0.00 11.48 33.61 19.50
05:08:01 AM all 12.57 0.00 9.53 44.04 33.85
05:09:01 AM all 12.57 0.00 10.15 43.37 33.90
05:10:01 AM all 14.01 0.00 10.39 42.13 33.48
05:11:01 AM all 13.81 0.00 10.33 43.85 32.00
05:12:01 AM all 13.04 0.00 8.98 43.36 34.61
05:13:01 AM all 13.57 0.00 10.02 42.66 33.75
05:14:01 AM all 13.23 0.00 9.89 42.18 34.71
05:15:01 AM all 14.08 0.00 10.00 43.23 32.68
05:16:01 AM all 12.70 0.00 9.63 46.63 31.04
[snip]
And here's a snapshot of the output of "top":
# top
top - 10:14:37 up 51 days, 13:04, 3 users, load average: 14.47,
16.29, 17.59
Tasks: 826 total, 4 running, 820 sleeping, 0 stopped, 2 zombie
Cpu(s): 3.1% us, 7.4% sy, 0.0% ni, 55.0% id, 33.0% wa, 0.2% hi,
1.2% si
Mem: 32858220k total, 32564576k used, 293644k free, 5108k
buffers
Swap: 25165812k total, 208k used, 25165604k free, 18913784k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7843 oracle 16 0 9550m 9.2g 9.2g R 5.6 29.3 958:52.02 oracle
7841 oracle 15 0 9534m 9.2g 9.2g D 2.6 29.3 185:03.37 oracle
7827 oracle 15 0 9536m 9.2g 9.2g D 2.3 29.3 192:28.29 oracle
7835 oracle 15 0 9536m 9.2g 9.2g D 2.3 29.3 185:23.92 oracle
7831 oracle 15 0 9534m 9.2g 9.2g D 2.0 29.3 167:24.50 oracle
7839 oracle 15 0 9534m 9.2g 9.2g D 2.0 29.3 167:32.48 oracle
7829 oracle 15 0 9536m 9.2g 9.2g S 1.0 29.3 170:23.43 oracle
7833 oracle 15 0 9534m 9.2g 9.2g S 1.0 29.3 184:57.21 oracle
17367 root 17 0 6820 1584 756 R 1.0 0.0 0:00.22 top
13371 oracle 15 0 9528m 9.2g 9.2g S 0.7 29.3 0:00.24 oracle
14178 oracle 15 0 9528m 9.2g 9.2g S 0.7 29.3 0:00.14 oracle
14783 oracle 15 0 9528m 9.2g 9.2g S 0.7 29.3 0:00.12 oracle
17602 oracle 16 0 9527m 9.2g 9.2g S 0.7 29.3 0:00.02 oracle
17604 oracle 16 0 9527m 9.2g 9.2g S 0.7 29.3 0:00.02 oracle
17606 oracle 16 0 9527m 9.2g 9.2g S 0.7 29.3 0:00.02 oracle
17608 oracle 16 0 9527m 9.2g 9.2g S 0.7 29.3 0:00.02 oracle
17610 oracle 16 0 9527m 9.2g 9.2g S 0.7 29.3 0:00.02 oracle
I generally give IOwait less importance than "await" and "svctm" when
managing/monitoring storage capacity and performance: high IO wait only
means a process/thread is waiting for IO while at least 1 CPU is idle.
And on all of the boxes with this HW/SW configuration, I've never seen
"await" go above 20ms (though we don't graph the output of iostat, so
it's possible I'm missing those occurances).
But unless I'm missing something, IOwait/%util is the only metric I've
found that looks odd to me, so I've put it on the radar as a possible
symptom to an IO bottleneck.
Are there any thoughts out there as to whether we have a diskIO
bottleneck somewhere? Are there any suggestions on what else I might
look at?
Thanks in advance for any advice.
Do you have Navisphere Analyzer on your CX500? That'd probably be the fastest way to determine if your disks are getting hammered and can't keep up.
.
- Prev by Date: Re: Determining if there is an IO bottleneck
- Next by Date: Legato: Not a registerd client?
- Previous by thread: Re: Determining if there is an IO bottleneck
- Next by thread: Re: Determining if there is an IO bottleneck
- Index(es):