Re: Point at which WAN latency affects perceived app response?
- From: "Charles Hooper" <hooperc2000@xxxxxxxxx>
- Date: 14 Sep 2006 17:24:58 -0700
sphealey wrote:
Does anyone have any experience with the effect of WAN latency on
applications accessing an Oracle database over a WAN?
We are using Oracle Server 9ir2 (9.2.0.7.0) as our back end database
for a packaged 2-tier client/server business application. The
development tool and network APIs for the application are fairly
obscure, but they must touch on SQL*Net eventually as the Oracle client
is required.
We have two sites. Each has an Oracle database for the portion of the
app that serves that site's business function. But of course, people
from A need to access the primary application for B, and people from B
need to access the primary application for A.
To this point we have served the cross-site requirements by installing
both sets of software at both sites, and simply accessing the Oracle
data directly across the WAN. Brutal and perhaps not elegent, but
simple and straightforward (particularly compared to an experiment with
Citrix).
We are now in the process of replacing our T1 leased line between the
sites with an MPLS-based network. Each primary site A and B has a 3
Mbps MPLS port. There are a few other minor sites that connect, none
that run the business apps.
Bandwidth on the MPLS connection is good: transferring large files of
random numbers I see close to full port thruput between the sites.
E-mail, web, etc work fine.
But when we attempted to bring up the Oracle-based apps the results
were not so good. Perceived performance got worse by a factor of 4,
and stopwatch timings confirm that the new configuration is slower by
at least a factor of 3.
Looking at ping times, I find the following for the two different
connections:
Packet size Leased Line MPLS
=========== =========== =========
256 6 ms 48 ms
512 11 ms 55 ms
1024 17 ms 69 ms
2048 26 ms 84 ms
Now, clearly less latency is better than more latency. Still, 60 ms
latency between sites is considered quite good on most WANs of any size
and I would think most Oracle users would be in that range or higher.
Would 60 ms be sufficient to cause the application to appear to the end
users as noticably slow?
I have been trying to watch the connection with Ethereal, and one thing
I noticed was that there seem to be a lot of fragemented packets in the
100-300 byte range. Other tools confirm that the non-fragmented MTU
for the link is 1500 bytes (same as the leased line). Does it make
sense that either the server or the router would break larger packets
up that small?
My knowledge in this area is like the Platte River (wide and shallow),
so any input would be appreciated.
sPh
I suspect that this client/server application is designed as a database
platform independent application that supports a handful of different
database engines, rather than just Oracle, with the authors of the
application having never read chapter 1 of either of Tom Kyte's
books. The client/server application is likely shooting for the least
common denominator of the supported database platforms - Ex: DB
platform 1 does not support inline views, DB platform 2 does not
support CONNECT BY PRIOR, DB platform 3 has a problem if the
application does NOT try to commit 25,000 times per second, DB platform
4 requires a slightly different SQL syntax for table joins. Instead of
one SQL statement to perform an operation, which would incur one unit
of latency over the WAN link, the client/server application uses 100
SQL statements, or feeds the bind variables 100 times, which then
incurs 100 units of latency over the WAN link rather than one.
For example, consider the following SQL statement:
SELECT
O.WORKORDER_SUB_ID,
O.SEQUENCE_NO,
O.RESOURCE_ID,
DECODE(O.WORKORDER_TYPE,'M','M','Q','Q',O.STATUS) STATUS,
O.SCHED_FINISH_DATE SCHED_FINISH_DATE,
O.SCHED_START_DATE SCHED_START_DATE,
O.USER_7 MRP_START_DATE,
O.SETUP_HRS,
O.RUN,
O.RUN_TYPE,
O.RUN_HRS,
O.MOVE_HRS,
O.MINIMUM_MOVE_QTY,
O.CALC_END_QTY,
O.COMPLETED_QTY,
NVL(OPR.CONCURRENT,0) CONCURRENT,
OB.BITS BITS
FROM
OPERATION O,
OPERATION_BINARY OB,
(SELECT
OPR.WORKORDER_TYPE,
OPR.WORKORDER_BASE_ID,
OPR.WORKORDER_LOT_ID,
OPR.WORKORDER_SPLIT_ID,
OPR.WORKORDER_SUB_ID,
OPR.SEQUENCE_NO,
COUNT(OPR.RESOURCE_ID) CONCURRENT
FROM
OPERATION_RESOURCE OPR
WHERE
OPR.WORKORDER_TYPE= :1
AND OPR.WORKORDER_BASE_ID= :2
AND OPR.WORKORDER_LOT_ID= :3
AND OPR.WORKORDER_SPLIT_ID= :4
GROUP BY
OPR.WORKORDER_TYPE,
OPR.WORKORDER_BASE_ID,
OPR.WORKORDER_LOT_ID,
OPR.WORKORDER_SPLIT_ID,
OPR.WORKORDER_SUB_ID,
OPR.SEQUENCE_NO) OPR
WHERE
O.WORKORDER_TYPE= :5
AND O.WORKORDER_BASE_ID= :6
AND O.WORKORDER_LOT_ID= :7
AND O.WORKORDER_SPLIT_ID= :8
AND O.WORKORDER_TYPE=OB.WORKORDER_TYPE(+)
AND O.WORKORDER_BASE_ID=OB.WORKORDER_BASE_ID(+)
AND O.WORKORDER_LOT_ID=OB.WORKORDER_LOT_ID(+)
AND O.WORKORDER_SPLIT_ID=OB.WORKORDER_SPLIT_ID(+)
AND O.WORKORDER_SUB_ID=OB.WORKORDER_SUB_ID(+)
AND O.SEQUENCE_NO=OB.SEQUENCE_NO(+)
AND O.WORKORDER_TYPE=OPR.WORKORDER_TYPE(+)
AND O.WORKORDER_BASE_ID=OPR.WORKORDER_BASE_ID(+)
AND O.WORKORDER_LOT_ID=OPR.WORKORDER_LOT_ID(+)
AND O.WORKORDER_SPLIT_ID=OPR.WORKORDER_SPLIT_ID(+)
AND O.WORKORDER_SUB_ID=OPR.WORKORDER_SUB_ID(+)
AND O.SEQUENCE_NO=OPR.SEQUENCE_NO(+)
ORDER BY
O.WORKORDER_SUB_ID,
O.SEQUENCE_NO;
This SQL statement will only work on one database platform: Oracle.
How many individual, database independent SQL statements would the
client send to the database server to perform the same task? The logic
of the above SQL statement may be further wrapped in another procedure
that steps through the parent record of the "OPERATION", so this may
further add to the round trips between the server and client.
Another possible problem area is that the syntax of the SQL statement
may need to be slightly different from one database platform to
another. Oracle requires NVL, DB platform 2 requires @NVL, DB platform
3 requires ISNULL. Depending on where the translation library is
located, and whether or not the client locally caches the translation
library contents after each read, the communication between the client
side of the WAN link and the database side of the WAN link may incur an
additional unit of latency for each SQL statement to be sent to the
database server so that the translation library may be read.
Nagle and delayed ACKs: there is a document floating around the
Internet that appears to be authored by Microsoft, which suggests as a
performance tuning method to increase a value in the Windows registry
from 2 to 13, which controls the number of packets to be sent/received
before expecting to send/receive an ACK for the previous packets. If
both computers are not set the same, delays of roughly 0.2 seconds
(plus the latency) will be incurred while the receiving side waits for
packets 3 through 13 to arrive. When the packets fail to arrive in the
0.2 second timeout period, the client sends an ACK for packets 1 and 2.
How severe is this? It can cause a 134MB file, which normally
transfers in roughly 3 seconds over a gigabit connection, to require 45
minutes for the transfer to complete.
I am sure that the more experienced database experts who frequent this
group will be able to offer additional details.
Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc.
.
- Follow-Ups:
- References:
- Point at which WAN latency affects perceived app response?
- From: sphealey
- Point at which WAN latency affects perceived app response?
- Prev by Date: Re: multimaster replication
- Next by Date: Re: Group resultset day by day
- Previous by thread: Re: Point at which WAN latency affects perceived app response?
- Next by thread: Re: Point at which WAN latency affects perceived app response?
- Index(es):
Relevant Pages
|