Re: TCP Window Size
- From: Trendkill <jpmason@xxxxxxxxx>
- Date: Fri, 01 Jun 2007 08:12:00 -0700
On Jun 1, 10:59 am, Cheema <atif_che...@xxxxxxxxx> wrote:
On Jun 1, 6:22 pm, Trendkill <jpma...@xxxxxxxxx> wrote:
On Jun 1, 7:59 am, Cheema <atif_che...@xxxxxxxxx> wrote:
On May 31, 9:57 pm, "Thrill5" <nos...@xxxxxxxxxxxxx> wrote:
Thin clients do not send large amounts of data between the thin client and
terminal server, so window size wouldn't be the problem. My bet is that the
problem is the Terminal Server. 200 clients on a single Terminal Server is
a lot even for non-database type applications. Are you also monitoring the
performance of the TS? You need to monitor memory usage, CPU, disk I/O and
network I/O, active clients, etc. Even if this is a big multi-cpu TS, you
probably have a some type of I/O bottleneck on the server.
Scott"Trendkill" <jpma...@xxxxxxxxx> wrote in message
On May 31, 8:45 am, Cheema <atif_che...@xxxxxxxxx> wrote:
I would like to share my experience. We have a data base application
with record of 30 million people.
PROBLEM : Slow application access from time to time
Server : data base application
WAN : 12Mbps of clear pipe end to end WAN links
client : A Thin and Terminal Server serving 200 thin/terminal server
Util% : WAN links are max 60% loaded
RTT : 20-120 msec with average RTT of 58msec
Server TCP Window Size : 24kByte
Client TCP Window Size : 65kByte
I have strong understanding that this is due to the SENDER and
RECIEVER capacity mismatch
Kindly advise on this situation
What TCP window size should be used ?
Should it be changed on both ends ?
can FAST TCP be applied in this scenario ?
Waiting for your valuable answer
Just from my experience, I have a hard time blaming TCP windowing.
How many concurrent users? Is this real time? How do the queries
look? Are they efficient? What kind of bandwidth per user or per
transaction, and how many users/transactions at any given time?
12Mbps is not that fast, but you need to provide context of whether or
not 12Mbps is enough. Could be anything from server being busy with
backups or some kind of schedule, to WAN pipe utilization going over
80% which would start to impact latency, to service provider, to
anything. Have you used MRTG or Netflow to gauge bandwidth
utilization at these times? How about latency? Do you have a
baseline of these usages and performance during 'good performance'
times? Do you have QoS? Could someone be running a FTP and killing
I won't say that packet/frame sizes are NOT the issue, but I just hate
to look at fundamental networking architecture when there are WAY too
many other variables that are more likely. Not to mention, window
sizes fluctuate, and if this is small telnet or shell based
application, they will most likely never get to full size.
Thanks a lot for your enlightening response. I would like to further
elaborate on my query, may be your can help me more
Age of Problem : 1.5 years
Working on Problem : whenever problem comes, it comes and goes, many
teams are involved here
Server type : IIOP Database application
Client type : some desktop PCs connect directly, but most of them
connect via thin servers which are also acting as TERMINAL Servers and
also act as clients to the IIOP Database application server
Client means : A terminal server which is also a Thin Server to 100s
of thin clients
I would like to clerify here is that we are talking about the BULK TCP
Transfer between THIN SERVERS/TS (also acting as clients to the IIOP
Database application) and the IIOP application server.
the reason of putting this post here is that it is always the NETWORK
which is blamed first for slow application and we use all cisco
networking devices. Multiple parallel WAN links are being load
balancing using IP LOAD-SHARING PER-PACKET with IP CEF.
EXPERIENCE : I have a hard time blaming TCP windowing !!!
can you put some more light on it
Q : How many concurrent users?
A : There are 6 THIN/TS Server, the concerned team has divided that at
a time there are max 30 users logged in.So 30x6= 180 users
Q : Is this real time?
A : Yes, it is real time transaction based use
Q : How do the queries look?
A : a number is put and query is sought against it. At front a java
based interface is opened, java jar compressed classes are being
downloaded from IIOP down to TS/Thin servers.
Q : Are they efficient?
A : How to find that out ? I believe that a certain transaction for 30
days takes about 12MB of data to be transferred which included screen/
graphics updates along with the real data but a transaction for a day
or two should take 1 MB or less. I believe this data is very WAN UN-
FRIENDLY but question is how to make it efficient ?
Q :What is slow and what is fast?
A : if query output is displayed in 2-4 seconds, it is fast and ok but
if it takes 15-30 seconds then it is mild and if it takes a minute or
more, it is slow.
Q : What kind of bandwidth per user or per, transaction, and how many
users/transactions at any given time?
A : The six WAN links remain at 50% loading but at times the link use
goes until 75%. Per user transaction and bandwidth need varies as some
queries yield less output while other yield more output. Total BW
transfer between the TS and IIOP server in one business day is 20GB.
At a time, average two hundred users are onto it. If we assume 200
transactions for each user, then 200x200=40000 transactions and if
each transaction on average is supposed to take 0.5-->1 MB, then it
makes 40,000x.5=20000 MB.
Q : How much total data is transferred in a week between TS Servers
and the IIOP Application Server ?
A : 1 TERA BYTE
Q : 12Mbps is not that fast, but you need to provide context of
whether or not 12Mbps is enough. Could be anything from server being
busy with backups or some kind of schedule, to WAN pipe utilization
going over 80% which would start to impact latency, to service
provider, to anything ?
A : There are 6 WAN link each of 2MB (6xE1). Each of the link varies
from 50% to 70% loading and at times it goes to 80%. But we have moved
from 2xE1 to 6XE1 and the application is so Bandwidth hungry that even
this BW does not seem enough. IIOP server is not in our domain so we
cannot check. Yes WAN pipe touches 80% but we cannot provide more
bandwidth than that and need to find the other way. TTL is also fine
but I need to check the latency during 80% loading.
Q : Have you used MRTG or Netflow to gauge bandwidth utilization at
these times? How about latency?
A : Yes we use MRTG and Netflow and I have detailed traffic stats.
Bandwidth sometimes goes up and the usage on a 2Mbps link varies from
1.5 to 1.75Mbps.
Q : Do you have a baseline of these usages and performance during
'good performance' times?
A : It has been up and down, sometime complaint comes and ususally "no
news is good news"
Q : Do you have QoS?
A : No
Q : Could someone be running a FTP and killing your pipe?
A : In the presence of Netflow I can alway catch the cluprit but that
is not the case here.
Q : My bet is that the problem is the Terminal Server. 200 clients on
a single Terminal Server is a lot even for non-database type
applications. Are you also monitoring the performance of the TS? You
need to monitor memory usage, CPU, disk I/O and network I/O, active
clients, etc. Even if this is a big multi-cpu TS, you probably have a
some type of I/O bottleneck on the server ?
A : Yes, sometime ago that was the case but later the users were split
onto different TS and more resources were added, can you refer to
SERVER SIZING URL where I can find performance parameters as you have
mentioned, how can I find that there is an I/O bottleneck ? and how
does it gets removed automatically ? TS/Thin Server team puts a
regualr weekly reboot of these machines.
I agree with you that there are too many variables
TRAFFIC CAPTURE RESULTS
I have captured the traffic and analyzed it. From three TS to IIOP
application, in 1 min 9 seconds, only about 2 MB of request was sent
and against that 65MB of data was pulled.
Capture Duration : 1 min 9 seconds
Client to Server Data : 2MB
Server to Client Data : 65 MB
Data Type : TCP
Frames caputred : 75000
Application : HTML, IIOP
MSS advertised by both : 1460byte
TOO MANY TCP RETRANSMISSION, DUPLICATE ACKS, FAST RETRANSMISSIONS etc
TCP window advertised by client : 65k
TCP window advertised by server : 24k
In all 7500 frames, I saw the same TCP WINDOW SIZE, should it change ?
is there anything wrong ? who controls window size ? I believe that
the SEND TCP window size of SENDER (IIOP Application) and RECEIVE TCP
window size at the TS Server which is fetching data from the IIOP
Server should be same.
What you think from above data, is it same or different ? and if not
what should be the optimal TCP WINDOW SIZE ?
As a work-around, I am suggesting that a CLUSTER TERMINAL SERVER be
placed at the IIOP Application LAN so that huge amounts of data
transfer only between two machines on the SAME LAN and only screen
refreshes transfer over WAN, what you say ?
waiting for your valuable response...
Here is your details on TCP Windowing....better than me trying to make
a 1 paragraph summary:http://www.ncsa.uiuc.edu/People/vwelch/net_perf/tcp_windows.html
As for your issue, it sounds like you may just have a simple issue of
amount of clients and bandwidth. By 'efficient', I mean are the
clients asking for all the data at once from the DB rather than a cell
by cell query. If one client makes a request (or a few), and the
server responds back with full packets (usually 1514 or whatever),
until the query has been fulfilled, it is network 'efficient'. If the
client is making hundreds of queries for each additional piece of
information, the application needs to be looked at. Especially over a
WAN with limited bandwidth and latency, this could kill you.
Assuming that is not your issue (nothing to prove it, just saying),
you may just have some overloaded times. When the bandwidth is at 50%
per pipe, is the performance good? When the performance is reported
as bad, does the bandwidth show any clear differences, such as 80%
utilization during these times. If so, it sounds like pure volume is
your bottleneck. If not, and it happens when its at 50 and 80 alike,
what else is going on in the network or one these boxes? TCP
windowing is negotiated, and while a smaller window will not allow as
much data, I still doubt this is your issue. Non optimized windowing
would also affect ALL your traffic, not just traffic at certain
times. Either it is negotiating properly or not, and it would not
make sense that some transactions are 1-2 seconds, while others are
over a minute. This is not a windowing problem. You need to focus on
volume/usage of the network and these boxes during good and bad times,
and see what correlations you can draw.
Thanks for your valuable advice which seems to be blended with many
years of related experience.I have skimmed through the details of the
URL you provided and found it very useful.Yes it seems like a simple
issue of number of clients and amount of bandwidth. I saw today that
all 6 WAN links were completely chocked at a particular time
From the traffic decode, I feel that when client requests, the server
responds with full data as I saw many 1514 byte packets in the decode
I have been asking the IIOP application team for many months to look
at the application but they are nagging their heads like dumb and deaf
and always blaming network.
Exactly, the application is killing WAN even with a 24kB TCP SEND
window size, I wonder what might happen to network when the window
size would be increased
Please suggest if my below calculation is right
TCP Window size = BW x RTT = 10Mbps x 70msec = 100kB
Note : 10Mbps for 5xE1 links
so the SEND TCP window at the IIOP Server could be 1000kB instead of
However the reason which diverted my attention to TCP Windows sizing
was that even when sometimes the WAN links were not chocked, there is
still slow access complaint.
"you may just have some overloaded times"
When BW is at 50%, yep, there are no complaints but not always but
mostly no complaints.
When links are above 80%, there is sometimes a delayed complaint and
it becomes difficult to find exactly when the complaint was coming and
what was the nature of the problem because when the complaint arrives
after some hours or next day, I look into the WAN, and it looks fine
and then a whole series of questions and lots of guess work has to be
done to estimate the event and its impact
"It sounds like pure volume is your bottleneck"
I agree but the issue here is we cannot allocate any more bandwidth so
a kind of constraint.
I am focused 100% now on volume and thinking of alternate way to
reduce the application traffic transported on the network and what do
you think of the idea (we are using this idea with great success for
some other applications also)
"a CLUSTER TERMINAL SERVER be placed at the IIOP Application LAN so
that huge amounts of data
transfer only between two machines on the SAME LAN and only screen
refreshes transfer over WAN"
Are you a cisco guy ? can I put you some later
I am a 'cisco guy', yes. I support thousands of applications for a
major financial institution. Windowing will not help you. The
application will still negotiate, and if you are max bandwidth,
windowing will not allow you to send any more or less traffic.
Additionally, your wan has limited packet sizes based on frame relay
or ATM or whatever you are using, and therefore, I am 99% sure that
tcp windowing changes will not do anything to help your situation. If
your issues are resulting from 80-90% WAN utilization, and everything
else looks ok, then your only options are to move your database,
increase your WAN, or decrease your load at any given time. I mean,
if this is a pure database query application, which it seems is
efficient based on your 1500 byte packet comments, then your response
issues are a simple combination of size of query, over limited
bandwidth, that is already constrained. Are these t1s bonded together
so you have 1 12 meg pipe? While separate pipes may help separate
users or traffic, 1 12 meg pipe should in theory allow better
communications as the traffic can return at a higher rate. While it
won't do too much when you have your max users like you do now, it may
end up helping a bit. If you already have these bonded, its a
function of what I already said above......you may not have a
'problem' per se here, but more of a simple constraint or 'ceiling'.
Maybe you could replicate this database to something local, and allow
your users to hit that? Bottom line is that you will need to get your
business/application resources to help think through options...but it
doesn't sound like something is necessarily 'broken'.
- Prev by Date: Re: PIX VPN w/MPLS routing
- Next by Date: Re: call manager attendant console install question
- Previous by thread: Re: TCP Window Size
- Next by thread: Re: TCP Window Size