Re: Degradation of TCP connection
- From: justin.pearson@xxxxxxxxx
- Date: Tue, 22 Jul 2008 15:25:33 -0700 (PDT)
On Jul 22, 3:12 pm, justin.pear...@xxxxxxxxx wrote:
Hi everyone,
I've got a TCP communications problem in VxWorks (6.5) that has me
stumped.
I've got a Windows XP machine running a LabView program that sends 500-
byte TCP/IP packets to the VxWorks app (running on a single-board
computer w/ a Motorola PowerPC 7447 processor, elsewhere on the
network) at 10Hz. The VxWorks app reads some temperature sensors from
its A/D boards, packages up that data (about 1kB), and sends it back
to the LabView app, also at 10Hz. There is another task on the VxWorks
app that performs some simple calculations on the data, packages up
the results of those calculations (again, into packets about 1kB in
size) and sends them to the LabView app, also at about 10Hz. The
socket for the TCP connection is a global variable that is used by
both VxWorks tasks, and protected by a mutex.
Here is the problem. After about 70 hours of communication, the
connection fails. A packet sniffer (Wireshark) revealed that the
VxWorks app suddenly stops hearing from the LabView app. Both parties
continue to send packets until (1) the LabView app gets nervous that
VxWorks hasn't been incrementing its ACKs and so begins retransmitting
old packets, and (2) the VxWorks app gets nervous that it hasn't
received anything from the LabView app and so begins retransmitting
old packets. Soon the apps fill their respective send buffers, the
connection times out, and it's game over. The VxWorks program seems to
lock up and the app must be restarted.
This problem happened after 70 hours of flawless communication. After
rebooting VxWorks, we ran 22 hours until the next failure. After
rebooting again, 28 hours until the next.
At first, it seemed like this behavior might be caused by an
intermittently-broken receive wire on the VxWorks side, but an
inspection suggests that the hardware (at least outside the single-
board computer) is fine. We are begrudgingly confident that the
problem is not on the Windows side, because the packet data
demonstrates that the problem starts when the VxWorks app first stops
listening. I am hesitant to blame the application-level software,
because it appears from the packet data that the VxWorks TCP stack
simply does not receive (or ignores) the LabView packets, and so the
poor application simply sees the stream of packets stop. Lastly, we
attempted a workaround where the LabView app closes and reopens the
connection every two hours. After implementing this code, the
connection failed 28 hours later. In an attempt to simplify
communications, we enabled TCP_NODELAY on both sides, to no avail.
I have no more tricks up my sleeve. I am not familiar with the VxWorks
TCP stack; perhaps there are buffers (aside from the TCP send and
receive buffers) that can fill up or degrade over time? Is there any
way I can probe into the TCP stack and determine its health? Do you
think I'm barking up the wrong tree? A friend suggested setting the
TCP window size to something very small in order to minimize the
number of packets "in the air" on the network, but I'm not sure how to
set this in VxWorks. Does anyone know?
In short: Any suggestions? Has anyone encountered this behavior
before?
Thank you all in advance for your suggestions and time,
Justin
Oh, and I failed to mention that the VxWorks app is doing the
listening in the TCP connection. The LabView app connects to it.
.
- Follow-Ups:
- Re: Degradation of TCP connection
- From: gtdsox
- Re: Degradation of TCP connection
- References:
- Degradation of TCP connection
- From: justin . pearson
- Degradation of TCP connection
- Prev by Date: Degradation of TCP connection
- Next by Date: Re: MMU question for PPC 603
- Previous by thread: Degradation of TCP connection
- Next by thread: Re: Degradation of TCP connection
- Index(es):
Relevant Pages
|