Re: network driver + replenishment



On Jul 18, 4:15 am, gvarndell <gvarnd...@xxxxxxxxxxx> wrote:
On Jul 17, 4:45 pm, noiset...@xxxxxxxxx wrote:



On Jul 17, 11:47 am, gvarndell <gvarnd...@xxxxxxxxxxx> wrote:

On Jul 17, 10:32 am, MB <madhu....@xxxxxxxxx> wrote:

Hi

I need a good sugestions about replenishing the buffers in network
driver wrt to performance.

I have a model in which, network driver receive a interrupt and ISR
will add a function to net job Q. Later, this netjob funtion pull the
packet from the network hw and give it to upper layer. Once upper
layer finish processing, it will again give back the buffer to driver,
now, which will be replenish back to network.

I've never seen a driver that works exactly that way -- in fact, it's
unlikely a driver *could* work exactly that way.
Typically, the ISR queues the netjob and leaves.
The netjob then takes the buffer away from the hardware, 'loans' it to
the stack, and immediately gives the hardware a new buffer.
I've always coded my drivers so that the hardware gets a new buffer
right in the ISR.
This reduces, about as much as possible, the likelihood of the
hardware not having a buffer when it needs one.
I've seen many network drivers that try to do all the ISR work at
netjob level and fail miserably when the network spikes with a big
flurry of very small packets.
More often than not, such drivers lock up so badly that the target has
to be reset.
You can defer passing the filled buffer up to the stack (do it in a
netjob) if you want, but I usually do that in the ISR too.

HTH
GV

I'm sorry, but that's wrong. Bear in mind that while you're executing
in ISR context in VxWorks, it's impossible for tasks to run. (Other
ISRs might run depending on the circumstances, but tasks won't.) The
longer you spend in ISR context, the longer you prevent tasks from
running again, even high priority tasks. Now consider what happens
with a typical ethernet controller that uses a DMA descriptor ring:
there might only be one frame waiting to be processed, or there might
be several. In fact, if there's a large amount of inbound traffic, you
could have hundreds or thousands of frames to process. You can't
process all of them at ISR context: that would prevent tasks from
running for an unknown amount of time. There are many VxWorks
customers who would not tolerate that.

If you want to see an example of where this can be a problem, set up a
VxWorks target with a driver that does its buffer handling at ISR
level and blast it with a heavy stream of UDP traffic. Then go to the
target shell and try typing a few commands. You may find that the
shell is sluggish and slow to respond while the traffic is present,
and then becomes fast again when you stop it. This is not supposed to
happen: the shell task runs at a higher priority (1) than tNetTask
(50) so it should always have priority: no matter how high the traffic
load, the shell should always remain responsive. This is the nature of
priority based scheduling in VxWorks. (Note that proper design
dictates that tasks that have high priority also be well behaved: the
high priority allows the task to run when it needs to, but it should
not spend a huge amount of time hogging the CPU, or other tasks will
suffer.)

The general rule is in fact to acknowledge/cancel and mask interrupts
in the ISR, and schedule all the other work to be done at task level
(in tNetTask). This is how most currently shipping drivers are
written. As you correctly noted, the driver will "loan" buffers
containing received frames to the stack, and it has to replace those
buffers with fresh ones. My approach is to attempt to allocate a
replacement buffer first: if that succeeds, then you can swap the
fresh buffer with the one to be loaned, and loan out the dirty buffer
to the stack. If that fails, then the driver leaves the dirty buffer
where it is and resets the DMA descriptor so that it can be re-used. I
also increment an error counter to indicate that a frame has been
discarded. This allows the hardware to continue receiving frames. Once
the out-of-buffers condition passes (it's assumed that running out of
RX buffers is a transient condition), the swap-and-loan process can
resume as usual.

Note that the ethernet chip itself may drop frames if all of the
descriptors in the RX DMA ring fill up. This is called an RX overrun
error. The driver should recover from such errors and not hang or
crash. This condition can arise if there is a lot of network traffic
to receive and tNetTask doesn't get a chance to run before the NIC
consumes all of the RX descriptors. This can happen if there are tasks
in the system running at a higher priority than tNetTask which are
hogging the CPU, or if interrupts are locked out for too long
(intLock()/intUnlock()), or if another driver's ISR is running too
long and preventing tasks from running.

It's also important that the driver implement a fair use policy so
that it does not monopolize tNetTask, especially in the RX path. The
typical strategy is that once the RX handler netJob has been scheduled
into tNetTask, it will continue processing frames from the hardware
until there are none left (the receiver goes idle). This isn't a good
approach though: if the interface is flooded with RX traffic, it might
never go idle, and you will prevent tNetTask from handling work from
any other interfaces in the system. The correct way to handle this is
to only process at most X number of frames at a time, and then break
out of the processing loop and call netJobAdd() to schedule the RX
handler to run again later. Among other things, this also allows the
TX cleanup path of the driver to run in tNetTask too. Note that you
should only re-enable RX interrupts after you have finished all the RX
work you intend to do. Keeping the interrupts disabled while the RX
handler is running prevents unnecessary context switches: there is no
need for the hardware to trigger an interrupt and run the ISR again
while the task level handler is still running: if new RX traffic
arrives, the handler should pick it up without needing to be prodded
again. Reducing the number of interrupts that fire helps the overall
performance of the system.

Now, I realize that the netBufLib API is in fact designed so that it
can be used in ISRs (and documented as such), and there were drivers
in the past that worked that way. However I think most of them were
older BSD-style drivers that were ported to VxWorks along with the
original BSD-based TCP/IP stack. The old style BSD drivers also used
"top half/bottom half" design where it was considered ok to do
processing in the interrupt handler. But this was because the BSD
kernel didn't really give you an alternative: there was no other
context to defer to. Technically, you could create a separate process
context, however this was not practical because the context switching
overhead was too high. With VxWorks, context switching is supposed to
be lightweight since VxWorks uses tasks, not full blown processes.

Also, there is nothing about the "defer to tNetTask" approach in and
of itself that should lead to instability. Of the dozens of drivers
that I've written that use this model, all of them hold up very well
under load, and I've pounded the snot out of them with various torture
tests (netperf, Smartbits). If a driver that uses this model does hang
under load (or worse, causes the whole target to hang), it's not
because the design is wrong: it's because the driver is buggy as an
anthill and needs to be fixed.

For the record, Windows drivers also use the same model: NDIS drivers
have an ISR routine and a handler routine. The ISR runs at device
interrupt level and _only_ acks/cancels and masks interrupts. The
handler does the rest of the work. (The difference though is that in
Windows, the handler runs at DISPATCH_LEVEL, which is not the same as
task context in VxWorks. But it's still a different context from the
ISR.)

-Bill

Wow, you must be a good typer, it would have taken me hours to tap out
such a lengthy reply. ;-)
Anyway, I wanted to respond to just a few key points.

1) responsiveness of the target shell during network storms has never
been something I concerned myself with.

If you ever notice that the shell becomes sluggish during a network
storm, then you _should_ be concerned. Note that I use the shell task
as an example here: this applies to any task that's running at a
higher priority than tNetTask. By assigning a high priority to a given
task, a system designer is making a concious desicion that the code in
that task is more important than even network traffic, and it must
execute as soon as it becomes runable. This means that, by default at
least, processing keystrokes on the console is more important than
processing RX or TX completion events on the ethernet port.

Now, if you, as the system designer, decide that you don't care as
much about console responsiveness, you could change the shell task's
priority to make it lower than tNetTask. In that case, a network storm
will slow down the shell, but now it's expected. (In my experience,
this is usually not what people want: if you're designing a packet
forwarding device of some kind, you usually want reliable access to
the console so that if there is a traffic storm, you can issue some
administrative commands to deal with it. If the traffic storm prevents
you from doing that because the console becomes dog slow, then it's a
denial of service attack in more ways than one.)

Basically, the whole point of having the priority based scheduler in
VxWorks is that the scheduler gets to prioritize who gets access to
the CPU. This determinism is supposed to be one of the selling points
of the VxWorks OS. By putting extra work into the ISR, you're
effectively doing an end run around the scheduler. In certain
situations, this may seem clever, but I know customers who would
strongly disagree.

2) if the driver uses netjobs to pre-fetch network buffers so the ISR
can *quickly* access them without calling netBufLib directly, then
the ISR can actually do less work by replacing the buffer at ISR
level. If, for whatever reason, the supply of fresh buffers becomes
exhausted, the ISR then falls back to simply re-using the buffers (as
you mentioned) it already has and loosing the packet that was in it.

Doing this introduces some complications. You will be sharing the pre-
fetched buffer cache between task context and interrupt context. The
only way to do this reliably is to use intLock()/intUnlock() to guard
it or else you'll have a race condition. (If you're talking SMP,
you'll need to use a spinlock instead.) You can do that, but I prefer
to avoid using intLock()/intUnlock() unless I absolutely can't make
things work without it. (Bear in mind that netBufLib is already using
intLock()/intUnlock() internally, so now you'll be adding another
instance of it to maintain your buffer cache.)

3) Code for reliably recovering from receiver shutdowns due to no
buffer is often difficult to write and debug, which increases the cost
and development time. Most vxWorks users will not tolerate that if
they're paying you to do a driver them. If it's easier to ensure that
the receiver never shuts down, and I say it is, *and* the method by
which you accomplish that makes a higher performance driver, then why
not do it?.

Reliability is supposed to be another selling point of VxWorks. What
the customers that I know won't tolerate is having their ethernet
interface lock under high load. It may take some more time and effort
to implement RX error handling, but in the end you really cant avoid
it.

You can't ensure that the receiver never shuts down (i.e. runs out
descriptors). It's bound to happen sooner or later (typically later,
when the code is delivered to the customer), and your driver code must
handle the situation gracefully. This is why the hardware is designed
to signal RX overrun events in the first place. Eventually, you _will_
end up in a situation where the ethernet controller outpaces the CPU
and fills up all the RX descriptors before they can be serviced. For
example, have you considered what will happen if you have two ethernet
ports in a given target? If both of them are under heavy load and both
of them use the strategy you suggest, then one of them will end up
starving the other off the CPU for a while. (Both their ISRs can't run
at the same time.) So while one may avoid RX underruns, the other
won't.

Your approach of depending on the ISR to keep the RX DMA ring
populated has a flaw: during prolonged periods of heavy traffic, it
_will_ starve out tasks from running. And not just high priority tasks
either: it will impact all tasks. If you keep the ISR scanning the RX
DMA ring until the receiver goes idle, the ISR will run for an
unacceptably long time before yielding the CPU. (If the traffic storm
persists for an hour, you could stay jammed in the ISR for an hour.)
If you code the ISR so that it only processes a bounded number of
frames before quitting, you'll end up with a huge number of RX
interrupt events and spend the majority of time either in the ISR or
context switching into and out of it.

4) disabling receiver interrupts and relying on a netjob to re-enable
them is a bad idea. What if a user is furiously typing 'll' over and
over again at the target shell? ;-) (that was a joke)

It is not a bad idea. Like I said before, reducing the number of
interrupts is a big win, since it eliminates many needless context
switches. tNetTask will processes as many RX frames are pending in the
RX DMA ring, including new frames that arrive after the initial
interrupt that caused tNetTask to be scheduled. Allowing additional
interrupts to occur while you are doing interrupt processing serves no
purpose: you don't need to be notified that there's work to do when
you're already doing that work. (Also, in older versions of VxWorks,
if you mistakenly kept calling netJobAdd() when you didn't need to,
you would eventually overflow the netJob queue.)

As for a user constantly mashing keys on the console, if a high
priority task happens to be scheduled while tNetTask is running, then
tNetTask will just have to wait. (Hopefully it will not have to wait
long.)

Also note that none of this has addressed TX event processing. You are
precluded from doing any TX event handling in an ISR because the END
driver model provides a semaphore for guarding the TX path. Code in
both the primary TX path and the TX completion cleanup path must use
this semaphore for synchronization. (The send routine can run in any
application task that uses networking, but the TX cleanup work always
runs in tNetTask. A semaphore is necessary to prevent races.) You
can't take a semaphore in an ISR, so you can't do the TX cleanup work
there. (Unless you switch from using the END TX semaphore to intLock()/
intUnlock(), and you really shouldn't do that.)

It's obvious you've been in the trenches with this stuff and you
certainly helped the OP with this post.
I would just suggest that some of what you wrote is Wind River dogma
which has shaky foundation.
I've debugged and written a few network drivers myself.

I've been doing network driver development for about 12 years. Over
the past 7, I've written about 30 of them, just for VxWorks. (I wrote
more for another OS which shall remain nameless.) I've worked on
10Mbps only NICs, 10/100 NICs, 1Gbps NICs, 10GbE NICs, copper NICs,
fiber NICs, and USB ethernet adapters. And I have the death sentence
on 12 systems.

What I wrote is not dogma. There are design requirements that dictate
how VxWorks is supposed to... well, work, and the driver design you
propose violates those requirements. Now, there's nothing stopping
someone from implementing a driver your way: once a customer licenses
VxWorks, they can add whatever application or driver code to it that
they want. The customer is always right, after all. But if they stress
the networking support at all, it will hurt the scheduling behavior of
the system, and some customers are extremely picky about things like
that.

For the record, my reliability checklist for VxWorks ethernet drivers
includes the following:

- Must withstand UDP flood
- Must withstand TX torture test (custom test that calls muxSend() in
a loop)
- Must withstand ping flood
- Must pass large ping test (i.e. large fragmented IP packets)
- Must implement multicast filtering correctly (required for IPv6
support)
- Must survive Smartbits forwarding/torturte test
- Must survive netperf RX and TX tests
- Must pass wtxTest (excercises Wind River WDB agent protocol and
polled mode)
- muxDevStop()/muxDevStart() must work correctly
- muxDevLoad/muxDevUnload() must work correctly
- Receiver must not stall on hardware RX error
- Transmitter must not stall on hardware TX error
- Must support dynamic link management (cable unplug/replug event
handling)

Unfortunately, Wind River doesn't have a driver verification process
like Microsoft does for Windows NDIS drivers, so my tests are somewhat
ad-hoc. But they cover many things which were ignored in the past:
there were a lot of drivers shipped with older versions of VxWorks
that would fail one or more of these tests. I don't like for that to
happen with my drivers. When one of them screws up, it means I screwed
up, and I take it personally.

-Bill

Regards,
GV

.



Relevant Pages

  • Re: network driver + replenishment
    ... I have a model in which, network driver receive a interrupt and ISR ... the ISR queues the netjob and leaves. ... The netjob then takes the buffer away from the hardware, ...
    (comp.os.vxworks)
  • METHOD_OUT_DIRECT and ISR
    ... I am updating/writing a driver that collects a large amount of data ... It is the output buffer that I need help with. ... The interrupts keep occuring until the ISR sees that it has collected ...
    (microsoft.public.development.device.drivers)
  • METHOD_OUT_DIRECT and ISR
    ... I am updating/writing a driver that collects a large amount of data ... It is the output buffer that I need help with. ... The interrupts keep occuring until the ISR sees that it has collected ...
    (microsoft.public.development.device.drivers)
  • METHOD_OUT_DIRECT and ISR
    ... I am updating/writing a driver that collects a large amount of data ... It is the output buffer that I need help with. ... The interrupts keep occuring until the ISR sees that it has collected ...
    (microsoft.public.development.device.drivers)
  • METHOD_OUT_DIRECT and ISR
    ... I am updating/writing a driver that collects a large amount of data ... It is the output buffer that I need help with. ... The interrupts keep occuring until the ISR sees that it has collected ...
    (microsoft.public.development.device.drivers)