Re: Concurrent Sequential



Yes, although that number is probably in the single digit microseconds
for more current processors. But my point is that if you got a 3X
performance improvement by eliminating the context switches, then their
overhead must have been 3X the amount of useful work done by the
processes between the switches and this is the "screwed-up" design I
referred to above.

Not necessarily. It means that you are running the cpu close to
saturation (e.g. lots of interrupt activity) and the production
engineer people won't allow upgraded cpu's for cost reasons.

It may also be related to upgrading of software on installed base. If
you can improve throughput with only a software change, you will
consider the idea.

Our "trick" (vf) lets you remove a whack of context switch overhead.
In the environments we used this technique, the number of instructions
executed to perform a context switch was comparable to the number of
instructions to perform the interrupt service. Knocking out a chunk
of instructions anywhere in this chain resulted in huge savings.

You brought up some interesting points about the overall environment
that drives design trade-offs...

I talked about that as a possible way of implementing I/O, but when you
indicated that it used more traditional synchronous I/I. I abandoned
that alternative.

Sorry, I must have misunderstood, or read my own meaning into what you
said.

[rearranged]
This is the same (software) cost as handling an interrupt. It has
significantly less cost than performing a full context switch.

That is the part I still don't understand.

(Phew, this is becoming hard to explain in words - here's another
attempt, ask if its not clear.)

[On re-read - I think we're getting stuck on what it means to "give up
the cpu" and what is and is not on the stack. I've let my rambling
answer stand - I get to the "give up the cpu" part further down.
"Context" does not need to be stored on the stack, but if you don't
store it on the stack, you have to make concessions. I argue that the
concessions look at first unreasonable, but lead to a surprisingly
usable result.]


Process P is running, it is interrupted and the interrupt is handled
by process X:

in the fully-preemptive case:

1) push context for P
2) pop context for X
3) service interrupt
4) push context for X
5) pop context for P (resume P)

in reactive (vf) case:

1) push context for P
2) service interrupt
3) pop context for P (resume P)

So, there is less (save/restore) code traversed (hence, more speed) in
the reactive (one-stack) case for the same interrupt.

[It's "obvious". I suspect that you already know this. That's why
they design irq's for cpu's that way.]

And, in the degenerate case, where the system is idle, the reactive
case boils down to:

1) service interrupt

Interestingly, when you program a system using the reactive mindset,
the "system is idle" condition happens frequently, so that case is
"not so" degenerate :-).

You might wish to argue that you could use full preemption and "defer"
the interrupt to speed things up, in which case you get:

1) push context for P
2) minimal service of interrupt (put data aside - defer)
3) pop context for P, until it yields
4) push context for P (save and yield)
5) pop context for X
6) finish deferred interrupt service
7) push context for X (save and yield)

In the reactive case, a deferred interrupt can happen when the
servicing part is busy (e.g. the system is very active and the part
has not finished its work before the next interrupt shows up):

1) begin servicing interrupt, mark X "busy"
2) 2nd interrupt arrives, push context for X
3) service 2nd interrupt - create event, enqueue on X's input queue
4) pop context for X, process input queue until X finishes

The fully-preemptive mentality is "save/restore your own context",
whereas the reactive mentality is "save the context of the process
you've interrupted, if any".

So, you do fewer context saves/restores with the reactive mentality -
but - you have to give something up (discussed below).

Does that get any closer to explaining it, or does it still look like
I'm pulling a sleight of hand? :-).

Another way to think about it might be: Yeah, yeah, I can match the
reactive "efficiency" by writing device drivers that way. And then,
when the system is not too busy, I can context switch to a process and
finish the interrupt handling.

But - in vf - we do it ALL that way. It is as if we are writing one
enormous device driver, everything works with the "single stack" model
just like device drivers do.

There are trade-offs to using VF. But, the advantage is that you get
to write device drivers (and the whole app) in a structured manner
instead of in an ad-hoc manner. And, you get causality. And, you get
sensible, type-checkable diagrams.

OK, here we are getting into the nub of the issue. the part is running
along, using its full context, then it starts the I/O, still using its
full context. When it gives up the CPU, to go into the wait state,

If you understood the preceding few paragraphs, I don't need to answer
this question :-). But I will.

I suspect that you are not yet "thinking reactively".

The statement "When it gives up the CPU" is misleading you, I think.
Well, either that, or my choice of words is confusing you :-).

It "gives up the cpu" by FINISHING and leaving no dynamic context.
End of subroutine. Done. No state remains on the stack.

You can ONLY leave STATIC context - e.g. poke some state variable (our
state diagram compiler does this for you automatically).

Let's take the specific example - a state machine containing two
states A & C.

The state machine is represented by a record (a "struct" in the C
language).

The record contains a field for the state of the state machine and
might contain some "local" static variables that belong to the state
machine.

The state machine is invoked and the state variable says to execute
the code snippet belonging to state A. This snippet pokes some
hardware register to begin the I/O, maybe scribbles something into a
static variable and then does a "GOTO STATE C" which changes the state
variable.

Then, A finishes. Not pre-empted. Completely done. Stacked popped.

The CPU goes idle.

When the I/O completion event arrives, the kernel invokes the same
state machine again.

This time, the state variable says to execute the code snippet for
state C. C was written with the knowledge that A preceded it. The
code in C knows which static variables to look in, how to complete the
I/O, whatever.


You are probably thinking that I've said nothing that you don't
already know, or that I've just gone around in a circle saying that
there is no dynamic state, but that I cheat by allowing for static
state :-)?

So, let me remind myself of my original message:

Most useful programs are reactive.

Programming reactive systems is tough if you use call-return.

Multitasking is a great way to program reactive systems.

Fully preemptive multitasking is too expensive to be used in that way.

There is a way to program in the reactive paradigm, but it requires
some simple trade-offs which, at fist glance look to be, either (a)
stupid, or (b) not powerful-enough, or (c) contrary to everything I
was taught about good programming practices (:-).

The trade-offs are:

(a) use run-to-completion mutual multitasking, so that we don't need
multiple stacks (as are needed for full preemption)

(b) throw away most instances of the need for dynamic state and rely
solely on static state,

(c) prohibit reentrancy (using a "busy" flag) so that static state
doesn't get screwed up.


ISTM
that the context needs to be saved as the code in state C may very well
depend on that context. I don't think you can blindly assume that the
code in state C (which to the compiler is just another portion of the
part it is compiling) won't depend on, for example, the contents of the
registers just before the code at C got control. Similarly, when C gets
control, assuming B ran, then the A-C part must have its context
restored as B presumably used the registers for its own use.

If it depends on that static context, then you write the code to save
that static context (beyond what is automatically done by the state
machine compiler).

This is one of the trade-offs of using VF.

This is probably why we like using state machines as the innards for
Parts, since it makes the "trade off" really easy to live with.

Again, we are making progress in narrowing down my areas of
misunderstanding. If A-C is running, gives up the CPU after it starts
an I/O and B gets control, then B's information is in the stack, having
been pushed after-in-time A-Cs information. Now the completion comes
in, the interrupt uses minimal context, runs to completion and is ready
to give up control. The stack pointer, etc. is pointing to B's
information. How can control be given to part A-C without somehow
unwinding the stack to get to A-C's information. And if you do unwind
the stack, what happens to B's information that was stored there?
[snip]
Why not? Isn't state C just some code compiled within the part? How do
you assure that the code doesn't assume register contents from
previously executed code in the same part?

The stack is not used to store A-C "state", so this scenario doesn't
happen. If A were written in the C language, then it reached the
closing curly brace before "giving up" the cpu and, hence, left
nothing on the stack, including temp register contents managed by the
compiler.

(In practice, because we have to build these things on call-return
based hardware, what really happens is that the code snippet does a
RET back into the kernel - which cleans up the stack and pops frame
pointers/whatever. This also explains why we like to use, when
possible, byte-coded virtual machines where we get total control over
the instruction set.)

[snip]
Yes, but it is a matter of who needs to know it - Also known as
separation of concerns. In a transaction server, the writers of the
server code need to worry about those things, but the programmers who
write the transactions - the parts that do most of the actual work and
vary between applications don't. ISTM that VF forces a requirement of
more knowledge to the people who write the parts/application.

I disagree. The server gurus would write the seriously detailed
parts. The app programmers would simply use the parts. The gurus can
hide parts and subsystems however they want, e.g. using hierarchical
schematics.

We saw this effectively happen on one project. After a year of
development, the business requirements changed in a way that changed
the architecture of the design. The manager who understood the new
business requirements treated the existing design as a "bag of parts
and subsystems" and simply rewired the top levels of the architecture
until he had what he wanted.

[snip]
Perhaps, but that isn't the point I am making. I am talking about much
the transaction writers need to know. I think they need to understand
much more when using VF than when using a traditional transaction server
model. I.e. more detailed "system wide" knowledge is required at the
level of the people writing the transaction code than is required by the
server model.

I'm probably miscommunicating something. I reach exactly the opposite
conclusion.

Maybe I haven't expanded on what it means to have hierarchical
schematics?

A reasonably-done architecture consists of many parts, but most of
those are parts within parts.

When you look at any "level" of the architecture, you see only a small
handful of parts, say, not more than 10.

We have been able to discuss architectures with "non-technical
management" down to lower levels better than using any other
technique. That, to me, means that each level is understandable to a
wider range of humans and that each level expresses details pertinent
only to a given level. Better "separation of concerns" than I've seen
with other techniques.

Contrast this with OOP. I was involved with Eiffel in the early days
(check the acknowledgements in Eiffel: The Language). I conclude that
OOP makes things harder to fathom, by adding a new dimension
(inheritance) for the blossoming of complexity. I remember the
evolution of the API for a database class - which tried to allow for
transparent access to a variety of databases. After the db class'
obligatory 3rd iteration, it was generalized to such an extent that
you could no longer determine what the class was to be used for. Or,
crack open the Swing reference book if you're trying to build another
gui but you've never used Swing before.

This was one of the experiences that led us to examine why the
hardware paradigm worked "so well" vs. the "software paradigm".

One of such answers is: eschew generalization. Generalizing classes
is "bad". Adding parameters and parameterization is "bad". UML is
basically "bad". Do what Engineers do - solve a specific problem and
draw the solution (example - you cannot directly reuse the blueprints
for one bridge to build another, because the two sets of river banks
are different distances apart - yet Engineers design and communicate
about things that are way more reliable than what software people
design).

To truly answer your separation of concerns problem, I need more
info. Can you give me an example of some transactional thing that you
think would be less-well managed using the reactive paradigm?

pt
.



Relevant Pages

  • Re: interrupt routine and application pages
    ... application stack in the interrupt context. ... are still in the context of interrupted thread. ... your code runs at raised IRQL, Memory Manager just had no chance to ...
    (microsoft.public.development.device.drivers)
  • Re: LinuxPPS & spinlocks
    ... I meant that if you have to lock between user context and interrupt ... Ok, I see, but how you can get your PPS source data struct starting ...
    (Linux-Kernel)
  • Re: Network Stack Locking
    ... Instead of saving and restoring the interrupt ... > to basically create a context (by pushing a procedure call on the stack) ... parallel to concepts in Mach. ...
    (freebsd-arch)
  • Re: interrupt routine and application pages
    ... I have a kernel mode driver that collects profiling information from CPU ... collect application stack during the profile interrupt. ... I need to grab the stack data upto 1K of user stack ... are still in the context of interrupted thread. ...
    (microsoft.public.development.device.drivers)
  • Re: Concurrent Sequential
    ... It was the context switch problem that caused us to look for an alternate ... The PID contains a pointer to the private stack. ... some "flag" is set to signal completion of the I/O ...
    (comp.arch)