Re: WWDC -- MacBook Pro?



In article <nospam.News.Bob-16E715.10330610082006@xxxxxxxxxxxxxxxx>, Bob
Harris <nospam.News.Bob@xxxxxxxxxxxxxxxxxxxxxx> wrote:

Yes and no. The C compiler can not alter the alignment of a jpeg
image, nor can it alter the alignment of data stored on disk.

In this case, it is the data of a binary file, ...

The
C program must be able to handle data as it is presented to it, it
can not arbitrarily change that.

....but once the program has loaded it into its program memory, there is no
way to tell what the binary format is, in classic C, before C99. So the
'char' used to load the binary data must hold a minimum, but can be padded
to something else.

In all the C compiles I've worked with, bytes are aligned on byte
boundaries, int16 are aligned on 2 byte boundaries, int32 are
aligned on 4 byte boundaries, int64 values are aligned on 64 bit
boundaries, and pages are aligned on page boundaries.

In more recent compilers, this is probably the case. But when discussed in
C/C++ forums and such, people are always able to come up with the compiler
that used 9 bit bytes, and strange padding of numerical types, or
whatever. One cannot ensure that an 'int' is stored as a complementary one
(or was it two).

Compilers that do not follow that behavior run into portability
problems reading and processing data created on another platform.

People seems to nowadays agree on what a byte is, but a word of several
bytes comes in big/little endian versions. Therefore, safe transport is
best by encoding into bytes, like UTF-8. Distributed programming
must also specify the underlying binary model - but that is not possible
within classical C, but requires extensions.

As for performance.  Even the Alpha (Digital's 64 bit CPU that had
a lot of fame in the early 90's) handled 32 bit aligned data on an
equal footing to 64 bit aligned data.  And later versions gave
byte and 2 byte aligned data an equal status.
....
As long as the data is on its
natural alignment boundary and that data is the right data for the
job, then being compiled for 32 bits or 64 bits should not matter
to the CPU, especially if the code doesn't need 64 bit features.

But mysterious thing is then that people say they have measured a small
performance loss when programs written for 32-bits is compiled for the new
64-computers.

A 64 bit CPU means it can tackle larger problem sets.  It does not
mean it now sucks at handling the smaller ones. 

It depends, if those designing the chips made the smaller types as
efficient. For example, a FPU might be as efficient on 32-bit float's as
on 64-bit double's, but if space is tight on the chip, the float's is
probably computed by first converting to a double, and then back to a
float, in which case the smaller type is slower!

Now this is only small (tiny) performance differences, but it is
something that can be measured.

One possible explanations that comes to my mind is that the 64-bit
computer isn't optimized around 32-bit. Data that is packed in to a single
word, so that the CPU has to split it, is slower than unpacked data. So if
the memory is 64-bit and one fits two 32-bit data types into it, it will
be slower than if these two 32-bit data are put into two 64-bit words. And
so if the 'int', which is 32-bits on a 32-bit computer remains 32-bit on
the 64-bit computer, and the C compiler decides that two following int's
should be packed into a 64-bit word, that might cause a slowdown relative
if the compiler decides that the int's should be put into two 64-bit
words.

No, it is because the application more memory taken up by address
space.

Think of it this way.  You have a glass.  In case A you fill that
glass up with ice cold water.  You are thirsty, so you quickly
drink the water and it is good.

Case B, same glass, same ice code water.  But now half the volume
be ice.  You are still thirsty, but now you only get half the
water, and you need to ask the waiter to get you more water.  You
wait.  The wait is short, as the waiter has a pitcher of ice water
near by (a cache), but when he gives it to you, you get half ice
again.  Your still thirsty, the pitcher quickly empties, the
waiter has to go back to the kitchen to get more ice and water for
the pitcher, you wait longer.  But you are still thirsty, another
pitcher gone, and more trips to the kitchen, and each time you are
only getting half a glass of water and half a glass of ice.

The point is that if what you want is water (meaningful
addresses), but you are also carrying along ice (zeros in the
upper 32 bits of every address), then the system will spend more
time moving ice around (zeros) that are never used.  This data
movement is not free.  It results in more memory usage, more cache
invalidates, more TLB soft faults, etc...

But in a proper 64-bit computer, all data pipes are doubled, including
memory buses, caches and so on.

The computers now labeled '64-bit' often seem to be not clean 64-bit
computers, but 64/32-bit hybrids.

So the problem is not running a 32 bit applications on a 64 bit
CPU, it is a 32 bit application compiled to 64 bits when it
doesn't need to be.

If all data handling is properly doubles, this should not be a problem.

...This code can be written in C and still get good results if the
compiler has a good optimizer.  And even if it is a bad optimizer,
you can still get good results in C by unrolling loops and doing
prefetches.  But it is very messy code and only pays off if you
are moving a lot of data, and it is very tricky to get the start
and end portions correct because of odd lengths and odd boundary
starting alignments.

And if this is done in the libc, it can use 64 load/store
operations no matter how the application is compiled.  It may need
to have a different version for 32 bit addressing vs 64 bit
addressing, but generally that is handled by having a libc for 32
bits and a libc for 64 bits.  You mileage may vary depending on
which operation system you are working on.  I have played with too
many, and so far I have not done any serious programming for Mac
OS X, I just love using it :-)

GCC has special features where register can be manipulated, and those are
used by folks writing compliers like GHC <http://haskell.org/>.

So I'm not trashing 64 bit CPUs.  I'm just saying they they do not
automatically double the memory needs, and that not all 64 bit
applications will be faster.

Let's return to the padding/packing question above. If you compile your
32-bit program for 64-bit, then the int's remain the same size, and
the compiler will probably pack adjacent int's into single 64-bit words.
This gives a small performance loss.

NO!  all caps are intentional.  As long as the data is aligned on
its natural boundary, and there no reason to think this will
change just because the size of the addresses has changed.

But the data isn't aligned on the 64-bit words, and must be extracted. So
it requires that the CPU has wiring that can do this without cycle loss.

The next step is to profile the code written for the 32-bit computer on
the 64-bit computer. So one discovers, aha, these 32-bit int's slow the
program down.

Profiling good.  yes do this.

But as I've said, it is unlikely that a 32 bit app will run slower
than the same app compiled for 64 bits running on the same
computer.

People are speaking about different computers here: The program written
for 32-bit compiled on a 32-bit computer and on a 64-bit computer. Then,
on the 64-bit computer, a performance loss is measured.

So the next step is to change them to 64-bit long int's.
Now, on a 32-bit computer, a 'long' might be 32-bits just as the 'int'. So
you have introduced a 64/32-bit incompatibility. Now, when programming
continues for awhile on the 64-bit computer, it seems prudent to make use
of the longer integral types. This could be other types, such as 'double',
which are coerced into this 64-bit types. So after awhile, you have code
that can't be easily converted to work on a 32-bit computer.

That is half true.  If you are a developer and you setup your
builds to generate ONLY 64 bit versions of your program, then you
will have a tendency to take advantage of int64. 

This is the situation I am discussing.

But unless you
are really doing heavy duty 64 bit integer math, or working with
huge memory models, the application will suffer from being
compiled this way, as explained above.

And even those that now are into universal binaries, are likely to walk
down this path, too, in due time. Then the 32-bit computers will die in
use.

So the code could, when programming continues for awhile and one is
making special use of the 64-bit features, expand considerably, and would
probably become wholly incompatible with the 32-bit computers.

Yes, and that could be a market limiting approach to making money. 
If your app is a niche product and your customers would only be
using on high end equipment anyway, this is less of a problem. But
if you are going to the low margin mass market, then stay 32 bits
as long as you can.  There is more money in it for you to not
exclude anyone.  For that matter, continue to provide PowerPC
versions too.  Money is money, and you don't want to leave any of
it on the table if you don't have to.

There is a difference in commercial and noncommercial programming: In
commercial programming, one tries to carry customers along, so that will
be universal binaries and such, but less efficient. In noncommercial
programming, one does not do that as much, often focusing only at the
latest. So there, the change will take place more quickly.

But memory is getting cheap, and there is also a cost carrying customers
on outdated hardware along. So also in commercial applications, one will
in due time shift to 64-bit only programming.

As this program rewriting happens, the 32-bit computers will die away from
use, as new programs will not run on them anymore.

That is not tomorrow.  And this may happen, or maybe the Smart
Phone will become such a big market, that you will want to write a
version of your app for the cell phone, and that may not be 64
bits for awhile.

or a media console may turn the wide screen TV into a media center
with your application running on the side, but the media center is
32 bits.

If you are a developer and you want people to love your product,
don't shoot yourself in the foot just because you can. :-)

I doubt universal binaries will run well on such a variety of hardware.
And it will have very different types of consumer usage. So one
will probably have to write the code anew specially for such applications.

Graphics applications generally want more memory, but they
generally want it because of the graphics data.  A 64 bit CPU with
the same graphics program will take the same amount of space as it
did on the 32 bit CPU.  It will most likely run faster because the
64 but CPU has a faster clock speed, faster memory bus, and able
to retire instructions in the CPU faster than the previous model.

Speed is very crucial in these kinds of applications. And graphics size
may expand of other reasons: one is using more complex methods,
and deeper graphics depth.

If that application is paging (use Terminal running vm_stat 60 to
monitor pageout values to see if paging is the issue). 

Unfortunately, this works only on computers that has the developer package
installed. And try to get a fellow used to MAc OS 9 to install a wholly
"unnecessary", big developer package.

Then more
memory will help.

If the computer admits it to be fit into it.
 
And if your computer system worked will when you purchased it and
your computing habits remain consistent,

In graphics design, they are not: new versions come along all the time.
The fellow I speak about, tries to figure out when the latest versions
have become stable enough to be installed, switching when new bugs cannot
screw up the jobs one has to do.

...then it is likely it will
last as long as you need it to (mine is almost 3 years old; and
running on 640MB). 

So this will not work, in that case. As for myself, I have no worries,
just doing some compiling, besides mostly reading mail and WWW.

If you change from reading email and surfing
the web to being a moving makers, then maybe you will need to
change your computer system sooner.

So this seems to be the case: fro those people, a 64-bit computer which
only can hold a maximum of 4 GB seems too tight.

bottom line, 64 bit CPUs are not going to have an instant effect
on memory sizes.

The next few years maybe. But for those at the RAM need edge, it will be
crucial to take full advantage of the larger 64-bit address space. Not
only 4 GB seems too little, but it is strange the Mac Pro is limited to 16
GB.

--
Hans Aberg
.



Relevant Pages

  • Re: Javas performance far better that optimized C++
    ... The compiler is extremely stupid. ... no memory leaks are guaranteed. ... However I have GC in my .NET programming. ... "C.9.1 Automatic Garbage Collection ...
    (comp.lang.cpp)
  • Re: Compiler and How It Handles Scope
    ... VB's compiler isn't really "state of the art" David. ... memory, no matter how many processes map the code into their virtual address ... words, no real distinction is made between Public/Private variables, or even ... > Modules can contain Public and Private variables, ...
    (microsoft.public.vb.winapi)
  • Re: WaitForSingleObject() will not deadlock
    ... Whatever memory values a thread can see when it creates a new thread can ... that later locks the same mutex. ... standardized multithreading library designed to run efficiently on existing ... A C compiler is not required to do this, and it can still be a conforming C ...
    (microsoft.public.vc.mfc)
  • Re: Delphi for XCode?
    ... memory, and that it moves blocks when they are promoted to the next generation, ... Of course you can have the compiler execute e.g. an array of events. ... But what does COM have to do with MY interfaces? ... Mono is using Boehm as well. ...
    (borland.public.delphi.non-technical)
  • Re: If you were inventing CoBOL...
    ... > size of shared memory is the size of procedural code. ... SECTIONs and the SEGMENT-LIMIT clause as *guidelines* in that process. ... discussing what COBOL *ought to do* and what features COBOL *ought to ... Two passes in the compiler front-end. ...
    (comp.lang.cobol)