Re: Double hires mode color artifacts



Michael J. Mahon <mjmahon@xxxxxxx> wrote:
I beg to differ -- I think often enough the game just wanted to
create "sharp" pixels, and had to live with the fringe color effects
because it couldn't avoid it. Bolo is the first example that comes
to mind.

If simple graphics are intended, I agree, but many games actually
had rather artistic screens--suggesting far more than one would
expect of an Apple II, given its resolution and color limitations.

Yes, of course. So one needs both.

It's not so easy to get a real smooth and color gradient (and to
replace a constant hue with a more "wavy" version) unless you magnify
quite a bit, and then it gets slow even on more modern systems. I can
run the emulation at 3x magnification, and it's still faster than the
original, but not much. Maybe better if I can get to integrate the
graphics acceleration in someway, but that's not trivial.

I've suggested a very simple table-driven algorithm that should do
quite well.

How exactly are the tables calculated? Just extract Y, U and V (or I and Q)
with a suitable filter window curve (Gaussian? Something else?)? That won't
change the "blocky" appearance much, because areas with the same pattern
will still look "solid", and only the borders will change a bit (but
not much).

Care to do a simple test implementation?

So to avoid the blocky and sterile look, I guess it's more important
to introduce some low frequency noise, and avoid rendering the apple
pixel on host pixel boundaries. A filter of 4 bit width in hires (which
corresponds to 8 bits in double hires) should be already sufficent for
that. Maybe I should try it... thought one problem would be that this
only affects information in x, not in y.

That's correct. The only "Y blending" would result from defocusing
of the analog monitor, and can probably be neglected.

Ok, so for hires, all we have to do is to set up the tables accordingly.
Any concrete suggestions? Which noise should be added? One should also
keep in mind that for uniform areas, the noise will also look uniform
(because the same pattern is repeated), and that will be noticable
if the pattern is too obvious.

Every bit doubles size. And the cache is limited. And I'm not having a
brand new system here, either. As I said, on 3x magnification it's
still faster, but not much.

This doesn't alter the size of the frame buffer--only the size of the
table used to map the color of each pixel in the buffer.

No, but it alters the size of a memory area that is needed during
*each* access to graphics. And this should stay in the cache. The host
framebuffer is just written to, it's never read.

Although a 14-bit table would be 16K x 4 = 64K bytes (at
32-bits/pixel), it would still fit easily in second level cache, and
references to it would have a lot of locality in most practical
cases.

For hires, I currently use 4 tables (even/odd; overlapping
bytes/single bytes) with 256 entries. Each table can produce up to 16
pixels (for larger magnifications) which 32bpp, so that's already
4*256*16*4 = 64KB. Add to that other tables for graphics, the tables
needed to do the 6502 simulation, the 6502 memory and some room to
map the non-localized framebuffer access, and I'm probably already
getting more cache misses than I'd like. Didn't test, though, that's
just a rough estimation.

A 14-bit (hires, not dhires) table would need at least 2^6 = 64 times
of that, that's much more. An alternative would be to scrap my code,
and instead of producing several pixels at a time to use a single
14-bit table, and rotate the bits through it. I have no idea how much
that would slow down everything, but probably it'd be quite noticable.
And to justify the effort (complete rewrite of code), you still
have to convince me that the results are worth it :-)

Just to give some numbers: Speedtest says hires writes at 32bpp
with 3x magnification are about 5x as fast as the original Apple //e.
So there is some headroom, but not much. The test suite doesn't
include dhires performance.

What I'm trying to find out is if the trade-off is worth it. At the
moment I'm not convinced, because I neither have a concrete recipe
to actually employ the wider window, nor do I have a convincing case
where it makes a difference. What a wider filter certainly can do
(and probably would do in the TV) is to make the hue change from
one repeated pattern to another repeated pattern more steep.

The recipe is simple: map host pixels to one or more 14MHz Apple
"dots" and to get the color for each pixel, shift in the next Apple
dot (1 or 0) into a 14-bit shift register and use the register for
an index into a 32-bit color table.

So how does that tell me whether the trade-off is worth it? :-)
I need concrete examples ("that's how it looks on a real TV, that's
how it looks with 14bit window, that's how it looks with the 4bit
window, see the difference here?") to make that decision, not
a theoretical algorithm :-)

And I still have the suspicion that just using a wider filter window
won't get much closer to the appearance on a TV.

The only thing missing is the DSP-like algorithm to construct the
color table.

DSP-like implementation is simple. What's missing are the concrete
filter coefficient values such a filter would need.

Yes, I know, but that wasn't the point. What I was asking was if it
would make sense to implement this mode (as an alternative to the
"sliding window" mode without quantizing x alignment), because there
are programs which use the display that way.

Sure, some do

Which ones, for example, so I can test?

--but it's a special case of the more general method. If you use
the general sliding window, the "blocked" case just works if it is
used.

No. The difference is that one method will produce color blur between
blocks, while the other doesn't. If the program intended the latter,
that method would be better. You cannot subsume one under the other,
you really need different tables, and user choice to say which table
he'd like for this or that program.

On a gigabyte machine, who cares whether an emulator requires
2MB or 2.25MB?

I'm kind of accustomed to 2MB L2 caches... ;-)

Comments in the code show that the program was originally intended
to give reasonable performance on a 486...

"The wastebasket is our most important design
tool--and it's seriously underused."

Yes :-)

- Dirk
.



Relevant Pages

  • Re: Double hires mode color artifacts
    ... expect of an Apple II, given its resolution and color limitations. ... essentially no change after the window had filled with 14 "dots". ... pixel on host pixel boundaries. ... A filter of 4 bit width in hires should be already sufficent for that. ...
    (comp.sys.apple2)
  • Code Addendum 01 - gforth: SDL/OpenGL Graphics Part V
    ... \ Load the SDL C Library Interface ... \ set to 1) when the mouse cursor is inside the display window. ... \ Pixel plot function. ... \ Plots the pixel value to the *dst surface at the given coordinates. ...
    (comp.lang.forth)
  • Code Addendum 04 - gforth: SDL/OpenGL Graphics Part VII
    ... \ set to 1) when the mouse cursor is inside the display window. ... \ Pixel plot function. ... \ Plots the pixel value to the *dst surface at the given coordinates. ...
    (comp.lang.forth)
  • Code Addendum 02 - gforth: SDL/OpenGL Graphics Part IX
    ... \ set to 1) when the mouse cursor is inside the display window. ... \ Pixel plot function. ... of _pixel *ofs C! ... \ Plots the pixel value to the *dst surface at the given coordinates. ...
    (comp.lang.forth)
  • Code Addendum 03 - gforth: SDL/OpenGL Graphics Part XII
    ... \ set to 1) when the mouse cursor is inside the display window. ... \ Pixel plot function. ... of _pixel *ofs C! ... \ Plots the pixel value to the *dst surface at the given coordinates. ...
    (comp.lang.forth)