Re: Double hires mode color artifacts



Dirk Thierbach wrote:
Michael J. Mahon <mjmahon@xxxxxxx> wrote:

I beg to differ -- I think often enough the game just wanted to
create "sharp" pixels, and had to live with the fringe color effects
because it couldn't avoid it. Bolo is the first example that comes
to mind.


If simple graphics are intended, I agree, but many games actually
had rather artistic screens--suggesting far more than one would
expect of an Apple II, given its resolution and color limitations.


Yes, of course. So one needs both.


It's not so easy to get a real smooth and color gradient (and to
replace a constant hue with a more "wavy" version) unless you magnify
quite a bit, and then it gets slow even on more modern systems. I can
run the emulation at 3x magnification, and it's still faster than the
original, but not much. Maybe better if I can get to integrate the
graphics acceleration in someway, but that's not trivial.


I've suggested a very simple table-driven algorithm that should do
quite well.


How exactly are the tables calculated? Just extract Y, U and V (or I and Q)
with a suitable filter window curve (Gaussian? Something else?)? That won't
change the "blocky" appearance much, because areas with the same pattern
will still look "solid", and only the borders will change a bit (but
not much).

I'd go with YUV, since that's more typical of the TVs of the era.

The filters should be FIR filters approximating the RLC chroma filters
of the day. For this purpose, it would be advantageous to pick an
actual monitor and simulate its filters.

You are correct, within a region of solid color, there would be
essentially no change after the window had filled with 14 "dots".

Interestingly, there does seem to be some practical difficulty for
emulator writers choosing even the correct steady-state pallette.

Care to do a simple test implementation?

My interest is more conceptual at this point. I'm not writing
an emulator and I've got my plate pretty full with other things.

So to avoid the blocky and sterile look, I guess it's more important
to introduce some low frequency noise, and avoid rendering the apple
pixel on host pixel boundaries. A filter of 4 bit width in hires (which corresponds to 8 bits in double hires) should be already sufficent for that. Maybe I should try it... thought one problem would be that this
only affects information in x, not in y.


That's correct. The only "Y blending" would result from defocusing
of the analog monitor, and can probably be neglected.


Ok, so for hires, all we have to do is to set up the tables accordingly.
Any concrete suggestions? Which noise should be added? One should also
keep in mind that for uniform areas, the noise will also look uniform
(because the same pattern is repeated), and that will be noticable
if the pattern is too obvious.

I woudln't add *any* noise. All the effects can be pre-computed and
would be completely deterministic. Any variations in luminance will
be the natural result of the dot patterns and the luminance filter,
for example.

Every bit doubles size. And the cache is limited. And I'm not having a brand new system here, either. As I said, on 3x magnification it's
still faster, but not much.


This doesn't alter the size of the frame buffer--only the size of the
table used to map the color of each pixel in the buffer.


No, but it alters the size of a memory area that is needed during
*each* access to graphics. And this should stay in the cache. The host
framebuffer is just written to, it's never read.

But I'd expect a memory-mapped frame buffer to show up in the
cache, if only to gain the benefit of "blocking" writes on a line...

Although a 14-bit table would be 16K x 4 = 64K bytes (at
32-bits/pixel), it would still fit easily in second level cache, and
references to it would have a lot of locality in most practical
cases.


For hires, I currently use 4 tables (even/odd; overlapping
bytes/single bytes) with 256 entries. Each table can produce up to 16
pixels (for larger magnifications) which 32bpp, so that's already
4*256*16*4 = 64KB. Add to that other tables for graphics, the tables
needed to do the 6502 simulation, the 6502 memory and some room to
map the non-localized framebuffer access, and I'm probably already getting more cache misses than I'd like. Didn't test, though, that's
just a rough estimation.

Temproal locality is a wonderful thing. Don't expect your cache
miss rates to skyrocket just because worst case estimates overflow
the cache.

A 14-bit (hires, not dhires) table would need at least 2^6 = 64 times
of that, that's much more. An alternative would be to scrap my code,
and instead of producing several pixels at a time to use a single
14-bit table, and rotate the bits through it. I have no idea how much
that would slow down everything, but probably it'd be quite noticable.
And to justify the effort (complete rewrite of code), you still have to convince me that the results are worth it :-)

Combinatorics of mapping multiple pixels per table reference will
result in rapid table growth. I've been talking throughout about
working at the "single Apple dot" per iteration level.

We have 3GHz processors now, and screens only need to change once
every 17 milliseconds or so, so I don't foresee any practical problem.

The issue for now and the future is how to use four or more fast
processors to produce better results, not how to keep three of them
idle. (Or are you encoding video while you emulate an Apple II? ;-)

Just to give some numbers: Speedtest says hires writes at 32bpp
with 3x magnification are about 5x as fast as the original Apple //e.
So there is some headroom, but not much. The test suite doesn't
include dhires performance.


What I'm trying to find out is if the trade-off is worth it. At the
moment I'm not convinced, because I neither have a concrete recipe
to actually employ the wider window, nor do I have a convincing case where it makes a difference. What a wider filter certainly can do (and probably would do in the TV) is to make the hue change from
one repeated pattern to another repeated pattern more steep.


The recipe is simple: map host pixels to one or more 14MHz Apple
"dots" and to get the color for each pixel, shift in the next Apple
dot (1 or 0) into a 14-bit shift register and use the register for
an index into a 32-bit color table.


So how does that tell me whether the trade-off is worth it? :-)
I need concrete examples ("that's how it looks on a real TV, that's
how it looks with 14bit window, that's how it looks with the 4bit
window, see the difference here?") to make that decision, not
a theoretical algorithm :-)

And I still have the suspicion that just using a wider filter window
won't get much closer to the appearance on a TV.

Experimentation is required--which means implementation, which
requires someone's curiosity needs to exceed the threshold...

What I have seen is lots of complaints that no emulator looks like
a composite monitor. I think my proposal addresses that issue, not
just for the Apple II, but for all computers that used NTSC color
monitors, aliased color or non-aliased color.

If no one cares to find out, so be it.

The only thing missing is the DSP-like algorithm to construct the
color table.


DSP-like implementation is simple. What's missing are the concrete
filter coefficient values such a filter would need.

And those are readily available in the schematics of actual monitors.

Yes, I know, but that wasn't the point. What I was asking was if it
would make sense to implement this mode (as an alternative to the
"sliding window" mode without quantizing x alignment), because there
are programs which use the display that way.


Sure, some do


Which ones, for example, so I can test?

I don't have an example in mind, but any program that treats the
screen as 140x192 will do the job. For example, any double lo-res
screen.

Finding graphics that provably use non-140-aligned colors will be
a little harder, requiring writing an Apple II program or using
a DHR paint program.

--but it's a special case of the more general method. If you use
the general sliding window, the "blocked" case just works if it is
used.


No. The difference is that one method will produce color blur between
blocks, while the other doesn't. If the program intended the latter,
that method would be better. You cannot subsume one under the other,
you really need different tables, and user choice to say which table
he'd like for this or that program.

I see what you mean. I was referring to supporting both "aligned"
140x192 *usage* and "unaligned" usage. Of course you are correct,
simulating a "non-blending" *emulation* requires different tables.

The good news is that the user can select from a menu of graphics
renderings that would require only different tables--same algorithm.

On a gigabyte machine, who cares whether an emulator requires
2MB or 2.25MB?


I'm kind of accustomed to 2MB L2 caches... ;-)


Comments in the code show that the program was originally intended
to give reasonable performance on a 486...

That hardly seems relevant today--or five years ago.

"The wastebasket is our most important design
tool--and it's seriously underused."


Yes :-)

Amen. ;-)

-michael

NadaPong: Network game demo for Apple II computers!
Home page: http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
.



Relevant Pages

  • Re: Double hires mode color artifacts
    ... because areas with the same pattern ... pixel on host pixel boundaries. ... A filter of 4 bit width in hires (which ... to actually employ the wider window, nor do I have a convincing case ...
    (comp.sys.apple2)
  • Iterative Mean Filter
    ... the fundamental idea is whenever the ANN detector find the noisy pixel, that pixel alone is filtered using Iterative Mean Filter and all other uncorrupted pixels are left as they are. ... Using 3x3 neighborhood window size, find the pixels that are not corrupted inside the window and replace by the original gray value. ...
    (comp.soft-sys.matlab)
  • Re: Iterative Mean Filter
    ... the fundamental idea is whenever the ANN detector find the noisy pixel, that pixel alone is filtered using Iterative Mean Filter and all other uncorrupted pixels are left as they are. ... Using 3x3 neighborhood window size, find the pixels that are not corrupted inside the window and replace by the original gray value. ...
    (comp.soft-sys.matlab)
  • Separable Median Filter Question
    ... filter in C using a horizontal and a vertical window. ... The horizontal window is applied to a pixel and then the vertical ... then the vertical window is applied to every pixel in the image. ...
    (sci.image.processing)
  • Re: 7 Places Where Mac OS X is Still Behind Windows
    ... >> nVidia had this working in hardware around 1997. ... The technology to do that was well within the accelerator vendors ... because Apple didn't think to make even one single request. ... > primitive's outline, overlaid upon the output pixel grid, the line may ...
    (comp.sys.mac.advocacy)