Re: tricks to make large PLAs fast?



Jim Granville <no.spam@xxxxxxxxxxxxxxxxx> writes:
That's a large array - does it really cover 2^25 combinations,
or can you compress the inputs, so that the remainder can fit into
Block Ram(s) ?

Not really. It was a design originally implemented in custom CMOS in
the early 1980s, and I don't really want to redesign it any more than
necessary. There are lots of don't cares scattered throughout the
AND matrix of the PLA, so it won't fit in any reasonable-sized ROM
or RAM. Also, the 25-bit input words don't uniquely map to outputs;
a given input word may (and often does) match multiple product terms.

I've now tried putting a "keep" attribute on the product terms, and
that made the timing worse. I thought it would result in better
(separate) optimization of the product terms and OR terms, rather
than mashing them together and trying to optimize the result.

By default, ISE *is* using the carry chain, and in fact it seems to
be using it for some 95-input gates, which end up being much slower
than an equivalent tree would be. I'm doing a run now with the
"USE_CARRY_CHAIN" attribute set to "no" for all the sum terms.

I might try having hacking my Python tool that translates the PLA
so that it directly instantiates trees of four-input gates for all
of the product terms and sum terms, with a "keep" attribute on each
gate output, and see what kind of timing that gets me. I think that
should result in the fewest possible levels of logic, which would
be around eight.

For the first attempt, I'll just brute-force it, so that none of
the terms have any shared sub-terms. If the results look reasonable,
I'll try to optimize it for use of common sub-terms to reduce the
total number of LUTs, while keeping the levels constant.

Eric
.