corgen cic = terrible efficiency?



I'm working with the xilinx corgen cic v3.0. I'm finding that to get a
decent rejection in the images (60 dB) I need about 4 stages. My input is
only 10 bit and I still end up with a 66 bit output, 50 of which are thrown
away. As a result my design won't fit in my device. Seems horribly
inefficient to me, so I have some questions:

1. My coregen says it doesn't support V4 for the cic so I've been compiling
for V2. Seems like the DSP48 with the large accumulator is ideal for CICs?
2. Looks like the exponential bit growith is from the number of stages.
Since noone uses more than 16 bits at the output why can't the output of the
first integrator be trimmed back to 16 bit before feeding the next and so
on?
3. If the cic is just a box car filter wouldn't it be easier to implement as
a single subtractor/accumulator whose inputs are the current sample and the
sample delayed by R? At least for reasonable R (< 8192) seems like it should
fit in block ram okay.

Thanks for any help,
Clark


.