Picoblaze enhencement and assembler



Hi,

I wasn't very satisfied with the available assembler, so a few month
ago I wrote a new compiler for the Picobaze during my spare
weekends ...

I just though I'd share it if anyone is interested ....

It's available there :
http://www.246tnt.com/files/PBAsm_20080225.tar.bz2

Example usage:
python ./Codegen.py example.S out.mem

WARNING: The last file on the command line is the output ... so if you
forget it, it will overwrite your last source file ...

What are the pro and cons:

pros:
- Cross platform
- It supports local label, so for labels that don't really deserve a
name you can do things like

load s0, 15
1:
sub s0, 1
jump nz, 1b

And the '1b' reference says to find the first label named 1 when
going 'back' (1f would be forward...).
Local labels are just a single digit.

- It supports initialized data, and reference to the "data" sections.
Like

..data
var1: .byte 0xa5

..text
load s0, var1
fetch s0, (s0)

- It supports evaluating expression like "load s0, 15 / 5 * 8"
- It supports some advanced hw feature not in the original picoblaze
like offset during fetches / store / input / output :

fetch s0, (s1)8

equivalent to

add s1, 8
fetch s0, (s1)
sub s1, 8

but in 1 cycle and at no hw cost ... (just need to change a LUT3 to a
LUT4 and change it's INIT string)


cons:

- Incompatible syntax with the official one ...
- I haven't tested interrupt stuff
- The registers are now s0 -> s15 and not s0 -> sF ... (easier to
parse)
- To use all the features you need a hw modified picoblaze
- It has terrible error reporting ... basically just throws an
exception with a not always helpful message.

It could use cleanup, an optimizer, a C front-end, a macro
preprocessor .... but ... it works :)

About the hw mods to the hw :
- I removed the ScratchPad distributed RAM and used the second port
on the BRAM as scratch pad. The second port is configured as 8 bit
width and with upper address lines mapped to 1 so that the SP is
located at the end of the BRAM. (Be careful when playing with this and
interruptions, the last 2 bytes of the scratch pad is the interrupt
vector ...) And the bonus is a 256 byte scratch pad, just sacrificing
128 instruction words ... (adjust mapping for other tradeoffs)

This mod is not absolutely needed. But if you want to use the pre-
initialized memory without it, you'll have to change the main function
to output two independent .mem file instead of just 1 merged one ...
(pretty easy, juste look at the end of CodeGen.py )

- I changed a lut to allow for having an offset in fetch/store/input/
output ... I can post the exact patch if it's to interest to anyone.
(I don't have it handy right here ...) But it doesn't cost any
slice ... It might prolong the critical path a bit. But for me, the
critical path was somewhere else anyway so I didn't mind ...


Sylvain
.