Re: slighty off topic -- writing an assembler!



[Note that this is a followup to a thread from 1998. -John]

From: samuel <SAMIGWE@xxxxxxxxxxxxxxxx>
Date: 24 Jun 1998 00:04:30 -0400
Good day all:
I am currently working on writing an assembler (intel syntax
for the x86 microprocessor)for my operating system project. I haven't
yet had any formal training on the design of one and havent been able
to find any "assembler design" books.

I am currently written a assembler. I unfortunately started writing one
before finding this thread, and reading the short description on a macro
assembler. Also, this thread being quite dated - still does not mean it
could never be useful - I found it useful in the year 2006.

I ended up creating a table for the MODRM and SIB bytes at compile time,
using macros to generate the rather large table. The table used the format.

struct tmodrmsib_tbl
{
dword type;
sbyte *expression;
bool hasSIB;
byte modrm;
byte sib;
};

I generated every single possible addressing mode, and its corresponding
addressing entries. I used expression to hold a ASCII zero-terminated string
to store something like: "eax", "eax+ebx", "eax*4", "eax+ecx+$1".

I used: $1, $2, and $3. To represent a dword, word, and byte displacement.
My table ended up looking like:
{A_PTR, "eax", false, 0x00, 0x00},
{A_PTR, "ecx", false, 0x01, 0x00},
{A_PTR, "edx", false, 0x02, 0x00},
{A_PTR, "ebx", false, 0x03, 0x00},
MASIB3(A_PTR, , 0x04), // generate all SIB possibilities - no
displacement.
{A_PTR, "$3", false, 0x05, 0x00},
{A_PTR, "esi", false, 0x06, 0x00},
{A_PTR, "edi", false, 0x07, 0x00},

For the first addressing mode. I used a macro to generate the SIB entries.

I store instructions with:
struct tISet
{
dword memonic;
dword prefixs;
word opcode;
dword operand1;
dword operand2;
};

So, it looks like this:
tISet ISet[] = {
{0xFFFFFFFF, 0, 0, 0, 0},
{ME_MOV,0, 0x0088, A_RM8, A_R8 | X86_O_R},
{ME_MOV,P_OSO, 0x0089, A_RM16, A_R16 | X86_O_R},
{ME_MOV,0, 0x0089, A_RM32, A_R32 | X86_O_R},
{ME_MOV,0, 0x008A, A_R8 | X86_O_R, A_RM8},
{ME_MOV,P_OSO, 0x008B, A_R16 | X86_O_R, A_RM16},
{ME_MOV,0, 0x008B, A_R32 | X86_O_R, A_RM32},
};

I define my flags so that:

A_RM32 = A_R32 | A_DWORDPTR .. and so on. So multiple types can be specified
and pass for one type specified, when the assembler chooses the correct
instruction. X86_O_R is ignored by the type checking, and is later handled
by a function for writing out the arguments for the instruction.

I used this, passed around between my functions to keep track of the
instruction building process:
struct tipi
{
bool wrotePrefix;
dword prefix;
bool wroteOpcode;
word opcode;
bool wroteMODRM;
byte modrm;
bool wroteSIB;
byte sib;
byte wroteDisplacement;
union{
dword displacement;
sdword sdisplacement;
};
byte wroteIntermediate;
dword intermediate;
};

The final step is reading this struct and writing out the bytes for the
instruction. So, I do not think I built a macro assembler at all, but rather
something else that so far this design of the assembler has worked very
well.

I am planning on packing the just the core of generating the x86
instructions into this layer of the assembler, and the rest into a
preprocessor layer for the assembler I suppose? =)

http://compilers.iecc.com/comparch/article/98-06-126
.



Relevant Pages

  • RosAsm injects extra bytes into your data
    ... If this is an example of how much better off beginners are with RosAsm ... declare *all* your variables in the same declaration section. ... The assembler is going to have to inject padding bytes between B and T2 ... to keep T2 properly dword aligned (or more, ...
    (alt.lang.asm)
  • Re: hla set up for the art of assembly
    ... are using it to teach assembly programming, ... Very assembler like: ... Note that HLA allows a form that lets you specify a constant. ... actually support a MUL or IMUL instruction that has a constant operand. ...
    (alt.lang.asm)
  • Re: Loading single word to a xmm register
    ... Certainly not the types of programming that is being done with SSE ... And keep in mind that if your assembler were running on a CPU prior to ... the advent of the SSE instruction set, then your assember ...
    (alt.lang.asm)
  • Re: HLA v1.86 is now available
    ... Which instruction do they know, when they the first time need ... Most students in the lab *would* be capable of loading a constant into ... you ought to learn Intel syntax and drop nonsense like this. ... Why is it important to "learn how to operate the assembler, ...
    (alt.lang.asm)
  • Re: softwire for basm?
    ... Not necessarilly, but it definetely makes it more easier for humans to code, and it reduces the overall amount of code you'll have to maintain in the library (even when NOT using a preexisting assembler). ... Assembling is relatively mechanical and straightforward once you have understood the modRM / SIB stuff, remember that the first full-blown assemblers did fit in a dozen kB of code (most of it being instruction tables, rather than code), and the parsed output of a block of ASM is directly the assembled ASM. ... ASM registers are a limited ressource, easily injectable code will involve systematic pushing/popping and extra register traffic which can quickly end more costly than the calls. ...
    (borland.public.delphi.language.basm)