Code generator
The code generator is our last component in the compiler. It takes the parse tree as the input and generate instructions for the machine.
Without the actual machine, it makes little sense to write the code generator. Therefore, while we introduce the code generator first, we will actually write the code generator later, and write our virtual machine first.
Interface
struct instruction
{
instruction();
instruction* prev;
instruction* next;
};
struct instruction_sequence
{
instruction_sequence();
instruction* head;
instruction* tail;
};
struct code_generation_outputs
{
instruction_sequence instructions;
int entry_point;
};
class code_generator
{
public:
code_generator();
code_generation_outputs generate_code(program_node* program);
};
...
Interface explained
The code generation logic simply take the parse tree as an the input and generate the instruction_sequence
. The instruction sequence is simply a doubly linked list of instructions.
Note that at the output, label must be removed, and any instruction that reference a label have an address specified on the label.
Implementation
Post order traversal
The key idea of code generation is simple - post order traversal. In order to generate the code to compute 1 + 2, you generate the code to generate 1, then you generate the code to generate 2, and then you generate the code to compute 1 + 2.
Program writing program
Writing a code generator is very much like writing a program that write programs. As a human programmer, we seldom write the code in a straightforward top down manner, we often write this function, suddenly realize we need to do something if something happen, but otherwise not doing this now and will do that later. This requires us to allow editing the code. That is why we used a doubly linked list, instead of a simple vector to represent the instructions.
Expression target
My implementation try to stay as context free as possible. The code generator walk the tree recursively and generate code as it goes. A key parameter to generate_code()
methods is the context->expressionTarget
parameter. The parameter tells where the code generator should put the output to.
Practical notes
This is a really naive code generator. If you look at my generated code, you will often find it generates code to read register, write to memory, and then read it back to put it back into register. You will also find a branch instruction branching to the next statements, etc. Any reasonable compilers today should do optimization to get rid of them. To make our life simple, I did not do that. You could try it for your exercise.