Parser

As described in the compiler chapter. The parser is the second component in our system. It consumes the stream of token and produce a parse tree.

What is a parse tree?

A picture is worth a thousand words, let me simply show you what a parse tree is!

So here is an example parse tree, it captures the structure of the program. Our goal is to construct a parse tree based on the stream of tokens coming from the scanner.

Interface

class parser
{
public:
    parser(scanner* scanner);
    program_node* parse();
    ...
};

...

Interface explained

This interface is really simple, all it does is that it turns the information in the scanner, into a parse tree. The root is a program_node, there are a few function_node under it, and so on.

Implementation

For a reader who never build a parser before, this interface can be overwhelmingly hard to implement. While there are multiple ways to build a parse tree, here I chose the one that I think is the simplest, top down recursive parsing.

Top down recursive parsing

Unlike the previous chapter, some theory is actually useful for parsing. In particular, I wanted to introduce the concept of context free grammar. It is basically this:

<program> := <function>+
<function> := function ( <argument> ) { <statement> }
...

The idea is that the left hand side (called non-terminals) can be expanded to the right hand side, and the right hand side stuff under an angle bracket can continue to expand, and so on, until there are no angle bracket. There are multiple options so there can be many different trees that we can generates out of this grammar.

With that in mind, we write one function per non-terminal to handle it. The goal of that function is to handle that non-terminal. Whenever we hit another non-terminal on the right hand side. We recursively call the function that handle the non-terminal to deal with it.

To read my code, that is one simple convention I used throughout. Whenever I call a parse function, the scanner already scan the first token. Understanding this will make the reading much easier.

If it is really hard to implement this, I suggest you can read a standard compiler textbook on top-down recursive parsing, they offer much better explanation. You can also read my code as well. But please try to implement it yourself. Writing a top down recursive parser yourself at least once is a rewarding exercise, you will feel like you are now capable on parsing any reasonable grammar!

Practical notes

Again, practical compiler author prefers hand written parser over generated ones for the same reason, ease to modify and debug. However, the top-down recursive parsing technique here has a fatal flaw - it use a lot of stack space, and there is no easy way around it. The LR parsing technique does save memory, but it is really hard to write one of those by hand. So there is no clear winner here. But top down recursive parsing is a clear winner in terms of simplicity.

results matching ""

    No results matching ""