Tiny Basic Interpreter

- Hariharan

I don’t know what it is with interpreters and compilers that I always try to build one without ever completing it. I get very excited to build a new programming language or just to play around with an existing esoteric language which has led to me having a couple of different lexers and partially completed parsers and code generators. This time, I wanted to finally build something that atleast prints “Hello, World” and I am very happy with the results.

This implementation is far from perfect. I am not very confident about the parser, leads to a lot of seg faults, speaking of which, there is no error reporting.

I reused a lexer I wrote for a language which I designed to be a simpler version of C, syntactically. Tiny Basic has fewer keywords and a much simpler grammar. I had to only remove certain things from the old lexer to make it work with Tiny Basic. The lexer generates an array of Tokens

typedef struct Tokens{
  int type;
  char *value;
} Token;

where the value is the actual token and the type is something defined in the tokens.h file.

I found the grammar on Wikipedia.

line ::= number statement CR | statement CR

statement ::= PRINT expr-list
              IF expression relop expression THEN statement
              GOTO expression
              INPUT var-list
              LET var = expression
              GOSUB expression
              RETURN
              CLEAR
              LIST
              RUN
              END

expr-list ::= (string|expression) (, (string|expression) )*

var-list ::= var (, var)*

expression ::= (+|-|ε) term ((+|-) term)*

term ::= factor ((*|/) factor)*

factor ::= var | number | (expression)

var ::= A | B | C ... | Y | Z

number ::= digit digit*

digit ::= 0 | 1 | 2 | 3 | ... | 8 | 9

relop ::= < (>|=|ε) | > (<|=|ε) | =

string ::= " (a|b|c ... |x|y|z|A|B|C ... |X|Y|Z|digit)* "

Even though I initially started implementing the parser based on this grammar, later on I took some liberty in a few things like non-char variable names, omitting RUN, CLEAR and making END optional.

The parser is where all the magic takes place. It is a recursive descent parser (I have to mention that I have never done a compilers course so I didn’t look into the type of grammar before writing the parser) followed by evaluation. Before any parsing or evaluation takes place, there is a pass where all the labels are stored in an array where the index points to the line number (starts from 0). Not fancy, but gets the job done.

The symbol table is basically an array of structs.

typedef struct Symbols{
    char *variable;
    int type;
    int value;
} Symbol;

Variables can store only ints for now. The symbol table is searched using linear search with the variable identifier as the key. Maps would have been really useful but I didn’t want to increase the number of lines any further. My eventual plan is to port it to work with NodeMCU/Arduino and when I get to it I might implement a map and make the code more efficient. For now, it just works.

Any assignment to a variable should be preceded by LET, even if the variable is already declared/initialized. Labels are optional and no syntax for loops yet, one has to make do with GOTO or GOSUB paired up with an IF condition for exiting the loop.

Stars in TB

I remember this being one of my first programs ever on a variant of Basic called RobotBasic.

PRINT "How many stars do you want?"
LET stars = 0
INPUT stars
10 IF stars == 0 THEN END
PRINT "*"
LET stars = stars - 1
GOTO 10

Fibonacci in TB

REM This code prints n fibonacci numbers
LET a = 0
LET b = 1
LET n = 10

REM This is how loops work for now
100 PRINT a
IF n == 0 THEN END
LET t = b
LET b = a + b
LET a = t
LET n = n - 1
GOTO 100

Infamous Hello World

10 PRINT "Hello World!"
GOTO 10

You can find the code for the interpreter here.