Tiny Basic Interpreter
- HariharanI don’t know what it is with interpreters and compilers that I always try to build one without ever completing it. I get very excited to build a new programming language or just to play around with an existing esoteric language which has led to me having a couple of different lexers and partially completed parsers and code generators. This time, I wanted to finally build something that atleast prints “Hello, World” and I am very happy with the results.
This implementation is far from perfect. I am not very confident about the parser, leads to a lot of seg faults, speaking of which, there is no error reporting.
I reused a lexer I wrote for a language which I designed to be a simpler version of C, syntactically. Tiny Basic has fewer keywords and a much simpler grammar. I had to only remove certain things from the old lexer to make it work with Tiny Basic. The lexer generates an array of Tokens
typedef struct Tokens{
int type;
char *value;
} Token;
where the value is the actual token and the type is something defined in the tokens.h file.
I found the grammar on Wikipedia.
line ::= number statement CR | statement CR
statement ::= PRINT expr-list
IF expression relop expression THEN statement
GOTO expression
INPUT var-list
LET var = expression
GOSUB expression
RETURN
CLEAR
LIST
RUN
END
expr-list ::= (string|expression) (, (string|expression) )*
var-list ::= var (, var)*
expression ::= (+|-|ε) term ((+|-) term)*
term ::= factor ((*|/) factor)*
factor ::= var | number | (expression)
var ::= A | B | C ... | Y | Z
number ::= digit digit*
digit ::= 0 | 1 | 2 | 3 | ... | 8 | 9
relop ::= < (>|=|ε) | > (<|=|ε) | =
string ::= " (a|b|c ... |x|y|z|A|B|C ... |X|Y|Z|digit)* "
Even though I initially started implementing the parser based on this grammar, later on I took some liberty in a few things like non-char variable names, omitting RUN, CLEAR and making END optional.
The parser is where all the magic takes place. It is a recursive descent parser (I have to mention that I have never done a compilers course so I didn’t look into the type of grammar before writing the parser) followed by evaluation. Before any parsing or evaluation takes place, there is a pass where all the labels are stored in an array where the index points to the line number (starts from 0). Not fancy, but gets the job done.
The symbol table is basically an array of structs.
typedef struct Symbols{
char *variable;
int type;
int value;
} Symbol;
Variables can store only ints for now. The symbol table is searched using linear search with the variable identifier as the key. Maps would have been really useful but I didn’t want to increase the number of lines any further. My eventual plan is to port it to work with NodeMCU/Arduino and when I get to it I might implement a map and make the code more efficient. For now, it just works.
Any assignment to a variable should be preceded by LET, even if the variable is already declared/initialized. Labels are optional and no syntax for loops yet, one has to make do with GOTO or GOSUB paired up with an IF condition for exiting the loop.
Stars in TB
I remember this being one of my first programs ever on a variant of Basic called RobotBasic.
PRINT "How many stars do you want?"
LET stars = 0
INPUT stars
10 IF stars == 0 THEN END
PRINT "*"
LET stars = stars - 1
GOTO 10
Fibonacci in TB
REM This code prints n fibonacci numbers
LET a = 0
LET b = 1
LET n = 10
REM This is how loops work for now
100 PRINT a
IF n == 0 THEN END
LET t = b
LET b = a + b
LET a = t
LET n = n - 1
GOTO 100
Infamous Hello World
10 PRINT "Hello World!"
GOTO 10
You can find the code for the interpreter here.