This was a Spring 2021 semester project for the University of Athens Compilers course.
Written in Java, it accepts a MiniJava file, performs Semantic checking and generates an equivalent LLVM-IR file that can be compiled to an executable binary by Clang.
make
- Java (any version >= 8 will do the job)
- JavaCC 5, along with a University of Athens
patched version
of JTB 1.3.2 are used for Parsing and Parse Tree generation.
(The
.jar
files for both are included, but can be found here as well.)
To build:
- In the project root,
cd MiniJavaLLVMCompiler
- Run
make
.
To compile one or multiple files, run java Main <file> <rest files>*
.
To clean up all generated files when done, run make clean
.
- The program checks the input MiniJava file for semantic errors.
If any error is found, it will be reported and the compilation of the current file will stop, so no more errors will be reported. - If no semantic errors are detected, the program generates the equivalent LLVM-IR
.ll
file.
No variable initialization checks are made, and all created variables/arrays are 0-byte initialized at runtime. The compiler tries to mirror the JVM behaviour regarding Array out of bounds access errors. It checks every index access at runtime and if such an error is detected, it is reported and the execution stops.
This program takes advantage of Visitor Pattern. 4 Visitors are used in the below order:
- ClassNameCollector: Collects and stores all class names declared in the file.
- Declaration Collector: Collects and stores all fields and methods declared in classes.
- FunctionBodyAnalyzer: Performs static checking in method bodies, while taking into account all the stored information up to that point. If an error is detected, it is reported and the compilation of the current target file will is aborted.
- IRGenerator: Creates the output LLVM-IR file.
- In a
ClassInfo
object, the class name, superclass, fields, methods and offsets are stored. The visitors handle aMap
structure containing one such object for each class in the file. - Class fields are also stored in a
Map
structure containingFieldInfo
objects. EachFieldInfo
has its respective offset and aVariableInfo
object, which is essentially a pair ofname
andtype
strings. - Class methods are stored in a
Map
structure as well, which consists ofMethodInfo
objects. EachMethodInfo
has its respective offset and aFunctionInfo
object, which contains the methodname
, returntype
and parameter types.
A VirtualTable
object maps Method names to Method objects. All classes in the MiniJava file an object of this type.
When a class extends a superclass, it provides the Virtual Table to the superclass (recursively),
to store its methods first, and the subclass methods are added afterwards. If a method overrides
a superclass method, it replaces it and obtains its offset.
- The Symbol Table only stores local variables and class fields, since methods can be
looked-up in
ClassInfo
objects, and uses a Stack (Deque
) of ScopeSymbols. When entering a Class, all fields (including superclass fields) are pushed in the Stack: If e.g.class B extends A
, classA
fields will be pushed first, and classB
fields will be pushed afterwards. When analyzing a classB
method, all parameters and local variables will be pushed, and when done, they will be popped. When we are done with the class, the fields will also be popped. - When an identifier name is looked-up in the Symbol Table, we start searching the first
ScopeSymbols
object in the Stack and then search the rest until an entry associated with that name is found.
- Each
visit
method in this visitor returns aString
, which indicates the type of the evaluated expression, ornull
, if the statement does not need to be evaluated (loops, declarations etc.).
The Visitor stores:
- A
Map
(classInfos
), used by the previous Visitors to store Classes along with their fields and methods, - A
currentClass
slot were the Class of the Method that is currently being visited is stored. - A Symbol Table, used as described above.
- Each
visit
method in this visitor returns aString
, which is the name of the register where the evaluated expression is stored, ornull
, if this is not needed.
The second argument is also aString
, which essentially is a slot for extra information to be provided tovisit(Identifier, String)
method:- If the argument is
null
, then the method returns the name of the identifier. - If the argument is
"lvalue"
, then the method returns the stack register that refers to the variable (and essentially contains the variable address). - If the argument is
"rvalue"
, then the variable of the above register is loaded into a new register, which contains the variable value, and this new register is returned.
- If the argument is
The Visitor stores:
- The
classInfos
Map
used by the previous visitors, - A
currentClass
slot were the Class of the Method that is currently being visited is stored. This allows to quickly determine the class type of the object stored in%this
. - A Symbol Table, used as described above,
- A
FileWriter
used to write to output.ll
file, - Counters used for generating new register and label names,
- A
Map
(objectRegisters
) which connects registers (which point to objects) to the Class of their object. This is useful in method calls (MessageSend
's), to determine the class type of the caller object (and therefore decide which classVirtualTable
to lookup to get the method offset). This maps local function registers, therefore it is cleared as soon as the function body has been generated.
Developed and tested in WSL Ubuntu 20.04, using Visual Studio Code.
javacc5.jar
andjtb132di.jar
files were used for JavaCC and JTB respectively.- Java SE-14 was used in development & testing.
- Clang 10.0.0 was used for compiling and executing
.ll
files produced by the Generator.