This repository implements a compiler for the SysY language, compiling SysY language code into RISC-V assembly code. The SysY language is a compact version of the C language.
Reference Document: https://pku-minic.github.io/online-doc/#/
cd sysy-make-template
cmake -DCMAKE_BUILD_TYPE=Debug -B build
cmake --build buildCMake will generate an executable named compiler in the build directory.
The compiler fundamentally provides the following functionalities:
- Parse SysY into an abstract syntax tree (AST).
- Parse the AST to generate Intermediate Representation (IR).
- Parse IR to generate target code.
The compiler consists of 6 main modules:
astis the abstract syntax tree,GlobalCounteris a singleton global counter,IR2RISCVconverts intermediate code to target code,koopaUtilitycontains some auxiliary functions for debugging,SymbolTableis the symbol table,processIRandprocessRISCVare additional processing after generating intermediate and target code.
The most core data structure of this compiler is the design of the abstract syntax tree (see source code for details). Moreover, a data structure SymbolTable was designed to accomplish the recursive query of the symbol table.
// Base class for all AST nodes
class BaseAST {
public:
string storeNum; // Indicates which temporary variable holds the value of this expression
virtual ~BaseAST() = default;
virtual string DumpKoopa() {
...
};
};
class CompUnitAST : public BaseAST {
public:
unique_ptr<BaseAST> func_def;
...
}
// Other AST classes follow a similar structure, using unique_ptr<BaseAST> to uniformly store pointers to child nodes and dynamic_cast for type conversion when necessary.For if...else... statements, due to ambiguity issues, the following syntax was designed:
Stmt
: NonIfStmt {...}
| MatchedStmt {...}
| UnmatchedStmt {...}
NonIfStmt
: RETURN Exp ';' {...}
...
MatchedStmt
: IF '(' Exp ')' MatchedStmt ELSE MatchedStmt {...}
...
UnmatchedStmt
: IF '(' Exp ')' Stmt {...}
...Additionally, for convenience in code writing and to avoid repetition, certain codes were encapsulated into functions.
Initially, the symbol table was designed as a singleton like the global counter but was later revised. The first version was a single, globally unified symbol table singleton. The second version is as follows:
enum class SymbolTableElementType {
INT = 0
};
extern SymbolTable* symbolTableNow; // Current symbol table
class SymbolTable{
public:
SymbolTable* father;
unordered_map<string,string> symbolTable; // Symbol table mapping symbol names to temporary variable names
bool insert(string name, string element){
// Insert name and element into the current symbol table
}
bool change(string name, string element){...}
string find(string name){
// If found in the current table, return. Otherwise, recursively search the father table.
}
}
// Functions for managing symbol tablesThe global symbolTableNow pointer indicates the current symbol table. Entering a new scope creates a new symbol table linked back to the previous one, simulating nested scopes.
The strategy for register allocation is to use the stack. Additionally, a loadRegister function was implemented for convenience:
// Generates assembly code to load a value into a register
string loadRegister(koopa_raw_value_t value, string target){
...
}The most important realization was that compiling is not just translating C++ into assembly. It's about converting one structured thing into another structured thing. The compiler is a bridge between the high-level language and the machine language, and the key is to design a good intermediate representation.