Project Title: ADLexSyn
Introduction: ADLexSyn is a comprehensive project that explores the fundamental aspects of lexical analysis, syntax analysis, and symbol table management within the context of programming language processing. The project is designed to demonstrate the implementation of key components necessary for the compilation or interpretation of a programming language. The main components of ADLexSyn include a lexer, a parser, and different types of symbol tables. Each of these components plays a critical role in the translation and execution of source code written in a high-level programming language.
Components:
-
Lexer (Lexical Analyzer):
- The lexer is responsible for scanning the source code and converting it into a series of tokens. Tokens are the atomic units of meaning, such as keywords, identifiers, operators, and literals.
- The lexer uses regular expressions to identify and categorize these tokens. For instance, it can recognize keywords like "print", operators like "+", and identifiers like variable names.
- The primary goal of the lexer is to simplify the syntax analysis phase by providing a structured sequence of tokens.
-
Parser (Syntax Analyzer):
- The parser takes the tokens generated by the lexer and arranges them into a hierarchical structure called a parse tree or syntax tree. This tree represents the grammatical structure of the source code.
- The parser checks if the source code follows the predefined grammar rules of the programming language. If the code violates these rules, the parser will generate syntax errors.
- ADLexSyn's parser handles basic statements such as print statements and assignment statements, ensuring that they conform to the language's syntax.
-
Symbol Tables:
- Symbol tables are data structures used to store information about identifiers (e.g., variable names, function names) encountered in the source code.
- Unordered Symbol Table: This table stores identifiers without any specific order, making it simple and efficient for certain types of lookups.
- Ordered Symbol Table: This table stores identifiers in a sorted order, typically alphabetical, which can make searching more efficient.
- Hash Symbol Table: This table uses a hash function to map identifiers to specific slots in a table, providing efficient and quick access.
Purpose: The primary purpose of ADLexSyn is to illustrate the core processes involved in the early stages of program compilation or interpretation. By breaking down the source code into tokens, constructing a syntax tree, and managing identifier information through symbol tables, ADLexSyn showcases the crucial steps that underpin any programming language's functionality. This project serves as an educational tool for understanding how high-level code is translated into a form that a computer can execute.
Getting Started:
-
Clone the Repository:
git clone https://github.com/AhmedDiaa0212/ADLexSyn.git cd ADLexSyn
-
Install Dependencies: Ensure you have Python installed. Then, install the necessary Python packages:
pip install anytree tabulate
-
Run the Project: To execute the project and see the output, run:
python main.py
Usage:
- Lexer: Tokenizes the input source code into a list of tokens.
- Parser: Constructs a syntax tree from the tokens and verifies the code's syntactical correctness.
- Symbol Tables: Builds and prints different types of symbol tables (unordered, ordered, and hash).
Example:
You can test the project with your own source code by modifying the test.txt
file. The output will be saved in output.txt
.
Course Context: This project is developed as part of a course on compiler design, aiming to provide practical insights and hands-on experience with the key components of a compiler.
Contributing: I welcome contributions to improve ADLexSyn. Please fork the repository and submit a pull request with your enhancements. Thank you for exploring ADLexSyn! If you have any questions or feedback, feel free to open an issue or contact us.
Happy coding!