Minishell is a robust POSIX-compatible command interpreter implementation, developed entirely in C as part of 42 School's advanced curriculum.
This project represents a significant milestone for me @LordMikkel and my partner @David-dbd: it is our first large-scale collaborative development (10,000+ lines of personal code), designed to delve deep into Unix system architecture, process management, and compiler theory. This shell has been engineered with a focus on user experience (UX), memory stability and optimization, and a scalable AST-based architecture.
git clone https://github.com/LordMikkel/Minishell.git
cd minishell
make
./minishell- π§ System Overview & Dependencies
- ποΈ Architecture & Parsing Logic
- βοΈ Core Components: The AST & Execution
- β‘ Features & Capabilities
- β»οΈ Memory Management Strategy
- π§ͺ Testing & Quality Control
- π€ Contributing & Roadmap
- βοΈ Credits
The core has been built to operate with efficiency comparable to lightweight production shells, minimizing external dependencies and maximizing control over system resources.
- Libft (Custom Standard Library): We use our own library (a prerequisite from previous curriculum) that reimplements essential C functions. Not just string handling, but custom implementations of printf, file I/O, and memory management, forming the foundation upon which the shell operates.
- Isocline (Modified Fork): We chose isocline as the command-line engine but performed a strategic fork to adapt it to the needs of a real shell.
- Leak Correction: We detected and patched an existing memory leak in the original library to ensure impeccable memory usage.
- Signal Compatibility: We modified the internal behavior so that operating system signal reception interacts correctly with our shell's lifecycle, emulating the native behavior of readline.
The data flow follows a strict modular design inspired by modern compiler theory.
User input is processed and converted into a dynamic array of tokens (reallocated as needed). Unlike a simple split, our lexer types each element for the parser:
Token Types: WORD, COMMAND, OR (||), AND (&&), SEMICOLON (;), REDIR_INPUT (<), REDIR_OUTPUT (>), REDIR_APPEND (>>), REDIR_HEREDOC (<<), ASIGNATION (VAR=val), EXPANSION ($VAR), SUBSHELL.
We implemented a hybrid expansion strategy:
- Safe Expansion: Expansions that don't alter the command structure are resolved early.
- Late Binding: For variables that could mutate or affect execution, we wait until just before execution to expand them, ensuring data integrity.
We employ a Recursive Descent Parser algorithm supported by Backtracking.
Logic: The parser attempts to build a branch of the linked node tree based on grammatical rules. If it encounters a syntactic inconsistency, the algorithm "backtracks" to try an alternative rule, similar to how programming language compilers operate.
To handle complex command combinations properly, our Recursive Descent Parser follows a strict priority hierarchy (from lowest to highest binding power). This ensures that operators are grouped correctly without ambiguity:
- Sequence Level (
;): The lowest priority. Splits independent commands. - Logical Level (
&&,||): Handles conditional execution based on the previous exit code. - Pipeline Level (
|): Connects the output of one process to the input of the next. - Subshell Level (
( )): Detected parentheses trigger a recursive call to the sequence parser within a forked process. - Command Level: Grouping of words, variables, and redirections into an actionable command.
This hierarchical approach allows constructs like (ls | wc) && echo "done"; sleep 5 to be parsed naturally into a coherent tree.
here an graphical example that can help to understand it better.
User Input βββΆ "(ls | wc) && echo done; sleep 5"
β
βΌ
Tokenizer βββΆ [(] [ls] [|] [wc] [)] [&&] [echo] [done] [;] [sleep] [5]
β
βΌ
Parser βββΆ [ NODE: SEMICOLON (;) ] βββ (AST Root Root)
β± β²
β± β²
[ NODE: AND (&&) ] [ NODE: COMMAND ]
β± β² β
β± β² "sleep 5"
[ NODE: SUBSHELL ] [ NODE: COMMAND ]
β β
[ NODE: PIPE (|) ] "echo done"
β± β²
β± β²
[ COMMAND ] [ COMMAND ]
β β
"ls" "wc"
The data structure is not linear; it's a binary tree representing the logical hierarchy of the command.
- Intermediate Nodes: Represent flow control (AND, OR, SEMICOLON, PIPE) and redirections (
> fileis treated as a node modifying output). - Subshells: Parentheses generate subshell nodes that encapsulate a complete sub-tree.
- Leaves: The final commands (COMMAND) reside at the tree's leaves.
this video -> What Is An Abstract Syntax Tree? can help you to undestand the key concept.
The execution engine traverses the AST using a Depth-First Search (DFS) strategy:
- Sequences: The semicolon (
;) is treated as a separator allowing independent sequences within the same logical line. - Conditional Logic: Evaluates AND/OR nodes based on the exit status of the previous branch.
- Isolation: Upon encountering a SUBSHELL node, the process forks to protect the parent environment.
The entire shell state is managed through a main t_shell data structure passed by reference. This structure contains all allocated information necessary for the shell to operate, both on the stack and the heap.
- Signal Handling: Adhering strictly to POSIX and project norms, we use a single global variable exclusively for the reception and transmission of operating system signals.
.ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ.
| π MINISHELL SYSTEM ARCHITECTURE |
| "Recursive Abstract Syntax Tree Engine" |
`ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ`
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. SYSTEM BOOTSTRAP (Initialization Phase) β
β [Concept: State Persistence & Environment Loading] β
β β
β β’ Global Setup: Init and allocate main shell structures (heap). β
β β’ Signal Config: Configures SIGINT/SIGQUIT (Parent Mode). β
β β’ Environment Vectorization: Transforms char** envp to Linked List. β
ββββββββββββββββββββββββββββββββββββββββ¦ββββββββββββββββββββββββββββββββββββββββ
β
.βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ.
βββββ>| π THE INFINITE RUNTIME LOOP |
β | (Input Cycle -> Evaluation -> Feedback) β
β `βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββΒ΄
β β
β βββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ
β β 2. INPUT INTERFACE (Blocking I/O) β
β β [Concept: Interactive Readline Wrapper] β
β β β
β β β’ Displays prompt (User@Host) and blocks waiting for STDIN. β
β β β’ Captures raw string buffer (char *input). β
β β β’ Intercepts Signals (Ctrl+C) to redraw prompt if needed. β
β βββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β β Raw Input Stream
β βΌ
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β 3. LEXICAL ANALYSIS (Tokenizer) β
β β [Concept: Linear Atomization] β
β β β
β β β’ Scans raw string char-by-char. β
β β β’ Generates a linear ARRAY of classified Tokens. β
β β β’ Types: WORD, PIPE, REDIR_IN, REDIR_OUT, etc. β
β β β’ Syntax review check: Check correct rules and input from the user. β
β βββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β β t_token *list
β βΌ
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β 4. SYNTACTIC ANALYSIS (AST Builder) β
β β [Concept: Tree Construction & Node Conversion] β
β β β
β β β’ Consumes the Token List using Grammar Rules (Precedence). β
β β β’ Converts Tokens into specific Tree Nodes (t_node). β
β β β’ Structures the hierarchy: Pipe -> Command -> Properties -> Args β
β βββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β β t_node *ast_root
β βΌ
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β 5. RECURSIVE EXECUTION KERNEL (The Core) β
β β [Concept: Depth-First Traversal & Process Orchestration] β
β β β
β β Traverses the AST branches. Logic depends on Node Type: β
β β β
β β [A] SEQUENCE NODE ( ; ) β
β β β³ Unconditional Separation. Executes Left, if exist then Right. β
β β β
β β [B] LOGICAL NODES ( &&, || ) β
β β β³ Conditional. Checks Left's exit code before executing Right. β
β β β
β β [C] PIPELINE NODE ( | ) β
β β β³ Manages IPC. Forks Writer (Left) and Reader (Right). β
β β β
β β [D] ISOLATION NODE ( Subshells ( ) ) β
β β β³ Forks a generic child to protect Parent Env from mutations. β
β β β
β β [E] LEAF NODE ( Commands / Redirs ) β
β β β³ Applies Redirections (dup2) -> Expands $VAR -> Execs. β
β βββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β β Exit Status
β βΌ
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β 6. CYCLE SANITIZATION (Transient Memory Reset) β
β β [Concept: Garbage Collection] β
β β β
β β β’ Recursively frees the AST Nodes (commands, args, structure). β
β β β’ Frees the token array and Raw Input string. β
β β β’ Closes transient FDs (pipes/files) but keeps History/Env alive. β
β βββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β β
β (Loop continues) β
ββββββββββββββββββββββββββββββββββββββββ€
β
βΌ
[ Signal: EOF / Exit ]
β
βΌ
.______________________________________________________________________.
| |
| π SYSTEM SHUTDOWN SEQUENCE |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| |
| 1. USER NOTIFICATION β Prints "exit" / goodbye message. |
| 2. HEAP TEARDOWN β Frees global environment & main struct. |
| 3. KERNEL RETURN β Exits process with final status code. |
| |
| [ SYSTEM HALTED ] |
|______________________________________________________________________|
We went beyond academic requirements to create a robust and pleasant-to-use tool:
- Script Execution: The shell supports non-interactive mode, allowing it to interpret and execute command files directly (e.g.,
./minishell script.sh), acting as a functional script interpreter. - Advanced Redirections: Full support for file descriptor manipulation (
2>), giving the user granular control over standard error and output streams. - Background Processes (
&): Implementation of asynchronous execution. The shell correctly forks and detaches processes to the background, allowing the user to continue interacting with the prompt immediately. - Sequential Logic (
;): Support for the semicolon separator allows chaining multiple independent commands in a single line, executing them sequentially regardless of the previous exit status. - Line Continuation: The shell intelligently detects incomplete commands (unclosed quotes, pipes, parentheses) and provides a secondary
>prompt, allowing users to complete multi-line statements naturally. - Local Assignments: Support for
VAR=value commandsyntax (the variable exists only for that command scope) and temporary assignments in the current shell. - Persistent History: The shell creates and manages a physical history file, allowing command retrieval across different sessions.
- Command Auto-Correction: Implemented a "Did you mean?" heuristic for builtins. If a user mistypes a command (e.g.,
exprotinstead ofexport), the shell intelligently suggests the intended correction. - Case-Insensitive Execution: Enhanced flexibility by allowing commands and builtins to be recognized regardless of case (e.g.,
ECHO,Ls, orpwdall execute correctly), streamlining the user experience. - Smart Welcome & Analytics: On startup, the shell detects the user and greets based on the time of day. Upon exit, it provides session analytics (total active time) with a farewell message.
- Custom Prompt: A modern, highly aesthetic prompt that displays:
- Current directory (PWD).
- User and Hostname.
- Harmonized color theme.
Since minishell is a long-running process, memory leaks are unacceptable. We implemented a two-tier cleaning strategy:
Executed after every command line input (inside the main loop).
- Restores FDs: Resets STDIN/STDOUT if redirections altered them.
- Prunes AST: Recursively frees the entire Syntax Tree nodes.
- Wipes Token List: Frees the dynamic array of tokens for the next prompt.
Executed only upon exit or fatal error.
- Free All: Frees environment variables linked lists, history descriptors, ast nodes, token array and internal shell configurations.
- Result: 0 leaks reachable at exit (validated with Valgrind).
To ensure system robustness we tested our Minishell to intensive automated testing using the community-standard 42_minishell_tester. Our customized testing regimen covered everything from basic command execution to edge cases like signal handling, complex pipe chains, and memory leak detection via Valgrind.
The implementation successfully passed all validation criteria:
- Mandatory Tests: β +2566 mandatory tests
- Bonus Tests: β +201 bonus tests.
We welcome contributions from the community to help push Minishell closer to full POSIX compliance. Whether you are looking to fix bugs or implement new features, your pull requests are welcome. Our current roadmap prioritizes the following enhancements:
- Command Substitution: Implementation of
$(...)and backticks logic. - Alias Management: Support for
aliasandunaliasbuiltins. - Scripting Functions: Ability to define and execute custom shell functions.
- Stability: Reporting edge-case bugs and memory leaks.
I'm Mikel Garrido @LordMikkel, a student at 42 Barcelona. I always try to make the simplest but most robust implementation in all my projects.
This project was developed in collaboration with my partner @David-dbd. For additional insights and project details, you can also view his personal README in the forked repository: Minishell.




