Skip to content

Commit 8490b30

Browse files
authored
Architecture Docs V1 (#792)
# Motivation <!-- Why is this change necessary? --> # Content <!-- Please include a summary of the change --> # Testing <!-- How was the change tested? --> # Please check the following before marking your PR as ready for review - [ ] I have added tests for my changes - [ ] I have updated the documentation or added new documentation as needed
1 parent 3553788 commit 8490b30

File tree

10 files changed

+504
-15
lines changed

10 files changed

+504
-15
lines changed

architecture/2. parsing/B. AST Construction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,4 +74,4 @@ Statements have another layer of complexity. They are essentially pattern based
7474

7575
## Next Step
7676

77-
After the AST is constructed, the system moves on to [Import Resolution](../3.%20imports-exports/A.%20Imports.md) to analyze module dependencies and resolve symbols across files.
77+
After the AST is constructed, the system moves on to [Directory Parsing](./C.%20Directory%20Parsing.md) to build a hierarchical representation of the codebase's directory structure.
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Directory Parsing
2+
3+
The Directory Parsing system is responsible for creating and maintaining a hierarchical representation of the codebase's directory structure in memory. Directories do not hold references to the file itself, but instead holds the names to the files and does a dynamic lookup when needed.
4+
5+
In addition to providing a more cohesive API for listing directory files, the Directory API is also used for [TSConfig](../3.%20imports-exports/C.%20TSConfig.md)-based (Import Resolution)[../3.%20imports-exports/A.%20Imports.md].
6+
7+
## Core Components
8+
9+
The Directory Tree is constructed during the initial build_graph step in codebase_context.py, and is recreated from scratch on every re-sync. More details are below:
10+
11+
## Directory Tree Construction
12+
13+
The directory tree is built through the following process:
14+
15+
1. The `build_directory_tree` method in `CodebaseContext` is called during graph initialization or when the codebase structure changes.
16+
1. The method iterates through all files in the repository, creating directory objects for each directory path encountered.
17+
1. For each file, it adds the file to its parent directory using the `_add_file` method.
18+
1. Directories are created recursively as needed using the `get_directory` method with create_on_missing=True\`.
19+
20+
## Directory Representation
21+
22+
The `Directory` class provides a rich interface for working with directories:
23+
24+
- **Hierarchy Navigation**: Access parent directories and subdirectories
25+
- **File Access**: Retrieve files by name or extension
26+
- **Symbol Access**: Find symbols (classes, functions, etc.) within files in the directory
27+
- **Directory Operations**: Rename, remove, or update directories
28+
29+
Each `Directory` instance maintains:
30+
31+
- A reference to its parent directory
32+
- Lists of files and subdirectories
33+
- Methods to recursively traverse the directory tree
34+
35+
## File Representation
36+
37+
Files are represented by the `File` class and its subclasses:
38+
39+
- `File`: Base class for all files, supporting basic operations like reading and writing content
40+
- `SourceFile`: Specialized class for source code files that can be parsed into an AST
41+
42+
Files maintain references to:
43+
44+
- Their parent directory
45+
- Their content (loaded dynamically to preserve the source of truth)
46+
- For source files, the parsed AST and symbols
47+
48+
## Next Step
49+
50+
After the directory structure is parsed, the system can perform [Import Resolution](../3.%20imports-exports/A.%20Imports.md) to analyze module dependencies and resolve symbols across files.
Lines changed: 55 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,60 @@
11
# Import Resolution
22

3-
TODO
3+
Import resolution follows AST construction in the code analysis pipeline. It identifies dependencies between modules and builds a graph of relationships across the codebase.
4+
5+
> NOTE: This is an actively evolving part of Codegen SDK, so some details here may be imcomplete, outdated, or incorrect.
6+
7+
## Purpose
8+
9+
The import resolution system serves these purposes:
10+
11+
1. **Dependency Tracking**: Maps relationships between files by resolving import statements.
12+
1. **Symbol Resolution**: Connects imported symbols to their definitions.
13+
1. **Module Graph Construction**: Builds a directed graph of module dependencies.
14+
1. **(WIP) Cross-Language Support**: Provides implementations for different programming languages.
15+
16+
## Core Components
17+
18+
### ImportResolution Class
19+
20+
The `ImportResolution` class represents the outcome of resolving an import statement. It contains:
21+
22+
- The source file containing the imported symbol
23+
- The specific symbol being imported (if applicable)
24+
- Whether the import references an entire file/module
25+
26+
### Import Base Class
27+
28+
The `Import` class is the foundation for language-specific import implementations. It:
29+
30+
- Stores metadata about the import (module path, symbol name, alias)
31+
- Provides the abstract `resolve_import()` method
32+
- Adds symbol resolution edges to the codebase graph
33+
34+
### Language-Specific Implementations
35+
36+
#### Python Import Resolution
37+
38+
The `PyImport` class extends the base `Import` class with Python-specific logic:
39+
40+
- Handles relative imports
41+
- Supports module imports, named imports, and wildcard imports
42+
- Resolves imports using configurable resolution paths and `sys.path`
43+
- Handles special cases like `__init__.py` files
44+
45+
#### TypeScript Import Resolution
46+
47+
The `TSImport` class implements TypeScript-specific resolution:
48+
49+
- Supports named imports, default imports, and namespace imports
50+
- Handles type imports and dynamic imports
51+
- Resolves imports using TSConfig path mappings
52+
- Supports file extension resolution
53+
54+
## Implementation
55+
56+
After file and directory parse, we loop through all import nodes and perform `add_symbol_resolution_edge`. This then invokes the language-specific `resolve_import` method that converts the import statement into a resolvable `ImportResolution` object (or None if the import cannot be resolved). This import symbol and the `ImportResolution` object are then used to add a symbol resolution edge to the graph, where it can then be used in future steps to resolve symbols.
457

558
## Next Step
659

7-
After import resolution, the system analyzes [Export Analysis](./B.%20Exports.md) and handles [TSConfig Support](./C.%20TSConfig.md) for TypeScript projects. This is followed by comprehensive [Type Analysis](../4.%20type-analysis/A.%20Type%20Analysis.md).
60+
After import resolution, the system analyzes [Export Analysis](./B.%20Exports.md) and handles [TSConfig Support](./C.%20TSConfig.md) for TypeScript projects. This is followed by [Type Analysis](../4.%20type-analysis/A.%20Type%20Analysis.md).

architecture/3. imports-exports/B. Exports.md

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,74 @@
11
# Export Analysis
22

3-
TODO
3+
Some languages contain additional metadata on "exported" symbols, specifying which symbols are made available to other modules. Export analysis follows import resolution in the code analysis pipeline. It identifies and processes exported symbols from modules, enabling the system to track what each module makes available to others.
4+
5+
## Core Components
6+
7+
### Export Base Class
8+
9+
The `Export` class serves as the foundation for language-specific export implementations. It:
10+
11+
- Stores metadata about the export (symbol name, is default, etc.)
12+
- Tracks the relationship between the export and its declared symbol
13+
- Adds export edges to the codebase graph
14+
15+
### TypeScript Export Implementation
16+
17+
The `TSExport` class implements TypeScript-specific export handling:
18+
19+
- Supports various export styles (named exports, default exports, re-exports)
20+
- Handles export declarations with and without values
21+
- Processes wildcard exports (`export * from 'module'`)
22+
- Manages export statements with multiple exports
23+
24+
#### Export Types and Symbol Resolution
25+
26+
The TypeScript implementation handles several types of exports:
27+
28+
1. **Declaration Exports**
29+
30+
- Function declarations (including generators)
31+
- Class declarations
32+
- Interface declarations
33+
- Type alias declarations
34+
- Enum declarations
35+
- Namespace declarations
36+
- Variable/constant declarations
37+
38+
1. **Value Exports**
39+
40+
- Object literals with property exports
41+
- Arrow functions and function expressions
42+
- Classes and class expressions
43+
- Assignment expressions
44+
- Primitive values and expressions
45+
46+
1. **Special Export Forms**
47+
48+
- Wildcard exports (`export * from 'module'`)
49+
- Named re-exports (`export { name as alias } from 'module'`)
50+
- Default exports with various value types
51+
52+
#### Symbol Tracking and Dependencies
53+
54+
The export system:
55+
56+
- Maintains relationships between exported symbols and their declarations
57+
- Validates export names match their declared symbols
58+
- Tracks dependencies through the codebase graph
59+
- Handles complex scenarios like:
60+
- Shorthand property exports in objects
61+
- Nested function and class declarations
62+
- Re-exports from other modules
63+
64+
#### Integration with Type System
65+
66+
Exports are tightly integrated with the type system:
67+
68+
- Exported type declarations are properly tracked
69+
- Symbol resolution considers both value and type exports
70+
- Re-exports preserve type information
71+
- Export edges in the codebase graph maintain type relationships
472

573
## Next Step
674

architecture/3. imports-exports/C. TSConfig.md

Lines changed: 75 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,80 @@
11
# TSConfig Support
22

3-
TODO
3+
TSConfig support is a critical component for TypeScript projects in the import resolution system. It processes TypeScript configuration files (tsconfig.json) to correctly resolve module paths and dependencies.
4+
5+
## Purpose
6+
7+
The TSConfig support system serves these purposes:
8+
9+
1. **Path Mapping**: Resolves custom module path aliases defined in the tsconfig.json file.
10+
1. **Base URL Resolution**: Handles non-relative module imports using the baseUrl configuration.
11+
1. **Project References**: Manages dependencies between TypeScript projects using the references field.
12+
1. **Directory Structure**: Respects rootDir and outDir settings for maintaining proper directory structures.
13+
14+
## Core Components
15+
16+
### TSConfig Class
17+
18+
The `TSConfig` class represents a parsed TypeScript configuration file. It:
19+
20+
- Parses and stores the configuration settings from tsconfig.json
21+
- Handles inheritance through the "extends" field
22+
- Provides methods for translating between import paths and absolute file paths
23+
- Caches computed values for performance optimization
24+
25+
## Configuration Processing
26+
27+
### Configuration Inheritance
28+
29+
TSConfig files can extend other configuration files through the "extends" field:
30+
31+
1. Base configurations are loaded and parsed first
32+
1. Child configurations inherit and can override settings from their parent
33+
1. Path mappings, base URLs, and other settings are merged appropriately
34+
35+
### Path Mapping Resolution
36+
37+
The system processes the "paths" field in tsconfig.json to create a mapping between import aliases and file paths:
38+
39+
1. Path patterns are normalized (removing wildcards, trailing slashes)
40+
1. Relative paths are converted to absolute paths
41+
1. Mappings are stored for efficient lookup during import resolution
42+
43+
### Project References
44+
45+
The "references" field defines dependencies between TypeScript projects:
46+
47+
1. Referenced projects are identified and loaded
48+
1. Their configurations are analyzed to determine import paths
49+
1. Import resolution can cross project boundaries using these references
50+
51+
## Import Resolution Process
52+
53+
### Path Translation
54+
55+
When resolving an import path in TypeScript:
56+
57+
1. Check if the path matches any path alias in the tsconfig.json
58+
1. If a match is found, translate the path according to the mapping
59+
1. Apply baseUrl resolution for non-relative imports
60+
1. Handle project references for cross-project imports
61+
62+
### Optimization Techniques
63+
64+
The system employs several optimizations:
65+
66+
1. Caching computed values to avoid redundant processing
67+
1. Early path checking for common patterns (e.g., paths starting with "@" or "~")
68+
1. Hierarchical resolution that respects the configuration inheritance chain
69+
70+
## Integration with Import Resolution
71+
72+
The TSConfig support integrates with the broader import resolution system:
73+
74+
1. Each TypeScript file is associated with its nearest tsconfig.json
75+
1. Import statements are processed using the file's associated configuration
76+
1. Path mappings are applied during the module resolution process
77+
1. Project references are considered when resolving imports across project boundaries
478

579
## Next Step
680

architecture/5. performing-edits/A. Edit Operations.md

Lines changed: 0 additions & 7 deletions
This file was deleted.
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Transactions
2+
3+
Transactions represent atomic changes to files in the codebase. Each transaction defines a specific modification that can be queued, validated, and executed.
4+
5+
## Transaction Types
6+
7+
The transaction system is built around a base `Transaction` class with specialized subclasses:
8+
9+
### Content Transactions
10+
11+
- **RemoveTransaction**: Removes content between specified byte positions
12+
- **InsertTransaction**: Inserts new content at a specified byte position
13+
- **EditTransaction**: Replaces content between specified byte positions
14+
15+
### File Transactions
16+
17+
- **FileAddTransaction**: Creates a new file
18+
- **FileRenameTransaction**: Renames an existing file
19+
- **FileRemoveTransaction**: Deletes a file
20+
21+
## Transaction Priority
22+
23+
Transactions are executed in a specific order defined by the `TransactionPriority` enum:
24+
25+
1. **Remove** (highest priority)
26+
1. **Edit**
27+
1. **Insert**
28+
1. **FileAdd**
29+
1. **FileRename**
30+
1. **FileRemove**
31+
32+
This ordering ensures that content is removed before editing or inserting, and that all content operations happen before file operations.
33+
34+
## Key Concepts
35+
36+
### Byte-Level Operations
37+
38+
All content transactions operate at the byte level rather than on lines or characters. This provides precise control over modifications and allows transactions to work with any file type, regardless of encoding or line ending conventions.
39+
40+
### Content Generation
41+
42+
Transactions support both static content (direct strings) and dynamic content (generated at execution time). This flexibility allows for complex transformations where the new content depends on the state of the codebase at execution time.
43+
44+
Most content transactions use static content, but dynamic content is supported for rare cases where the new content depends on the state of other transactions. One common example is handling whitespace during add and remove transactions.
45+
46+
### File Operations
47+
48+
File transactions are used to create, rename, and delete files.
49+
50+
> NOTE: It is important to note that most file transactions such as `FileAddTransaction` are no-ops (AKA skiping Transaction Manager) and instead applied immediately once the `create_file` API is called. This allows for created files to be immediately available for edit and use. The reason file operations are still added to Transaction Manager is to help with optimizing graph re-parse and diff generation. (Keeping track of which files exist and don't exist anymore).
51+
52+
## Next Step
53+
54+
After understanding the transaction system, they are managed by the [Transaction Manager](./B.%20Transaction%20Manager.md) to ensure consistency and atomicity.

0 commit comments

Comments
 (0)