diff --git a/.github/workflows/c-cpp.yml b/.github/workflows/c-cpp.yml index 43bb3ff..2180c4f 100644 --- a/.github/workflows/c-cpp.yml +++ b/.github/workflows/c-cpp.yml @@ -17,5 +17,5 @@ jobs: ## run: ./configure - name: make clean run: make clean - - name: make check - run: make check + - name: make test + run: make test diff --git a/45gs02_optimization_guide.md b/45gs02_optimization_guide.md new file mode 100644 index 0000000..0cd5478 --- /dev/null +++ b/45gs02_optimization_guide.md @@ -0,0 +1,124 @@ +# 45GS02 (MEGA65) Optimization Guide + +This guide details the specialized optimization techniques for the 45GS02 CPU found in the MEGA65, as implemented in `opt6502`. The 45GS02 offers several unique instructions and architectural features that allow for significant performance and code size improvements over the standard 6502/65C02. + +## Key Differences & Considerations for 45GS02 + +- **STZ (Store Z Register)**: CRITICAL DIFFERENCE! On the 45GS02, `STZ` *stores the Z register*, not a zero value. This makes it incredibly useful for repeated stores of a specific byte. The `opt6502` *never* converts `LDA #0, STA addr` to `STZ addr` for 45GS02, as it would be incorrect. +- **Z Register**: A general-purpose 8-bit register, similar to A, X, Y. +- **Q Register**: A 32-bit composite register `[Z:Y:X:A]`. Operations on Q affect all four underlying 8-bit registers. +- **New Instructions**: `LDZ`, `STZ`, `NEG`, `ASR`, `LDQ`, `STQ`, `ADCQ`, `SBCQ`, `CMPQ`, `ASRQ`, `RORQ`, `ROLQ`, `INC16`, `DEC16`, `PHW`, `PLW`, `BRA` (always). +- **Extended NOP**: `NOP #cycles` allows for precise multi-cycle delays. + +## 1. Z Register for Repeated Stores + +Leverages the `LDZ` and `STZ` instructions to efficiently store the same value to multiple memory locations. + +**Pattern**: Repeated sequence of loading an immediate value and storing it. + +**Before:** +```asm + LDA #$20 + STA $0400 + LDA #$20 + STA $0401 + LDA #$20 + STA $0402 +``` + +**After:** +```asm + LDZ #$20 ; Load Z register once + STZ $0400 ; Store from Z + STZ $0401 ; Store from Z + STZ $0402 ; Store from Z +``` +**Benefit**: Saves 2 bytes and 2 cycles per additional store (`LDA`/`STA` is 4 bytes/6 cycles, `STZ` is 2 bytes/4 cycles). This is a significant optimization for screen fills, memory initialization, etc. + +## 2. 32-bit Q Register Operations (`LDQ`, `STQ`, `ADCQ`, `SBCQ`, etc.) + +The Q register is a powerful feature for 32-bit operations by treating A, X, Y, and Z as a single 32-bit entity. `opt6502` identifies sequences of 8-bit loads that form a 32-bit constant. + +**Pattern**: Four consecutive immediate loads into A, X, Y, Z to form a 32-bit value `[Z:Y:X:A]`. + +**Before:** +```asm + LDA #$AA ; Low byte + LDX #$BB ; Mid-low byte + LDY #$CC ; Mid-high byte + LDZ #$DD ; High byte +``` + +**After:** +```asm + LDQ #$DDCCBBAA ; Load 32-bit value into Q +; OPT: LDX #$BB removed +; OPT: LDY #$CC removed +; OPT: LDZ #$DD removed +``` +**Benefit**: Replaces four instructions (12 bytes, ~16 cycles) with one `LDQ` instruction (5 bytes, 8 cycles), offering massive speed and size improvements for 32-bit constant loading. + +## 3. NEG Instruction + +The 45GS02 has a dedicated `NEG` (negate accumulator) instruction, which is much more efficient than the traditional 6502 sequence. + +**Pattern**: `EOR #$FF, SEC, ADC #$01` (6502 way to negate A). + +**Before:** +```asm + EOR #$FF + SEC + ADC #$01 +``` + +**After:** +```asm + NEG A ; Negate accumulator +; OPT: SEC removed +; OPT: ADC #$01 removed +``` +**Benefit**: Saves 4 bytes and 5 cycles. + +## 4. ASR (Arithmetic Shift Right) + +The 45GS02 includes an `ASR` instruction, which performs an arithmetic shift right, preserving the sign bit. This is faster and smaller than the typical 6502 sequence. + +**Pattern**: `CMP #$80, ROR` (a common 6502 sequence for signed right shift). + +**Before:** +```asm + CMP #$80 ; Check sign + ROR A ; Shift right +``` + +**After:** +```asm + ASR A ; Arithmetic Shift Right Accumulator +; OPT: CMP #$80 removed +``` +**Benefit**: Saves 2 bytes and 2 cycles. + +## 5. Extended NOP + +The 45GS02 `NOP` instruction can take an operand to specify a delay in cycles, making it ideal for precise timing loops or replacing multiple `NOP` instructions. + +**Pattern**: Multiple consecutive `NOP` instructions. + +**Before:** +```asm + NOP + NOP + NOP + NOP +``` + +**After:** +```asm + NOP #8 ; Four NOPs (2 cycles each) replaced with NOP for 8 cycles +; OPT: NOP removed +; OPT: NOP removed +; OPT: NOP removed +``` +**Benefit**: Saves bytes by consolidating multiple `NOP`s into a single instruction with a cycle count. (Each NOP is 2 cycles; so NOP #8 replaces 4 NOP instructions). + +This guide covers the major 45GS02-specific optimizations within `opt6502`. Utilizing these features effectively can lead to highly performant and compact code on the MEGA65. diff --git a/6502_optimizations_guide.md b/6502_optimizations_guide.md new file mode 100644 index 0000000..0ab6dda --- /dev/null +++ b/6502_optimizations_guide.md @@ -0,0 +1,259 @@ +# 6502 Optimization Guide + +A practical guide to 6502 assembly optimization techniques implemented in `opt6502`. + +This guide provides examples and explanations for each optimization category. The examples use a generic assembly syntax. + +## 1. Peephole Optimizations + +Peephole optimization involves examining a small "window" of instructions and replacing them with a shorter or faster sequence. + +### Redundant Load/Store + +**Pattern**: Storing a value to memory and immediately loading it back into the same register. + +**Before:** +```asm + STA my_var + LDA my_var +``` + +**After:** +```asm + STA my_var +; OPT: LDA my_var removed +``` +**Benefit**: Saves 3-4 cycles and 2-3 bytes. The value is already in the accumulator. + +### Useless Transfers + +**Pattern**: Transferring a value between registers back and forth. + +**Before:** +```asm + TAX ; A -> X + TXA ; X -> A +``` + +**After:** +```asm +; OPT: TAX removed +; OPT: TXA removed +``` +**Benefit**: Saves 4 cycles and 2 bytes. + +### No-Operation Instructions + +**Pattern**: Instructions that have no effect on registers or flags in a specific context. + +**Before:** +```asm + ORA #$00 ; OR with zero changes nothing + AND #$FF ; AND with all ones changes nothing +``` + +**After:** +```asm +; OPT: ORA #$00 removed +; OPT: AND #$FF removed +``` +**Benefit**: Saves 2 cycles and 2 bytes per removed instruction. + +## 2. Dead Code Elimination + +This removes code that is unreachable and can never be executed. + +**Pattern**: Code that appears immediately after an unconditional jump or return instruction. + +**Before:** +```asm + JMP end_of_routine + LDA #$01 ; This line can never be reached + STA $D020 + +end_of_routine: + RTS +``` + +**After:** +```asm + JMP end_of_routine +; OPT: LDA #$01 removed +; OPT: STA $D020 removed + +end_of_routine: + RTS +``` +**Benefit**: Saves bytes and prevents logical errors. + +## 3. Jump & Branch Optimization + +### Jump to Next Instruction + +**Pattern**: A `JMP` instruction that jumps to the very next line of code. + +**Before:** +```asm + JMP continue +continue: + LDA #$00 +``` +**After:** +```asm +; OPT: JMP continue removed +continue: + LDA #$00 +``` +**Benefit**: Saves 3 cycles and 3 bytes. + +### Tail Call Optimization + +**Pattern**: A subroutine call (`JSR`) immediately followed by a return (`RTS`). The `JSR`/`RTS` can be replaced by a single `JMP`. + +**Before:** +```asm + JSR do_something + RTS +``` +**After:** +```asm + JMP do_something +; OPT: RTS removed +``` +**Benefit**: Saves 12 cycles (6 for `JSR`, 6 for `RTS`) and 1 byte. Also reduces stack usage. + +## 4. Load/Store Optimization + +### Redundant Loads + +**Pattern**: Loading a value into a register when that value is already present. + +**Before:** +```asm + LDA #$0A + STA some_var + LDA #$0A ; Redundant, A already contains $0A + STA other_var +``` +**After:** +```asm + LDA #$0A + STA some_var +; OPT: LDA #$0A removed + STA other_var +``` +**Benefit**: Saves 2 cycles and 2 bytes. + +## 5. Constant Propagation & Folding + +### Constant Propagation + +The optimizer tracks the immediate values held in registers. + +**Before:** +```asm + LDA #10 + STA value + ... + LDA #10 ; A is known to be 10 here + STA value2 +``` +**After:** +```asm + LDA #10 + STA value + ... +; OPT: LDA #10 removed + STA value2 +``` +**Benefit**: Saves 2 cycles and 2 bytes. + +### Constant Folding + +The optimizer evaluates constant expressions at compile time. + +**Before:** +```asm + LDA #$10 + ORA #$20 +``` +**After:** +```asm + LDA #$30 +; OPT: ORA #$20 removed and folded +``` +**Benefit**: Saves 2 cycles and 2 bytes. + +## 6. Subroutine Inlining + +If a subroutine is only called once, the optimizer can replace the `JSR` with the body of the subroutine. + +**Before:** +```asm +init: + JSR clear_memory + RTS + +clear_memory: + LDX #$00 +loop: + STA $0400,X + INX + BNE loop + RTS +``` +**After:** +```asm +init: + ; JSR clear_memory (inlined below) + LDX #$00 +loop: + STA $0400,X + INX + BNE loop + RTS + +; OPT: clear_memory routine removed after inlining +``` +**Benefit**: Saves 12 cycles from the `JSR`/`RTS` overhead, but increases code size. Best for `speed` optimization mode. + +## 7. Strength Reduction + +This technique replaces computationally "expensive" operations with cheaper ones. + +**Pattern**: Multiplication by 2. + +**Before:** +```asm + CLC + ADC my_var ; Assuming A holds my_var, this is A = A * 2 +``` +**After:** +```asm + ASL A ; Arithmetic shift left is faster +``` +**Benefit**: `ASL A` is 2 cycles, `CLC`+`ADC` is 4-5 cycles. + +## 8. Flag Usage Optimization + +### Redundant Flag Instructions + +**Pattern**: Setting or clearing a flag that is already in the desired state. + +**Before:** +```asm + CLC + LDA #$01 + CLC ; Redundant, carry is already clear + ADC #$02 +``` +**After:** +```asm + CLC + LDA #$01 +; OPT: CLC removed + ADC #$02 +``` +**Benefit**: Saves 2 cycles and 1 byte. + +This guide covers the core optimizations for the standard 6502 processor. For specifics on 65C02 or 45GS02, please see the `README.md` and the dedicated guides. diff --git a/README.md b/README.md index 142c5ef..0264eb8 100644 --- a/README.md +++ b/README.md @@ -622,4 +622,3 @@ Generated with assistance from Claude (Anthropic) - [6502 Optimization Guide](./6502_optimizations_guide.md) - [45GS02 Optimization Guide](./45gs02_optimization_guide.md) -- [Local Labels Example](./local_labels_example.asm)