CTalkobt · CTalkobt · Jan 4, 2026 · Jan 4, 2026 · Jan 4, 2026
diff --git a/.github/workflows/c-cpp.yml b/.github/workflows/c-cpp.yml
@@ -17,5 +17,5 @@ jobs:
 ##      run: ./configure
     - name: make clean
       run: make clean
-    - name: make check
-      run: make check
+    - name: make test
+      run: make test
diff --git a/45gs02_optimization_guide.md b/45gs02_optimization_guide.md
@@ -0,0 +1,124 @@
+# 45GS02 (MEGA65) Optimization Guide
+
+This guide details the specialized optimization techniques for the 45GS02 CPU found in the MEGA65, as implemented in `opt6502`. The 45GS02 offers several unique instructions and architectural features that allow for significant performance and code size improvements over the standard 6502/65C02.
+
+## Key Differences & Considerations for 45GS02
+
+- **STZ (Store Z Register)**: CRITICAL DIFFERENCE! On the 45GS02, `STZ` *stores the Z register*, not a zero value. This makes it incredibly useful for repeated stores of a specific byte. The `opt6502` *never* converts `LDA #0, STA addr` to `STZ addr` for 45GS02, as it would be incorrect.
+- **Z Register**: A general-purpose 8-bit register, similar to A, X, Y.
+- **Q Register**: A 32-bit composite register `[Z:Y:X:A]`. Operations on Q affect all four underlying 8-bit registers.
+- **New Instructions**: `LDZ`, `STZ`, `NEG`, `ASR`, `LDQ`, `STQ`, `ADCQ`, `SBCQ`, `CMPQ`, `ASRQ`, `RORQ`, `ROLQ`, `INC16`, `DEC16`, `PHW`, `PLW`, `BRA` (always).
+- **Extended NOP**: `NOP #cycles` allows for precise multi-cycle delays.
+
+## 1. Z Register for Repeated Stores
+
+Leverages the `LDZ` and `STZ` instructions to efficiently store the same value to multiple memory locations.
+
+**Pattern**: Repeated sequence of loading an immediate value and storing it.
+
+**Before:**
+```asm
+  LDA #$20
+  STA $0400
+  LDA #$20
+  STA $0401
+  LDA #$20
+  STA $0402
+```
+
+**After:**
+```asm
+  LDZ #$20      ; Load Z register once
+  STZ $0400     ; Store from Z
+  STZ $0401     ; Store from Z
+  STZ $0402     ; Store from Z
+```
+**Benefit**: Saves 2 bytes and 2 cycles per additional store (`LDA`/`STA` is 4 bytes/6 cycles, `STZ` is 2 bytes/4 cycles). This is a significant optimization for screen fills, memory initialization, etc.
+
+## 2. 32-bit Q Register Operations (`LDQ`, `STQ`, `ADCQ`, `SBCQ`, etc.)
+
+The Q register is a powerful feature for 32-bit operations by treating A, X, Y, and Z as a single 32-bit entity. `opt6502` identifies sequences of 8-bit loads that form a 32-bit constant.
+
+**Pattern**: Four consecutive immediate loads into A, X, Y, Z to form a 32-bit value `[Z:Y:X:A]`.
+
+**Before:**
+```asm
+  LDA #$AA    ; Low byte
+  LDX #$BB    ; Mid-low byte
+  LDY #$CC    ; Mid-high byte
+  LDZ #$DD    ; High byte
+```
+
+**After:**
+```asm
+  LDQ #$DDCCBBAA ; Load 32-bit value into Q
+; OPT: LDX #$BB removed
+; OPT: LDY #$CC removed
+; OPT: LDZ #$DD removed
+```
+**Benefit**: Replaces four instructions (12 bytes, ~16 cycles) with one `LDQ` instruction (5 bytes, 8 cycles), offering massive speed and size improvements for 32-bit constant loading.
+
+## 3. NEG Instruction
+
+The 45GS02 has a dedicated `NEG` (negate accumulator) instruction, which is much more efficient than the traditional 6502 sequence.
+
+**Pattern**: `EOR #$FF, SEC, ADC #$01` (6502 way to negate A).
+
+**Before:**
+```asm
+  EOR #$FF
+  SEC
+  ADC #$01
+```
+
+**After:**
+```asm
+  NEG A      ; Negate accumulator
+; OPT: SEC removed
+; OPT: ADC #$01 removed
+```
+**Benefit**: Saves 4 bytes and 5 cycles.
+
+## 4. ASR (Arithmetic Shift Right)
+
+The 45GS02 includes an `ASR` instruction, which performs an arithmetic shift right, preserving the sign bit. This is faster and smaller than the typical 6502 sequence.
+
+**Pattern**: `CMP #$80, ROR` (a common 6502 sequence for signed right shift).
+
+**Before:**
+```asm
+  CMP #$80   ; Check sign
+  ROR A      ; Shift right
+```
+
+**After:**
+```asm
+  ASR A      ; Arithmetic Shift Right Accumulator
+; OPT: CMP #$80 removed
+```
+**Benefit**: Saves 2 bytes and 2 cycles.
+
+## 5. Extended NOP
+
+The 45GS02 `NOP` instruction can take an operand to specify a delay in cycles, making it ideal for precise timing loops or replacing multiple `NOP` instructions.
+
+**Pattern**: Multiple consecutive `NOP` instructions.
+
+**Before:**
+```asm
+  NOP
+  NOP
+  NOP
+  NOP
+```
+
+**After:**
+```asm
+  NOP #8     ; Four NOPs (2 cycles each) replaced with NOP for 8 cycles
+; OPT: NOP removed
+; OPT: NOP removed
+; OPT: NOP removed
+```
+**Benefit**: Saves bytes by consolidating multiple `NOP`s into a single instruction with a cycle count. (Each NOP is 2 cycles; so NOP #8 replaces 4 NOP instructions).
+
+This guide covers the major 45GS02-specific optimizations within `opt6502`. Utilizing these features effectively can lead to highly performant and compact code on the MEGA65.
diff --git a/6502_optimizations_guide.md b/6502_optimizations_guide.md
@@ -0,0 +1,259 @@
+# 6502 Optimization Guide
+
+A practical guide to 6502 assembly optimization techniques implemented in `opt6502`.
+
+This guide provides examples and explanations for each optimization category. The examples use a generic assembly syntax.
+
+## 1. Peephole Optimizations
+
+Peephole optimization involves examining a small "window" of instructions and replacing them with a shorter or faster sequence.
+
+### Redundant Load/Store
+
+**Pattern**: Storing a value to memory and immediately loading it back into the same register.
+
+**Before:**
+```asm
+  STA my_var
+  LDA my_var
+```
+
+**After:**
+```asm
+  STA my_var
+; OPT: LDA my_var removed
+```
+**Benefit**: Saves 3-4 cycles and 2-3 bytes. The value is already in the accumulator.
+
+### Useless Transfers
+
+**Pattern**: Transferring a value between registers back and forth.
+
+**Before:**
+```asm
+  TAX   ; A -> X
+  TXA   ; X -> A
+```
+
+**After:**
+```asm
+; OPT: TAX removed
+; OPT: TXA removed
+```
+**Benefit**: Saves 4 cycles and 2 bytes.
+
+### No-Operation Instructions
+
+**Pattern**: Instructions that have no effect on registers or flags in a specific context.
+
+**Before:**
+```asm
+  ORA #$00   ; OR with zero changes nothing
+  AND #$FF   ; AND with all ones changes nothing
+```
+
+**After:**
+```asm
+; OPT: ORA #$00 removed
+; OPT: AND #$FF removed
+```
+**Benefit**: Saves 2 cycles and 2 bytes per removed instruction.
+
+## 2. Dead Code Elimination
+
+This removes code that is unreachable and can never be executed.
+
+**Pattern**: Code that appears immediately after an unconditional jump or return instruction.
+
+**Before:**
+```asm
+  JMP end_of_routine
+  LDA #$01      ; This line can never be reached
+  STA $D020
+
+end_of_routine:
+  RTS
+```
+
+**After:**
+```asm
+  JMP end_of_routine
+; OPT: LDA #$01 removed
+; OPT: STA $D020 removed
+
+end_of_routine:
+  RTS
+```
+**Benefit**: Saves bytes and prevents logical errors.
+
+## 3. Jump & Branch Optimization
+
+### Jump to Next Instruction
+
+**Pattern**: A `JMP` instruction that jumps to the very next line of code.
+
+**Before:**
+```asm
+  JMP continue
+continue:
+  LDA #$00
+```
+**After:**
+```asm
+; OPT: JMP continue removed
+continue:
+  LDA #$00
+```
+**Benefit**: Saves 3 cycles and 3 bytes.
+
+### Tail Call Optimization
+
+**Pattern**: A subroutine call (`JSR`) immediately followed by a return (`RTS`). The `JSR`/`RTS` can be replaced by a single `JMP`.
+
+**Before:**
+```asm
+  JSR do_something
+  RTS
+```
+**After:**
+```asm
+  JMP do_something
+; OPT: RTS removed
+```
+**Benefit**: Saves 12 cycles (6 for `JSR`, 6 for `RTS`) and 1 byte. Also reduces stack usage.
+
+## 4. Load/Store Optimization
+
+### Redundant Loads
+
+**Pattern**: Loading a value into a register when that value is already present.
+
+**Before:**
+```asm
+  LDA #$0A
+  STA some_var
+  LDA #$0A     ; Redundant, A already contains $0A
+  STA other_var
+```
+**After:**
+```asm
+  LDA #$0A
+  STA some_var
+; OPT: LDA #$0A removed
+  STA other_var
+```
+**Benefit**: Saves 2 cycles and 2 bytes.
+
+## 5. Constant Propagation & Folding
+
+### Constant Propagation
+
+The optimizer tracks the immediate values held in registers.
+
+**Before:**
+```asm
+  LDA #10
+  STA value
+  ...
+  LDA #10      ; A is known to be 10 here
+  STA value2
+```
+**After:**
+```asm
+  LDA #10
+  STA value
+  ...
+; OPT: LDA #10 removed
+  STA value2
+```
+**Benefit**: Saves 2 cycles and 2 bytes.
+
+### Constant Folding
+
+The optimizer evaluates constant expressions at compile time.
+
+**Before:**
+```asm
+  LDA #$10
+  ORA #$20
+```
+**After:**
+```asm
+  LDA #$30
+; OPT: ORA #$20 removed and folded
+```
+**Benefit**: Saves 2 cycles and 2 bytes.
+
+## 6. Subroutine Inlining
+
+If a subroutine is only called once, the optimizer can replace the `JSR` with the body of the subroutine.
+
+**Before:**
+```asm
+init:
+  JSR clear_memory
+  RTS
+
+clear_memory:
+  LDX #$00
+loop:
+  STA $0400,X
+  INX
+  BNE loop
+  RTS
+```
+**After:**
+```asm
+init:
+  ; JSR clear_memory (inlined below)
+  LDX #$00
+loop:
+  STA $0400,X
+  INX
+  BNE loop
+  RTS
+
+; OPT: clear_memory routine removed after inlining
+```
+**Benefit**: Saves 12 cycles from the `JSR`/`RTS` overhead, but increases code size. Best for `speed` optimization mode.
+
+## 7. Strength Reduction
+
+This technique replaces computationally "expensive" operations with cheaper ones.
+
+**Pattern**: Multiplication by 2.
+
+**Before:**
+```asm
+  CLC
+  ADC my_var   ; Assuming A holds my_var, this is A = A * 2
+```
+**After:**
+```asm
+  ASL A        ; Arithmetic shift left is faster
+```
+**Benefit**: `ASL A` is 2 cycles, `CLC`+`ADC` is 4-5 cycles.
+
+## 8. Flag Usage Optimization
+
+### Redundant Flag Instructions
+
+**Pattern**: Setting or clearing a flag that is already in the desired state.
+
+**Before:**
+```asm
+  CLC
+  LDA #$01
+  CLC          ; Redundant, carry is already clear
+  ADC #$02
+```
+**After:**
+```asm
+  CLC
+  LDA #$01
+; OPT: CLC removed
+  ADC #$02
+```
+**Benefit**: Saves 2 cycles and 1 byte.
+
+This guide covers the core optimizations for the standard 6502 processor. For specifics on 65C02 or 45GS02, please see the `README.md` and the dedicated guides.
diff --git a/README.md b/README.md
@@ -622,4 +622,3 @@ Generated with assistance from Claude (Anthropic)
 
 - [6502 Optimization Guide](./6502_optimizations_guide.md)
 - [45GS02 Optimization Guide](./45gs02_optimization_guide.md)
-- [Local Labels Example](./local_labels_example.asm)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -622,4 +622,3 @@ Generated with assistance from Claude (Anthropic)

		- [6502 Optimization Guide](./6502_optimizations_guide.md)
		- [45GS02 Optimization Guide](./45gs02_optimization_guide.md)
		- [Local Labels Example](./local_labels_example.asm)