Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/c-cpp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ jobs:
## run: ./configure
- name: make clean
run: make clean
- name: make check
run: make check
- name: make test
run: make test
124 changes: 124 additions & 0 deletions 45gs02_optimization_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# 45GS02 (MEGA65) Optimization Guide

This guide details the specialized optimization techniques for the 45GS02 CPU found in the MEGA65, as implemented in `opt6502`. The 45GS02 offers several unique instructions and architectural features that allow for significant performance and code size improvements over the standard 6502/65C02.

## Key Differences & Considerations for 45GS02

- **STZ (Store Z Register)**: CRITICAL DIFFERENCE! On the 45GS02, `STZ` *stores the Z register*, not a zero value. This makes it incredibly useful for repeated stores of a specific byte. The `opt6502` *never* converts `LDA #0, STA addr` to `STZ addr` for 45GS02, as it would be incorrect.
- **Z Register**: A general-purpose 8-bit register, similar to A, X, Y.
- **Q Register**: A 32-bit composite register `[Z:Y:X:A]`. Operations on Q affect all four underlying 8-bit registers.
- **New Instructions**: `LDZ`, `STZ`, `NEG`, `ASR`, `LDQ`, `STQ`, `ADCQ`, `SBCQ`, `CMPQ`, `ASRQ`, `RORQ`, `ROLQ`, `INC16`, `DEC16`, `PHW`, `PLW`, `BRA` (always).
- **Extended NOP**: `NOP #cycles` allows for precise multi-cycle delays.

## 1. Z Register for Repeated Stores

Leverages the `LDZ` and `STZ` instructions to efficiently store the same value to multiple memory locations.

**Pattern**: Repeated sequence of loading an immediate value and storing it.

**Before:**
```asm
LDA #$20
STA $0400
LDA #$20
STA $0401
LDA #$20
STA $0402
```

**After:**
```asm
LDZ #$20 ; Load Z register once
STZ $0400 ; Store from Z
STZ $0401 ; Store from Z
STZ $0402 ; Store from Z
```
**Benefit**: Saves 2 bytes and 2 cycles per additional store (`LDA`/`STA` is 4 bytes/6 cycles, `STZ` is 2 bytes/4 cycles). This is a significant optimization for screen fills, memory initialization, etc.

## 2. 32-bit Q Register Operations (`LDQ`, `STQ`, `ADCQ`, `SBCQ`, etc.)

The Q register is a powerful feature for 32-bit operations by treating A, X, Y, and Z as a single 32-bit entity. `opt6502` identifies sequences of 8-bit loads that form a 32-bit constant.

**Pattern**: Four consecutive immediate loads into A, X, Y, Z to form a 32-bit value `[Z:Y:X:A]`.

**Before:**
```asm
LDA #$AA ; Low byte
LDX #$BB ; Mid-low byte
LDY #$CC ; Mid-high byte
LDZ #$DD ; High byte
```

**After:**
```asm
LDQ #$DDCCBBAA ; Load 32-bit value into Q
; OPT: LDX #$BB removed
; OPT: LDY #$CC removed
; OPT: LDZ #$DD removed
```
**Benefit**: Replaces four instructions (12 bytes, ~16 cycles) with one `LDQ` instruction (5 bytes, 8 cycles), offering massive speed and size improvements for 32-bit constant loading.

## 3. NEG Instruction

The 45GS02 has a dedicated `NEG` (negate accumulator) instruction, which is much more efficient than the traditional 6502 sequence.

**Pattern**: `EOR #$FF, SEC, ADC #$01` (6502 way to negate A).

**Before:**
```asm
EOR #$FF
SEC
ADC #$01
```

**After:**
```asm
NEG A ; Negate accumulator
; OPT: SEC removed
; OPT: ADC #$01 removed
```
**Benefit**: Saves 4 bytes and 5 cycles.

## 4. ASR (Arithmetic Shift Right)

The 45GS02 includes an `ASR` instruction, which performs an arithmetic shift right, preserving the sign bit. This is faster and smaller than the typical 6502 sequence.

**Pattern**: `CMP #$80, ROR` (a common 6502 sequence for signed right shift).

**Before:**
```asm
CMP #$80 ; Check sign
ROR A ; Shift right
```

**After:**
```asm
ASR A ; Arithmetic Shift Right Accumulator
; OPT: CMP #$80 removed
```
**Benefit**: Saves 2 bytes and 2 cycles.

## 5. Extended NOP

The 45GS02 `NOP` instruction can take an operand to specify a delay in cycles, making it ideal for precise timing loops or replacing multiple `NOP` instructions.

**Pattern**: Multiple consecutive `NOP` instructions.

**Before:**
```asm
NOP
NOP
NOP
NOP
```

**After:**
```asm
NOP #8 ; Four NOPs (2 cycles each) replaced with NOP for 8 cycles
; OPT: NOP removed
; OPT: NOP removed
; OPT: NOP removed
```
**Benefit**: Saves bytes by consolidating multiple `NOP`s into a single instruction with a cycle count. (Each NOP is 2 cycles; so NOP #8 replaces 4 NOP instructions).

This guide covers the major 45GS02-specific optimizations within `opt6502`. Utilizing these features effectively can lead to highly performant and compact code on the MEGA65.
259 changes: 259 additions & 0 deletions 6502_optimizations_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
# 6502 Optimization Guide

A practical guide to 6502 assembly optimization techniques implemented in `opt6502`.

This guide provides examples and explanations for each optimization category. The examples use a generic assembly syntax.

## 1. Peephole Optimizations

Peephole optimization involves examining a small "window" of instructions and replacing them with a shorter or faster sequence.

### Redundant Load/Store

**Pattern**: Storing a value to memory and immediately loading it back into the same register.

**Before:**
```asm
STA my_var
LDA my_var
```

**After:**
```asm
STA my_var
; OPT: LDA my_var removed
```
**Benefit**: Saves 3-4 cycles and 2-3 bytes. The value is already in the accumulator.

### Useless Transfers

**Pattern**: Transferring a value between registers back and forth.

**Before:**
```asm
TAX ; A -> X
TXA ; X -> A
```

**After:**
```asm
; OPT: TAX removed
; OPT: TXA removed
```
**Benefit**: Saves 4 cycles and 2 bytes.

### No-Operation Instructions

**Pattern**: Instructions that have no effect on registers or flags in a specific context.

**Before:**
```asm
ORA #$00 ; OR with zero changes nothing
AND #$FF ; AND with all ones changes nothing
```

**After:**
```asm
; OPT: ORA #$00 removed
; OPT: AND #$FF removed
```
**Benefit**: Saves 2 cycles and 2 bytes per removed instruction.

## 2. Dead Code Elimination

This removes code that is unreachable and can never be executed.

**Pattern**: Code that appears immediately after an unconditional jump or return instruction.

**Before:**
```asm
JMP end_of_routine
LDA #$01 ; This line can never be reached
STA $D020

end_of_routine:
RTS
```

**After:**
```asm
JMP end_of_routine
; OPT: LDA #$01 removed
; OPT: STA $D020 removed

end_of_routine:
RTS
```
**Benefit**: Saves bytes and prevents logical errors.

## 3. Jump & Branch Optimization

### Jump to Next Instruction

**Pattern**: A `JMP` instruction that jumps to the very next line of code.

**Before:**
```asm
JMP continue
continue:
LDA #$00
```
**After:**
```asm
; OPT: JMP continue removed
continue:
LDA #$00
```
**Benefit**: Saves 3 cycles and 3 bytes.

### Tail Call Optimization

**Pattern**: A subroutine call (`JSR`) immediately followed by a return (`RTS`). The `JSR`/`RTS` can be replaced by a single `JMP`.

**Before:**
```asm
JSR do_something
RTS
```
**After:**
```asm
JMP do_something
; OPT: RTS removed
```
**Benefit**: Saves 12 cycles (6 for `JSR`, 6 for `RTS`) and 1 byte. Also reduces stack usage.

## 4. Load/Store Optimization

### Redundant Loads

**Pattern**: Loading a value into a register when that value is already present.

**Before:**
```asm
LDA #$0A
STA some_var
LDA #$0A ; Redundant, A already contains $0A
STA other_var
```
**After:**
```asm
LDA #$0A
STA some_var
; OPT: LDA #$0A removed
STA other_var
```
**Benefit**: Saves 2 cycles and 2 bytes.

## 5. Constant Propagation & Folding

### Constant Propagation

The optimizer tracks the immediate values held in registers.

**Before:**
```asm
LDA #10
STA value
...
LDA #10 ; A is known to be 10 here
STA value2
```
**After:**
```asm
LDA #10
STA value
...
; OPT: LDA #10 removed
STA value2
```
**Benefit**: Saves 2 cycles and 2 bytes.

### Constant Folding

The optimizer evaluates constant expressions at compile time.

**Before:**
```asm
LDA #$10
ORA #$20
```
**After:**
```asm
LDA #$30
; OPT: ORA #$20 removed and folded
```
**Benefit**: Saves 2 cycles and 2 bytes.

## 6. Subroutine Inlining

If a subroutine is only called once, the optimizer can replace the `JSR` with the body of the subroutine.

**Before:**
```asm
init:
JSR clear_memory
RTS

clear_memory:
LDX #$00
loop:
STA $0400,X
INX
BNE loop
RTS
```
**After:**
```asm
init:
; JSR clear_memory (inlined below)
LDX #$00
loop:
STA $0400,X
INX
BNE loop
RTS

; OPT: clear_memory routine removed after inlining
```
**Benefit**: Saves 12 cycles from the `JSR`/`RTS` overhead, but increases code size. Best for `speed` optimization mode.

## 7. Strength Reduction

This technique replaces computationally "expensive" operations with cheaper ones.

**Pattern**: Multiplication by 2.

**Before:**
```asm
CLC
ADC my_var ; Assuming A holds my_var, this is A = A * 2
```
**After:**
```asm
ASL A ; Arithmetic shift left is faster
```
**Benefit**: `ASL A` is 2 cycles, `CLC`+`ADC` is 4-5 cycles.

## 8. Flag Usage Optimization

### Redundant Flag Instructions

**Pattern**: Setting or clearing a flag that is already in the desired state.

**Before:**
```asm
CLC
LDA #$01
CLC ; Redundant, carry is already clear
ADC #$02
```
**After:**
```asm
CLC
LDA #$01
; OPT: CLC removed
ADC #$02
```
**Benefit**: Saves 2 cycles and 1 byte.

This guide covers the core optimizations for the standard 6502 processor. For specifics on 65C02 or 45GS02, please see the `README.md` and the dedicated guides.
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -622,4 +622,3 @@ Generated with assistance from Claude (Anthropic)

- [6502 Optimization Guide](./6502_optimizations_guide.md)
- [45GS02 Optimization Guide](./45gs02_optimization_guide.md)
- [Local Labels Example](./local_labels_example.asm)