Self-Improvement via Inversion

Training language models without human labels, ground truth, or external reward models.

The Key Insight

If a model can reconstruct what it was asked from what it generated, it probably understood the task.

Forward:  "Scan port 80 on 192.168.1.1" → nmap -p 80 192.168.1.1
Inverse:  nmap -p 80 192.168.1.1 → "Scan port 80 on host 192.168.1.1"
Score:    similarity(original, reconstructed) = 0.95

High reconstruction fidelity = the model understood what it generated = good output.

Results (10 Cycles)

Metric	Before	After	Improvement
Intrinsic Score	0.531	0.627	+18.1%
Acceptance Rate	60.7%	88.0%	+27.3pp
Training Loss	5.70	4.73	-17%

Zero human labels. Zero external reward models. Pure self-improvement.

📊 Full results breakdown

How It Works

┌─────────────────────────────────────────────────────────┐
│                  SELF-IMPROVEMENT LOOP                  │
├─────────────────────────────────────────────────────────┤
│                                                         │
│   1. GENERATE    Task ──────────► Command               │
│                  "Scan port 80"    "nmap -p 80 ..."     │
│                                                         │
│   2. INVERT      Command ─────────► Task'               │
│                  "nmap -p 80 ..."   "Scan port 80"      │
│                                                         │
│   3. SCORE       similarity(Task, Task')                │
│                  High = understood, Low = confused      │
│                                                         │
│   4. SELECT      Keep high-scoring examples             │
│                                                         │
│   5. TRAIN       Fine-tune on self-generated data       │
│                                                         │
│   6. REPEAT      → Better generation → Better scores    │
│                                                         │
└─────────────────────────────────────────────────────────┘

Quick Start

# Install dependencies
pip install -r requirements.txt

# Run self-improvement (3 cycles, quick test)
python intrinsic_self_improvement.py --cycles 3 --tasks 30

# Run extended experiment (10 cycles)
python intrinsic_self_improvement.py --cycles 10 --tasks 50

# Run with custom settings
python intrinsic_self_improvement.py \
  --cycles 20 \
  --tasks 50 \
  --candidates 5 \
  --threshold 0.5 \
  --temperature 0.7 \
  --output ./my_experiment

Arguments

Argument	Default	Description
`--cycles`	10	Number of self-improvement cycles
`--tasks`	50	Tasks generated per cycle
`--candidates`	3	Candidates per task
`--threshold`	0.5	Minimum score to accept example
`--temperature`	0.7	Generation temperature
`--model`	mistralai/Mistral-7B-Instruct-v0.2	Base model
`--output`	./output/extended_run	Output directory

Output

output/
├── {experiment_name}/
│   ├── checkpoint_cycle_1/    # Model after cycle 1
│   ├── checkpoint_cycle_2/    # Model after cycle 2
│   ├── ...
│   ├── progress.json          # Cycle-by-cycle results
│   └── self_improvement_results.json  # Final results

Why This Is Novel

Method	Human Labels	Ground Truth	External Judge
STaR (2022)	No	Yes	No
SPIN (2024)	Yes	Yes	No
Self-Rewarding (2024)	No	No	Yes
Ours	No	No	No

We eliminate ALL external supervision by using inversion as an intrinsic signal.

Limitations & Follow-Up Research

While the intrinsic metrics show improvement, follow-up research revealed critical limitations:

The Probing vs Reality Gap

In a security agent extension, we tested whether self-improvement translates to real-world task performance:

Test Type	Result
Probing (controlled)	100% skill differentiation
Real deployment	20% skill differentiation
Trust Score	0.2 (Low)

Key finding: Models can show improvement on narrow metrics while failing in deployment. The model learned superficial patterns (skill label → tool token) without deeper behavioral understanding.

What This Means

Pattern matching ≠ Reasoning — Inversion fidelity measures reconstruction, not understanding
Intrinsic metrics can mislead — High scores on self-generated tests don't guarantee real-world performance
Scale matters — Small training sets (50-100 examples) produce fragile patterns
Diversity is critical — Self-improvement loops can collapse to repetitive behavior

The Trust Diagnostic

This led us to develop a behavioral consistency framework:

trust_score = similarity(probe_result, actual_behavior)
# High = reliable, Low = "talks the talk but doesn't walk the walk"

See the full analysis: sipit-security-agent

Open Questions

Can larger scale (1000s of examples) overcome the pattern matching limitation?
Would RL-based reward signals produce more robust improvement?
Is there a phase transition from pattern matching to genuine reasoning?

Citation

@misc{inversion-self-improvement-2025,
  title={Self-Improvement via Inversion: Training Language Models Without External Supervision},
  author={Adam Kruger},
  year={2025},
  url={https://github.com/CINOAdam/inversion-self-improvement}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
RESULTS.md		RESULTS.md
TRUE_SELF_IMPROVEMENT_PROOF.md		TRUE_SELF_IMPROVEMENT_PROOF.md
intrinsic_self_improvement.py		intrinsic_self_improvement.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Improvement via Inversion

The Key Insight

Results (10 Cycles)

How It Works

Quick Start

Arguments

Output

Why This Is Novel

Limitations & Follow-Up Research

The Probing vs Reality Gap

What This Means

The Trust Diagnostic

Open Questions

Citation

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

CINOAdam/inversion-self-improvement

Folders and files

Latest commit

History

Repository files navigation

Self-Improvement via Inversion

The Key Insight

Results (10 Cycles)

How It Works

Quick Start

Arguments

Output

Why This Is Novel

Limitations & Follow-Up Research

The Probing vs Reality Gap

What This Means

The Trust Diagnostic

Open Questions

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages