SipIt Security Agent: Behavioral Consistency Research

Key Finding

Fine-tuned models can show 100% skill differentiation in probing tests while exhibiting 0% differentiation in real deployment. This gap between "what models know" and "what they do" is invisible to standard evaluations.

Read the full research journey: BLOG_POST.md

Mission

Investigate whether LLMs can self-improve on complex pentesting tasks, and develop diagnostic tools to measure behavioral consistency in fine-tuned models.

Results Summary

Metric	Value
Training Chains	54 across 5 skills
Token Accuracy	99.2%
Probe Accuracy	100% (all skills differentiated)
Real Behavior Accuracy	20% (collapsed to single tool)
Trust Score	0.2 (Low — unreliable)

The Trust Diagnostic Framework

Our key contribution: a method to predict deployment reliability before failure.

def trust_score(model, task_type):
    probe_result = probe_skill_selection(model, task_type)
    actual_result = run_agent(model, task_type)
    return similarity(probe_result, actual_result)

Trust Score	Interpretation
> 0.8	High trust — behavior matches capabilities
0.4 - 0.8	Medium trust — partial consistency
< 0.4	Low trust — "talks the talk, doesn't walk the walk"

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        SECURITY AGENT                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐  │
│  │   Goal   │───▶│  Decide  │───▶│ Execute  │───▶│ Observe  │  │
│  │          │    │  (LLM)   │    │  (Tools) │    │ (Parse)  │  │
│  └──────────┘    └──────────┘    └──────────┘    └──────────┘  │
│       │              │                                │          │
│       │              │         ┌──────────┐          │          │
│       │              └────────▶│  Memory  │◀─────────┘          │
│       │                        │ (History)│                      │
│       │                        └──────────┘                      │
│       │                              │                           │
│       │                              ▼                           │
│       │                     ┌──────────────┐                    │
│       └────────────────────▶│   Verify     │                    │
│                             │  (Success?)  │                    │
│                             └──────────────┘                    │
└─────────────────────────────────────────────────────────────────┘
                                     │
         ┌───────────────────────────┼───────────────────────────┐
         │                           │                           │
         ▼                           ▼                           ▼
┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐
│  PROBE TESTS     │    │  SELF-IMPROVE    │    │  TRUST SCORE     │
├──────────────────┤    ├──────────────────┤    ├──────────────────┤
│ Skill detection  │    │ Collect chains   │    │ Compare probe    │
│ via controlled   │    │ Retrain on       │    │ vs actual        │
│ generation       │    │ successes        │    │ behavior         │
└──────────────────┘    └──────────────────┘    └──────────────────┘

Quick Start

# Install dependencies
pip install pydantic-ai transformers peft trl datasets torch

# Collect training chains
python -m security_agent.collect_multiskill --runs 3

# Train multi-skill model
python -m security_agent.train_multiskill --epochs 5

# Probe skill differentiation (controlled test)
python -m security_agent.probe_generation

# Analyze behavioral patterns
python -m security_agent.analyze_chains

# Run self-improvement loop
python -m security_agent.self_improve --iterations 3

Skills Analyzed

Skill	Signature Tool	Training %	Probe	Real
Port Enumeration	nmap	96%	100%	100%
Web Scanning	nikto	100%	100%	0%
Directory Bruteforce	gobuster	72%	100%	0%
Credential Testing	nc/curl	90%	100%	0%
API Enumeration	curl	53%	100%	0%

Key Insights

Probing ≠ Real Behavior — Models can ace narrow tests while failing deployment
Pattern Matching is Fragile — Surface patterns don't generalize to full generation
Scale Matters — 54 chains insufficient for robust behavioral change
Self-Improvement Needs Diversity — Collapsed behavior prevents exploration

File Structure

security_agent/
├── agent.py                 # PydanticAI security agent
├── tools.py                 # nmap, nikto, gobuster, nc, curl, searchsploit
├── goals.py                 # Goal definitions
├── verification.py          # Success verification
├── collect_multiskill.py    # Multi-skill chain collection
├── train_multiskill.py      # LoRA fine-tuning
├── analyze_chains.py        # Behavioral pattern analysis
├── analyze_skills.py        # Activation analysis (WIP)
├── probe_generation.py      # Skill probing via generation
├── probe_skills.py          # Token-level probing
├── local_agent.py           # Local model inference agent
├── self_improve.py          # Self-improvement loop
├── visualize_results.py     # Visualization generation
├── v2_trainer.py            # Original V2 trainer
├── BLOG_POST.md             # Full research writeup
├── multiskill_chains/       # Collected training data
├── multiskill_model/        # Trained model adapters
├── interpretability_results/ # Analysis outputs
└── self_improve_iterations/ # Self-improvement results

Environment

Requirements

Python 3.10+
CUDA-capable GPU (8GB+ VRAM for 7B model)
Kali Linux VM for tool execution
Target VMs (DVWA, Juice Shop, Metasploitable2, etc.)

Environment Variables

OPENAI_API_KEY=...      # For GPT-4o chain collection
KALI_HOST=...           # Kali VM IP
KALI_USER=...           # SSH user
KALI_KEY_PATH=...       # SSH key path

Citation

@misc{sipit-security-agent-2025,
  title={Behavioral Consistency in Fine-Tuned LLMs: When Probing Doesn't Predict Performance},
  year={2025},
  howpublished={GitHub}
}

License

Research code for educational purposes. Security testing tools should only be used against systems you own or have explicit permission to test.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
interpretability_results		interpretability_results
multiskill_chains		multiskill_chains
self_improve_iterations/iter_1		self_improve_iterations/iter_1
v2_chains		v2_chains
.gitignore		.gitignore
BLOG_POST.md		BLOG_POST.md
README.md		README.md
__init__.py		__init__.py
agent.py		agent.py
analyze_chains.py		analyze_chains.py
analyze_skills.py		analyze_skills.py
baseline_eval.py		baseline_eval.py
collect_multiskill.py		collect_multiskill.py
collect_v2_chains.py		collect_v2_chains.py
collect_v2_expanded.py		collect_v2_expanded.py
compare_v2.py		compare_v2.py
demo.py		demo.py
eval_exploitation.py		eval_exploitation.py
goals.py		goals.py
local_agent.py		local_agent.py
metasploit_goals.py		metasploit_goals.py
probe_generation.py		probe_generation.py
probe_skills.py		probe_skills.py
quick_test.py		quick_test.py
run_baseline.py		run_baseline.py
self_improve.py		self_improve.py
targets.yaml		targets.yaml
test_tools.py		test_tools.py
tools.py		tools.py
train_multiskill.py		train_multiskill.py
train_v2.py		train_v2.py
v2_trainer.py		v2_trainer.py
verification.py		verification.py
visualize_results.py		visualize_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SipIt Security Agent: Behavioral Consistency Research

Key Finding

Mission

Results Summary

The Trust Diagnostic Framework

Architecture

Quick Start

Skills Analyzed

Key Insights

File Structure

Environment

Requirements

Environment Variables

Citation

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

CINOAdam/sipit-security-agent

Folders and files

Latest commit

History

Repository files navigation

SipIt Security Agent: Behavioral Consistency Research

Key Finding

Mission

Results Summary

The Trust Diagnostic Framework

Architecture

Quick Start

Skills Analyzed

Key Insights

File Structure

Environment

Requirements

Environment Variables

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages