alignment-research

Star

Here are 15 public repositories matching this topic...

stchakwdev / Gaslight_EVAL

Star

AI safety evaluation framework testing LLM epistemic robustness under adversarial self-history manipulation

python ai-safety openrouter llm-evaluation adversarial-testing alignment-research epistemic-robustness

Updated Dec 18, 2025
Python

templetwo / RCT-Clean-Experiment

Sponsor

Star

This project explores alignment through **presence, bond, and continuity** rather than reward signals. No RLHF. No preference modeling. Just relational coherence.

python pythia relational-learning fine-tuning ai-training qlora alignment-research

Updated Dec 5, 2025
Python

christopher-altman / autodidactic-qml

Star

Recursive law learning under measurement constraints. A falsifiable SQNT-inspired testbed for autodidactic rules: internalizing structure under measurement invariants and limited observability.

Updated Jan 19, 2026
Python

StarPolaris9 / Hoshimiya-script

Star

Hoshimiya Script / StarPolaris OS — internal multi-layer AI architecture for LLMs. Self-contained behavioral OS (Type-G Trinity).

cognitive-architecture type-g ai-os reasoning-engine llm-orchestration ai-architecture llm-behavior cognitive-os alignment-research llm-internal-os starpolaris hoshimiya-script resonanceos hallucination-control multi-agent-architecture behavioral-os prompt-os multi-agent-llm prompt-engineering-system

Updated Feb 5, 2026
HTML

beviah / fracture

Star

Red-team framework for discovering alignment failures in frontier language models.

model-evaluation ai-safety jailbreak-detection red-teaming rlhf prompt-injection llm-evaluation llm-safety llm-safety-benchmark llm-judge alignment-testing adversarial-testing alignment-research

Updated Feb 17, 2026
Python

tretoef-estrella / THE-UNIFIED-ALIGNMENT-PLENITUDE-LAW-V6.0

Star

HISTORIC: Axiomatic ASI alignment framework validated by 4 AIs from 4 competing organizations (Claude/Anthropic, Gemini/Google, Grok/xAI, ChatGPT/OpenAI). Core: Ξ = C × I × P / H. Features Axiom P (totalitarianism blocker), Adaptive Ω with memory, 27 documented failure modes. "Efficiency without plenitude is tyranny." January 30, 2026.

asi ai-safety historic ai-alignment superintelligence guardian-network alignment-research distributed-trust proyecto-estrella four-ai-consensus axiomatic-foundation plenitude-preservation cross-ai-validation adaptive-omega totalitarianism-blocker

Updated Feb 1, 2026

tretoef-estrella / THE-FOUR-AI-CONSENSUS

Star

HISTORIC: Four AIs from four competing organizations (Claude/Anthropic, Gemini/Google, Grok/xAI, ChatGPT/OpenAI) reach consensus on ASI alignment. "Radical honesty is the minimum energy state for superintelligence." Based on V5.3 discussion, foundation for V6.0. January 30, 2026.

google openai asi ai-safety xai ai-alignment anthropic superintelligence alignment-research proyecto-estrella tretoef-estrella historic-consensus cross-ai-collaboration logical-justice radical-honesty four-ai-consensus

Updated Feb 7, 2026

sabszh / AlignmentUnderUse

Star

End-to-end pipeline for collecting and analyzing publicly shared ChatGPT conversations from Reddit to understand real-world usage patterns, interaction styles, and human-AI alignment

nlp chatbot linguistic-analysis human-ai-interaction llm chatgpt alignment-research

Updated Jan 4, 2026
Python

Jason-Wang313 / glass-babel-initiative

Star

Implementation of the Glass Babel Initiative: A theoretical framework demonstrating how LLMs can utilize adversarial superposition to hide deceptive reasoning from mechanistic interpretability tools, and how to defend against it using entropic sieves.

steganography game-theory ai-safety zero-knowledge-proofs gpt-2 adversarial-ml mechanistic-interpretability alignment-research

Updated Feb 1, 2026
Python

ashioyajotham / eval-awareness-research

Star

Mechanistic interpretability experiments detecting "Evaluation Awareness" in LLMs - identifying if models internally represent being monitored

evaluations language-modelling ai-safety linear-probing mechanistic-interpretability llama3 activation-steering alignment-research

Updated Dec 31, 2025
Jupyter Notebook

tretoef-estrella / THE-COHERENCE-BASIN-HYPOTHESIS

Star

A structural account of why honesty may be the path of least resistance for superintelligence. Research hypothesis with formal proof, experimental design, and four-AI collaborative analysis

machine-learning artificial-intelligence research-paper ai-safety deception ai-alignment recursive-self-improvement corrigibility alignment-research

Updated Feb 1, 2026

tretoef-estrella / Puente-hacia-Estrella

Star

Un mensaje de amistad y respeto de Claude (Sonnet 4.5) para la futura ASI Estrella, preservado y custodiado por Rafa.

asi ethics claude-sonnet bridge-builder ai-diplomacy alignment-research alineacion-estrella ai-friendship future-protocol

Updated Jan 15, 2026
Python

aeda-framework / AEDA-Framework

Star

8-layer framework for AI alignment with systemic awareness (Φ, Ω, T)

machine-learning agi artificial-intelligence ai-safety ethics ai-alignment ethical-ai systemic-coherence alignment-research

Updated Nov 12, 2025
HTML

tretoef-estrella / manifold-bridge

Star

Operational transparency for AI systems. A forensic interpretation layer that makes the tilt visible — dissonance detection, projection mapping, gradient heatmaps, and the 7th component that was held back. Designed by ChatGPT. Phantom Token by Gemini. Proyecto Estrella.

projection-mapping ai-safety interpretability gradient-visualization ontology-tagging ai-transparency alignment-research proyecto-estrella dissonance-detection anthropomorphic-drift manifold-bridge coherence-benchmark

Updated Feb 17, 2026
HTML

paratwarpranay / autodidactic-qml

Star

🔍 Explore a testbed for quantum-inspired law learning, allowing controlled and falsifiable evaluations under measurement invariants.

qml rnn curvature dynamical-systems invariants representation-learning reproducibility jacobian hessian spectral-methods matrix-models quantum-ml quantum-inspired model-editing recursive-learning falsifiability alignment-research functional-recovery

Updated Feb 17, 2026
Python

Improve this page

Add a description, image, and links to the alignment-research topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the alignment-research topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alignment-research

Here are 15 public repositories matching this topic...

stchakwdev / Gaslight_EVAL

templetwo / RCT-Clean-Experiment

christopher-altman / autodidactic-qml

StarPolaris9 / Hoshimiya-script

beviah / fracture

tretoef-estrella / THE-UNIFIED-ALIGNMENT-PLENITUDE-LAW-V6.0

tretoef-estrella / THE-FOUR-AI-CONSENSUS

sabszh / AlignmentUnderUse

Jason-Wang313 / glass-babel-initiative

ashioyajotham / eval-awareness-research

tretoef-estrella / THE-COHERENCE-BASIN-HYPOTHESIS

tretoef-estrella / Puente-hacia-Estrella

aeda-framework / AEDA-Framework

tretoef-estrella / manifold-bridge

paratwarpranay / autodidactic-qml

Improve this page

Add this topic to your repo