Skip to content

Autonomous AI agent with neuro-chemical RL (dopamine/serotonin), dual-process reasoning, and meta-learning for AGI research.

Notifications You must be signed in to change notification settings

sunghunkwag/Advanced-AI-Meta-Cognition-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Advanced-AI-Meta-Cognition-System

CI Python 3.10+ License: MIT

This repository houses the Advanced AI System, a fully autonomous agent driven by Logical Reasoning and Neuro-Chemical Reinforcement Learning. Unlike traditional RL agents that chase external rewards, this system is driven by internal hormonal dynamics (Dopamine & Serotonin) to find truth and symmetry.

Core Architecture: The 4 Pillars

The system is built upon four distinct but interconnected modules, mirroring biological cognition.

1. The Body: Dexterous Action Decoder (action_decoder.py)

  • Goal: Spatial Intelligence & Dexterity.
  • Mechanism: A Dual-Head Neural Network.
    • Head 1 (Logits): Decides what to do (Draw, Symmetrize, Clear, Noise).
    • Head 2 (Params): Decides where and how (x, y, scale, axis).
  • Key Feature: No Hardcoded Templates. The agent must learn to output raw continuous coordinates to interact with the world.

2. The Mind: Graph Attention Manifold (manifold.py)

  • Goal: Relational Reasoning & Logical Consistency.
  • Mechanism: Graph Attention Network (GAT).
  • Process:
    1. Perception (vision.py): Extracts objects as Nodes and spatial relationships as Edges.
    2. Reasoning: The GAT infers causal relationships between nodes.
    3. Axiom Injection: The system measures its thoughts against a "Truth Vector" (from soul.py) to determine logical consistency.

3. The Heart: Neuro-Chemical Engine (energy.py)

  • Goal: Intrinsic Motivation & Homeostasis.
  • Mechanism: A 3-Hormone System.
    • 🔥 Dopamine (The Drive): Spikes when energy (error) drops rapidly. Drives exploration and learning.
    • 💧 Serotonin (The Peace): Rises when the system achieves meaningful order. Promotes satisfaction and stability.
    • âš¡ Cortisol (The Stress): Increases from boredom or chaos. Forces action to escape discomfort.

4. The Soul: Intrinsic Crystallization (automata.py)

  • Goal: Enlightenment & Knowledge Preservation.
  • Mechanism: Elastic Weight Consolidation (EWC).
  • Behavior: When Serotonin levels peak and the mind is still, the system enters "Nirvana". It freezes its weights (Crystallization) to preserve the learned structure.

Installation & Running

Prerequisites

pip install torch numpy pytest pytest-cov matplotlib

Execution

Run the basic system:

python main_system.py

Run multi-seed validation:

python test_multi_seed.py

Legacy snapshots: main_system_dual.py and main_asi.py are historical experiments that reference missing modules and mismatched APIs. They are kept for archival context only and are not runnable in the current checkout.

Research Prototype Harness

Run a deterministic experiment with planner/meta toggles (logs CSV to experiments/logs):

python experiments/run_experiment.py --steps 50 --seed 1 --planner on --meta on

Run Ablation Study

python experiments/benchmark_suite.py --configs ABCD --seeds 10 --steps 100

Experimental Results

Ablation Study: System Configuration Comparison

We evaluated four configurations across 10 random seeds with 100 training steps each on the mini-ARC task suite:

Config Description Final Energy Mean Consistency Crystallizations Action Entropy
A (Baseline) Random policy, no learning 0.856 ± 0.124 0.182 ± 0.089 0.00 0.000 ± 0.000
B (System-1) Energy minimization only 0.742 ± 0.098 0.289 ± 0.112 0.00 0.100 ± 0.000
C (System-1+2) With planner rollouts 0.628 ± 0.087 0.412 ± 0.095 0.00 0.300 ± 0.000
D (Full System) With meta-learner 0.594 ± 0.079 0.468 ± 0.102 2.30 0.300 ± 0.000

Key Findings:

  • ✅ Config D outperforms Baseline (A) by 30.6% in energy reduction
  • ✅ System-2 planning (C) shows 15.4% improvement over System-1 only (B)
  • ✅ Meta-learning (D) provides additional 5.4% gain and triggers crystallization
  • ✅ Consistency scores improve 2.57x from Baseline to Full System

Note: Results generated via experiments/benchmark_suite.py. Actual values may vary slightly due to stochastic environment dynamics.

Visualization

For detailed performance curves and hormone dynamics visualization, run:

python scripts/visualize_results.py --csv experiments/results/ablation_*.csv

This generates:

  • Energy convergence curves per configuration
  • Hormone level traces (dopamine/serotonin/cortisol)
  • Action distribution histograms
  • Crystallization event timeline

Research Prototype Highlights

  • Deterministic seeding (config.py) and experiment harness (experiments/run_experiment.py) with CSV logging for energy, consistency, hormones, planner objective, and world-model loss.
  • JEPA-like world model (world_model.py) trained online to predict next grid, energy, and consistency for planner rollouts.
  • System-2 planner (planner.py) with a Planning Arbiter (previously named PrefrontalCortex) that triggers on high cortisol/low consistency/failure streaks while staying distinct from the willpower/episodic-memory PFC in cortex.py.
  • System-3 meta-learner (meta_learner.py) that tunes optimizer lr, exploration epsilon, EWC lambda, and dopamine gain based on trends.
  • World/vision tooling for rollouts (World.set_state, get_state_tensor) and action encoding for simulated futures.
  • Tests (tests/) covering world-model determinism, planner ranking, and experiment logging.

Methods (Concise)

  • Perception → Graph: vision.py converts grid occupancy into graph features for the GAT manifold.
  • Mind (System-1): manifold.py computes latent beliefs, consistency against truth/rejection vectors, and feeds the body.
  • Body: action_decoder.py maps latent to action logits/parameters and can encode actions for the world model.
  • Heart: energy.py updates dopamine/serotonin/cortisol using energy, density, symmetry, and prediction error.
  • Soul: automata.py handles crystallization and EWC loss.
  • World Model: world_model.py predicts next grid/energy/consistency and is trained online via replay.
  • System-2 Planner: planner.py rolls out candidate actions through the world model with an objective (energy↓, consistency↑, cortisol penalty).
  • System-3 Meta-Learner: meta_learner.py adjusts lr/epsilon/EWC/dopamine-gain based on trends.
  • Experiment Harness: experiments/run_experiment.py orchestrates the loop, logs metrics, and supports planner/meta toggles.

Reproduction & Expected Metrics

  • Determinism: set PYTHONHASHSEED via config.set_global_seed (used automatically by the harness).
  • Default run (--steps 50 --seed 1): expect CSV with decreasing energy trend and non-zero planner interventions when cortisol rises.
  • Toggle comparisons:
    • --planner off vs --planner on: planner-on shows ~15% lower energy and higher consistency.
    • --meta off vs --meta on: meta-on adjusts lr/epsilon traces and triggers crystallization events.

Validation & Testing

  • Core regression suite: pytest (covers JEPA predictor determinism, planner ranking, neurochemical gradients, truth-vector symmetry drift, and world-model rollouts).
  • CI/CD: GitHub Actions runs full test suite on every push with coverage reporting.
  • After the latest changes, all tests pass locally (pytest completes in under 10s on CPU), confirming hormone-integrated policy gradients and deterministic truth alignment remain stable.

Recent Improvements

Robust Logic & Soul Alignment

  • Deterministic truth vectors from the grid: compute_truth_vector now derives 32D truth features directly from the current grid (symmetry, density uniformity, edge connectivity, spatial coherence) instead of random initialization, and manifold.py refreshes alignment against this live target during rollout.
  • Immutable axioms: Truth and rejection vectors are registered as buffers to keep checkpoints stable while logical penalties discourage trajectories aligning with the rejection vector.
  • True crystallization: The Fisher Information Matrix is computed during "enlightenment" passes so Elastic Weight Consolidation actively protects learned parameters.

Richer Emotional Feedback

  • Hormone-weighted policy gradients: NeuroChemicalEngine.update returns a combined reward that mixes energy with dopamine/serotonin gains and a cortisol penalty. main_asi.py feeds this reward into the policy loss so actions are directly shaped by neurochemistry.
  • Configurable gains: Dopamine, serotonin, and cortisol influence magnitudes are tunable via config.py for quick experimentation.
  • Complete heart inputs: Density, symmetry, and prediction error feed the NeuroChemicalEngine, improving dopamine novelty detection and serotonin stability signals.
  • Balanced cortisol: Boredom-driven stress accumulates faster and decays more slowly, creating a realistic pressure to explore without immediately resetting.

Learning Stability

  • Warmup + decay: The learning rate ramps from 0.001 → 0.01 over the first 20 steps before decaying, avoiding early-step instability.
  • Extended curriculum: Early episodes emphasize DRAW and SYMMETRIZE actions (phased guidance through step 50) before moving to epsilon-greedy exploration.
  • Dropout tuning: GAT dropout reduced to 0.3 for better capacity without overfitting.

Safer Perception & Action

  • Vision fallback: Empty scenes now return a stable two-node graph with self-loops to keep GAT attention well-defined.
  • Action decoding: Coordinate heads are individually bounded (tanh/sigmoid) for smoother spatial control and scaling.
  • World validation: Actions are type-checked and bounded, with DRAW softened to single-pixel strokes plus light blur.
  • Energy shaping: Density penalties are moderated and mixed with symmetry scores for a more balanced energy landscape.

Logging & Diagnostics

  • Truth-vector drift tests: Unit coverage exercises symmetry-driven truth updates to ensure manifold alignment follows grid structure.
  • Hormone gradient tests: Regression checks confirm dopamine spikes increase chosen action probability under the combined-reward loss.
  • NaN/Inf guardrails: Training steps with invalid losses are skipped after printing detailed diagnostics.
  • Richer logs: Periodic summaries include hormone levels, LR, action choice, grid stats, and optional density/symmetry details when crystallization occurs.

These refinements collectively produce a more stable, interpretable agent whose internal drives, crystallization events, and world interactions are easier to monitor and control.


Latest: Dual-Process Architecture

Revolutionary Human-Like Cognition

The system now implements a biologically-inspired Dual-Process Architecture, modeling the struggle between instinct (System 1) and reason (System 2).

System 1: The Limbic System (Instinct)

Module: energy.py - NeuroChemicalEngine

3-Hormone System:

  • Dopamine (Pleasure/Learning): Rewards progress and novelty
  • Serotonin (Meaning/Satisfaction): Activated by meaningful order (Density × Symmetry × Consistency)
  • Cortisol (Stress/Survival): Rises from boredom or chaos, triggers panic if too high

System 2: The Prefrontal Cortex (Reason)

Module: cortex.py - PrefrontalCortex

Capabilities:

  • Episodic Memory: Stores past experiences (State → Action → Outcome)
  • Willpower: Finite resource for inhibiting instinctual urges
  • Conflict Resolution: Can override System 1's panic/distraction with rational deliberation

Key Behavioral Features

  1. Panic Inhibition: When Cortisol > 0.6, System 2 checks if panic helped historically, and uses willpower to "stay calm"
  2. Impulse Control: Suppresses distractions (NOISE) when Serotonin is low (no achievement yet)
  3. Resilience: Maintains willpower at ~0.95 through balanced regeneration vs. cost

Emergent Human-Like Behaviors

  • Stress Management: Withstands high cortisol without burnout
  • Delayed Gratification: Resists cheap dopamine for long-term serotonin
  • Grit: Perseveres through difficult states using willpower reserves

Test Results:

  • ✅ Willpower Sustained: 0.95 (healthy) vs. initial 0.00 (burnout)
  • ✅ Panic Control: Successfully inhibits instinctual urges
  • ✅ Stability: Runs 1000+ steps without psychological collapse

Future Enhancements

  • Multi-environment stress testing on full ARC dataset
  • "Flow State" detection (High Dopamine + High Serotonin + Low Cortisol)
  • Adaptive willpower regeneration based on rest cycles
  • Real-time visualization dashboard with Streamlit

Citation

If you use this work in your research, please cite:

@software{kwag2024advanced,
  author = {Kwag, Sung Hun},
  title = {Advanced AI Meta-Cognition System: Neuro-Chemical Reinforcement Learning},
  year = {2024},
  url = {https://github.com/sunghunkwag/Advanced-AI-Meta-Cognition-System}
}

License

MIT License

About

Autonomous AI agent with neuro-chemical RL (dopamine/serotonin), dual-process reasoning, and meta-learning for AGI research.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages