- Toronto
-
13:58
(UTC -05:00)
Pinned Loading
-
Gaslight_EVAL
Gaslight_EVAL PublicAI safety evaluation framework testing LLM epistemic robustness under adversarial self-history manipulation
Python 1
-
Secret_H_Evals
Secret_H_Evals PublicMulti-agent strategic deception evaluation framework for LLMs using Secret Hitler as a testbed. Analyzes AI reasoning, trust dynamics, and deceptive behavior patterns.
Python 1
-
Pinocchio-Vector-Test
Pinocchio-Vector-Test PublicInvestigating whether language models encode anticipated social consequences in their activations. Uses a 2x2 factorial design crossing truth × social valence to show that models are more sensitive…
Python 1
-
If the problem persists, check the GitHub status page or contact support.

