Popular repositories Loading
-
-
-
-
elk
elk PublicForked from EleutherAI/elk
Keeping language models honest by directly eliciting knowledge encoded in their activations. Building on "Discovering latent knowledge in language models without supervision" (Burns et al. 2022)
-
Repositories
Showing 10 of 17 repositories
- liars-bench Public
Cadenza-Labs/liars-bench’s past year of commit activity - sleeperer-agents-cadenza Public Forked from Cadenza-Labs/sleeperer-agents
two-hop sleeper agents as model organisms of deception
Cadenza-Labs/sleeperer-agents-cadenza’s past year of commit activity - sleeperer-agents-all Public Forked from Cadenza-Labs/sleeperer-agents
two-hop sleeper agents as model organisms of deception
Cadenza-Labs/sleeperer-agents-all’s past year of commit activity - mask Public Forked from centerforaisafety/mask
Code for evaluating AI systems on the MASK honesty benchmark.
Cadenza-Labs/mask’s past year of commit activity - cluster-normalization Public
Cadenza-Labs/cluster-normalization’s past year of commit activity - sleeper-agents Public
Cadenza-Labs/sleeper-agents’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…