Skip to content

ShantanuT01/DACTYL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large language models

arXiv HuggingFace License: MIT


Overview

This repository contains the code and resources for my master's thesis at Cambridge: DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large language models.

Datasets

Models

Generating DACTYL

We used the dactyl-generation package to generate texts. This can be installed via pip:

pip install dactyl-generation

Code

  • finetuning: Continues pre-training a Llama 3.2 1B Instruct model for a specific domain.
  • baselines: Contains code to get predictions from existing AI-text detectors on the DACTYL dataset.
  • cpt_generations: Performs a randomized generation parameter sweep to determine which parameters evade detection better for the Llama 3.2 1B Instruct models.
  • training: Trains an AI-text classifier.
  • evaluation: Evaluates DACTYL-trained classifiers.

Citation

@misc{thorat2025dactyldiverseadversarialcorpus,
      title={DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models}, 
      author={Shantanu Thorat and Andrew Caines},
      year={2025},
      eprint={2508.00619},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2508.00619}, 
}

About

Source code for DACTYL.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published