This repository contains the code and resources for my master's thesis at Cambridge: DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large language models.
We used the dactyl-generation package to generate texts. This can be installed via pip:
pip install dactyl-generationfinetuning: Continues pre-training a Llama 3.2 1B Instruct model for a specific domain.baselines: Contains code to get predictions from existing AI-text detectors on the DACTYL dataset.cpt_generations: Performs a randomized generation parameter sweep to determine which parameters evade detection better for the Llama 3.2 1B Instruct models.training: Trains an AI-text classifier.evaluation: Evaluates DACTYL-trained classifiers.
@misc{thorat2025dactyldiverseadversarialcorpus,
title={DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models},
author={Shantanu Thorat and Andrew Caines},
year={2025},
eprint={2508.00619},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.00619},
}