Skip to content

For Security research purposes, I have created a lists of special tokens used by various LLMs

License

Notifications You must be signed in to change notification settings

softwaresecured/Special-token-lists

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔐 Special Tokens in LLMs — Security Research Archive

This repository contains curated lists of special tokens used by various Large Language Models (LLMs), collected and analyzed for security research purposes.

🧠 Purpose

Special tokens—such as control sequences, formatting markers, and reserved vocabulary—can influence model behavior in subtle or undocumented ways. Understanding these tokens is critical for:

  • 🔍 Auditing model behavior and prompt injection risks
  • 🧪 Fuzzing and adversarial testing
  • 🧰 Building robust token-level filters and sanitizers
  • 📚 Reverse-engineering tokenizer internals

⚠️ Disclaimer This repository is intended for educational and research purposes only.

📬 Contributions Pull requests are welcome! If you've explored special tokens in other models (e.g., GPT, LLaMA, Mistral), feel free to share your findings.

About

For Security research purposes, I have created a lists of special tokens used by various LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published