CTFtime Archive Dataset

A structured dataset of capture the flag (CTF) competition events from CTFtime.org, covering 2015 through 2025.

The dataset contains 2,478 events scraped from CTFtime's past events archive. Each event includes its name, date, format, location, and CTFtime weight rating. An enriched version adds 20+ derived variables for temporal analysis, duration categories, COVID-era classification, and more.

Dataset Overview

	Raw	Enriched
File	`ctftime_archive_all.csv`	`ctftime_archive_all_enriched.csv`
Rows	2,478	2,380
Columns	8	28
Description	Parsed directly from CTFtime	Cleaned, standardized, with derived variables

The enriched file has fewer rows because events with durations over 7 days (training platforms, long-running challenges) were filtered out to focus on traditional CTF competitions.

Events per year:

2015	2016	2017	2018	2019	2020	2021	2022	2023	2024	2025
81	109	142	157	201	230	241	274	332	351	360

Files

ctftime-archive/
├── data/
│   ├── ctftime_archive_all.csv            # Raw parsed data (8 columns)
│   └── ctftime_archive_all_enriched.csv   # Enriched data (28 columns)
├── data_dictionary.csv                    # Column descriptions and types
├── parse_ctf.py                           # Parser: CTFtime text -> raw CSV
├── enrich_ctf_data.py                     # Enrichment: raw CSV -> enriched CSV
├── describe_data.py                       # Quick dataset summary stats
├── requirements.txt
├── CITATION.cff
├── LICENSE
└── README.md

How the Data Was Collected

Visited each year's archive page on CTFtime (e.g. ctftime.org/event/list/?year=2022)
Copied the event table into a text file (tab-separated)
Ran parse_ctf.py to standardize formats, locations, and weights into a clean CSV
Ran enrich_ctf_data.py to add temporal variables, duration calculations, and categorical flags

The raw text files are not included in this repo, but the parser script documents the expected input format.

Column Reference

The raw CSV has 8 columns:

Column	Example
event_id	1
name	32C3 CTF
year	2015
date_raw	27 Dec., 12:00 PST — 29 Dec. 2015, 12:00 PST
format	Jeopardy
location	Online
weight	70.00
notes	N/A

CTFtime lists actual city and country names for in-person events (e.g. "Moscow, Russia"). In both CSV files, these have been standardized to "Online" and "On-site" for consistency.

The enriched CSV adds 20 derived columns. See data_dictionary.csv for the full list with types and descriptions. Additions include:

start_date, end_date, duration_hours, duration_days
start_quarter, season, covid_era
is_weekend, is_multi_day, is_qualifier, is_finals
duration_category (Short/Medium/Long), weight_category (Zero/Low/Medium/High)

Usage

Load the enriched dataset directly (no scripts needed):

import csv

with open('data/ctftime_archive_all_enriched.csv', 'r') as f:
    reader = csv.DictReader(f)
    events = list(reader)

Or with pandas:

import pandas as pd

df = pd.read_csv('data/ctftime_archive_all_enriched.csv')

Rebuild from scratch (if you want to re-scrape or modify the pipeline):

pip install -r requirements.txt

# Step 1: Parse a raw text file
python parse_ctf.py 2022_raw.txt --year 2022

# Step 2: Enrich the parsed CSV
python enrich_ctf_data.py 2022_ctf_data.csv

Citation

If you use this dataset, please cite:

@misc{jimenez2026ctftime,
  author    = {Jimenez, Jhaell},
  title     = {CTFtime Archive Dataset: 2015-2025},
  year      = {2026},
  url       = {https://github.com/xjhaell/ctftime-archive}
}

GitHub also provides a citation button via the CITATION.cff file in this repo.

License

MIT. See LICENSE.

The underlying CTF event data is sourced from CTFtime.org. This dataset is a structured compilation intended for research and analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CTFtime Archive Dataset

Dataset Overview

Files

How the Data Was Collected

Column Reference

Usage

Citation

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
data_dictionary.csv		data_dictionary.csv
describe_data.py		describe_data.py
enrich_ctf_data.py		enrich_ctf_data.py
parse_ctf.py		parse_ctf.py
requirements.txt		requirements.txt

License

xjhaell/ctftime-archive

Folders and files

Latest commit

History

Repository files navigation

CTFtime Archive Dataset

Dataset Overview

Files

How the Data Was Collected

Column Reference

Usage

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages