Leaderboard

OSINTbench is a benchmark for evaluating how well large language models can perform open-source intelligence (OSINT) tasks. Categories include:

Geolocation: Spatial reasoning
Identification: Information synthesis, breadth of knowledge
Temporal: Temporal reasoning
Analysis: General reasoning

Leaderboard

Installation

git clone https://github.com/ccmdi/osintbench.git
cd osintbench
pip install -r requirements.txt

Setup your .env based on SAMPLE.env for whichever model providers you wish to test for (e.g. ANTHROPIC_API_KEY must be set to test Claude).

You will need to manually create a dataset for this to work. Datasets follow this schema:

"cases": [
    {
      "id": <case_number>,
      "images": [
        "images/<image_number>.<ext>"
      ],
      "info": "<context given to the model about the case>",
      "tasks": [
        {
          "id": 1,
          "type": "location",
          "prompt": "Find the exact location of the photo.",
          "answer": {
            "lat": <true_lat>,
            "lng": <true_lng>
          }
        },
        {
            "id": 2,
            "type": "identification",
            "prompt": "Who is this?",
            "answer": "<person_name>"
        }
      ]
    },
    ...

The folder for a dataset should be in the structure:

dataset/
├─ basic/
│  ├─ metadata.json
│  ├─ images/
│  │  ├─ 2.jpg
│  │  ├─ 1.png
├─ advanced/
│  ├─ metadata.json

Where your dataset definition lives in metadata.json.

Test a model

Caution

Most outputs are evaluated by a judge model. Double-check responses before finalizing results.

python osintbench.py --dataset <test name> --model <model name>

Models go by their class name in models.py. Gemini 2.5 Flash goes by Gemini2_5Flash, for instance.

Roadmap

Note

Contributors are welcome! Check the roadmap.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.vscode		.vscode
img		img
models		models
scripts		scripts
visualizations/streamlit		visualizations/streamlit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SAMPLE.env		SAMPLE.env
context.py		context.py
osintbench.py		osintbench.py
prompt.py		prompt.py
requirements.txt		requirements.txt
tools.py		tools.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Leaderboard

Installation

Test a model

Roadmap

About

Uh oh!

Languages

License

ccmdi/osintbench

Folders and files

Latest commit

History

Repository files navigation

Leaderboard

Installation

Test a model

Roadmap

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages