This tool scans the last N commits of a Git repository for secrets or other sensitive data, combining:
- Regex heuristics for known patterns (API keys, tokens, passwords)
- Entropy analysis for random-looking strings (high-entropy indicators)
- LLM reasoning (DeepSeek) for contextual understanding and classification
The scanner can analyze local or remote repositories, runs efficiently in parallel threads, and produces a single structured JSON report for easy review.
- Scans commit diffs and messages for hard-coded secrets.
- Detects a wide range of tokens: AWS, GitHub, Slack, Stripe, Discord, etc.
- Combines regex + entropy to reduce false negatives.
- Uses an LLM to classify findings and provide reasoning/confidence.
- Parallelized for faster analysis (--threads).
- Outputs a machine-readable JSON report with detailed findings.
Install dependencies:
pip install gitpython tqdm openai
Export your own API key:
export DEEPSEEK_API_KEY="your_deepseek_api_key"
Basic usage: python scanner.py --repo https://github.com/example/project.git --n 10
Flags:
--repo <path url> Path or URL to a Git repository
--n <int> Number of commits to scan (default = 10)
--out <filename> Where to output the JSON report (default = report.json)
--only_sus Include only suspicious diffs (ones flagged by regex or entropy)
--no-full-text Use only suspicious lines instead of full commit diff
--threads <int> Number of concurrent LLM calls (default = 4)
Example output (for one diff):
{
"file path": "README.md",
"commit hash": "1234hash1234hash1234hash1234hash1234hash",
"findings": [
{
"type": "AWS Key",
"line": "AWS keys | `ASIA12COOL34WOWZXCT`| 1 |",
"reasoning": "The line explicitly labels the content as AWS keys and contains a high-entropy string that matches AWS key patterns",
"confidence": "0.90"
},
{
"type": "Facebook password",
"line": "my_facebook password: `qwooqwfjieiweiwowqe123",
"reasoning": "The line contains a facebook password",
"confidence": "0.85"
}
]
}
LLM calls can be slow. The tool uses ThreadPoolExecutor to analyze multiple diffs in parallel. Tune concurrency with the --threads option to balance speed and API rate limits.