Skip to content

A tool that scans the last N commits of a Git repository for secrets or other sensitive data

License

Notifications You must be signed in to change notification settings

SimonSl07/Git-Secret-Scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Git Secret Scanner (LLM-Powered)

This tool scans the last N commits of a Git repository for secrets or other sensitive data, combining:

  • Regex heuristics for known patterns (API keys, tokens, passwords)
  • Entropy analysis for random-looking strings (high-entropy indicators)
  • LLM reasoning (DeepSeek) for contextual understanding and classification

The scanner can analyze local or remote repositories, runs efficiently in parallel threads, and produces a single structured JSON report for easy review.

Features:

  • Scans commit diffs and messages for hard-coded secrets.
  • Detects a wide range of tokens: AWS, GitHub, Slack, Stripe, Discord, etc.
  • Combines regex + entropy to reduce false negatives.
  • Uses an LLM to classify findings and provide reasoning/confidence.
  • Parallelized for faster analysis (--threads).
  • Outputs a machine-readable JSON report with detailed findings.

Install dependencies: pip install gitpython tqdm openai

Export your own API key: export DEEPSEEK_API_KEY="your_deepseek_api_key"

Basic usage: python scanner.py --repo https://github.com/example/project.git --n 10

Flags:

--repo <path url>	Path or URL to a Git repository
--n <int>	        Number of commits to scan	(default = 10)
--out <filename>	Where to output the JSON report (default = report.json)
--only_sus	        Include only suspicious diffs (ones flagged by regex or entropy)
--no-full-text	    Use only suspicious lines instead of full commit diff
--threads <int>	    Number of concurrent LLM calls	(default = 4)

Example output (for one diff):

{
    "file path": "README.md",
    "commit hash": "1234hash1234hash1234hash1234hash1234hash",
    "findings": [
      {
        "type": "AWS Key",
        "line": "AWS keys    | `ASIA12COOL34WOWZXCT`| 1 |",
        "reasoning": "The line explicitly labels the content as AWS keys and contains a high-entropy string that matches AWS key patterns",
        "confidence": "0.90"
      },
      {
        "type": "Facebook password",
        "line": "my_facebook password:  `qwooqwfjieiweiwowqe123",
        "reasoning": "The line contains a facebook password",
        "confidence": "0.85"
      }
    ]
}

LLM calls can be slow. The tool uses ThreadPoolExecutor to analyze multiple diffs in parallel. Tune concurrency with the --threads option to balance speed and API rate limits.

About

A tool that scans the last N commits of a Git repository for secrets or other sensitive data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages