Pickle Inspector

pickle_inspector is a static analysis tool to detect insecure deserialization vulnerabilities in Python projects — especially those involving pickle.load(), yaml.load(), and other unsafe sinks. It identifies flows from user-controlled inputs to deserialization sinks, including cases like:

pickle.load(open(...)) with user-influenced file paths
pickle.load(request.files['file']) directly from uploads
Dangerous patterns across multiple files and function calls
Web application contexts (Flask/Django routes)
File operation and task execution contexts

Insecure Deserialization Sinks Supported

The tool statically detects usage of the following deserialization functions, which can lead to arbitrary code execution or data tampering when handling untrusted input:

Sink	Description
`pickle.load()`	Code execution risk if attacker controls input
`pickle.loads()`	Same as `pickle.load` but for in-memory data
`pickle.Unpickler.load()`	Can be used in custom deserialization flows, still uses pickle internally
`joblib.load()`	Common in ML pipelines, relies on pickle under the hood
`sklearn.externals.joblib.load()`	Legacy scikit-learn import path for joblib, also uses pickle
`cloudpickle.load()`	Like pickle, used in distributed computing (e.g., Dask, Ray)
`cloudpickle.loads()`	In-memory variant of `cloudpickle.load()`
`dill.load()`	Extends pickle, supports serializing more object types
`dill.loads()`	Like `dill.load`, but operates on in-memory byte strings
`marshal.load()`	Can load arbitrary Python bytecode (rarely used outside stdlib)
`marshal.loads()`	Same as above but from byte strings
`shelve.open()`	Implicitly uses pickle to load persistent storage
`yaml.load()`	Unsafe: can instantiate arbitrary objects when used with default loader
`torch.load()`	Used in PyTorch — internally uses pickle, exploitable via model files
`torch.jit.load()`	Loads TorchScript models, also uses pickle internally
`numpy.load()`	Unsafe if `allow_pickle=True` (loads .npy/.npz files using pickle)
`pandas.read_pickle()`	Loads DataFrames using pickle under the hood
`keras.models.load_model()`	Can deserialize pickled objects inside HDF5 model files

⚠️ If these functions are used with attacker-controlled input — directly or indirectly — they are flagged with appropriate risk levels (HIGH, MEDIUM, LOW) based on flow analysis.

Additional sinks can be added by editing sources_and_sinks.py.

Features

Detection: Detects insecure usage of deserialization sinks (e.g., pickle, marshal, yaml, shelve, etc.)
Flow Tracking: Tracks both file-based and stream-based flows with full path visibility
Context Awareness: Identifies web application contexts (Flask/Django routes), file operations, and task execution
Selective Scanning: Exclude test files, virtual environments, and other patterns with --exclude
Reports: Generate HTML reports with --html flag
Multiple Formats: Console output, JSON export, and HTML reports
Legacy Support: Python 2 syntax conversion (optional)

Design Overview

AST-based analysis (no runtime execution)
All source code is parsed from temporary copies — your files are never modified
Python 2 files are converted in-memory if --py2-support is enabled
Smart context detection for web applications, file operations, and background tasks
Taint tracking models common user-input flows, including Flask/Django

Directory Structure

pickle_inspector/
├── analyzer.py           # Core sink detection & taint tracking with context awareness
├── ast_parser.py         # AST parsing with error handling
├── cli.py                # CLI interface with exclude and HTML export
├── indexer.py            # Project indexing, function/import mapping
├── resolver.py           # Call resolution
├── utils.py              # AST utilities
├── report.py             # Console, JSON & HTML reporting
├── sources_and_sinks.py  # Configurable list of dangerous functions
├── reports/              # Generated HTML and JSON reports (created automatically)
   ├── project1_20250811_134848.html
   ├── project1_20250811_134848.json
   └── project2_20250811_135230.html

Installation

Option 1: Install from GitHub (Recommended)

pip install git+https://github.com/anotherik/pickle_inspector.git

Option 2: Manual Setup (Development)

git clone https://github.com/anotherik/pickle_inspector.git
cd pickle_inspector
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Requirements

tqdm>=4.64.0           # Progress bar during scans
rich>=13.3.0           # Table output for findings and HTML generation
autopep8>=2.0.0        # Optional formatting and linting (used internally for py2 cleanup)

System dependencies (not included in requirements.txt)

Python 3.7+
2to3 (for --py2-support): install via your OS package manager # Standard tool for converting Python 2 code to Python 3

Usage

After installation, you can use the pickle-inspector command directly:

Basic Scanning

# Scan a directory
pickle-inspector ./my_project/

# Scan a single file
pickle-inspector ./vulnerable_app.py

# Continue scanning even when encountering parsing errors
pickle-inspector --skip-errors ./my_project/

Advanced Features

# Exclude test files and virtual environments
pickle-inspector --exclude test --exclude venv --exclude __pycache__ ./project/

# Generate HTML report
pickle-inspector --html ./project/

# Generate JSON report
pickle-inspector --json ./project/

# Generate both HTML and JSON reports
pickle-inspector --html --json ./project/

# Combine multiple features
pickle-inspector --exclude test --exclude venv --html --json --verbose --skip-errors ./project/

# Python 2 support for legacy code
pickle-inspector --py2-support ./legacy_project/

# Verbose output with full trace details
pickle-inspector --verbose ./project/

# Control warning and error output (suppress SyntaxWarnings and parsing errors)
pickle-inspector --scan-verbosity quiet ./project/

Development Usage

python3 cli.py --skip-errors ./my_project/
python3 cli.py --exclude test --html --skip-errors ./project/

Command Line Options

Flag	Description
`--exclude`	Pattern to exclude from scanning (can be used multiple times)
`--html`	Generate professional HTML report in `reports/` folder
`--json`	Generate structured JSON report in `reports/` folder
`--skip-errors`	Continue scanning when encountering syntax/indentation errors (default: stop on first error)
`--py2-support`	Attempt to convert Python 2 files via `2to3` before analysis
`--verbose`	Print detailed trace information per finding
`--scan-verbosity`	Control warning and error output: `quiet` (suppress all), `normal` (default), `verbose` (show all)

Example Output

Console Output (Rich Tables)

                                            Insecure Deserialization Findings

  Risk       File                  Line   Context                Source              Flow               Sink
 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  HIGH       /home/user/project/        7   File Op: load_config   pickle file:        File Operation     pickle.load
             app.py                                                 '/config/session.   (load_config) →   
                                                                     pkl'                fd (assigned at   
                                                                                         line 6) →         
                                                                                         open('/config/s  
                                                                                         ession.pkl' (pi  
                                                                                         ckle file))      
  MEDIUM     /home/user/project/       14   POST /upload           request.form        HTTP POST /uploa  yaml.load
             app.py                                                 ['yaml_data']       d → request.form  
                                                                     (HTTP POST form    ['yaml_data'] →   
                                                                     data)              yaml.load(..., L  
                                                                                         oader=yaml.Loa  
                                                                                         der)

Verbose Output

[!] Insecure deserialization detected
  Risk    : HIGH
  File    : /home/user/project/app.py:7
  Context : File Operation (load_config)
  Source  : pickle file: '/config/session.pkl'
  Flow    : File Operation (load_config) → fd (assigned at line 6) → open('/config/session.pkl' (pickle file))
  Sink    : pickle.load

JSON Report

The --json flag generates structured JSON reports in the reports/ folder with the following structure:

JSON Structure Example:

{
  "scan_info": {
    "total_findings": 2,
    "risk_summary": {
      "HIGH": 1,
      "MEDIUM": 1
    },
    "generated_at": "2025-08-11T16:15:17.055411"
  },
  "findings": [
    {
      "file": "/path/to/app.py",
      "line": 7,
      "sink": "pickle.load",
      "initial_source": "pickle file: '/config/session.pkl'",
      "flow": "File Operation (load_config) → fd → open('/config/session.pkl')",
      "risk": "HIGH",
      "context": {
        "type": "file_operation",
        "function_name": "load_config"
      }
    }
  ]
}

Context Detection

The tool automatically detects and provides context for different types of applications:

Web Applications

Flask/Django Routes: Detects @app.route() decorators and HTTP methods
Form Data: Identifies request.form['field'] and request.files['file'] patterns
API Endpoints: Shows HTTP method and endpoint path

File Operations

File Functions: Detects functions with names like load, save, read, write
Documentation: Analyzes docstrings for file-related keywords
Operations: Shows "File Operation: function_name" context

Background Tasks

Task Functions: Identifies functions with names like task, job, worker, execute
Job Systems: Common in Luigi, Celery, and other task frameworks
Context: Shows "Task Execution: function_name" context

Risk Assessment

Findings are categorized by risk level based on flow analysis:

Risk Level	Description	Example
HIGH	Direct user input to sink	`pickle.load(request.files['file'])`
MEDIUM	Indirect user influence	`pickle.load(open(user_provided_path))`
LOW	Limited or no user control	`pickle.load(open('/etc/config.pkl'))`

🏆 Vulnerabilities Found with Pickle Inspector

The following security issues were discovered with the help of pickle_inspector

Date	Repository / Project	Description	CVE / Reference
22-10-2025	secdev/scapy	Scapy Session Loading Vulnerable to Arbitrary Code Execution via Untrusted Pickle Deserialization	GHSA-cq46-m9x9-j8w2

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
analyzer.py		analyzer.py
ast_parser.py		ast_parser.py
cli.py		cli.py
indexer.py		indexer.py
pickle-inspector.png		pickle-inspector.png
pyproject.toml		pyproject.toml
report.py		report.py
requirements.txt		requirements.txt
resolver.py		resolver.py
setup.py		setup.py
sources_and_sinks.py		sources_and_sinks.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pickle Inspector

Insecure Deserialization Sinks Supported

Features

Design Overview

Directory Structure

Installation

Option 1: Install from GitHub (Recommended)

Option 2: Manual Setup (Development)

Requirements

System dependencies (not included in requirements.txt)

Usage

Basic Scanning

Advanced Features

Development Usage

Command Line Options

Example Output

Console Output (Rich Tables)

Verbose Output

JSON Report

Context Detection

Web Applications

File Operations

Background Tasks

Risk Assessment

🏆 Vulnerabilities Found with Pickle Inspector

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

anotherik/pickle_inspector

Folders and files

Latest commit

History

Repository files navigation

Pickle Inspector

Insecure Deserialization Sinks Supported

Features

Design Overview

Directory Structure

Installation

Option 1: Install from GitHub (Recommended)

Option 2: Manual Setup (Development)

Requirements

System dependencies (not included in requirements.txt)

Usage

Basic Scanning

Advanced Features

Development Usage

Command Line Options

Example Output

Console Output (Rich Tables)

Verbose Output

JSON Report

Context Detection

Web Applications

File Operations

Background Tasks

Risk Assessment

🏆 Vulnerabilities Found with Pickle Inspector

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages