A Python utility for processing and structuring Cooperative Patent Classification (CPC) data. This tool transforms raw CPC classification XML files into accessible Python data structures and JSON output, making it easy to work with patent classification hierarchies programmatically.
The Cooperative Patent Classification (CPC) system is a hierarchical classification scheme used by patent offices worldwide. cpc-tree simplifies working with this data by:
- Converting distributed XML classification files into structured Python objects
- Providing both dictionary-based and object-oriented interfaces for navigation
- Exporting classification hierarchies to JSON for use in other applications
- Enabling programmatic access to CPC data for patent analysis workflows
- Python ≥ 3.11
uvpackage manager- argparse
- pre-commit
- pyright
- pytest
- pytest-cov
- ruff
First clone this repository:
git clone https://github.com/ashkonf/cpc-tree.git
cd cpc-treeThen install dependencies:
uv syncDownload and decompress CPCSchemeXML202508.zip in the repo directory.
uv run python -m src.cpc_tree /path/to/xml/directoryThis will generate a cpc_tree.json file containing the complete CPC hierarchy.
from src.cpc_tree import build_cpc_tree
# Parse XML files and build dictionary tree
cpc_tree_data = build_cpc_tree("CPCSchemeXML202508")
# Access classification data
print(cpc_tree_data["A"]["title"]) # "HUMAN NECESSITIES"
print(cpc_tree_data["A"]["children"]["A01"]["title"]) # "AGRICULTURE"This will also generate the same cpc_tree.json file.
from cpc_tree import build_cpc_tree, load_cpc_tree
# Build and convert to CPCTreeNode objects
cpc_tree_data = build_cpc_tree("CPCSchemeXML202508")
cpc_tree = load_cpc_tree(cpc_tree_data)
# Navigate using object interface
root_node = cpc_tree["A"]
print(f"Code: {root_node.code}") # "A"
print(f"Title: {root_node.title}") # "HUMAN NECESSITIES"
# Access children
agriculture_node = root_node.children["A01"]
print(f"Agriculture: {agriculture_node.title}") # "AGRICULTURE"For an interactive overview of the steps above in code, see the sample_usage.ipynb notebook.
The CPC tree follows a hierarchical structure:
Section (e.g., "A" - Human Necessities)
├── Class (e.g., "A01" - Agriculture)
│ ├── Subclass (e.g., "A01B" - Soil Working)
│ │ ├── Group (e.g., "A01B1/00" - Hand Tools)
│ │ │ └── Subgroup (e.g., "A01B1/02" - Spades, Shovels)
Each node contains:
code: Classification symbol (e.g., "A01B1/02")title: Human-readable descriptionchildren: Dictionary of child nodes
Builds the complete CPC tree from XML files in the specified directory.
Parameters:
directory: Path to directory containing CPCSchemeXML202508.zip.
Returns: Dictionary representation of the CPC hierarchy
Converts dictionary representation to CPCTreeNode objects for easier navigation.
Parameters:
data: Dictionary tree frombuild_cpc_tree()or loaded JSON
Returns: Dictionary mapping codes to CPCTreeNode objects
Represents a single node in the CPC classification tree.
Attributes:
code: str- Classification symboltitle: Optional[str]- Human-readable titlechildren: Dict[str, CPCTreeNode]- Child nodes
# Clone and install with development dependencies
git clone https://github.com/ashkonf/cpc-tree.git
cd cpc-tree
uv sync
# Install pre-commit hooks
uv run pre-commit installFirst download and decompress CPCSchemeXML202508.zip in the repo directory.
Then run:
uv run python -m pytestThis project uses several tools for code quality:
# Linting and formatting
uv run ruff check --fix
uv run ruff format
# Type checking
uv run pyright
# Run all checks (via pre-commit)
uv run pre-commit run --all-filescpc-tree/
├── cpc_tree/ # Package containing logic and CLI
│ ├── __init__.py # Core processing logic
│ └── __main__.py # Command line entry point
├── tests/
│ └── test_cpc_tree.py # Test suite
├── cpc_tree.json # Generated CPC hierarchy (large file)
├── pyproject.toml # Project configuration
├── uv.lock # Locked dependencies
├── .pre-commit-config.yaml # Code quality hooks
└── README.md # This file
- Fork the repository
- Create a feature branch:
git checkout -b username/amazing-feature - Make your changes
- Run tests and quality checks:
uv run pre-commit run --all-files - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request against
main
This project is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.