This repository contains code to build a text-attributed graph from patent data. The graph represents patents as nodes and connects them based on semantic similarity of their textual content.
- Load and preprocess patent data from CSV files
- Generate semantic embeddings using SentenceTransformers
- Build a similarity graph based on customizable thresholds
- is located in
CS594PatentProject/outputfolder
- Go to
GNN/embedding.py - Set the model you want to use (e.g.,
all-MiniLM-L6-v2) - Embeddings will be saved to:
output/embeddings/MiniLML6.npy
- Go to
GNN/data_loader.py - Set the appropriate embedding file:
embedding_path = os.path.join(root, "output", "embeddings", "MiniLML6.npy")
To train a GNN on the patent graph
- Open
GNN/train.py - Choose the GNN model by modifying the line:
model = GCN(patent_data.num_node_features, 64, len(torch.unique(patent_data.y)))Replace GCN with the specific GNN model you want to use, such as GCN_deep, GraphSAGE, GAT, GIN, etc.
- Open
TAPE_modifiedand runexplanation.pyfor explanation generation - Put your huggingface access token in the line:
hf_token = "hf_token"
Homaira Huda Shomee (hshome2@uic.edu) Ataher Sams (asams3@uic.edu)