IHC-LLMiner

Description

IHC-LLMiner is a Python module for automatically extracting immunohistochemistry (IHC) marker-tumour profiles from PubMed abstracts. It leverages LLMs and BERT-based models for:

Downloading abstracts for specific IHC markers
Classifying abstract relevance
Extracting structured IHC marker data
Normalising entity mentions using UMLS

Installation

Python 3.10

git clone https://github.com/knowlab/IHC-LLMiner.git
cd IHC-LLMiner
conda create -n ihcllminer python=3.10
conda activate ihcllminer
pip install .

Download Abstracts

python download.py --markers BOB1 TTF1 --max_per_marker 9999 --output_file pmid_list_w_abstract.tsv

Classify Abstracts

python classify.py \
  --input_file pmid_list_w_abstract.tsv \
  --output_file predictions.json

Extract IHC Profiles

python extract.py \
  --input_file predictions.json \
  --output_file extraction_result.tsv

Preparation of the UMLS file

You would need UMLS metathesaurus downloaded. For this, you would need to log in with your own credential. Download Full Subset from here then run generate_UMLS_data.ipynb

Normalise the Extracted Results

python normalize.py \
  --input_file extraction_result.tsv \
  --output_file inference_umls_mapped_data.tsv

Example for downstream analysis of the normalised results

Please refer to data_analysis.ipynb

Hardware

The code was tested with A5000 GPU 24GB memory.

Reference

@misc{kim2025ihcllminer,
      title={IHC-LLMiner: Automated extraction of tumour immunohistochemical profiles from PubMed abstracts using large language models}, 
      author={Yunsoo Kim and Michal W. S. Ong and Daniel W. Rogalsky and Manuel Rodriguez-Justo and Honghan Wu and Adam P. Levine},
      year={2025},
      eprint={2504.00748},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.00748}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IHC-LLMiner

Description

Installation

Download Abstracts

Classify Abstracts

Extract IHC Profiles

Preparation of the UMLS file

Normalise the Extracted Results

Example for downstream analysis of the normalised results

Hardware

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
classify.py		classify.py
data_analysis.ipynb		data_analysis.ipynb
download.py		download.py
extract.py		extract.py
generate_UMLS_data.ipynb		generate_UMLS_data.ipynb
normalize.py		normalize.py
setup.py		setup.py

knowlab/IHC-LLMiner

Folders and files

Latest commit

History

Repository files navigation

IHC-LLMiner

Description

Installation

Download Abstracts

Classify Abstracts

Extract IHC Profiles

Preparation of the UMLS file

Normalise the Extracted Results

Example for downstream analysis of the normalised results

Hardware

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages