Skip to content

linnil1/KIR_graph

Repository files navigation

Graph-KIR

Graph-KIR is a tool for KIR (Killer Immunoglobulin-like Receptor) typing using short read FASTQ files.

This repo contains two main programs:

  1. graphkir - Main Typing Tool

    graphkir reads FASTQ files, both from CSV or directly via command-line arguments. It outputs copy number estimations in a CSV file called cohort.cn.tsv and allele typing results in cohort.allele.tsv by default. More details about its algorithm and concept can be found in the paper.

  2. kirpipe - KIR Typing Pipeline

    kirpipe is an aggregation tool that automates the KIR typing pipeline. It includes five published tools: graphkir, PING, Sakaue's KIR, T1K, and KIR*KPI.

    (Note: Currently, kirpipe requires podman or docker to execute)

Version

  1. version 1.0
  2. version 2.0 (latest)
    • github tag: v2.0

Docker Version

We have prepared a Docker version of Graph-KIR for easy setup and reproducibility. You can build and use the Docker image as follows:

docker build -t linnil1/graphkir .
docker run -it --rm -v "$PWD":/data linnil1/graphkir graphkir --help

This will run Graph-KIR inside a container, mounting your current directory to /data in the container. Adjust the command and volume as needed for your workflow.

Requirements (Local Installation)

To run Graph-KIR locally in default (with --engine local), you need:

  • Python >= 3.10
  • MUSCLE >= 5.1 (required only for index building stage)
  • HISAT2 >= 2.2.1
  • samtools >= 1.15.1
  • BWA-MEM >= 0.7.17 (needed only for the WGS extraction stage)

Example: Create a Conda Environment for Local Engine

You can use conda to set up the required environment and install the necessary tools:

conda create -n graphkir_env python=3.14
conda activate graphkir_env
conda install -c bioconda muscle=5.3 hisat2=2.2.1 samtools=1.22.1 bwa=0.7.19

Then install Graph-KIR:

pip install .

Using Container Tools

You can also use Graph-KIR with containerization tools for easier setup and reproducibility. Supported engines:

  • podman
  • docker
  • singularity

Specify the engine with the --engine argument, e.g. --engine podman.

Note: If you use other container engines (podman, docker, singularity) with --engine, you should install Graph-KIR with pip install . on your local machine. The container will be used only for running the external tools, while the main program runs locally.

Usage (Main)

Download the pre-built Graph-KIR index:

wget https://graphkir.c4lab.tw/download/example_index.tar.gz
tar xvf example_index.tar.gz
# If kirpipe is used, rename it
# ln -s example_index graphkir_alpha

Install Graph-KIR:

git clone https://github.com/linnil1/KIR_graph
cd KIR_graph
pip install .
graphkir --help

Run Graph-KIR (If the index does not exist, it will be auto-built):

graphkir \
    --thread 2 \
    --r1 example/test00.read1.fq.gz \
    --r2 example/test00.read2.fq.gz \
    --r1 example/test01.read1.fq.gz \
    --r2 example/test01.read2.fq.gz \
    --index-folder example_index \
    --output-folder example_data \
    --output-cohort-name example_data/cohort

Or, if you have an input CSV file (e.g., cohort.csv) containing the list of samples:

graphkir \
    --thread 2 \
    --input-csv example/cohort.csv \
    --index-folder example_index \
    --allele-strategy exonfirst \
    --output-cohort-name example_data/cohort \
    --log-level DEBUG

The CSV should have four columns:

  • name: The output prefix of the sample.
  • r1 and r2: Paths to the fastq files.
  • cnfile: You can assign a copy number file for the sample. Leave it empty for Graph-KIR to assign automatically.
name,r1,r2,cnfile
example_data/linnil1.00,example/test00.read1.fq.gz,example/test00.read2.fq.gz,example/test00.assigned.cn.tsv
example_data/linnil1.01,example/test01.read1.fq.gz,example/test01.read2.fq.gz,

The final result that includes all the samples are aggrate into one file with prefix output-cohort-name. In the above sample, example_data/cohort.cn.tsv and example_data/cohort.allele.tsv are generated.

Some useful arguments include:

  • --ref-genome: Reference genome for WGS extraction: hg19 (hs37d5) or hg38 (GRCh38_no_alt). (default: hg19)
  • --step-skip-extraction: Skip whole genome mapping and KIR read extraction. Use this if your input reads are already filtered for KIR regions.
  • --allele-strategy exonfirst: Denoted as 'exon_only' in the manuscript for 3-digit or 5-digit typing. This mode prioritizes exon-level information and is designed to enhance exon-level typing accuracy.
  • --cn-3dl3-not-diploid: Estimate CN without assuming KIR3DL3 CN is 2. By default, Graph-KIR assumes KIR3DL3 is diploid and adjusts CN estimation accordingly.
  • --cn-diploid-gene: Use a diploid gene (VDR/RYR1/EGFR) to normalize CN estimation. Leave empty for no normalization. Requires --cn-3dl3-not-diploid.
  • --cn-cohort: Estimate CN while considering the entire cohort. In cohort mode, diploid gene information is not considered.
  • --plot: Generate CN result plots.
  • --cn-dist-dev: Adjust CN distribution model deviation (e.g., 0.06).

Usage (kirpipe pipeline for other KIR tools)

ln -s ../example/test00.read1.fq.gz example_data/test.00.read.1.fq.gz
ln -s ../example/test00.read2.fq.gz example_data/test.00.read.2.fq.gz
ln -s ../example/test01.read1.fq.gz example_data/test.01.read.1.fq.gz
ln -s ../example/test01.read2.fq.gz example_data/test.01.read.2.fq.gz
kirpipe example_data/test.{} --tools t1k

Usage (for paper)

If you want to develop or rerun the code related to the Graph-KIR research, check out the research/ directory.

Most of these scripts are not automated and require manual configuration or linking to your cohort (e.g., HPRC). You may also need to adjust arguments to run Graph-KIR with different configurations.

Requirements:

  • pip install .[paper]
  • podman (other container tools are not tested)

To build the document, use: mkdocs serve

  • research/kg_main.py My work for simulated data (100 samples)
  • research/kg_real.py My work for real data (HPRC)
  • research/other_kir.py Run other KIR tools for HPRC or 100 samples
  • research/kg_dev_* Scripts for development purposes (not used in the paper)
  • research/kg_eval_* Compare the results

Evaluation code and data for v2:

  • research/kg_eval_hprc_alldigit.py
  • research/kg_eval_hprc_remove_novel.py
  • research/groundtruth/hprc_annotation_skirt.tsv

Related tools

  • star_allele_comp: https://github.com/linnil1/star_alleles_comparator

    The star allele comparator allows KIR/HLA alleles as input. This module is inspired by research/kg_eval.py.

  • pyhlamsa: https://github.com/linnil1/pyHLAMSA

    A tool for easily manipulating MSA data. It reads from IPD-KIR or IPD-HLA database formats, merges exons, calculates consensus, writes data in specific formats, and more.

  • filenameflow: https://github.com/linnil1/FileNameFlow

    A lightweight pipeline tool that executes pipelines. It uses filenames as auto-versioning keys, which is convenient when tuning arguments or switching parts frequently. Note that in this research, Version 0.0.7 is used, so clone the repository and run git checkout v0.0.7 && pip install ..

Changelog

  • github tag: v1.0: Initial release on bioRxiv and open-sourced the Graph-KIR code.
  • latest: Current Version
    • Improved the algorithm for assuming KIR3DL3 is diploid. We now treat KIR3DL3's depth as a probability of 2x depth instead of assuming an exact 2x depth, which enhances the clustering results for copy number estimation. Special thanks to Ting-Jian Wang, one of the authors of the original paper.

LICENSE

LGPL

::: graphkir

About

KIR genotyping tool

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages