Skip to content

HICAI-ZJU/SciKGs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 

Repository files navigation

Awesome Scientific Knowledge Graphs

Bridging Data and Discovery: A Survey on Knowledge Graphs in AI for Science

📑 Table of Contents

🧬 Research Scopes

overview An overview of the scope in this survey, covering four fundamental scientific tasks in biology, chemistry, and materials science: (a) drug development and optimization, (b) omics interpretation and analysis, (c) chemical reaction and synthesis, and (d) materials design and discovery.

📚 Structure of Survey

structure Structure of the survey. Our review is structured around the lifecycle of SciKGs: from their conceptual foundation and construction methodologies, to their applications and synergistic integration with LLMs for discovery, culminating in challenges, opportunities and future directions that envision SciKGs as engines for autonomous scientific discovery.

🔗 Evolution of SciKGs

evolution The co-evolution of knowledge graph technologies and their scientific practices. The technological evolution of KGs (top) has continually enabled new paradigms in SciKG applications (bottom). This progression has moved from static cataloguing and manual integration to machine learning-driven inference, culminating in the current era of bidirectional synergy between LLMs and KGs. This synergy, leveraging tools such as RAG and AI agents, transforms SciKGs from static repositories into dynamic engines for generative scientific discovery. Abbr., SQL: Structured Query Language; RDF: Resource Description Framework; OWL: Web Ontology Language; SPARQL: SPARQL Protocol and RDF Query Language; GNN: graph neural network; KGE: knowledge graph embedding; RAG: retrieval-augmented generation.

🏗️ Construction and Maintenance of SciKGs

construct Construction and maintenance of SciKGs. (a) The foundation of SciKG construction involves integrating diverse data sources, including structured databases, unstructured text, and multimodal data. (b) Two main approaches for extracting entities and relations from the acquired data are illustrated: rule/dictionary-based extraction, which relies on predefined lexicons and rules, and LLM-based extraction, involving fine-tuning on scientific datasets and prompt engineering. (c) Ontology alignment integrates diverse representations of the same entity (e.g., aspirin), followed by graph embedding into a continuous vector space. (d) Dynamic updating through incremental learning and LLM-driven error correction ensures SciKGs remain accurate and up to date. (e-h) Sub-figures illustrate representative examples of specialized knowledge graphs for drugs, omics, chemicals, and materials, respectively.

🌐 Core Functions of SciKGs

app Summary of core functions of SciKGs in diverse scientific tasks. SciKGs serve as a foundational infrastructure that: (1) organizes heterogeneous scientific data into structured knowledge; (2) enhances representation learning via graph embedding; (3) enables causal and relational inference for hypothesis generation; and (4) improves AI model interpretability by grounding predictions in traceable, evidence-based knowledge paths.

🤝 SciKG–LLM Integration for Scientific Discovery

kg_llm Synergistic integration of SciKGs and LLMs for knowledge-driven scientific discovery. (a) SciKGs serve as the foundational knowledge infrastructure by ensuring factual grounding and verification, defining reasonable scientific boundaries, and enabling unified representation of heterogeneous data. (b) LLMs act as dynamic semantic engines through five core functions: semantic interface for knowledge access, analytical reasoner for inference, generative engine for hypothesis design, constructor for knowledge curation, and orchestrator for workflow automation. (c) The SciKG-LLM integration empowers four key scientific discovery tasks: multi-source data interpretation, complex system mechanism analysis, system performance optimization, and innovative solution design.

🧠 Discovery Flywheel

copilot
The autonomous scientific discovery flywheel driven by LLM agents and SciKGs.

⚖️ Challenges and Opportunities in SciKGs

chall_oppor Challenges and Opportunities in SciKGs. This figure illustrates the major challenges (C1-C4) facing SciKGs, including data quality and completeness, interoperability and integration, dynamic and temporal knowledge, and trustworthy and explainable reasoning. Each challenge is paired with corresponding opportunities (O1-O4) for advancement, such as building standards and benchmarks, integrating multimodal foundation models, autonomous updating via agents, and developing community-driven platforms. The green sections depict workflows (W1-W4) that enable these opportunities, highlighting a path towards more auditable, unified, dynamic, and community-governed SciKGs.

Collection of SciKGs and its Applications

Drug Development and Optimization

Year Title KG Name KG Type Domain Construction Method Venue Paper Code
2025 TarIKGC: A Target Identification Tool Using Semantics-Enhanced Knowledge Graph Completion with Application to CDK2 Inhibitor Discovery biological activity KG public KG DTI prediction Semi-automated Journal of Medicinal Chemistry Link Link
2025 A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research iKraph Multi-source KG Drug repurposing and Hypothesis Generation Semi-automated Nature Machine Intelligence Link Link
2025 VITAGRAPH: Building a Knowledge Graph for Biologically Relevant Learning Tasks VITAGRAPH public KG Drug repurposing Semi-automated arXiv Link Link
2024 A Foundation Model for Clinician-Centered Drug Repurposing / public KG Drug repurposing Semi-automated Nature Medicine Link Link
2024 Accurate and Interpretable Drug-Drug Interaction Prediction Enabled by Knowledge Subgraph Learning / public KG DDI prediction Automated Nature Communication Medicine Link Link
2024 Knowledge Enhanced Representation Learning for Drug discovery MKG Multi-source KG DTI prediction and Virtual screening and drug discovery Semi-automated AAAI Link Link
2024 An experimentally validated approach to automated biological evidence generation in drug discovery using knowledge graphs Healx KG public KG Drug repurposing Semi-automated Nature Communications Link Link
2024 DDI-GPT: Explainable Prediction of Drug-Drug Interactions using Large Language Models enhanced with Knowledge Graphs iBKH public KG DDI prediction Semi-automated bioRxiv Link Link
2024 MKG-FENN: A Multimodal Knowledge Graph Fused End-to-End Neural Network for Accurate Drug–Drug Interaction Prediction MKG Multi-source KG DDI prediction Automated AAAI Link Link
2024 TransFOL: A Logical Query Model for Complex Relational Reasoning in Drug-Drug Interaction / public KG DDI prediction Semi-automated Journal of Biomedical and Health Informatics Link Link
2024 KGRLFF: Detecting Drug-Drug Interactions Based on Knowledge Graph Representation Learning and Feature Fusion / public KG DDI prediction Semi-automated TCBB Link Link
2024 An effective framework for predicting drug–drug interactions based on molecular substructures and knowledge graph neural network DKG (Drug knowledge graph) public KG DDI prediction Semi-automated Computers in Biology and Medicine Link Link
2024 Medical knowledge graph question answering for drug‐drug interaction prediction based on multi‐hop machine reading comprehension / public KG DDI prediction Automated CAAI Transactions on Intelligence Technology Link
2024 Integrated Knowledge Graph and Drug Molecular Graph Fusion via Adversarial Networks for Drug–Drug Interaction Prediction DrugBank public KG DDI prediction Semi-automated JCIM Link Link
2024 KGE-UNIT: toward the unification of molecular interactions prediction based on knowledge graph and multi-task learning on drug discovery / Multi-source KG DDI prediction, DTI prediction and Hypothesis Generation Automated Briefings in Bioinformatics Link Link
2023 Biomedical Knowledge Graph Learning for Drug Repurposing by Extending Guilt-By Association to Multiple Layers / public KG Drug repurposing Semi-automated Nature Communications Link Link
2023 Evolution-strengthened knowledge graph enables predicting the targetability and druggability of genes ESKG (Evolution-strengthened KG) public KG DTI prediction Semi-automated PNAS nexus Link Link
2023 Drugomics: Knowledge Graph & AI to Construct Physicians' Brain Digital Twin to Prevent Drug Side-Effects and Patient Harm Drugomics KG Multi-source KG Drug toxicity and adverse reactions Semi-automated Big Data Analytics Link
2023 Molecular-evaluated and explainable drug repurposing for COVID-19 using ensemble knowledge graph embedding / Multi-source KG Drug repurposing Semi-automated Scientific Reports Link Link
2023 Toxicology knowledge graph for structural birth defects ReproTox-KG Multi-source KG Drug toxicity and adverse reactions and Hypothesis Generation Semi-automated Communications medicine Link Link
2023 NAFLDkb: A Knowledge Base and Platform for Drug Development against Nonalcoholic Fatty Liver Disease NAFLDkb Multi-source KG Drug repurposing Semi-automated JCIM Link Link
2023 Molecule generation toward target protein (SARS-CoV-2) using reinforcement learning-based graph neural network via knowledge graph / domain-specific KG Virtual screening and drug discovery, DTI prediction, and Hypothesis Generation Semi-automated Network Modeling and Analysis in Health Informatics and Bioinformatics Link
2022 e-TSN: an Interactive Visual Exploration Platform for Target-Disease Knowledge Mappling from Literature e-TSN KG literature-based KG DTI prediction Automated Briefings in Bioinformatics Link
2022 Attention-based knowledge graph representation learning for predicting drug-drug interactions / public KG DDI prediction Semi-automated Briefings in Bioinformatics Link Link
2022 Automating Predictive Toxicology Using ComptoxAI ComptoxAI KG public KG Drug toxicity and adverse reactions Semi-automated Chemical Research in Toxicology Link Link
2022 KG-MTL: Knowledge Graph Enhanced Multi-Task Learning for Molecular Interaction DRKG public KG DTI prediction Automated IEEE Link Link
2021 A Unified Drug-Target Interaction Prediction Framework Based on Knowledge Graph and Recommendation System / public KG DTI prediction Semi-automated Nature Communications Link Link
2021 Biological Insights Knowledge Graph: an integrated knowledge graph to support drug development BIKG Multi-source KG DTI prediction and Drug repurposing Semi-automated bioRxiv Link
2021 SumGNN: multi-typed drug interaction prediction via efficient knowledge graph summarization / public KG DDI prediction Semi-automated Bioinformatics Link Link
2021 Predicting Potential Drug Targets Using Tensor Factorisation and Knowledge Graph Embeddings Hetionet public KG DTI prediction Semi-automated BIOKDD Link
2021 Adverse Drug Reaction Discovery Using a Tumor-Biomarker Knowledge Graph TBKG (Tumor-biomarker KG) literature-based KG Drug toxicity and adverse reactions Semi-automated Frontiers in Genetics Link
2021 Investigating ADR mechanisms with Explainable AI: a feasibility study with knowledge graph mining PGxLOD Multi-source KG Drug toxicity and adverse reactions Semi-automated BMC Medical Informatics and Decision Making Link
2020 Discovering Protein Drug Targets Using Knowledge Graph Embeddings a knowledge graph of biological entities related to both drugs and targets public KG DTI prediction Automated Bioinformatics Link
2020 KGNN: Knowledge Graph Neural Network for Drug-Drug Interaction Prediction / public KG DDI prediction Semi-automated IJCAI Link Link
2020 Network-based prediction of drug–target interactions using an arbitrary-order proximity embedded deep forest / public KG DTI prediction Semi-automated Bioinformatics Link Link
2019 Evaluation of knowledge graph embedding approaches for drug-drug interaction prediction in realistic settings / public KG DDI prediction Semi-automated BMC Bioinformatics Link Link
2019 Drug-Drug Interaction Prediction Based on Knowledge Graph Embeddings and Convolutional-LSTM Network / public KG DDI prediction Semi-automated arXiv Link Link
2019 GAMENet: Graph Augmented MEmory Networks for Recommending Medication Combination EHR&DDI Graph domain-specific KG Virtual screening and drug discovery Automated AAAI Link Link
2019 Facilitating prediction of adverse drug reactions by using knowledge graphs and multi‐label learning models Bio2RDF KG public KG Drug toxicity and adverse reactions Automated Briefings in Bioinformatics Link
2018 Modeling polypharmacy side effects with graph convolutional networks / public KG Drug toxicity and adverse reactions Semi-automated Bioinformatics Link
2018 Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches / public KG DTI prediction Semi-automated BMC Bioinformatics Link
2017 A Network Integration Approach for Drug-Target Interaction Prediction and Computational Drug Repositioning from Heterogeneous Information / public KG DTI prediction and Drug repurposing Semi-automated Nature Communications Link Link
2017 Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records / public KG Drug toxicity and adverse reactions Semi-automated Scientific Reports Link Link
2017 Large-scale structural and textual similarity-based mining of knowledge graph to predict drug–drug interactions / public KG DDI prediction Semi-automated Journal of Web Semantics Link
2017 Deep mining heterogeneous networks of biomedical linked data to predict novel drug–target associations LTN (Linked Tripartite Network) public KG DTI prediction Semi-automated Bioinformatics Link Link

Omics Interpretation and Analysis

Year Title KG Name KG Type Domain Construction Method Venue Paper Code
2025 A novel approach for target deconvolution from phenotype-based screening using knowledge graph P53_HUMAN PPIKG public KG Proteomics research Semi-automated Scientific Reports Link Link
2025 Unified Knowledge-Guided Molecular Graph Encoder with multimodal fusion and multi-task learning Elemental KG and Biological KG Multi-source KG Proteomics research Semi-automated Neural Networks Link
2025 PhenoKG: Knowledge Graph-Driven Gene Discovery and Patient Insights from Phenotypes Alone PhenoKG public KG Genomics research Semi-automated arXiv Link
2024 Petagraph: A large-scale unifying knowledge graph framework for integrating biomolecular and biomedical data Petagraph Multi-source KG Genomics research Semi-automated Scientific Data Link Link
2024 An ontology-based knowledge graph for representing interactions involving RNA molecules RNA-KG Multi-source KG Transcriptomics research Semi-automated Scientific Data Link Link
2024 Knowledge graph construction based on granulosa cells transcriptome from polycystic ovary syndrome with normoandrogen and hyperandrogen causal KG Multi-source KG Transcriptomics research Semi-automated Journal of Ovarian Research Link
2024 Multi-Modal Protein Knowledge Graph Construction and Applications (Student Abstract) ProteinKG65 Multi-source KG Proteomics research Semi-automated AAAI Link Link
2024 Bridging chemical structure and conceptual knowledge enables accurate prediction of compound-protein interaction DRKG public KG Proteomics research Manual BMC Biology Link Link
2024 Integration of chromosome locations and functional aspects of enhancers and topologically associating domains in knowledge graphs enables versatile queries about gene regulation crm, crm2gene, crm2tfac, crm2phen, tad, human genes (after ampliation) graph public KG Genomics research Semi-automated Nucleic Acids Research Link Link
2024 Identifying compound-protein interactions with knowledge graph embedding of perturbation transcriptomics / public KG Proteomics research Semi-automated Cell Genomics Link Link
2023 MMiKG: a knowledge graph-based platform for path mining of microbiota–mental diseases interactions MMiKG literature-based KG Microbiome research Manual Briefings in Bioinformatics Link
2023 Transporter proteins knowledge graph construction and its application in drug development Transporter Proteins Knowledge Graph public KG Proteomics research Semi-automated Computational and Structural Biotechnology Journal Link
2023 A Knowledge Graph Approach to Elucidate the Role of Organellar Pathways in Disease via Biomedical Reports / Multi-source KG Proteomics research Semi-automated JoVE Journal of Biochemistry Link
2022 Knowledge-graph-based cell-cell communication inference for spatially resolved transcriptomic data with SpaTalk LRT-KG public KG Transcriptomics research Semi-automated Nature Communications Link Link
2022 A knowledge graph to interpret clinical proteomics data CKG (Clinical Knowledge graph) Multi-source KG Proteomics research Semi-automated Nature Biotechnology Link Link
2022 Knowledge integration and decision support for accelerated discovery of antibiotic resistance genes E. coli knowledge graph public KG Genomics research Semi-automated Nature Communications Link Link
2022 Machine learning prediction and tau-based screening identifies potential Alzheimer's disease genes relevant to immunity PKG (protein knowledge graph) public KG Proteomics research Semi-automated Communications Biology Link Link
2022 OntoProtein: Protein Pretraining With Gene Ontology Embedding ProteinKG25 public KG Proteomics research Automated ICLR Link Link
2022 GenomicKB: a knowledge graph for the human genome GenomicKB (Genomic Knowledgebase) Multi-source KG Genomics research Semi-automated Nucleic Acids Research Link
2022 Creating and Exploiting the Intrinsically Disordered Protein Knowledge Graph (IDP-KG) IDP-KG public KG Proteomics research Automated CEUR Workshop Proceedings Link Link
2022 BioTAGME: A Comprehensive Platform for Biological Knowledge Network Analysis BioTAGME KG Multi-source KG Multi-Omics research Semi-automated Frontiers in Genetics Link
2022 Biomedical knowledge graph embeddings for personalized medicine: Predicting disease-gene associations a biomedical KG for predicting disease-gene association public KG Genomics research Semi-automated Expert Systems Link Link
2022 Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph Protein KG literature-based KG Genomics research Semi-automated PLOS ONE Link Link
2022 KG-MTL: Knowledge Graph Enhanced Multi-Task Learning for Molecular Interaction DRKG public KG Proteomics research Automated IEEE Link Link
2021 FORUM: building a Knowledge Graph from public databases and scientific literature to extract associations between chemicals and diseases FORUM Multi-source KG Metabolomics research Semi-automated Bioinformatics Link Link
2020 Exploring the Microbiota-Gut-Brain Axis for Mental Disorders with Knowledge Graphs MiKG (Microbiota knowledge graph) Multi-source KG Microbiome research Semi-automated Journal of Artificial Intelligence for Medical Sciences Link Link
2020 Metastatic Site Prediction in Breast Cancer using Omics Knowledge Graph and Pattern Mining with Kirchhoff's Law Traversal Kirchhoff's KG public KG Multi-Omics research Semi-automated bioRxiv Link
2020 Accurate prediction of kinase-substrate networks using knowledge graphs a phosphorylation knowledge graph public KG Proteomics research Automated PLoS Computational Biology Link
2020 An integrative knowledge graph for rare diseases, derived from the Genetic and Rare Diseases Information Center (GARD) an integrative KG for rare diseases Multi-source KG Genomics research Semi-automated Journal of Biomedical Semantics Link
2019 Predicting gene-disease associations from the heterogeneous network using graph embedding / public KG Genomics research Semi-automated IEEE Link
2019 GenomicsKG: A Knowledge Graph to Visualize Poly-Omics Data GenomicsKG public KG Genomics research Semi-automated Journal of Advances in Health Link
2018 Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes / public KG Genomics research Semi-automated Bioinformatics Link Link
2018 Heterogeneous network embedding for identifying symptom candidate genes SDGNet&SDGPNet (two heterogeneous symptom-related networks) public KG Genomics research Semi-automated JAMIA Link
2018 Network-based integration of multi-omics data for prioritizing cancer genes / public KG Genomics research Semi-automated Bioinformatics Link Link
2016 A knowledge-based approach for predicting gene–disease associations / public KG Genomics research Semi-automated Bioinformatics Link

Chemical Reaction and Synthesis

Year Title KG Name KG Type Domain Construction Method Venue Paper Code
2025 Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations MolKG public KG Molecular property prediction Semi-automated AAAI Link Link
2025 Automated Retrosynthesis Planning of Macromolecules Using Large Language Models and Knowledge Graphs / literature-based KG Chemical synthesis pathway optimization Automated Macromolecular Rapid Communications Link Link
2025 An Automated Approach for Domain-Specific Knowledge Graph Generation─Graph Measures and Characterization / literature-based KG Chemical reaction prediction and Chemical Synthesis Pathway Optimization Automated JCIM Link
2024 Large-Scale Knowledge Integration for Enhanced Molecular Property Prediction ElementKG-CHEBI Multi-source KG Molecular property prediction Semi-automated Nesy Link Link
2024 Self-Supervised Contrastive Molecular Representation Learning with a Chemical Synthesis Knowledge Graph Chemical synthesis KG public KG Chemical reaction prediction Semi-automated JCIM Link Link
2023 Knowledge graph-enhanced molecular contrastive learning with functional prompt ElementKG domain-specific KG Molecular property prediction Semi-automated Nature Machine Intelligence Link Link
2023 Marie and BERT─A Knowledge Graph Embedding Based Question Answering System for Chemistry TWA KG (the World Avatar KG) and Wikidata chemistry KG Dynamic KG Chemical reaction prediction and Molecular property prediction Semi-automated ACS Omega Link Link
2022 MKGE: Knowledge graph embedding with molecular structure information KCCR and DeepDDI public KG Molecular property prediction Semi-automated Computational Biology and Chemistry Link
2022 Prediction of Compound Synthesis Accessibility Based on Reaction Knowledge Graph Reaction knowledge graph public KG Chemical reaction prediction, Molecular property prediction, and Chemical synthesis pathway optimization Semi-automated Molecules Link Link
2022 Molecular Contrastive Learning with Chemical Element Knowledge Graph Chemical element KG domain-specific KG Molecular property prediction Semi-automated AAAI Link Link
2022 FAIR and Interactive Data Graphics from a Scientific Knowledge Graph / literature-based KG Chemical synthesis pathway optimization, Chemical reaction prediction, and Molecular property prediction Semi-automated Scientific Data Link
2022 From Platform to Knowledge Graph: Evolution of Laboratory Automation The World Avatar KG Dynamic KG Chemical Synthesis Pathway Optimization Semi-automated JACS Au Link
2021 Intelligent generation of optimal synthetic pathways based on knowledge graph inference and retrosynthetic predictions using reaction big data Reaction knowledge graph public KG Chemical synthesis pathway optimization Semi-automated Journal of the Taiwan Institute of Chemical Engineers Link
2021 Automated Calibration of a Poly(oxymethylene) Dimethyl Ether Oxidation Mechanism Using the Knowledge Graph Technology JPS KG Dynamic KG Chemical reaction prediction and Chemical Synthesis Pathway Optimization Semi-automated JCIM Link
2021 A graph-based network for predicting chemical reaction pathways in solid-state materials synthesis / domain-specific KG Chemical reaction prediction Automated Nature Communications Link Link
2020 Knowledge Graph Approach to Combustion Chemistry and Interoperability JPS KG literature-based KG Chemical reaction prediction Automated ACS Omega Link
2020 Multiscale Cross-Domain Thermochemical Knowledge-Graph JPS KG Dynamic KG Chemical reaction prediction and Molecular property prediction Automated JCIM Link
2016 Modelling Chemical Reasoning to Predict and Invent Reactions / public KG Chemical reaction prediction Semi-automated Chemistry Europe Link

Materials Design and Discovery

Year Title KG Name KG Type Domain Construction Method Venue Paper Code
2025 Construction of a knowledge graph for framework material enabled by large language models and its application KG-FM literature-based KG Material screening and optimization Semi-automated npj Computational Materials Link Link
2025 High throughput screening of new piezoelectric materials using graph machine learning and knowledge graph approach a simple KG encoding structural similarity between materials public KG Material screening and optimization Semi-automated Computational Materials Science Link Link
2024 MatKG: An autonomously generated knowledge graph in Material Science MatKG literature-based KG New material design Semi-automated Scientific Data Link Link
2024 A materials terminology knowledge graph automatically constructed from text corpus MGED-KG literature-based KG Material screening and optimization Semi-automated Scientific Data Link Link
2024 Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model MKG literature-based KG and dynamic KG New material design Semi-automated NeurIPS Link Link
2024 Generative Retrieval-Augmented Ontologic Graph and Multiagent Strategies for Interpretive Large Language Model-Based Materials Design Ontological KG literature-based KG New material design and Material performance prediction Semi-automated ACS Engineering Au Link Link
2024 SciAgents: Automating Scientific Discovery Through Bioinspired Multi-Agent Intelligent Graph Reasoning an ontological knowledge graph for biologically inspired materials literature-based KG New material design and Material performance prediction Semi-automated Advanced Materials Link
2024 An ontology-based text mining dataset for extraction of process-structure-property entities Materials mechanics ontology literature-based KG Material performance prediction Semi-automated Scientific Data Link Link
2024 Material Property Prediction with Element Attribute Knowledge Graphs and Multimodal Representation Learning Element KG domain-specific KG Material performance prediction Semi-automated arXiv Link
2024 Knowledge graph-guided data-driven design of ultra-high-performance concrete (UHPC) with interpretability and physicochemical reaction discovery capability UHPC KG literature-based KG Material screening and optimization Manual Construction and Building Materials Link
2023 The materials experiment knowledge graph MekG domain-specific KG Material performance prediction Semi-automated Digital Discovery Link Link
2023 Revisiting Electrocatalyst Design by a Knowledge Graph of Cu-Based Catalysts for CO2 Reduction Cu-Based Catalysts Knowledge Graph for CO2 Reduction literature-based KG New material design and Material performance prediction Semi-automated ACS Catalysis Link Link
2023 Reinforcement learning-based knowledge graph reasoning for aluminum alloy applications Aluminum alloy domain KG public KG Material performance prediction Semi-automated Computational Materials Science Link
2023 Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property Prediction Cross-modal KG Multi-source KG Material performance prediction Semi-automated arXiv Link Link
2023 Digital Twin-Based Fault Diagnosis Platform for Final Rolling Temperature in Hot Strip Production / domain-specific KG Material screening and optimization Semi-automated Materials Link
2022 Grain Knowledge Graph Representation Learning: A New Paradigm for Microstructure-Property Prediction Grain KG domain-specific KG Material performance prediction Semi-automated Crystals Link Link
2022 Automating Materials Exploration with a Semantic Knowledge Graph for Li-Ion Battery Cathodes a semantic knowledge graph dedicated to LIB cathodes literature-based KG New material design Semi-automated AFM Link Link
2022 High-Throughput Computing Assisted by Knowledge Graph to Study the Correlation between Microstructure and Mechanical Properties of 6XXX Aluminum Alloy 6XXX Aluminum Alloy KG domain-specific KG Material screening and optimization and Material performance prediction Semi-automated Materials Link
2022 FAIR and Interactive Data Graphics from a Scientific Knowledge Graph / literature-based KG Material screening and optimization Semi-automated Scientific Data Link
2022 Compound Knowledge Graph-Enabled AI Assistant for Accelerated Materials Discovery CKG (Compound KG) Multi-source KG New material design Semi-automated Integrating Materials and Manufacturing Innovation Link
2021 EBSD Grain Knowledge Graph Representation Learning for Material Structure-Property Prediction EBSD Grain KG domain-specific KG Material performance prediction Semi-automated CCKS Link
2020 propnet: A Knowledge Graph for Materials Science propnet KG domain-specific KG Material performance prediction Semi-automated Matter Link Link
2020 NanoMine: A Knowledge Graph for Nanocomposite Materials Science NanoMine KG domain-specific KG New material design Semi-automated ISWC Link
2018 Relation extraction with weakly supervised learning based on process-structure-property-performance reciprocity PSPP KG (Process-Structure-Property-Performance) literature-based KG New material design Semi-automated Science and Technology of Advanced Materials Link Link

Summary of SciKG-LLM Integration

Name Year Domains Roles of LLMs Roles of SciKG Tasks Application
KnowNET 2024 Drug Semantic Interface (Query Generation) Grounding (Factual Verification) M Guide health information seeking
FactFinder 2024 Drug Semantic Interface (Query Generation) Grounding (Factual Retrieval) M Life-science question answering
DDI-GPT 2024 Drug Reasoner (Prediction & Explanation) Representation (Semantic Enhancement) C Explainable prediction of drug-drug interactions
Soman et al. 2024 Drug, Omics Constructor, Interface (KG Construction, Text Generation) Grounding (Knowledge Base & Traceability) M, C Drug repurposing and medical QA
BioLORD 2024 Drug, Omics Reasoner (Semantic Representation Optimization) Grounding (Knowledge Base & Semantic Support) M Enhance biomedical semantic similarity
HeCiX 2024 Drug, Omics Semantic Interface (Format Conversion) Grounding (Knowledge Base) M Enhance clinical trial research
KRAGEN 2024 Drug, Omics Orchestrator (Plan Generation & Execution) Grounding (Knowledge Base & Visualization) M Visualized biomedical QA system
MechGPT 2024 Material Constructor, Reasoner, Orchestrator (KG Construction, Explanation, Multi-agent) Grounding, Reasoning Constraints (Knowledge & Explainability) C, S, I Materials analysis and design
SciAgents 2024 Material Constructor, Reasoner, Generator (KG Construction, Analytical Reasoning, Hypothesis Generation) Grounding (Knowledge Base) M, I Automated discovery in biomaterials science
MKG 2024 Material Constructor (KG Construction & Maintenance) Grounding (Knowledge Base) I Multidisciplinary materials science discovery
OpenTCM 2025 Drug Interface, Reasoner, Constructor (Retrieval, Diagnosis, KG Construction) Reasoning Constraints (Knowledge Retrieval Enhancement) M Traditional Chinese Medicine diagnosis
iKraph 2025 Drug Constructor (KG Construction) Grounding (Knowledge Base) S Biomedical Research
KGT 2025 Drug, Omics Interface, Reasoner (Query Generation & Reasoning Output) Grounding, Reasoning Constraints (Fact Checking & Path Constraint) S, M Drug repositioning, Framework for pan-cancer QA
ESCARGOT 2025 Drug, Omics Generator, Orchestrator (Strategy & Code Generation) Grounding (Knowledge Base) S, I Biomedical AI agent
Cat-KG 2025 Chemistry Constructor, Reasoning, Interface (KG Construction, Path Reasoning & Explanation) Grounding, Reasoning Constraints (Explainability & Path Constraint) C, M Relay catalysis pathway recommendation
Ma et al. 2025 Chemistry Constructor, Generator (KG Construction & Path Recommendation) Grounding (Structured Knowledge Management) S Automated Retrosynthesis Planning of Macromolecules
KG-FM 2025 Material Constructor, Reasoner (Multi-modal Extraction, QA & Reasoning) Grounding (Knowledge Base & Visualization) M Improve LLM QA in framework materials
SciToolAgent 2025 Comprehensive Orchestrator (Multi-agent Collaboration) Grounding (Tool Knowledge Base) S, M, I Scientific agent for multi-tool integration

Tasks Abbreviations: M: Multi-source Data Interpretation; C: Complex System Mechanism Analysis; S: System Performance Optimization; I: Innovative Solution Design

Databases for Constructing Scientific Knowledge Graph

Domain Database Short Description Statistics Update Frequency
Drug Databases BindingDB Publicly accessible collection of measured drug-target binding affinities 3.1M binding data for 1.3M compounds & 9.6K targets Weekly
DrugBank Richly annotated resource combining drug data with target, pathway & pharmacogenomic info 18K approved & investigational drugs, 23K drug-target links, 3.6K drug-transporter links, 6K drug-enzyme links Monthly
CTD The comparative toxicogenomics database links chemicals, genes, phenotypes and diseases 101M toxicogenomic interactions, 19K chemicals, 57K genes, 7K diseases --
DisGeNET Comprehensive platform integrating genes, variants, and human diseases, combining curated data and text-mined evidence 2.0M gene–disease associations, 4.4M variant–disease associations, and 28M disease–disease associations --
DrugCentral Authoritative, open-access compendium of active pharmaceutical ingredients approved worldwide 5K drugs, 152K pharmaceutical products --
PharmGKB Provide PGx data from literature annotations to genotype-based treatment guidelines 209 clinical guideline annotations, 1.2K drug label annotations, 483 FDA drug label annotations --
SIDER Database of marketed drugs and their recorded adverse drug reactions (ADRs) 1.4K drugs, 6K side effects, 140K drug–side effect pairs Static
Omics Databases Uniprot Comprehensive, high-quality protein sequence & functional annotation database 573K reviewed entries, 253K unreviewed entries 4 Weeks
Ensembl Genome browser & annotation resource for vertebrates and selected eukaryotes 300+ species, 40K coding genes (human), 1M variants 3 Months
KEGG Database integrating pathways, genes, compounds, drugs and diseases for system analysis 75K pathways, 54M genes, 12K drugs, 11K diseases Daily
Reactome Curated, peer-reviewed pathway database emphasizing human biology 2.8K human pathways covering 11.6K proteins, 16K reactions Monthly
InterPro Comprehensive resource integrating multiple protein signature databases 13 member databases covering millions of protein sequences Quarterly
RNAcentral Comprehensive ncRNA sequence collection representing all ncRNA types across diverse organisms 44.5M non-coding RNA sequences, covering 1.1K species from 54 databases Twice a year
STRING Database of known and predicted protein–protein interactions across multiple organisms 59.3M proteins, 20B PPIs, 12.5K organisms --
MONDO† Ontology harmonizing disease concepts with standardized identifiers, mappings, and classifications for clinical use 17 disease resources integrated into 22K unified disease concepts Monthly
UMLS Comprehensive biomedical ontology integrating multiple vocabularies to unify concepts, names, and relationships 17M names, 3.4M concepts, 8.7M codes, 190 vocabularies, 29 languages Twice a year
Chemical Database ChEBI† Chemical entities of biological interest, a dictionary and ontology of small molecular entities 62K compounds Monthly
ChEMBL A curated database of drug-like bioactive molecules that integrates chemical, bioactivity and genomic data to support drug discovery 2.5M compounds, 1.7M assays, 15.5K drugs, 48.8K drug indications --
Reaxys Elsevier-curated chemical reactions, substances, properties & literature 283M chemical substances, 73M reactions, 500M physicochemical data points --
PubChem NIH repository of chemical substances, bioactivities & patents 122M compounds, 338M substances, 297M bioactivities Daily
ZINC Free database of commercially available compounds for virtual screening 980M purchasable compounds --
Materials Databases OQMD Open-access database of DFT-calculated properties for inorganic and hybrid materials 1.2M materials --
Materials Project High-throughput DFT database of materials properties & crystal structures 144K inorganic compounds, 76K bandstructures, 64K molecules, 530K nanoporous materials, and diverse tensors and electrodes --

Software Tools for Knowledge Graph

Category Software Name URL Description Supported Tasks License
Automated KG Construction DeepKE Link A knowledge extraction toolkit for knowledge graph construction supporting cnSchema, low-resource, document-level and multimodal scenarios for entity, relation and attribute extraction Named Entity Recognition, Relation Extraction, Attribute Extraction MIT License
OneKE Link A flexible dockerized system for schema-guided knowledge extraction, capable of extracting information from the web and raw PDF books across multiple domains like science and news Named Entity Recognition, Web News Extraction, Book Knowledge Extraction MIT License
AutoKG Link An LLM-powered multi-agent framework for automated KG construction and reasoning, integrating external knowledge sources for large-scale extraction Entity/Relation Extraction, KG Construction, KG Reasoning MIT License
Graph Databases and Storage Neo4j Link A widely used native graph database with ACID transactions and Cypher query language, suitable for highly connected data analysis Graph Storage, Graph Querying, Graph Algorithms GPLv3
JanusGraph Link A highly scalable graph database optimized for storing and querying large graphs with billions of vertices and edges distributed across a multi-machine cluster Graph Storage, Gremlin Query CC-BY-4.0
ArangoDB Link A scalable graph database system to drive value from connected data, faster. Native graphs, an integrated search engine, and JSON support, via a single query language Multi-Model Storage, Graph Traversal, Path Querying BSL 1.1
Virtuoso Link A hybrid relational-RDF database supporting both SPARQL and SQL, widely used for Linked Data publishing RDF Storage, SPARQL Query, Ontology Reasoning GPL v2
TigerGraph Link A commercial distributed parallel graph database optimized for real-time graph analytics, offering GSQL for querying at trillion-edge scale Graph Storage, Parallel Graph Computation, Real-time Querying Proprietary
Representation Learning & Reasoning OpenKE Link A sub-project of OpenSKL, providing an Open-source Knowledge Embedding toolkit for knowledge representation learning (KRL) KG Embedding, Link Prediction, Triple Classification MIT License
DGL-KE Link A high performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings KG Embedding, Large-scale Link Prediction Apache 2.0
PyKEEN Link A Python library for KG embeddings with modular design, automated hyperparameter tuning, and reproducibility guarantees KG Embedding, Model Training and Evaluation, Hyperparameter Optimization MIT License
AmpliGraph Link A suite of neural machine learning models for relational Learning, a branch of machine learning that deals with supervised learning on knowledge graphs Generate KG embeddings, Link Prediction, Anomaly Detection Apache 2.0
LibKGE Link A PyTorch-based library for efficient training, evaluation, and hyperparameter optimization of knowledge graph embeddings (KGE) Link Prediction, Training, Evaluation of KGE Models MIT License
Pykg2vec Link A library for learning the representation of entities and relations in Knowledge Graph KGE Model Implementations, Hyperparameters Discovery, Learned Embedding Inspecting MIT License
Auxiliary Tools Doccano Link An open-source text annotation tool with a web interface for humans Annotation for Text Classification, Sequence Labeling, Sequence to Sequence tasks MIT License
Label Studio Link An open source data labeling tool supporting multimodal data, such as text, images, audio, video, time series Multi-modal Data Annotation, Quality Assurance Apache 2.0
Gephi Link An award-winning open-source platform for visualizing and manipulating large graphs Graph Visualization, Network Analysis, Community Detection CDDL 1.0
Cytoscape Link A network visualization platform originally designed for bioinformatics, now supporting general-purpose network analysis with rich plugins Graph Visualization, Attribute Integration, Topology Analysis LGPL
GraphGPT Link An experimental tool using GPT models to extract entities and relations from text and generate interactive KG visualizations Triple Extraction, KG Construction, Visualization MIT License
LlamaIndex Link A component for building KG indices from unstructured text, integrating subject–predicate–object triples into LLM-based retrieval pipelines Triple Extraction, KG Indexing, KG-based QA MIT License

Citation

If you find this repository useful, please cite our paper:


 @article{Ding_2025,
title={Bridging Data and Discovery: A Survey on Knowledge Graphs in AI for Science},
url={http://dx.doi.org/10.36227/techrxiv.176369442.22009541/v1},
DOI={10.36227/techrxiv.176369442.22009541/v1},
publisher={Institute of Electrical and Electronics Engineers (IEEE)},
author={Ding, Keyan and Zhu, Zhihui and Tang, Yuqi and Feng, Kehua and Zhuang, Xiang and Wang, Hongwei and Yang, Yi and Du, Huifang and Ni, Zhangkai and Wang, Shiqi and Fan, Xiaohui and Xing, Huabin and Bai, Lei and Liu, Qi and Wang, Haofen and Zhang, Qiang and Chen, Huajun},
year={2025},
month=nov }

If you notice any mistakes or have suggestions, please feel free to contact us at: zhihui.zhu01@outlook.com

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •