scChat: A Large Language Model-Powered Co-Pilot for Contextualized Single-Cell RNA Sequencing Analysis

Welcome to the scChat page. scChat is a pioneering AI assistant designed to enhance single-cell RNA sequencing (scRNA-seq) analysis by incorporating research context into the workflow. Powered by a large language model (LLM), scChat goes beyond standard tasks like cell annotation by offering advanced capabilities such as research context-based experimental analysis, hypothesis validation, and suggestions for future experiments.

Video Demo

Watch the demo of scChat in action below:

If you found this work useful, please cite this preprint as:

@misc{lu2024scchat,
    title={scChat: A Large Language Model-Powered Co-Pilot for Contextualized Single-Cell RNA Sequencing Analysis},
    author={Yen-Chun Lu and Ashley Varghese and Rahul Nahar and Hao Chen and Kunming Shao and Xiaoping Bao and Can Li},
    year={2024},
    eprint={2024.10.01.616063},
    archivePrefix={bioRxiv},
    doi={10.1101/2024.10.01.616063}
}

Overview

1. Motivation

Data-driven methods such as unsupervised and supervised learning are essential tools in single-cell RNA sequencing (scRNA-seq) analysis. However, these methods often lack the ability to incorporate research context, which can lead to overlooked insights. scChat addresses this by integrating contextualized conversation with data analysis to provide a deeper understanding of experimental results. It supports the exploration of research hypotheses and generates actionable insights for future experiments.

Please read our scChat paper for more motivation and details about how the scChat works.

2. Scope

Model: scChat currently supports analysis using AnnData-formatted single-cell RNA sequencing datasets.

Capabilities: scChat integrates an LLM mutli-agent system with specialized tools to enable tasks, such as cell type annotation, enrichment analysis, and result visualization, all through conversational interactions.

3. Methodology

scChat – a multi-agent scRNA-seq research co-scientist – that can autonomously generate executable plans for multi-step analyses, ranging from data preprocessing and follow-up analysis to results visualization. scChat includes five main agents in it:

🧠 Planner: Searches for function execution and conversation history, parses the query, and decomposes it to generate a plan with several function calls arranged as steps in sequence.
⚡ Executor: Performs the function specified in the plan iteratively.
✅ Evaluator: Validates the outcome of each function from the executor, handling errors and interrupting the plan to pass error messages to the response generator if needed. Additionally, it checks the availability of remaining steps and determines the next step in the workflow.
🔍 Critic: Identifies potentially missing functions by creating a separate plan based on the function results, ensuring targeted analyses of specific cell types with all necessary downstream steps.
📝 Response Generator: Compiles all relevant function results to generate the final response to the user's query. After generating the response, it stores the final response and the function execution results in conversation and function histories, respectively.

scChat is highly rely on RAG to perform the function. Below provides the explanation of cell type RAG and pathway RAG for cell type annotation and enrichment analysis.

The hierarchy links “System,” “Organ,” and “CellType” nodes through functional edges, enabling marker retrieval and annotation. Cell lineage is traced with “develops to,” while cell type RAG derives from CellMarker. The retrived information then be used to do matching for cell type annotation.

CellMarker: a manually curated resource of cell markers in human and mouse
Published in Nucleic Acids Research, 2018
DOI: 10.1093/nar/gky900

The pathway knowledge graph has “Database,” “GeneSetLibrary,” and “Pathway” nodes. Databases include GO, KEGG, Reactome, and GSEA, with the first three directly connected to pathways through “found in” edges. It assists scChat to determine which method and gene set library should be used for enrichment analysis.

Tutorial

To set up the project environment and run the server, follow these steps:

Step 1: Install the required dependencies:

pip3 install -r requirements.txt

Step 2: Set the OPENAI Key Environment Variable

Type and enter export in your terminal

OPENAI_API_KEY='your_openai_api_key'

Step 3: Download Neo4j Desktop 2

Download Neo4j Desktop 2 (https://neo4j.com/download/)
Download required dump files (https://drive.google.com/drive/folders/17UCKv95G3tFqeyce1oQAo3ss2vS7uZQE)
Create a new instance on Neo4j (this step asks you set the password)
Import the dump files as new databases in the created instance.
Start the database

Step 4: Upload and update files

Upload scRNA-seq adata file (.h5ad)
Upload the pathway vector-based model (.pkl and .faiss), which can be found in this link: https://drive.google.com/drive/u/4/folders/1OklM2u5T5FsjiUvvYRYyWxrssQIb84Ky
Update specification_graph.json with your Neo4j username, password, system and organ relevant to the database you are using with specific format
Update sample_mapping.json with adata file corresponding "Sample name", which can be found in adata.obs, and write descriptions for each condition.

Step 5: Build the specification.json and sample_mapping.json for RAG specifications

Build the specification_graph.json with your Neo4j username, password, database(human or mouse), system and organ relevant to the file you are going to test with following format:

{
    "url": "put your url here", 
    "username": "put your username here",
    "password": "put your password here",
    "database": "make sure the database name is correct",
    "pathway_rag": "make sure the pathway rag name is correct",
    "sources": [
        {
            "system": "Lymphatic System",
            "organ": "Peripheral blood"
        }
    ]
}

It's also allowed to pass multiple system and organ to the RAG. For example:

{
    "url": "put your url here", 
    "username": "put your username here",
    "password": "put your password here",
    "database": "make sure the database name is correct",
    "pathway_rag": "make sure the pathway rag name is correct",
    "sources": [
        {
            "system": "Lymphatic System",
            "organ": "Peripheral blood"
        },
        {
            "system": "Nervous System",
            "organ": "Brain"
        }
    ]
}

Build the sample_mapping.json with adata file corresponding "Sample name", which can be found in adata.obs, and write descriptions for each condition. For example:

{
  "Sample name": "Sample",
  "Sample categories": {
    "0": "p1_pre",
    "1": "p1_post",
    "2": "p6_pre",
    "3": "p6_post",
    "4": "p7_pre",
    "5": "p7_post"
  },
  "Sample description": {
    "p1_pre": "Pre-treatment sample from patient 1",
    "p1_post": "Post-treatment sample from patient 1",
    "p6_pre": "Pre-treatment sample from patient 6",
    "p6_post": "Post-treatment sample from patient 6",
    "p7_pre": "Pre-treatment sample from patient 7",
    "p7_post": "Post-treatment sample from patient 7"
  }
}

Notably, the available systems, organs and tissues are listed in available_cell_RAG.json.

Step 6: Initialize the Application

Run

python3 manage.py migrate

For the first time as you install scChat

Then run

python3 manage.py runserver

Step 7: Access the Application

Open your web browser and navigate to: http://127.0.0.1:8000/schatbot
Recommended: Use Google Chrome for the best experience.

Additional Tip: Clear Cache to Avoid Previous Chat Data

Periodically clearing the cache is recommended to ensure a smooth experience:
1. Right-click on the page and select Inspect.
2. Go to the Application tab.
3. Under Cookies, remove sessionid.
4. You may have to run python manage.py migrate in some cases before Step 4.
This will prevent previous chat sessions from being reprocessed.

Datasets

The datasets used for testing and examples for sample_mapping.json and specification_graph.json can be found at https://docs.google.com/spreadsheets/d/1NwN5GydHn0B3-W0DLcAfvnNtZVJEMUgBW9YyzXnS83A/edit?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
conversation_history		conversation_history
images		images
media		media
process_cell_data		process_cell_data
scchatbot		scchatbot
umaps		umaps
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
available_cell_RAG.json		available_cell_RAG.json
docker-compose.yml		docker-compose.yml
get-pip.py		get-pip.py
logging_setup.py		logging_setup.py
manage.py		manage.py
preface.txt		preface.txt
requirements.txt		requirements.txt
setup_database.py		setup_database.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scChat: A Large Language Model-Powered Co-Pilot for Contextualized Single-Cell RNA Sequencing Analysis

Video Demo

Table of Contents

Overview

1. Motivation

2. Scope

3. Methodology

Tutorial

Step 1: Install the required dependencies:

Step 2: Set the OPENAI Key Environment Variable

Step 3: Download Neo4j Desktop 2

Step 4: Upload and update files

Step 5: Build the specification.json and sample_mapping.json for RAG specifications

Step 6: Initialize the Application

Step 7: Access the Application

Additional Tip: Clear Cache to Avoid Previous Chat Data

Datasets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

li-group/scChat

Folders and files

Latest commit

History

Repository files navigation

scChat: A Large Language Model-Powered Co-Pilot for Contextualized Single-Cell RNA Sequencing Analysis

Video Demo

Table of Contents

Overview

1. Motivation

2. Scope

3. Methodology

Tutorial

Step 1: Install the required dependencies:

Step 2: Set the OPENAI Key Environment Variable

Step 3: Download Neo4j Desktop 2

Step 4: Upload and update files

Step 5: Build the specification.json and sample_mapping.json for RAG specifications

Step 6: Initialize the Application

Step 7: Access the Application

Additional Tip: Clear Cache to Avoid Previous Chat Data

Datasets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages