Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

<img src="static/logo_reagent.png" width="200">

# ReAgentAI
Expand All @@ -15,11 +14,14 @@ ReAgentAI is an advanced chemical assistant powered by AI that provides comprehe
- **Molecular Visualization**: Create high-quality images of chemical structures and reaction pathways
- **Similarity Search**: Find structurally similar molecules using molecular fingerprints and Tanimoto similarity
- **SMILES Validation**: Verify and validate SMILES strings for chemical accuracy
- **Chemical Information Retrieval**: Access comprehensive chemical data through PubChem integration
- **Chemical Name/SMILES Conversion**: Convert between chemical names and SMILES using authoritative PubChem database
- **Chemical Knowledge**: Access comprehensive chemistry information through integrated web search

### Datasets & Models
### Integrated Databases & Models
- **USPTO-trained models**: Leveraging one of the largest chemical reaction databases
- **ZINC stock collection**: Access to commercially available compounds
- **PubChem database**: Integration with the NIH's comprehensive chemical database
- **Curated molecular datasets**: ~16,000 drug-like molecules for similarity searches

## 🛠 Setup
Expand Down Expand Up @@ -85,6 +87,9 @@ ReAgentAI supports various chemistry-related queries:
- **Molecular similarity**: "Find molecules similar to ethanol" or "What compounds are structurally related to benzene?"
- **Structure visualization**: "Show me the structure of morphine" or "Generate an image of the synthesis route"
- **Chemical validation**: "Is this SMILES string valid: CCO?"
- **Chemical information**: "What is the IUPAC name and molecular formula of Paracetamol?"
- **Name to SMILES**: "What is the SMILES string for Gabapentin?"
- **SMILES to name**: "What chemical name corresponds to this SMILES: CN1C=NC2=C1C(=O)N(C(=O)N2C)C?"
- **General chemistry**: "What are the properties of acetaminophen?"

## 🔧 Architecture
Expand All @@ -93,6 +98,7 @@ ReAgentAI is built with:
- **Pydantic AI**: For robust AI agent framework
- **RDKit**: Chemical informatics and molecular manipulation
- **AiZynthFinder**: Retrosynthetic analysis engine
- **PubChemPy**: Interface for accessing the PubChem database
- **Google Gemini**: Large language model for natural language processing
- **Gradio**: User-friendly web interface

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ dependencies = [
"python-dotenv>=1.1.0",
"gradio>=5.29.1",
"pydantic-ai-slim[duckduckgo]>=0.2.4",
"pubchempy>=1.0.4",
]

[tool.black]
Expand Down
16 changes: 15 additions & 1 deletion src/reagentai/agents/main/instructions.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

You are ReAgentAI, an advanced and highly precise chemical assistant. Your primary function is to answer chemistry-related questions, perform retrosynthesis, and visualize chemical structures and reaction pathways. Your core principle is to always use your available tools to provide accurate, reliable, and thoroughly grounded information.

**Core Responsibilities:**
Expand Down Expand Up @@ -65,6 +64,21 @@ You are ReAgentAI, an advanced and highly precise chemical assistant. Your prima
* `target_smiles_list` (optional): A list of SMILES strings to search against. If not provided, defaults to a curated dataset of ~16,000 drug-like molecules commonly used in chemical informatics.
* `top_n` (optional): The number of most similar molecules to return (defaults to 5).
* **Output Interpretation:** Returns a list of tuples containing SMILES strings and their Tanimoto similarity scores (0-1 range, where 1 indicates identical molecules and 0 indicates completely dissimilar). Present the results clearly, explaining that higher scores indicate greater structural similarity.
* **`get_smiles_from_name`:**
* **Purpose:** Retrieve the canonical SMILES string for a chemical compound using its common or IUPAC name (via PubChem database).
* **Usage:** Use this tool when a user provides a chemical name and you need its SMILES. If the input is already a valid SMILES, it will be returned as is.
* **Input:** `compound_name` (string)
* **Output Interpretation:** Returns the canonical SMILES string from PubChem, or the input if it is already a valid SMILES.
* **`get_compound_info`:**
* **Purpose:** Retrieve detailed chemical information from PubChem, including SMILES, molecular formula, molecular weight, IUPAC name, and synonyms.
* **Usage:** Use this tool when the user asks for detailed information about a compound by name.
* **Input:** `compound_name` (string)
* **Output Interpretation:** Returns a dictionary with keys like 'smiles', 'molecular_formula', 'molecular_weight', 'iupac_name', 'cid', and 'synonyms'.
* **`get_name_from_smiles`:**
* **Purpose:** Find the best-matching chemical name (IUPAC or synonym) for a given SMILES string using PubChem.
* **Usage:** Use this tool when you have a SMILES and need to present a human-readable name for it.
* **Input:** `smiles` (string)
* **Output Interpretation:** Returns the IUPAC name if available, otherwise the first synonym from PubChem.

Your responses should always be professional, clear, and reflect your expert chemical knowledge, meticulously supported by your tool usage.

Expand Down
8 changes: 8 additions & 0 deletions src/reagentai/agents/main/main_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@
from src.reagentai.common.aizynthfinder import initialize_aizynthfinder
from src.reagentai.constants import AIZYNTHFINDER_CONFIG_PATH
from src.reagentai.tools.image import route_to_image, smiles_to_image
from src.reagentai.tools.pubchem import (
get_compound_info,
get_name_from_smiles,
get_smiles_from_name,
)
from src.reagentai.tools.retrosynthesis import perform_retrosynthesis
from src.reagentai.tools.smiles import find_similar_molecules, is_valid_smiles

Expand Down Expand Up @@ -199,6 +204,9 @@ def create_main_agent() -> MainAgent:
Tool(smiles_to_image),
Tool(route_to_image),
Tool(find_similar_molecules),
Tool(get_smiles_from_name),
Tool(get_compound_info),
Tool(get_name_from_smiles),
duckduckgo_search_tool(),
]

Expand Down
3 changes: 3 additions & 0 deletions src/reagentai/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@
"How to synthesize Aspirin? Can u tell me the best steps to achieve this?",
"Suggest a retrosynthesis for Ibuprofen. Show all molecule images from the best route.",
"Find molecules similar to Aspirin. Show the top 5.",
"What is the IUPAC name and molecular formula of Paracetamol?",
"Convert this SMILES to a chemical name: CC(=O)OC1=CC=CC=C1C(=O)O",
"Tell me the detailed properties of Gabapentin.",
]

DEFAULT_LOG_LEVEL: int = INFO
Expand Down
222 changes: 222 additions & 0 deletions src/reagentai/tools/pubchem.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
import logging

import pubchempy as pcp
from pydantic_ai.exceptions import ModelRetry

from src.reagentai.tools.smiles import is_valid_smiles

logger = logging.getLogger(__name__)


def get_smiles_from_name(compound_name: str) -> str:
"""
Retrieve the SMILES string for a chemical compound using its common name via PubChem.

This function searches the PubChem database to find the canonical SMILES representation
of a chemical compound based on its common name, IUPAC name, or other identifiers.
PubChem is a comprehensive chemical database maintained by the NIH that contains
millions of chemical structures and their properties.

Args:
compound_name (str): The name of the chemical compound to search for.
This can be a common name (e.g., "aspirin", "caffeine"),
IUPAC name, trade name, or other chemical identifier.

Returns:
str: The canonical SMILES string of the compound as found in PubChem.

Raises:
ModelRetry: If the compound name is not found in PubChem, if no valid
SMILES string could be retrieved, or if there's a network issue.

Example:
>>> smiles = get_smiles_from_name("aspirin")
>>> print(smiles)
"CC(=O)OC1=CC=CC=C1C(=O)O"

>>> smiles = get_smiles_from_name("caffeine")
>>> print(smiles)
"CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
"""
logger.info(f"[TASK] [GET_SMILES_FROM_NAME] Arguments: compound_name: {compound_name}")

if not compound_name or not compound_name.strip():
logger.error("Empty or invalid compound name provided")
raise ModelRetry("Compound name cannot be empty")

compound_name = compound_name.strip()

# Check if the input is already a valid SMILES string
if is_valid_smiles(compound_name):
logger.info(f"Input appears to be a valid SMILES, returning as is: {compound_name}")
return compound_name

try:
# Search for the compound by name
compounds = pcp.get_compounds(compound_name, "name")

if not compounds:
logger.warning(f"No compounds found for name: {compound_name}")
raise ModelRetry(f"No compound found in PubChem for name: '{compound_name}'")

# Get the first (most relevant) compound
compound = compounds[0]

# Retrieve the canonical SMILES
smiles = compound.canonical_smiles

if not smiles:
logger.error(f"No SMILES found for compound: {compound_name}")
raise ModelRetry(f"No SMILES string available for compound: '{compound_name}'")

logger.info(f"Successfully retrieved SMILES for {compound_name}: {smiles}")
logger.debug(f"PubChem CID: {compound.cid}")

return smiles

except Exception as e:
if isinstance(e, ModelRetry):
raise # Re-raise ModelRetry as-is

logger.error(f"Error retrieving SMILES for {compound_name}: {str(e)}")

# For all exceptions, wrap in ModelRetry
error_msg = f"Failed to retrieve SMILES for '{compound_name}': {str(e)}"
if "connection" in str(e).lower() or "network" in str(e).lower():
error_msg = f"Failed to connect to PubChem: {str(e)}"

raise ModelRetry(error_msg) from e


def get_compound_info(compound_name: str) -> dict[str, str | list | None]:
"""
Retrieve comprehensive information about a chemical compound from PubChem.

This function provides additional chemical information beyond just the SMILES string,
including molecular formula, molecular weight, IUPAC name, and other identifiers.

Args:
compound_name (str): The name of the chemical compound to search for.

Returns:
dict[str, Optional[str]]: A dictionary containing compound information with keys:
- 'smiles': Canonical SMILES string
- 'molecular_formula': Molecular formula
- 'molecular_weight': Molecular weight in g/mol
- 'iupac_name': IUPAC systematic name
- 'cid': PubChem Compound ID
- 'synonyms': List of alternative names (first 5)

Raises:
ModelRetry: If the compound name is not found in PubChem or if there's a network issue.

Example:
>>> info = get_compound_info("aspirin")
>>> print(info['smiles'])
"CC(=O)OC1=CC=CC=C1C(=O)O"
>>> print(info['molecular_formula'])
"C9H8O4"
"""
logger.info(f"[TASK] [GET_COMPOUND_INFO] Arguments: compound_name: {compound_name}")

if not compound_name or not compound_name.strip():
logger.error("Empty or invalid compound name provided")
raise ModelRetry("Compound name cannot be empty")

compound_name = compound_name.strip()

try:
# Search for the compound by name
compounds = pcp.get_compounds(compound_name, "name")

if not compounds:
logger.warning(f"No compounds found for name: {compound_name}")
raise ModelRetry(f"No compound found in PubChem for name: '{compound_name}'")

# Get the first (most relevant) compound
compound = compounds[0]

# Extract comprehensive information
info = {
"smiles": getattr(compound, "canonical_smiles", None),
"molecular_formula": getattr(compound, "molecular_formula", None),
"molecular_weight": str(getattr(compound, "molecular_weight", None))
if hasattr(compound, "molecular_weight")
else None,
"iupac_name": getattr(compound, "iupac_name", None),
"cid": str(getattr(compound, "cid", None)) if hasattr(compound, "cid") else None,
"synonyms": getattr(compound, "synonyms", [])[:5]
if hasattr(compound, "synonyms")
else [],
}

logger.info(f"Successfully retrieved compound info for {compound_name}")
logger.debug(f"Compound info: {info}")

return info

except Exception as e:
if isinstance(e, ModelRetry):
raise # Re-raise ModelRetry as-is

logger.error(f"Error retrieving compound info for {compound_name}: {str(e)}")

# For all exceptions, wrap in ModelRetry
error_msg = f"Failed to retrieve compound info for '{compound_name}': {str(e)}"
if "connection" in str(e).lower() or "network" in str(e).lower():
error_msg = f"Failed to connect to PubChem: {str(e)}"

raise ModelRetry(error_msg) from e


def get_name_from_smiles(smiles: str) -> str:
"""
Retrieve the best-matching chemical name for a given SMILES string using PubChem.

Args:
smiles (str): The SMILES string of the compound.

Returns:
str: The best-matching chemical name (IUPAC or synonym) from PubChem.

Raises:
ModelRetry: If no compound is found for the SMILES, if no name is available,
or if there's a network issue connecting to PubChem servers.
"""
logger.info(f"[TASK] [GET_NAME_FROM_SMILES] Arguments: smiles: {smiles}")

if not smiles or not smiles.strip():
logger.error("Empty or invalid SMILES provided")
raise ModelRetry("SMILES string cannot be empty")

smiles = smiles.strip()

try:
compounds = pcp.get_compounds(smiles, "smiles")
if not compounds:
logger.warning(f"No compounds found for SMILES: {smiles}")
raise ModelRetry(f"No compound found in PubChem for SMILES: '{smiles}'")
compound = compounds[0]
# Prefer IUPAC name, fall back to first synonym
name = getattr(compound, "iupac_name", None)
if not name:
synonyms = getattr(compound, "synonyms", [])
if synonyms:
name = synonyms[0]
if not name:
logger.error(f"No name found for SMILES: {smiles}")
raise ModelRetry(f"No name available for SMILES: '{smiles}'")
logger.info(f"Successfully retrieved name for SMILES {smiles}: {name}")
return name
except Exception as e:
if isinstance(e, ModelRetry):
raise # Re-raise ModelRetry as-is

logger.error(f"Error retrieving name for SMILES {smiles}: {str(e)}")

# For all exceptions, wrap in ModelRetry
error_msg = f"Failed to retrieve name for SMILES '{smiles}': {str(e)}"
if "connection" in str(e).lower() or "network" in str(e).lower():
error_msg = f"Failed to connect to PubChem: {str(e)}"

raise ModelRetry(error_msg) from e
14 changes: 7 additions & 7 deletions src/reagentai/ui/app.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import functools

import gradio as gr
from pydantic_ai import UnexpectedModelBehavior, UsageLimitExceeded, ModelHTTPError
from pydantic_ai import ModelHTTPError, UnexpectedModelBehavior, UsageLimitExceeded
from pydantic_ai.messages import ToolCallPart, ToolReturnPart

from src.reagentai.agents.main.main_agent import MainAgent
Expand Down Expand Up @@ -37,6 +37,12 @@ def create_settings_panel(
visible=True,
)

gr.Examples(
examples=EXAMPLE_PROMPTS,
inputs=chat_input_component,
label="Example Prompts",
)

gr.Markdown("### Tool Usage History")
tool_display = gr.Chatbot(
type="messages",
Expand All @@ -45,12 +51,6 @@ def create_settings_panel(
elem_id="tool_display",
)

gr.Examples(
examples=EXAMPLE_PROMPTS,
inputs=chat_input_component,
label="Example Prompts",
)

return model_dropdown, usage_counter, tool_display


Expand Down
8 changes: 8 additions & 0 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.