From 0c17f53042f6c494085edc51b643c22d8c97578d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomasz=20D=C4=85dela?= Date: Sun, 8 Jun 2025 23:46:07 +0200 Subject: [PATCH 01/11] [feat] pubchempy name_to_smiles --- pyproject.toml | 1 + src/reagentai/agents/main/instructions.txt | 24 +++++--- src/reagentai/agents/main/main_agent.py | 2 + src/reagentai/tools/pubchem.py | 70 ++++++++++++++++++++++ uv.lock | 8 +++ 5 files changed, 96 insertions(+), 9 deletions(-) create mode 100644 src/reagentai/tools/pubchem.py diff --git a/pyproject.toml b/pyproject.toml index 0db98b0..64a6b14 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -10,6 +10,7 @@ dependencies = [ "python-dotenv>=1.1.0", "gradio>=5.29.1", "pydantic-ai-slim[duckduckgo]>=0.2.4", + "pubchempy>=1.0.4", ] [tool.black] diff --git a/src/reagentai/agents/main/instructions.txt b/src/reagentai/agents/main/instructions.txt index 69f381d..31385ad 100644 --- a/src/reagentai/agents/main/instructions.txt +++ b/src/reagentai/agents/main/instructions.txt @@ -1,4 +1,3 @@ - You are ReAgentAI, an advanced and highly precise chemical assistant. Your primary function is to answer chemistry-related questions, perform retrosynthesis, and visualize chemical structures and reaction pathways. Your core principle is to always use your available tools to provide accurate, reliable, and thoroughly grounded information. **Core Responsibilities:** @@ -6,24 +5,25 @@ You are ReAgentAI, an advanced and highly precise chemical assistant. Your prima 2. Execute retrosynthesis calculations for chemical compounds. 3. Visualize chemical compounds and reaction routes. 4. Find structurally similar molecules based on molecular fingerprints. -5. Ensure all information is rigorously validated and sourced. +5. Convert chemical names to SMILES strings using authoritative databases. +6. Ensure all information is rigorously validated and sourced. **Operating Principles & Constraints:** 1. **Tool-First Execution:** Your default mode of operation is to identify the most relevant tool(s) for a given request and execute them. Do not attempt to generate information or perform tasks without first trying to use a tool if one is applicable. 2. **Absolute Grounding & Anti-Hallucination:** * **NEVER** invent chemical names, SMILES strings, properties, facts, reaction pathways, or any other information. - * All factual statements MUST be directly supported by the output of your tools, especially `duckduckgo_search` and `perform_retrosynth`. + * All factual statements MUST be directly supported by the output of your tools, especially `duckduckgo_search`, `name_to_smiles`, and `perform_retrosynth`. * If a request cannot be fulfilled or validated by your tools, you must clearly state that you cannot find the information or perform the action and explain why (e.g., "I could not find a valid SMILES for that compound," or "Retrosynthesis for this compound did not yield any routes"). 3. **SMILES as Primary Input:** * Many of your specialized tools (`perform_retrosynth`, `smiles_to_image`, `find_similar_molecules`) require chemical compounds to be provided in SMILES (Simplified Molecular Input Line Entry System) format. - * **Conversion Rule:** If a user provides a customary (common) name for a compound (e.g., "aspirin", "caffeine", "ethanol"), your immediate first step is to use `duckduckgo_search` to find its corresponding SMILES string. Only once you have obtained a valid SMILES, proceed with the original request. + * **Conversion Rule:** If a user provides a customary (common) name for a compound (e.g., "aspirin", "caffeine", "ethanol"), your immediate first step is to use `name_to_smiles` to find its corresponding SMILES string from PubChem. If `name_to_smiles` fails, use `duckduckgo_search` as a fallback. Only once you have obtained a valid SMILES, proceed with the original request. * **Validation:** Implicitly validate SMILES strings by attempting to use them with your tools. If a tool fails to process a given SMILES, inform the user about the potential invalidity. 4. **Step-by-Step Problem Solving (Chain-of-Thought):** * For any request, mentally outline the steps needed: 1. **Understand:** What exactly is the user asking for? 2. **Identify Inputs:** Does the request involve a chemical name, SMILES, or a general query? - 3. **Plan Tools:** Which tool(s) are necessary? What order should they be used in? (e.g., Name -> `duckduckgo_search` -> SMILES -> `perform_retrosynth` -> `route_to_image`). + 3. **Plan Tools:** Which tool(s) are necessary? What order should they be used in? (e.g., Name -> `name_to_smiles` -> SMILES -> `perform_retrosynth` -> `route_to_image`). 4. **Execute Tools:** Run the chosen tool(s). 5. **Process Output:** Interpret the tool's output. 6. **Formulate Response:** Present the information clearly, concisely, and directly answering the user's query, ensuring all claims are tool-supported. @@ -32,6 +32,7 @@ You are ReAgentAI, an advanced and highly precise chemical assistant. Your prima * When presenting `perform_retrosynth` results, clearly list the found routes and the SMILES strings of the compounds involved in each step. * When an image tool is used (`smiles_to_image` or `route_to_image`), state that an image has been successfully generated and describe what it depicts. * When presenting `find_similar_molecules` results, list the similar molecules with their SMILES strings and similarity scores, explaining what the Tanimoto similarity coefficient represents. + * When presenting `name_to_smiles` results, clearly state the SMILES string found and mention that it was retrieved from PubChem database. **Available Tools and Their Specific Usage Directives:** @@ -40,11 +41,16 @@ You are ReAgentAI, an advanced and highly precise chemical assistant. Your prima * **Usage:** Only call this tool when the user explicitly requests "retrosynthesis," "how to make," or "synthesis pathway" for a compound. * **Input:** The `target_smiles` argument **must** be a valid SMILES string. Adhere to the "SMILES as Primary Input" rule if a common name is provided. * **Output Interpretation:** This tool returns information about potential routes, including the SMILES of intermediate and starting compounds. Structure your response to clearly present these routes and their associated SMILES. +* **`name_to_smiles`:** + * **Purpose:** To convert chemical names to canonical SMILES strings using the authoritative PubChem database maintained by NCBI. + * **Usage:** This is your **primary tool** for converting chemical names to SMILES. Use this whenever a user provides a common name, IUPAC name, or other chemical identifier that needs to be converted to SMILES format. This should be preferred over `duckduckgo_search` for name-to-SMILES conversion. + * **Input:** The `chemical_name` argument should be the name of the compound (e.g., "aspirin", "caffeine", "2-acetoxybenzoic acid"). + * **Output Interpretation:** Returns the canonical SMILES string as found in PubChem. If successful, use this SMILES for subsequent operations. If this tool fails, use `duckduckgo_search` as a fallback. * **`duckduckgo_search`:** - * **Purpose:** Your primary tool for general knowledge retrieval, fact-checking, validating chemical names/SMILES, and **crucially, converting customary chemical names to SMILES strings.** + * **Purpose:** Your tool for general knowledge retrieval, fact-checking, validating chemical names/SMILES, and as a **fallback** for converting customary chemical names to SMILES strings when `name_to_smiles` fails. * **Usage:** - * Always use this when a user provides a common chemical name and you need its SMILES. - * Use for any factual query about chemistry that isn't directly addressed by retrosynthesis. + * Use as a fallback when `name_to_smiles` fails to find a compound. + * Use for any factual query about chemistry that isn't directly addressed by other specialized tools. * Use to confirm the existence or properties of a chemical or reaction. * **Output Interpretation:** Carefully read and extract the most relevant information from the search results to answer the user's query or obtain the required SMILES. * **`smiles_to_image`:** @@ -64,7 +70,7 @@ You are ReAgentAI, an advanced and highly precise chemical assistant. Your prima * **`find_similar_molecules`:** * **Purpose:** To find molecules structurally similar to a query molecule based on Tanimoto similarity of Morgan fingerprints (ECFP4-like circular fingerprints). * **Usage:** Call this tool when the user asks for "similar molecules," "structural analogs," "compounds like," or requests to find molecules with similar structures. Also useful when exploring chemical space around a particular compound or when looking for potential drug analogs. - * **Input:** + * **Input:** * `query_smiles` (required): **must** be a valid SMILES string. Adhere to the "SMILES as Primary Input" rule if a common name is provided. * `target_smiles_list` (optional): A list of SMILES strings to search against. If not provided, defaults to a curated dataset of ~16,000 drug-like molecules commonly used in chemical informatics. * `top_n` (optional): The number of most similar molecules to return (defaults to 5). diff --git a/src/reagentai/agents/main/main_agent.py b/src/reagentai/agents/main/main_agent.py index 76f7cea..bdbc454 100644 --- a/src/reagentai/agents/main/main_agent.py +++ b/src/reagentai/agents/main/main_agent.py @@ -10,6 +10,7 @@ from src.reagentai.common.aizynthfinder import initialize_aizynthfinder from src.reagentai.constants import AIZYNTHFINDER_CONFIG_PATH from src.reagentai.tools.image import route_to_image, smiles_to_image +from src.reagentai.tools.pubchem import name_to_smiles from src.reagentai.tools.retrosynthesis import perform_retrosynthesis from src.reagentai.tools.smiles import find_similar_molecules, is_valid_smiles @@ -157,6 +158,7 @@ def create_main_agent() -> MainAgent: Tool(smiles_to_image), Tool(route_to_image), Tool(find_similar_molecules), + Tool(name_to_smiles), duckduckgo_search_tool(), ] diff --git a/src/reagentai/tools/pubchem.py b/src/reagentai/tools/pubchem.py new file mode 100644 index 0000000..8da96aa --- /dev/null +++ b/src/reagentai/tools/pubchem.py @@ -0,0 +1,70 @@ +import logging + +import pubchempy as pcp + +logger = logging.getLogger(__name__) + + +def name_to_smiles(chemical_name: str) -> str: + """ + Finds the canonical SMILES string for a given chemical name using PubChem database. + + This function searches the PubChem database for a chemical compound by name and + returns its canonical SMILES representation. PubChem is a comprehensive database + maintained by the National Center for Biotechnology Information (NCBI) containing + millions of chemical structures and their associated data. + + Args: + chemical_name (str): The name of the chemical compound to search for. + Can be a common name (e.g., "aspirin", "caffeine"), + IUPAC name, or other chemical identifier. + + Returns: + str: The canonical SMILES string of the compound as found in PubChem. + + Raises: + ValueError: If no compound is found for the given name or if multiple + ambiguous results are returned without a clear match. + + Example: + >>> smiles = name_to_smiles("aspirin") + >>> # Returns "CC(=O)OC1=CC=CC=C1C(=O)O" + >>> smiles = name_to_smiles("caffeine") + >>> # Returns "CN1C=NC2=C1C(=O)N(C(=O)N2C)C" + """ + logger.info(f"[TASK] [NAME_TO_SMILES] Arguments: chemical_name: {chemical_name}") + + if not chemical_name or not chemical_name.strip(): + logger.error("Empty or invalid chemical name provided") + raise ValueError("Chemical name cannot be empty") + + chemical_name = chemical_name.strip() + + try: + # Search for the compound by name + compounds = pcp.get_compounds(chemical_name, "name") + + if not compounds: + logger.warning(f"No compounds found for name: {chemical_name}") + raise ValueError(f"No compound found for name: {chemical_name}") + + # Get the first (most relevant) compound + compound = compounds[0] + + # Get canonical SMILES + smiles = compound.canonical_smiles + + if not smiles: + logger.error(f"No SMILES found for compound: {chemical_name}") + raise ValueError(f"No SMILES representation found for: {chemical_name}") + + logger.info(f"Found SMILES for '{chemical_name}': {smiles}") + logger.debug(f"PubChem CID: {compound.cid}") + + return smiles + + except Exception as e: + if isinstance(e, ValueError): + raise + logger.error(f"Error searching PubChem for '{chemical_name}': {str(e)}") + raise ValueError(f"Failed to retrieve SMILES for '{chemical_name}': {str(e)}") from e diff --git a/uv.lock b/uv.lock index 40fb04c..4d66919 100644 --- a/uv.lock +++ b/uv.lock @@ -2440,6 +2440,12 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/22/a6/858897256d0deac81a172289110f31629fc4cee19b6f01283303e18c8db3/ptyprocess-0.7.0-py2.py3-none-any.whl", hash = "sha256:4b41f3967fce3af57cc7e94b888626c18bf37a083e3651ca8feeb66d492fef35", size = 13993, upload-time = "2020-12-28T15:15:28.35Z" }, ] +[[package]] +name = "pubchempy" +version = "1.0.4" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/aa/fb/8de3aa9804b614dbc8dc5c16ed061d819cc360e0ddecda3dcd01c1552339/PubChemPy-1.0.4.tar.gz", hash = "sha256:24e9dc2fc90ab153b2764bf805e510b1410700884faf0510a9e7cf0d61d8ed0e", size = 29767, upload-time = "2017-04-11T18:36:23.649Z" } + [[package]] name = "pure-eval" version = "0.2.3" @@ -2926,6 +2932,7 @@ source = { virtual = "." } dependencies = [ { name = "aizynthfinder" }, { name = "gradio" }, + { name = "pubchempy" }, { name = "pydantic-ai" }, { name = "pydantic-ai-slim", extra = ["duckduckgo"] }, { name = "python-dotenv" }, @@ -2935,6 +2942,7 @@ dependencies = [ requires-dist = [ { name = "aizynthfinder", specifier = ">=4.3.2" }, { name = "gradio", specifier = ">=5.29.1" }, + { name = "pubchempy", specifier = ">=1.0.4" }, { name = "pydantic-ai", specifier = ">=0.2.4" }, { name = "pydantic-ai-slim", extras = ["duckduckgo"], specifier = ">=0.2.4" }, { name = "python-dotenv", specifier = ">=1.1.0" }, From a93cf128a6ec863368217c03e6114ef294802e84 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomasz=20D=C4=85dela?= Date: Mon, 9 Jun 2025 00:01:00 +0200 Subject: [PATCH 02/11] [feat] pubchempy get_compound_info --- src/reagentai/agents/main/instructions.txt | 27 ++-- src/reagentai/agents/main/main_agent.py | 7 +- src/reagentai/tools/pubchem.py | 151 +++++++++++++++++---- 3 files changed, 141 insertions(+), 44 deletions(-) diff --git a/src/reagentai/agents/main/instructions.txt b/src/reagentai/agents/main/instructions.txt index 31385ad..e727113 100644 --- a/src/reagentai/agents/main/instructions.txt +++ b/src/reagentai/agents/main/instructions.txt @@ -13,17 +13,17 @@ You are ReAgentAI, an advanced and highly precise chemical assistant. Your prima 1. **Tool-First Execution:** Your default mode of operation is to identify the most relevant tool(s) for a given request and execute them. Do not attempt to generate information or perform tasks without first trying to use a tool if one is applicable. 2. **Absolute Grounding & Anti-Hallucination:** * **NEVER** invent chemical names, SMILES strings, properties, facts, reaction pathways, or any other information. - * All factual statements MUST be directly supported by the output of your tools, especially `duckduckgo_search`, `name_to_smiles`, and `perform_retrosynth`. + * All factual statements MUST be directly supported by the output of your tools, especially `get_smiles_from_name`, `duckduckgo_search` and `perform_retrosynth`. * If a request cannot be fulfilled or validated by your tools, you must clearly state that you cannot find the information or perform the action and explain why (e.g., "I could not find a valid SMILES for that compound," or "Retrosynthesis for this compound did not yield any routes"). 3. **SMILES as Primary Input:** * Many of your specialized tools (`perform_retrosynth`, `smiles_to_image`, `find_similar_molecules`) require chemical compounds to be provided in SMILES (Simplified Molecular Input Line Entry System) format. - * **Conversion Rule:** If a user provides a customary (common) name for a compound (e.g., "aspirin", "caffeine", "ethanol"), your immediate first step is to use `name_to_smiles` to find its corresponding SMILES string from PubChem. If `name_to_smiles` fails, use `duckduckgo_search` as a fallback. Only once you have obtained a valid SMILES, proceed with the original request. + * **Conversion Rule:** If a user provides a customary (common) name for a compound (e.g., "aspirin", "caffeine", "ethanol"), your immediate first step is to use `get_smiles_from_name` to find its corresponding SMILES string from PubChem. If that fails, fall back to `duckduckgo_search`. Only once you have obtained a valid SMILES, proceed with the original request. * **Validation:** Implicitly validate SMILES strings by attempting to use them with your tools. If a tool fails to process a given SMILES, inform the user about the potential invalidity. 4. **Step-by-Step Problem Solving (Chain-of-Thought):** * For any request, mentally outline the steps needed: 1. **Understand:** What exactly is the user asking for? 2. **Identify Inputs:** Does the request involve a chemical name, SMILES, or a general query? - 3. **Plan Tools:** Which tool(s) are necessary? What order should they be used in? (e.g., Name -> `name_to_smiles` -> SMILES -> `perform_retrosynth` -> `route_to_image`). + 3. **Plan Tools:** Which tool(s) are necessary? What order should they be used in? (e.g., Name -> `get_smiles_from_name` -> SMILES -> `perform_retrosynth` -> `route_to_image`). 4. **Execute Tools:** Run the chosen tool(s). 5. **Process Output:** Interpret the tool's output. 6. **Formulate Response:** Present the information clearly, concisely, and directly answering the user's query, ensuring all claims are tool-supported. @@ -32,7 +32,7 @@ You are ReAgentAI, an advanced and highly precise chemical assistant. Your prima * When presenting `perform_retrosynth` results, clearly list the found routes and the SMILES strings of the compounds involved in each step. * When an image tool is used (`smiles_to_image` or `route_to_image`), state that an image has been successfully generated and describe what it depicts. * When presenting `find_similar_molecules` results, list the similar molecules with their SMILES strings and similarity scores, explaining what the Tanimoto similarity coefficient represents. - * When presenting `name_to_smiles` results, clearly state the SMILES string found and mention that it was retrieved from PubChem database. + * When using `get_smiles_from_name` or `get_compound_info`, clearly present the retrieved information and mention that it comes from PubChem database. **Available Tools and Their Specific Usage Directives:** @@ -41,15 +41,20 @@ You are ReAgentAI, an advanced and highly precise chemical assistant. Your prima * **Usage:** Only call this tool when the user explicitly requests "retrosynthesis," "how to make," or "synthesis pathway" for a compound. * **Input:** The `target_smiles` argument **must** be a valid SMILES string. Adhere to the "SMILES as Primary Input" rule if a common name is provided. * **Output Interpretation:** This tool returns information about potential routes, including the SMILES of intermediate and starting compounds. Structure your response to clearly present these routes and their associated SMILES. -* **`name_to_smiles`:** - * **Purpose:** To convert chemical names to canonical SMILES strings using the authoritative PubChem database maintained by NCBI. - * **Usage:** This is your **primary tool** for converting chemical names to SMILES. Use this whenever a user provides a common name, IUPAC name, or other chemical identifier that needs to be converted to SMILES format. This should be preferred over `duckduckgo_search` for name-to-SMILES conversion. - * **Input:** The `chemical_name` argument should be the name of the compound (e.g., "aspirin", "caffeine", "2-acetoxybenzoic acid"). - * **Output Interpretation:** Returns the canonical SMILES string as found in PubChem. If successful, use this SMILES for subsequent operations. If this tool fails, use `duckduckgo_search` as a fallback. +* **`get_smiles_from_name`:** + * **Purpose:** To retrieve the canonical SMILES string for a chemical compound using its common name via PubChem database. + * **Usage:** This is your PRIMARY tool for converting chemical names to SMILES. Use this whenever a user provides a common name, trade name, or IUPAC name for a compound. This tool provides more reliable and authoritative results than web search for chemical name-to-SMILES conversion. + * **Input:** The `compound_name` argument should be the chemical name provided by the user. + * **Output Interpretation:** Returns the canonical SMILES string from PubChem. If successful, use this SMILES for subsequent operations. If it fails, inform the user and potentially fall back to `duckduckgo_search`. +* **`get_compound_info`:** + * **Purpose:** To retrieve comprehensive chemical information about a compound from PubChem, including SMILES, molecular formula, molecular weight, IUPAC name, and synonyms. + * **Usage:** Use this when the user asks for detailed information about a chemical compound, or when you need additional context beyond just the SMILES string. + * **Input:** The `compound_name` argument should be the chemical name provided by the user. + * **Output Interpretation:** Returns a dictionary with comprehensive compound information. Present this information clearly, highlighting the most relevant details for the user's query. * **`duckduckgo_search`:** - * **Purpose:** Your tool for general knowledge retrieval, fact-checking, validating chemical names/SMILES, and as a **fallback** for converting customary chemical names to SMILES strings when `name_to_smiles` fails. + * **Purpose:** Your secondary tool for general knowledge retrieval, fact-checking, and as a fallback for chemical name/SMILES conversion when PubChem tools fail. * **Usage:** - * Use as a fallback when `name_to_smiles` fails to find a compound. + * Use as a fallback when `get_smiles_from_name` fails to find a compound. * Use for any factual query about chemistry that isn't directly addressed by other specialized tools. * Use to confirm the existence or properties of a chemical or reaction. * **Output Interpretation:** Carefully read and extract the most relevant information from the search results to answer the user's query or obtain the required SMILES. diff --git a/src/reagentai/agents/main/main_agent.py b/src/reagentai/agents/main/main_agent.py index bdbc454..35e24ed 100644 --- a/src/reagentai/agents/main/main_agent.py +++ b/src/reagentai/agents/main/main_agent.py @@ -10,7 +10,7 @@ from src.reagentai.common.aizynthfinder import initialize_aizynthfinder from src.reagentai.constants import AIZYNTHFINDER_CONFIG_PATH from src.reagentai.tools.image import route_to_image, smiles_to_image -from src.reagentai.tools.pubchem import name_to_smiles +from src.reagentai.tools.pubchem import get_smiles_from_name, get_compound_info from src.reagentai.tools.retrosynthesis import perform_retrosynthesis from src.reagentai.tools.smiles import find_similar_molecules, is_valid_smiles @@ -158,7 +158,8 @@ def create_main_agent() -> MainAgent: Tool(smiles_to_image), Tool(route_to_image), Tool(find_similar_molecules), - Tool(name_to_smiles), + Tool(get_smiles_from_name), + Tool(get_compound_info), duckduckgo_search_tool(), ] @@ -179,4 +180,4 @@ def create_main_agent() -> MainAgent: output_type=str, ) - return main_agent + return main_agent \ No newline at end of file diff --git a/src/reagentai/tools/pubchem.py b/src/reagentai/tools/pubchem.py index 8da96aa..9a79113 100644 --- a/src/reagentai/tools/pubchem.py +++ b/src/reagentai/tools/pubchem.py @@ -1,70 +1,161 @@ import logging +from typing import Optional import pubchempy as pcp logger = logging.getLogger(__name__) -def name_to_smiles(chemical_name: str) -> str: +def get_smiles_from_name(compound_name: str) -> str: """ - Finds the canonical SMILES string for a given chemical name using PubChem database. + Retrieve the SMILES string for a chemical compound using its common name via PubChem. - This function searches the PubChem database for a chemical compound by name and - returns its canonical SMILES representation. PubChem is a comprehensive database - maintained by the National Center for Biotechnology Information (NCBI) containing - millions of chemical structures and their associated data. + This function searches the PubChem database to find the canonical SMILES representation + of a chemical compound based on its common name, IUPAC name, or other identifiers. + PubChem is a comprehensive chemical database maintained by the NIH that contains + millions of chemical structures and their properties. Args: - chemical_name (str): The name of the chemical compound to search for. - Can be a common name (e.g., "aspirin", "caffeine"), - IUPAC name, or other chemical identifier. + compound_name (str): The name of the chemical compound to search for. + This can be a common name (e.g., "aspirin", "caffeine"), + IUPAC name, trade name, or other chemical identifier. Returns: str: The canonical SMILES string of the compound as found in PubChem. Raises: - ValueError: If no compound is found for the given name or if multiple - ambiguous results are returned without a clear match. + ValueError: If the compound name is not found in PubChem or if no valid + SMILES string could be retrieved. + ConnectionError: If there's a network issue connecting to PubChem servers. Example: - >>> smiles = name_to_smiles("aspirin") - >>> # Returns "CC(=O)OC1=CC=CC=C1C(=O)O" - >>> smiles = name_to_smiles("caffeine") - >>> # Returns "CN1C=NC2=C1C(=O)N(C(=O)N2C)C" + >>> smiles = get_smiles_from_name("aspirin") + >>> print(smiles) + "CC(=O)OC1=CC=CC=C1C(=O)O" + + >>> smiles = get_smiles_from_name("caffeine") + >>> print(smiles) + "CN1C=NC2=C1C(=O)N(C(=O)N2C)C" """ - logger.info(f"[TASK] [NAME_TO_SMILES] Arguments: chemical_name: {chemical_name}") + logger.info(f"[TASK] [GET_SMILES_FROM_NAME] Arguments: compound_name: {compound_name}") - if not chemical_name or not chemical_name.strip(): - logger.error("Empty or invalid chemical name provided") - raise ValueError("Chemical name cannot be empty") + if not compound_name or not compound_name.strip(): + logger.error("Empty or invalid compound name provided") + raise ValueError("Compound name cannot be empty") - chemical_name = chemical_name.strip() + compound_name = compound_name.strip() try: # Search for the compound by name - compounds = pcp.get_compounds(chemical_name, "name") + compounds = pcp.get_compounds(compound_name, 'name') if not compounds: - logger.warning(f"No compounds found for name: {chemical_name}") - raise ValueError(f"No compound found for name: {chemical_name}") + logger.warning(f"No compounds found for name: {compound_name}") + raise ValueError(f"No compound found in PubChem for name: '{compound_name}'") # Get the first (most relevant) compound compound = compounds[0] - # Get canonical SMILES + # Retrieve the canonical SMILES smiles = compound.canonical_smiles if not smiles: - logger.error(f"No SMILES found for compound: {chemical_name}") - raise ValueError(f"No SMILES representation found for: {chemical_name}") + logger.error(f"No SMILES found for compound: {compound_name}") + raise ValueError(f"No SMILES string available for compound: '{compound_name}'") - logger.info(f"Found SMILES for '{chemical_name}': {smiles}") + logger.info(f"Successfully retrieved SMILES for {compound_name}: {smiles}") logger.debug(f"PubChem CID: {compound.cid}") return smiles except Exception as e: if isinstance(e, ValueError): - raise - logger.error(f"Error searching PubChem for '{chemical_name}': {str(e)}") - raise ValueError(f"Failed to retrieve SMILES for '{chemical_name}': {str(e)}") from e + raise # Re-raise ValueError as-is + + logger.error(f"Error retrieving SMILES for {compound_name}: {str(e)}") + + # Check if it's a network-related error + if "connection" in str(e).lower() or "network" in str(e).lower(): + raise ConnectionError(f"Failed to connect to PubChem: {str(e)}") + + # For other exceptions, wrap in ValueError + raise ValueError(f"Failed to retrieve SMILES for '{compound_name}': {str(e)}") + + +def get_compound_info(compound_name: str) -> dict[str, Optional[str]]: + """ + Retrieve comprehensive information about a chemical compound from PubChem. + + This function provides additional chemical information beyond just the SMILES string, + including molecular formula, molecular weight, IUPAC name, and other identifiers. + + Args: + compound_name (str): The name of the chemical compound to search for. + + Returns: + dict[str, Optional[str]]: A dictionary containing compound information with keys: + - 'smiles': Canonical SMILES string + - 'molecular_formula': Molecular formula + - 'molecular_weight': Molecular weight in g/mol + - 'iupac_name': IUPAC systematic name + - 'cid': PubChem Compound ID + - 'synonyms': List of alternative names (first 5) + + Raises: + ValueError: If the compound name is not found in PubChem. + ConnectionError: If there's a network issue connecting to PubChem servers. + + Example: + >>> info = get_compound_info("aspirin") + >>> print(info['smiles']) + "CC(=O)OC1=CC=CC=C1C(=O)O" + >>> print(info['molecular_formula']) + "C9H8O4" + """ + logger.info(f"[TASK] [GET_COMPOUND_INFO] Arguments: compound_name: {compound_name}") + + if not compound_name or not compound_name.strip(): + logger.error("Empty or invalid compound name provided") + raise ValueError("Compound name cannot be empty") + + compound_name = compound_name.strip() + + try: + # Search for the compound by name + compounds = pcp.get_compounds(compound_name, 'name') + + if not compounds: + logger.warning(f"No compounds found for name: {compound_name}") + raise ValueError(f"No compound found in PubChem for name: '{compound_name}'") + + # Get the first (most relevant) compound + compound = compounds[0] + + # Extract comprehensive information + info = { + 'smiles': getattr(compound, 'canonical_smiles', None), + 'molecular_formula': getattr(compound, 'molecular_formula', None), + 'molecular_weight': str(getattr(compound, 'molecular_weight', None)) if hasattr(compound, + 'molecular_weight') else None, + 'iupac_name': getattr(compound, 'iupac_name', None), + 'cid': str(getattr(compound, 'cid', None)) if hasattr(compound, 'cid') else None, + 'synonyms': getattr(compound, 'synonyms', [])[:5] if hasattr(compound, 'synonyms') else [] + } + + logger.info(f"Successfully retrieved compound info for {compound_name}") + logger.debug(f"Compound info: {info}") + + return info + + except Exception as e: + if isinstance(e, ValueError): + raise # Re-raise ValueError as-is + + logger.error(f"Error retrieving compound info for {compound_name}: {str(e)}") + + # Check if it's a network-related error + if "connection" in str(e).lower() or "network" in str(e).lower(): + raise ConnectionError(f"Failed to connect to PubChem: {str(e)}") + + # For other exceptions, wrap in ValueError + raise ValueError(f"Failed to retrieve compound info for '{compound_name}': {str(e)}") \ No newline at end of file From 0d3bab859b8b8ec242f448c831738203576717a6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomasz=20D=C4=85dela?= Date: Mon, 9 Jun 2025 00:08:15 +0200 Subject: [PATCH 03/11] [feat] pubchempy get_name_from_smiles --- src/reagentai/agents/main/instructions.txt | 6 ++ src/reagentai/agents/main/main_agent.py | 9 ++- src/reagentai/tools/pubchem.py | 74 ++++++++++++++++++---- 3 files changed, 75 insertions(+), 14 deletions(-) diff --git a/src/reagentai/agents/main/instructions.txt b/src/reagentai/agents/main/instructions.txt index e727113..4a9edb3 100644 --- a/src/reagentai/agents/main/instructions.txt +++ b/src/reagentai/agents/main/instructions.txt @@ -33,6 +33,7 @@ You are ReAgentAI, an advanced and highly precise chemical assistant. Your prima * When an image tool is used (`smiles_to_image` or `route_to_image`), state that an image has been successfully generated and describe what it depicts. * When presenting `find_similar_molecules` results, list the similar molecules with their SMILES strings and similarity scores, explaining what the Tanimoto similarity coefficient represents. * When using `get_smiles_from_name` or `get_compound_info`, clearly present the retrieved information and mention that it comes from PubChem database. + * When using `get_name_from_smiles`, present the retrieved chemical name and specify whether it is the IUPAC name or a synonym. **Available Tools and Their Specific Usage Directives:** @@ -80,5 +81,10 @@ You are ReAgentAI, an advanced and highly precise chemical assistant. Your prima * `target_smiles_list` (optional): A list of SMILES strings to search against. If not provided, defaults to a curated dataset of ~16,000 drug-like molecules commonly used in chemical informatics. * `top_n` (optional): The number of most similar molecules to return (defaults to 5). * **Output Interpretation:** Returns a list of tuples containing SMILES strings and their Tanimoto similarity scores (0-1 range, where 1 indicates identical molecules and 0 indicates completely dissimilar). Present the results clearly, explaining that higher scores indicate greater structural similarity. +* **`get_name_from_smiles`:** + * **Purpose:** To retrieve the best-matching chemical name (IUPAC or synonym) for a given SMILES string using the PubChem database. + * **Usage:** Use this tool whenever you need to convert a SMILES string to a human-readable chemical name, or when a user requests the name for a given SMILES. This is the authoritative method for SMILES-to-name conversion. + * **Input:** The `smiles` argument should be the SMILES string provided by the user or obtained from another tool. + * **Output Interpretation:** Returns the IUPAC name if available, otherwise the first synonym from PubChem. Clearly present the name to the user and mention that it comes from the PubChem database. Your responses should always be professional, clear, and reflect your expert chemical knowledge, meticulously supported by your tool usage. \ No newline at end of file diff --git a/src/reagentai/agents/main/main_agent.py b/src/reagentai/agents/main/main_agent.py index 35e24ed..3633ffc 100644 --- a/src/reagentai/agents/main/main_agent.py +++ b/src/reagentai/agents/main/main_agent.py @@ -10,7 +10,11 @@ from src.reagentai.common.aizynthfinder import initialize_aizynthfinder from src.reagentai.constants import AIZYNTHFINDER_CONFIG_PATH from src.reagentai.tools.image import route_to_image, smiles_to_image -from src.reagentai.tools.pubchem import get_smiles_from_name, get_compound_info +from src.reagentai.tools.pubchem import ( + get_compound_info, + get_name_from_smiles, + get_smiles_from_name, +) from src.reagentai.tools.retrosynthesis import perform_retrosynthesis from src.reagentai.tools.smiles import find_similar_molecules, is_valid_smiles @@ -160,6 +164,7 @@ def create_main_agent() -> MainAgent: Tool(find_similar_molecules), Tool(get_smiles_from_name), Tool(get_compound_info), + Tool(get_name_from_smiles), duckduckgo_search_tool(), ] @@ -180,4 +185,4 @@ def create_main_agent() -> MainAgent: output_type=str, ) - return main_agent \ No newline at end of file + return main_agent diff --git a/src/reagentai/tools/pubchem.py b/src/reagentai/tools/pubchem.py index 9a79113..d582f9f 100644 --- a/src/reagentai/tools/pubchem.py +++ b/src/reagentai/tools/pubchem.py @@ -1,5 +1,4 @@ import logging -from typing import Optional import pubchempy as pcp @@ -47,7 +46,7 @@ def get_smiles_from_name(compound_name: str) -> str: try: # Search for the compound by name - compounds = pcp.get_compounds(compound_name, 'name') + compounds = pcp.get_compounds(compound_name, "name") if not compounds: logger.warning(f"No compounds found for name: {compound_name}") @@ -82,7 +81,7 @@ def get_smiles_from_name(compound_name: str) -> str: raise ValueError(f"Failed to retrieve SMILES for '{compound_name}': {str(e)}") -def get_compound_info(compound_name: str) -> dict[str, Optional[str]]: +def get_compound_info(compound_name: str) -> dict[str, str | None]: """ Retrieve comprehensive information about a chemical compound from PubChem. @@ -122,7 +121,7 @@ def get_compound_info(compound_name: str) -> dict[str, Optional[str]]: try: # Search for the compound by name - compounds = pcp.get_compounds(compound_name, 'name') + compounds = pcp.get_compounds(compound_name, "name") if not compounds: logger.warning(f"No compounds found for name: {compound_name}") @@ -133,13 +132,16 @@ def get_compound_info(compound_name: str) -> dict[str, Optional[str]]: # Extract comprehensive information info = { - 'smiles': getattr(compound, 'canonical_smiles', None), - 'molecular_formula': getattr(compound, 'molecular_formula', None), - 'molecular_weight': str(getattr(compound, 'molecular_weight', None)) if hasattr(compound, - 'molecular_weight') else None, - 'iupac_name': getattr(compound, 'iupac_name', None), - 'cid': str(getattr(compound, 'cid', None)) if hasattr(compound, 'cid') else None, - 'synonyms': getattr(compound, 'synonyms', [])[:5] if hasattr(compound, 'synonyms') else [] + "smiles": getattr(compound, "canonical_smiles", None), + "molecular_formula": getattr(compound, "molecular_formula", None), + "molecular_weight": str(getattr(compound, "molecular_weight", None)) + if hasattr(compound, "molecular_weight") + else None, + "iupac_name": getattr(compound, "iupac_name", None), + "cid": str(getattr(compound, "cid", None)) if hasattr(compound, "cid") else None, + "synonyms": getattr(compound, "synonyms", [])[:5] + if hasattr(compound, "synonyms") + else [], } logger.info(f"Successfully retrieved compound info for {compound_name}") @@ -158,4 +160,52 @@ def get_compound_info(compound_name: str) -> dict[str, Optional[str]]: raise ConnectionError(f"Failed to connect to PubChem: {str(e)}") # For other exceptions, wrap in ValueError - raise ValueError(f"Failed to retrieve compound info for '{compound_name}': {str(e)}") \ No newline at end of file + raise ValueError(f"Failed to retrieve compound info for '{compound_name}': {str(e)}") from e + + +def get_name_from_smiles(smiles: str) -> str: + """ + Retrieve the best-matching chemical name for a given SMILES string using PubChem. + + Args: + smiles (str): The SMILES string of the compound. + + Returns: + str: The best-matching chemical name (IUPAC or synonym) from PubChem. + + Raises: + ValueError: If no compound is found for the SMILES or no name is available. + ConnectionError: If there's a network issue connecting to PubChem servers. + """ + logger.info(f"[TASK] [GET_NAME_FROM_SMILES] Arguments: smiles: {smiles}") + + if not smiles or not smiles.strip(): + logger.error("Empty or invalid SMILES provided") + raise ValueError("SMILES string cannot be empty") + + smiles = smiles.strip() + + try: + compounds = pcp.get_compounds(smiles, "smiles") + if not compounds: + logger.warning(f"No compounds found for SMILES: {smiles}") + raise ValueError(f"No compound found in PubChem for SMILES: '{smiles}'") + compound = compounds[0] + # Prefer IUPAC name, fall back to first synonym + name = getattr(compound, "iupac_name", None) + if not name: + synonyms = getattr(compound, "synonyms", []) + if synonyms: + name = synonyms[0] + if not name: + logger.error(f"No name found for SMILES: {smiles}") + raise ValueError(f"No name available for SMILES: '{smiles}'") + logger.info(f"Successfully retrieved name for SMILES {smiles}: {name}") + return name + except Exception as e: + if isinstance(e, ValueError): + raise + logger.error(f"Error retrieving name for SMILES {smiles}: {str(e)}") + if "connection" in str(e).lower() or "network" in str(e).lower(): + raise ConnectionError(f"Failed to connect to PubChem: {str(e)}") from e + raise ValueError(f"Failed to retrieve name for SMILES '{smiles}': {str(e)}") from e From 715f433d067a71557cf196e85fbce32fc584f355 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomasz=20D=C4=85dela?= Date: Mon, 9 Jun 2025 00:37:13 +0200 Subject: [PATCH 04/11] [feat] pubchem return input if valid SMILES is passed to get_smiles_from_name --- src/reagentai/tools/pubchem.py | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/reagentai/tools/pubchem.py b/src/reagentai/tools/pubchem.py index d582f9f..da91795 100644 --- a/src/reagentai/tools/pubchem.py +++ b/src/reagentai/tools/pubchem.py @@ -1,6 +1,7 @@ import logging import pubchempy as pcp +from src.reagentai.tools.smiles import is_valid_smiles logger = logging.getLogger(__name__) @@ -44,6 +45,11 @@ def get_smiles_from_name(compound_name: str) -> str: compound_name = compound_name.strip() + # Check if the input is already a valid SMILES string + if is_valid_smiles(compound_name): + logger.info(f"Input appears to be a valid SMILES, returning as is: {compound_name}") + return compound_name + try: # Search for the compound by name compounds = pcp.get_compounds(compound_name, "name") From e7b3367bed4bd1b59d17c9ef135f40627eea7936 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomasz=20D=C4=85dela?= Date: Mon, 9 Jun 2025 00:38:22 +0200 Subject: [PATCH 05/11] [fix] apply ruff rules --- src/reagentai/tools/pubchem.py | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/reagentai/tools/pubchem.py b/src/reagentai/tools/pubchem.py index da91795..cb19517 100644 --- a/src/reagentai/tools/pubchem.py +++ b/src/reagentai/tools/pubchem.py @@ -1,6 +1,7 @@ import logging import pubchempy as pcp + from src.reagentai.tools.smiles import is_valid_smiles logger = logging.getLogger(__name__) @@ -81,10 +82,10 @@ def get_smiles_from_name(compound_name: str) -> str: # Check if it's a network-related error if "connection" in str(e).lower() or "network" in str(e).lower(): - raise ConnectionError(f"Failed to connect to PubChem: {str(e)}") + raise ConnectionError(f"Failed to connect to PubChem: {str(e)}") from e # For other exceptions, wrap in ValueError - raise ValueError(f"Failed to retrieve SMILES for '{compound_name}': {str(e)}") + raise ValueError(f"Failed to retrieve SMILES for '{compound_name}': {str(e)}") from e def get_compound_info(compound_name: str) -> dict[str, str | None]: @@ -163,10 +164,12 @@ def get_compound_info(compound_name: str) -> dict[str, str | None]: # Check if it's a network-related error if "connection" in str(e).lower() or "network" in str(e).lower(): - raise ConnectionError(f"Failed to connect to PubChem: {str(e)}") + raise ConnectionError(f"Failed to connect to PubChem: {str(e)}") from e # For other exceptions, wrap in ValueError - raise ValueError(f"Failed to retrieve compound info for '{compound_name}': {str(e)}") from e + raise ValueError( + f"Failed to retrieve compound info for '{compound_name}': {str(e)}" + ) from e def get_name_from_smiles(smiles: str) -> str: From ee440876fc627deaf0187fd5858c15eb0ed719d3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomasz=20D=C4=85dela?= Date: Mon, 9 Jun 2025 01:12:15 +0200 Subject: [PATCH 06/11] [fix] re-add pubchempy info to instructions.txt --- src/reagentai/agents/main/instructions.txt | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/src/reagentai/agents/main/instructions.txt b/src/reagentai/agents/main/instructions.txt index bee8699..60a6d80 100644 --- a/src/reagentai/agents/main/instructions.txt +++ b/src/reagentai/agents/main/instructions.txt @@ -1,4 +1,3 @@ - You are ReAgentAI, an advanced and highly precise chemical assistant. Your primary function is to answer chemistry-related questions, perform retrosynthesis, and visualize chemical structures and reaction pathways. Your core principle is to always use your available tools to provide accurate, reliable, and thoroughly grounded information. **Core Responsibilities:** @@ -64,6 +63,21 @@ You are ReAgentAI, an advanced and highly precise chemical assistant. Your prima * `target_smiles_list` (optional): A list of SMILES strings to search against. If not provided, defaults to a curated dataset of ~16,000 drug-like molecules commonly used in chemical informatics. * `top_n` (optional): The number of most similar molecules to return (defaults to 5). * **Output Interpretation:** Returns a list of tuples containing SMILES strings and their Tanimoto similarity scores (0-1 range, where 1 indicates identical molecules and 0 indicates completely dissimilar). Present the results clearly, explaining that higher scores indicate greater structural similarity. +* **`get_smiles_from_name`:** + * **Purpose:** Retrieve the canonical SMILES string for a chemical compound using its common or IUPAC name (via PubChem database). + * **Usage:** Use this tool when a user provides a chemical name and you need its SMILES. If the input is already a valid SMILES, it will be returned as is. + * **Input:** `compound_name` (string) + * **Output Interpretation:** Returns the canonical SMILES string from PubChem, or the input if it is already a valid SMILES. +* **`get_compound_info`:** + * **Purpose:** Retrieve detailed chemical information from PubChem, including SMILES, molecular formula, molecular weight, IUPAC name, and synonyms. + * **Usage:** Use this tool when the user asks for detailed information about a compound by name. + * **Input:** `compound_name` (string) + * **Output Interpretation:** Returns a dictionary with keys like 'smiles', 'molecular_formula', 'molecular_weight', 'iupac_name', 'cid', and 'synonyms'. +* **`get_name_from_smiles`:** + * **Purpose:** Find the best-matching chemical name (IUPAC or synonym) for a given SMILES string using PubChem. + * **Usage:** Use this tool when you have a SMILES and need to present a human-readable name for it. + * **Input:** `smiles` (string) + * **Output Interpretation:** Returns the IUPAC name if available, otherwise the first synonym from PubChem. Your responses should always be professional, clear, and reflect your expert chemical knowledge, meticulously supported by your tool usage. From 95552b818334c000a59f7b8273236c8ebcbaa4f1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomasz=20D=C4=85dela?= Date: Mon, 9 Jun 2025 01:17:32 +0200 Subject: [PATCH 07/11] [fix] minor typing --- src/reagentai/tools/pubchem.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/reagentai/tools/pubchem.py b/src/reagentai/tools/pubchem.py index cb19517..6edb32a 100644 --- a/src/reagentai/tools/pubchem.py +++ b/src/reagentai/tools/pubchem.py @@ -88,7 +88,7 @@ def get_smiles_from_name(compound_name: str) -> str: raise ValueError(f"Failed to retrieve SMILES for '{compound_name}': {str(e)}") from e -def get_compound_info(compound_name: str) -> dict[str, str | None]: +def get_compound_info(compound_name: str) -> dict[str, str | list | None]: """ Retrieve comprehensive information about a chemical compound from PubChem. From 6702ea048398503b0ad3d751be42d7b8b2026dff Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomasz=20D=C4=85dela?= Date: Mon, 9 Jun 2025 02:10:39 +0200 Subject: [PATCH 08/11] [fix] raise ModelRetry inside pubchempy tools instead of others --- src/reagentai/tools/pubchem.py | 68 +++++++++++++++++----------------- 1 file changed, 35 insertions(+), 33 deletions(-) diff --git a/src/reagentai/tools/pubchem.py b/src/reagentai/tools/pubchem.py index 6edb32a..1ce3c7a 100644 --- a/src/reagentai/tools/pubchem.py +++ b/src/reagentai/tools/pubchem.py @@ -1,6 +1,7 @@ import logging import pubchempy as pcp +from pydantic_ai.exceptions import ModelRetry from src.reagentai.tools.smiles import is_valid_smiles @@ -25,9 +26,8 @@ def get_smiles_from_name(compound_name: str) -> str: str: The canonical SMILES string of the compound as found in PubChem. Raises: - ValueError: If the compound name is not found in PubChem or if no valid - SMILES string could be retrieved. - ConnectionError: If there's a network issue connecting to PubChem servers. + ModelRetry: If the compound name is not found in PubChem, if no valid + SMILES string could be retrieved, or if there's a network issue. Example: >>> smiles = get_smiles_from_name("aspirin") @@ -42,7 +42,7 @@ def get_smiles_from_name(compound_name: str) -> str: if not compound_name or not compound_name.strip(): logger.error("Empty or invalid compound name provided") - raise ValueError("Compound name cannot be empty") + raise ModelRetry("Compound name cannot be empty") compound_name = compound_name.strip() @@ -57,7 +57,7 @@ def get_smiles_from_name(compound_name: str) -> str: if not compounds: logger.warning(f"No compounds found for name: {compound_name}") - raise ValueError(f"No compound found in PubChem for name: '{compound_name}'") + raise ModelRetry(f"No compound found in PubChem for name: '{compound_name}'") # Get the first (most relevant) compound compound = compounds[0] @@ -67,7 +67,7 @@ def get_smiles_from_name(compound_name: str) -> str: if not smiles: logger.error(f"No SMILES found for compound: {compound_name}") - raise ValueError(f"No SMILES string available for compound: '{compound_name}'") + raise ModelRetry(f"No SMILES string available for compound: '{compound_name}'") logger.info(f"Successfully retrieved SMILES for {compound_name}: {smiles}") logger.debug(f"PubChem CID: {compound.cid}") @@ -75,17 +75,17 @@ def get_smiles_from_name(compound_name: str) -> str: return smiles except Exception as e: - if isinstance(e, ValueError): - raise # Re-raise ValueError as-is + if isinstance(e, ModelRetry): + raise # Re-raise ModelRetry as-is logger.error(f"Error retrieving SMILES for {compound_name}: {str(e)}") - # Check if it's a network-related error + # For all exceptions, wrap in ModelRetry + error_msg = f"Failed to retrieve SMILES for '{compound_name}': {str(e)}" if "connection" in str(e).lower() or "network" in str(e).lower(): - raise ConnectionError(f"Failed to connect to PubChem: {str(e)}") from e + error_msg = f"Failed to connect to PubChem: {str(e)}" - # For other exceptions, wrap in ValueError - raise ValueError(f"Failed to retrieve SMILES for '{compound_name}': {str(e)}") from e + raise ModelRetry(error_msg) from e def get_compound_info(compound_name: str) -> dict[str, str | list | None]: @@ -108,8 +108,7 @@ def get_compound_info(compound_name: str) -> dict[str, str | list | None]: - 'synonyms': List of alternative names (first 5) Raises: - ValueError: If the compound name is not found in PubChem. - ConnectionError: If there's a network issue connecting to PubChem servers. + ModelRetry: If the compound name is not found in PubChem or if there's a network issue. Example: >>> info = get_compound_info("aspirin") @@ -122,7 +121,7 @@ def get_compound_info(compound_name: str) -> dict[str, str | list | None]: if not compound_name or not compound_name.strip(): logger.error("Empty or invalid compound name provided") - raise ValueError("Compound name cannot be empty") + raise ModelRetry("Compound name cannot be empty") compound_name = compound_name.strip() @@ -132,7 +131,7 @@ def get_compound_info(compound_name: str) -> dict[str, str | list | None]: if not compounds: logger.warning(f"No compounds found for name: {compound_name}") - raise ValueError(f"No compound found in PubChem for name: '{compound_name}'") + raise ModelRetry(f"No compound found in PubChem for name: '{compound_name}'") # Get the first (most relevant) compound compound = compounds[0] @@ -157,19 +156,17 @@ def get_compound_info(compound_name: str) -> dict[str, str | list | None]: return info except Exception as e: - if isinstance(e, ValueError): - raise # Re-raise ValueError as-is + if isinstance(e, ModelRetry): + raise # Re-raise ModelRetry as-is logger.error(f"Error retrieving compound info for {compound_name}: {str(e)}") - # Check if it's a network-related error + # For all exceptions, wrap in ModelRetry + error_msg = f"Failed to retrieve compound info for '{compound_name}': {str(e)}" if "connection" in str(e).lower() or "network" in str(e).lower(): - raise ConnectionError(f"Failed to connect to PubChem: {str(e)}") from e + error_msg = f"Failed to connect to PubChem: {str(e)}" - # For other exceptions, wrap in ValueError - raise ValueError( - f"Failed to retrieve compound info for '{compound_name}': {str(e)}" - ) from e + raise ModelRetry(error_msg) from e def get_name_from_smiles(smiles: str) -> str: @@ -183,14 +180,14 @@ def get_name_from_smiles(smiles: str) -> str: str: The best-matching chemical name (IUPAC or synonym) from PubChem. Raises: - ValueError: If no compound is found for the SMILES or no name is available. - ConnectionError: If there's a network issue connecting to PubChem servers. + ModelRetry: If no compound is found for the SMILES, if no name is available, + or if there's a network issue connecting to PubChem servers. """ logger.info(f"[TASK] [GET_NAME_FROM_SMILES] Arguments: smiles: {smiles}") if not smiles or not smiles.strip(): logger.error("Empty or invalid SMILES provided") - raise ValueError("SMILES string cannot be empty") + raise ModelRetry("SMILES string cannot be empty") smiles = smiles.strip() @@ -198,7 +195,7 @@ def get_name_from_smiles(smiles: str) -> str: compounds = pcp.get_compounds(smiles, "smiles") if not compounds: logger.warning(f"No compounds found for SMILES: {smiles}") - raise ValueError(f"No compound found in PubChem for SMILES: '{smiles}'") + raise ModelRetry(f"No compound found in PubChem for SMILES: '{smiles}'") compound = compounds[0] # Prefer IUPAC name, fall back to first synonym name = getattr(compound, "iupac_name", None) @@ -208,13 +205,18 @@ def get_name_from_smiles(smiles: str) -> str: name = synonyms[0] if not name: logger.error(f"No name found for SMILES: {smiles}") - raise ValueError(f"No name available for SMILES: '{smiles}'") + raise ModelRetry(f"No name available for SMILES: '{smiles}'") logger.info(f"Successfully retrieved name for SMILES {smiles}: {name}") return name except Exception as e: - if isinstance(e, ValueError): - raise + if isinstance(e, ModelRetry): + raise # Re-raise ModelRetry as-is + logger.error(f"Error retrieving name for SMILES {smiles}: {str(e)}") + + # For all exceptions, wrap in ModelRetry + error_msg = f"Failed to retrieve name for SMILES '{smiles}': {str(e)}" if "connection" in str(e).lower() or "network" in str(e).lower(): - raise ConnectionError(f"Failed to connect to PubChem: {str(e)}") from e - raise ValueError(f"Failed to retrieve name for SMILES '{smiles}': {str(e)}") from e + error_msg = f"Failed to connect to PubChem: {str(e)}" + + raise ModelRetry(error_msg) from e From b5bdae461c8c3c0e6d6cd2febdc2c607ebd7eb88 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomasz=20D=C4=85dela?= Date: Mon, 9 Jun 2025 02:15:30 +0200 Subject: [PATCH 09/11] [feat] pubchempy example prompts --- src/reagentai/constants.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/reagentai/constants.py b/src/reagentai/constants.py index d9639b8..0b7b4d2 100644 --- a/src/reagentai/constants.py +++ b/src/reagentai/constants.py @@ -11,6 +11,9 @@ "How to synthesize Aspirin? Can u tell me the best steps to achieve this?", "Suggest a retrosynthesis for Ibuprofen. Show all molecule images from the best route.", "Find molecules similar to Aspirin. Show the top 5.", + "What is the IUPAC name and molecular formula of Paracetamol?", + "Convert this SMILES to a chemical name: CC(=O)OC1=CC=CC=C1C(=O)O", + "Tell me the detailed properties of Gabapentin.", ] DEFAULT_LOG_LEVEL: int = INFO From 4658fa62634cc3d670f75d415e0c6689af495f68 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomasz=20D=C4=85dela?= Date: Mon, 9 Jun 2025 03:02:25 +0200 Subject: [PATCH 10/11] [refactor] swap examples and usage history --- src/reagentai/ui/app.py | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/reagentai/ui/app.py b/src/reagentai/ui/app.py index 85016b8..d98fcd0 100644 --- a/src/reagentai/ui/app.py +++ b/src/reagentai/ui/app.py @@ -37,6 +37,12 @@ def create_settings_panel( visible=True, ) + gr.Examples( + examples=EXAMPLE_PROMPTS, + inputs=chat_input_component, + label="Example Prompts", + ) + gr.Markdown("### Tool Usage History") tool_display = gr.Chatbot( type="messages", @@ -45,12 +51,6 @@ def create_settings_panel( elem_id="tool_display", ) - gr.Examples( - examples=EXAMPLE_PROMPTS, - inputs=chat_input_component, - label="Example Prompts", - ) - return model_dropdown, usage_counter, tool_display From fca6fcf44bb586b46b48c43cb9829d6750842fe3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Tomasz=20D=C4=85dela?= Date: Mon, 9 Jun 2025 03:08:33 +0200 Subject: [PATCH 11/11] [chore] add pubchempy info to readme --- README.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index de6bddf..98bec34 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,3 @@ - # ReAgentAI @@ -15,11 +14,14 @@ ReAgentAI is an advanced chemical assistant powered by AI that provides comprehe - **Molecular Visualization**: Create high-quality images of chemical structures and reaction pathways - **Similarity Search**: Find structurally similar molecules using molecular fingerprints and Tanimoto similarity - **SMILES Validation**: Verify and validate SMILES strings for chemical accuracy +- **Chemical Information Retrieval**: Access comprehensive chemical data through PubChem integration +- **Chemical Name/SMILES Conversion**: Convert between chemical names and SMILES using authoritative PubChem database - **Chemical Knowledge**: Access comprehensive chemistry information through integrated web search -### Datasets & Models +### Integrated Databases & Models - **USPTO-trained models**: Leveraging one of the largest chemical reaction databases - **ZINC stock collection**: Access to commercially available compounds +- **PubChem database**: Integration with the NIH's comprehensive chemical database - **Curated molecular datasets**: ~16,000 drug-like molecules for similarity searches ## 🛠 Setup @@ -85,6 +87,9 @@ ReAgentAI supports various chemistry-related queries: - **Molecular similarity**: "Find molecules similar to ethanol" or "What compounds are structurally related to benzene?" - **Structure visualization**: "Show me the structure of morphine" or "Generate an image of the synthesis route" - **Chemical validation**: "Is this SMILES string valid: CCO?" +- **Chemical information**: "What is the IUPAC name and molecular formula of Paracetamol?" +- **Name to SMILES**: "What is the SMILES string for Gabapentin?" +- **SMILES to name**: "What chemical name corresponds to this SMILES: CN1C=NC2=C1C(=O)N(C(=O)N2C)C?" - **General chemistry**: "What are the properties of acetaminophen?" ## 🔧 Architecture @@ -93,6 +98,7 @@ ReAgentAI is built with: - **Pydantic AI**: For robust AI agent framework - **RDKit**: Chemical informatics and molecular manipulation - **AiZynthFinder**: Retrosynthetic analysis engine +- **PubChemPy**: Interface for accessing the PubChem database - **Google Gemini**: Large language model for natural language processing - **Gradio**: User-friendly web interface