binome-dev · GuanyiLi-Craig · Oct 24, 2025 · Oct 24, 2025 · Oct 24, 2025 · Oct 24, 2025
diff --git a/docs/docs/user-guide/tools/function-call.md b/docs/docs/user-guide/tools/function-call.md
@@ -296,6 +296,56 @@ This tool automatically discovers available functions from connected MCP servers
 
 By integrating `AgentCallingTool` into your event-driven workflow, you can build sophisticated multi-agent systems where each agent can be invoked seamlessly via structured function calls. This approach maintains a clear separation between the LLM's orchestration and the agents' invoke details.
 
+## Synthetic Tool
+
+`SyntheticTool` extends the `FunctionCallTool` that enables LLMs to generate synthetic or modeled data by leveraging another LLM as a data generator. Unlike traditional function call tools that execute predefined logic, `SyntheticTool` uses an LLM to produce plausible, schema-compliant, outputs based on input specifications—perfect for testing, prototyping, or generating realistic mock data.
+
+**Fields**:
+
+| Field                  | Description                                                                                                        |
+|------------------------|--------------------------------------------------------------------------------------------------------------------|
+| `name`                | Descriptive identifier, defaults to `"SyntheticTool"`.                                                              |
+| `type`                | Tool type indicator, defaults to `"SyntheticTool"`.                                                                 |
+| `tool_name`           | Name used for function registration and LLM tool calls.                                                             |
+| `description`         | Explanation of what synthetic data this tool generates.                                                             |
+| `input_model`         | Pydantic `BaseModel` class or JSON schema dict defining expected input structure.                                   |
+| `output_model`        | Pydantic `BaseModel` class or JSON schema dict defining generated output structure.                                 |
+| `model`               | OpenAI model to use for data generation (e.g., `"gpt-4o-mini"`).                                                    |
+| `openai_api_key`      | API key for OpenAI authentication.                                                                                  |
+| `oi_span_type`        | OpenInference semantic attribute (`TOOL`), enabling observability and traceability.                                 |
+
+**Methods**:
+
+| Method           | Description                                                                                                                                                                                                 |
+|------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `get_function_specs` | Returns the function specification (name, description, input schema) for the synthetic. tool                                                                                                                  |
+| `invoke`      | Processes tool calls, generates synthetic data via LLM, returns schema-compliant JSON responses.                                                                                           |
+| `ensure_strict_schema`   | Static method that recursively adds `additionalProperties: false` to JSON schemas for OpenAI strict mode compatibility.                                                                                                                 |
+| `to_dict`        | Serializes all relevant fields, including agent metadata and the assigned callable, for debugging or persistence.                                                                                           |
+
+**Workflow Example**:
+
+1. **Schema Definition**: Define input and output schemas using either Pydantic models (type-safe Python) or JSON Schema dicts (flexible).
+2. **Function Registration**: The tool automatically generates the `FunctionSpec`, enabling LLMs to discover and call the tool.
+3. **Tool Invocation**: When an LLM invokes the tool with arguments:
+    - Arguments are validated against `input_model` schema
+    - A prompt is constructed with input/output schema specifications
+    - LLM generates synthetic data conforming to `output_model`
+4. **Structured Output**:
+    - **Pydantic Mode**: Uses OpenAI's `beta.chat.completions.parse()` with type safety
+    - **JSON Schema Mode**: Uses `chat.completions.create()` with `strict: True` for schema validation
+5. **Response Handling**: Generated data is returned as a `Message` object linked to the original `tool_call_id`.
+
+**Usage and Customization**:
+
+- **Flexible Schema Definition**: By supporting both Pydantic models and JSON schemas, you can choose type-safe Python development or dynamic schema-based configuration without changing the rest of your workflow.
+- **Runtime Model Selection**: Easily swap between OpenAI models (e.g., `gpt-5-mini` for cost, `gpt-5` for quality) to balance generation quality and API costs without modifying tool logic.
+- **Schema-Driven Generation**: Input and output schemas guide the LLM's data generation, ensuring consistent, validated outputs that conform to your exact specifications.
+- **Composable Data Pipelines**: Chain multiple `SyntheticTool` instances where one tool's output becomes another's input, creating sophisticated data generation workflows.
+
+With SyntheticTool, you can rapidly prototype data-driven workflows without building actual data sources, while maintaining full schema compliance and
+type safety through Pydantic or JSON Schema validation.
+
 ## Best Practices
 
 ### Function Design

diff --git a/grafi/tools/function_calls/impl/synthetic_tool.py b/grafi/tools/function_calls/impl/synthetic_tool.py
@@ -0,0 +1,340 @@
+import inspect
+import json
+from typing import Any
+from typing import AsyncGenerator
+from typing import Dict
+from typing import List
+
+from openai import OpenAIError
+from openinference.semconv.trace import OpenInferenceSpanKindValues
+from pydantic import BaseModel
+from pydantic import field_validator
+
+from grafi.common.decorators.record_decorators import record_tool_invoke
+from grafi.common.models.function_spec import FunctionSpec
+from grafi.common.models.function_spec import ParametersSchema
+from grafi.common.models.invoke_context import InvokeContext
+from grafi.common.models.message import Message
+from grafi.common.models.message import Messages
+from grafi.tools.function_calls.function_call_tool import FunctionCallTool
+from grafi.tools.function_calls.function_call_tool import FunctionCallToolBuilder
+
+
+try:
+    from openai import AsyncOpenAI
+except ImportError:
+    raise ImportError(
+        "`openai` not installed. Please install using `pip install openai`"
+    )
+
+
+class SyntheticTool(FunctionCallTool):
+    name: str = "SyntheticTool"
+    type: str = "SyntheticTool"
+    tool_name: str = ""
+    description: str = ""
+    input_model: Any = ""
+    output_model: Any = ""
+    model: str = ""
+    openai_api_key: str = ""
+    oi_span_type: OpenInferenceSpanKindValues = OpenInferenceSpanKindValues.TOOL
+
+    @field_validator("input_model", "output_model")
+    @classmethod
+    def validate_pydantic_model_or_schema(cls, v: Any, info) -> Any:
+        """
+        Validate that input_model and output_model are either:
+        - A Pydantic BaseModel class (not instance) - for type-safe Python usage
+        - A JSON schema dict - for flexible schema definition
+        - An empty string (for optional models)
+
+        Both Pydantic models and JSON schemas are fully supported for LLM invocation
+        with strict validation enabled.
+
+        Args:
+            v: The value to validate
+            info: Pydantic validation info containing field name
+
+        Returns:
+            The validated value
+
+        Raises:
+            ValueError: If the value is not a valid type (e.g., int, str, instances)
+        """
+        if v == "":
+            return v
+
+        if isinstance(v, dict):
+            return v
+
+        if inspect.isclass(v) and issubclass(v, BaseModel):
+            return v
+
+        field_name = info.field_name
+        raise ValueError(
+            f"{field_name} must be a Pydantic BaseModel class, "
+            f"a dict schema, or an empty string. "
+            f"Got: {type(v).__name__}"
+        )
+
+    def model_post_init(self, _context: Any) -> None:
+        if self.input_model:
+            # Handle both dict schemas and Pydantic models
+            if isinstance(self.input_model, dict):
+                input_schema = self.input_model
+            else:
+                input_schema = self.input_model.model_json_schema()
+
+            self.function_specs.append(
+                FunctionSpec(
+                    name=self.tool_name,
+                    description=self.description,
+                    parameters=ParametersSchema(**input_schema),
+                )
+            )
+
+    @property
+    def input_schema(self) -> Dict[str, Any]:
+        """Get input schema from Pydantic model."""
+        if self.input_model:
+            if isinstance(self.input_model, dict):
+                return self.input_model
+            return self.input_model.model_json_schema()
+        return {}
+
+    @property
+    def output_schema(self) -> Dict[str, Any]:
+        """Get output schema from Pydantic model."""
+        if self.output_model:
+            if isinstance(self.output_model, dict):
+                return self.output_model
+            return self.output_model.model_json_schema()
+        return {}
+
+    @classmethod
+    def builder(cls) -> "SyntheticToolBuilder":
+        """
+        Return a builder for SyntheticTool.
+        This method allows for the construction of an SyntheticTool instance with specified parameters.
+        """
+        return SyntheticToolBuilder(cls)
+
+    @record_tool_invoke
+    async def invoke(
+        self,
+        invoke_context: InvokeContext,
+        input_data: Messages,
+    ) -> AsyncGenerator[Messages, None]:
+        """
+        Invokes the synthetic tool by processing incoming tool calls and generating
+        LLM-based responses for each matching invocation.
+
+        Args:
+            invoke_context (InvokeContext): The context for this invocation.
+            input_data (Messages): A list of incoming messages that may contain tool calls.
+
+        Yields:
+            AsyncGenerator[Messages, None]: A stream of messages representing the
+            responses from the LLM for each valid tool call.
+
+        Raises:
+            ValueError: If no tool_calls are found in the input data.
+        """
+        input_msg = input_data[0]
+        if input_msg.tool_calls is None:
+            raise ValueError("No tool_calls found for SyntheticTool invocation.")
+
+        messages: List[Message] = []
+
+        for tool_call in input_msg.tool_calls:
+            if tool_call.function.name != self.tool_name:
+                continue
+
+            args = json.loads(tool_call.function.arguments)
+            prompt = self._make_prompt(args)
+            response = await self._call_llm(prompt)
+            messages.extend(
+                self.to_messages(response=response, tool_call_id=tool_call.id)
+            )
+
+        yield messages
+
+    def _make_prompt(self, user_input: Dict[str, Any]) -> str:
+        """Builds the synthetic execution prompt."""
+        return f"""
+            You are a synthetic tool named "{self.tool_name}".
+            Description: {self.description}
+
+            INPUT SCHEMA:
+            {json.dumps(self.input_schema, indent=2)}
+
+            OUTPUT SCHEMA:
+            {json.dumps(self.output_schema, indent=2)}
+
+            USER INPUT:
+            {json.dumps(user_input, indent=2)}
+
+            Return ONLY a JSON object that strictly conforms to the OUTPUT schema.
+        """
+
+    @staticmethod
+    def ensure_strict_schema(schema: Dict[str, Any]) -> Dict[str, Any]:
+        """
+        Recursively ensure schema is compatible with OpenAI strict mode.
+
+        Adds 'additionalProperties': false to all objects, which is required
+        for OpenAI's structured outputs strict mode.
+
+        Args:
+            schema: JSON schema dict
+
+        Returns:
+            Modified schema with strict mode requirements
+        """
+        schema = schema.copy()
+
+        if schema.get("type") == "object":
+            schema["additionalProperties"] = False
+            if "properties" in schema:
+                schema["properties"] = {
+                    k: SyntheticTool.ensure_strict_schema(v)
+                    for k, v in schema["properties"].items()
+                }
+        elif schema.get("type") == "array":
+            if "items" in schema:
+                schema["items"] = SyntheticTool.ensure_strict_schema(schema["items"])
+
+        return schema
+
+    async def _call_llm(self, prompt: str) -> str:
+        """
+        Calls OpenAI with structured output.
+        Supports both Pydantic models and JSON schemas.
+        """
+        try:
+            if not self.output_model:
+                raise ValueError("output_model must be set to call LLM")
+
+            client = AsyncOpenAI(api_key=self.openai_api_key)
+
+            # If output model is json (dict)
+            if isinstance(self.output_model, dict):
+                # Ensure schema is compatible with strict mode
+                strict_schema = self.ensure_strict_schema(self.output_model)
+
+                response_format = {
+                    "type": "json_schema",
+                    "json_schema": {
+                        "name": f"{self.tool_name}_output",
+                        "schema": strict_schema,
+                        "strict": True,
+                    },
+                }
+
+                # Use standard chat completion (not parse)
+                completion = await client.chat.completions.create(
+                    model=self.model,
+                    messages=[{"role": "user", "content": prompt}],
+                    response_format=response_format,
+                )
+
+                content = completion.choices[0].message.content
+                if not content:
+                    return json.dumps({"error": "Empty response"})
+
+                return content
+
+            # If output model is pydantic model
+            else:
+                # Use Pydantic mode with parse
+                completion = await client.beta.chat.completions.parse(
+                    model=self.model,
+                    messages=[{"role": "user", "content": prompt}],
+                    response_format=self.output_model,
+                )
+
+                parsed_response = completion.choices[0].message.parsed
+
+                if not parsed_response:
+                    return json.dumps({"error": "Empty response"})
+
+                # Return as JSON string
+                return parsed_response.model_dump_json()
+
+        except OpenAIError as exc:
+            return json.dumps({"error": f"OpenAI API error: {str(exc)}"})
+
+        except Exception as e:
+            return json.dumps({"error": f"LLM call failed: {str(e)}"})
+
+    def to_dict(self) -> Dict[str, Any]:
+        """
+        Convert the tool instance to a dictionary representation.
+
+        Returns:
+            Dict[str, Any]: A dictionary representation of the tool.
+        """
+        return {
+            **super().to_dict(),
+            "tool_name": self.tool_name,
+            "description": self.description,
+            "input_schema": self.input_schema,
+            "output_schema": self.output_schema,
+            "model": self.model,
+        }
+
+    @classmethod
+    async def from_dict(cls, data: Dict[str, Any]) -> "SyntheticTool":
+        """
+        Create a SyntheticTool instance from a dictionary representation.
+
+        Args:
+            data (dict[str, Any]): A dictionary representation of the SyntheticTool.
+
+        Returns:
+            SyntheticTool: A SyntheticTool instance created from the dictionary.
+
+        Note:
+            The client needs to be recreated with an API key from environment
+            or other secure source as API keys are masked in serialization.
+        """
+        return (
+            cls.builder()
+            .tool_name(data.get("tool_name", "synthetic_tool"))
+            .description(data.get("description", ""))
+            .input_model(data.get("input_schema", {}))
+            .output_model(data.get("output_schema", {}))
+            .model(data.get("model", "gpt-5-mini"))
+            .openai_api_key(data.get("openai_api_key", ""))
+            .oi_span_type(OpenInferenceSpanKindValues(data.get("oi_span_type", "TOOL")))
+            .build()
+        )
+
+
+class SyntheticToolBuilder(FunctionCallToolBuilder[SyntheticTool]):
+    """Builder for SyntheticTool instances."""
+
+    def tool_name(self, name: str) -> "SyntheticToolBuilder":
+        self.kwargs["tool_name"] = name
+        self.kwargs["name"] = name
+        return self
+
+    def description(self, desc: str) -> "SyntheticToolBuilder":
+        self.kwargs["description"] = desc
+        return self
+
+    def input_model(self, model: type[BaseModel]) -> "SyntheticToolBuilder":
+        self.kwargs["input_model"] = model
+        return self
+
+    def output_model(self, model: type[BaseModel]) -> "SyntheticToolBuilder":
+        self.kwargs["output_model"] = model
+        return self
+
+    def model(self, model: str) -> "SyntheticToolBuilder":
+        self.kwargs["model"] = model
+        return self
+
+    def openai_api_key(self, openai_api_key: str) -> "SyntheticToolBuilder":
+        self.kwargs["openai_api_key"] = openai_api_key
+        return self