OpenChatGPT is a high-end, production-ready, fully open-source, on-device AI intelligence system designed as a professional-grade ChatGPT-style assistant. It combines agentic reasoning, real-time web search, session-based memory, and a clean iMessage-inspired interface to deliver accurate, contextual, and polished conversational experiences.
The project demonstrates that high-quality, agentic AI experiences can be achieved entirely on local infrastructure. By combining stateful reasoning, real-time information retrieval, and a polished conversational interface, OpenChatGPT provides a practical, transparent alternative to proprietary AI platforms.
OpenChatGPT is built from the ground up with openness and user control as first-class principles:
- 100% open-source stack spanning UI, backend, orchestration, and model runtime
- Runs entirely on-device, keeping prompts, context, and reasoning local
- No SaaS model APIs, subscriptions, or hidden inference costs
- Full inspectability of agent logic, memory flow, and tool usage
This makes OpenChatGPT suitable for developers, researchers, and organizations that require transparency, data ownership, and long-term maintainability.
Despite being fully local and open, OpenChatGPT is engineered to deliver outcomes on par with commercial AI services:
- Large-scale local model inference via gpt-oss:20b
- Agentic reasoning with LangGraph for structured task completion
- Real-time web search and synthesis using Tavily
- Multi-turn session memory with consistent contextual awareness
- Professional, tool-agnostic response formatting
The result is a system capable of research, analysis, and conversational assistance that closely mirrors the experience of paid, cloud-hosted AI platforms.
OpenChatGPT does not rely on external model providers or remote inference:
- All LLM reasoning occurs locally through Ollama
- Memory and session state are stored in-process
- External calls are limited to optional search APIs, with no prompt leakage
- API keys are managed locally without environment exposure
This architecture ensures maximum privacy while preserving modern AI capabilities.
- Implemented using LangGraph
- Stateful, single-agent loop for structured task execution
- Designed for reliable multi-step reasoning and completion
- Powered by gpt-oss:20b via Ollama
- Local inference with high-quality, large-context responses
- No dependency on hosted proprietary model APIs
- Integrated Tavily API for live information retrieval
- Tool usage is fully abstracted from the user
- Responses are synthesized in a natural, professional tone
- Thread-based memory using an in-memory saver
- Maintains conversational context across multiple turns
- Supports multi-session usage without cross-contamination
-
Secure, local API key loading from:
TavilyKey.txtLangSmithKey.txt
-
No hard-coded secrets or environment variable leakage
- Clean, text-only chat interface built with Streamlit
- Right-aligned green user message bubbles
- Left-aligned blue AI message bubbles
- No icons or visual clutter
- Custom auto-scrolling for smooth conversational flow
- FastAPI-based backend service
- Hot-reloading enabled for rapid iteration
- Clean separation between UI, API, and agent logic
End-to-end request flow from user input to synthesized response.
flowchart TD
%% Nodes
U[User]
UI[Streamlit Frontend<br/>iMessage-style UI]
API[FastAPI Backend<br/>API Layer]
AGENT[LangGraph Agent<br/>Stateful Loop]
LLM[Ollama<br/>gpt-oss:20b]
SEARCH[Tavily API<br/>Real-Time Search]
MEM[In-Memory Saver<br/>Session Context]
%% Flow
U --> UI --> API --> AGENT
AGENT --> LLM
AGENT --> SEARCH
LLM --> MEM
SEARCH --> MEM
MEM --> AGENT
%% Styling
classDef user fill:#f5f5f5,stroke:#333,stroke-width:1px;
classDef ui fill:#e3f2fd,stroke:#1565c0,stroke-width:1.5px;
classDef backend fill:#ede7f6,stroke:#5e35b1,stroke-width:1.5px;
classDef agent fill:#e8f5e9,stroke:#2e7d32,stroke-width:1.5px;
classDef model fill:#fff3e0,stroke:#ef6c00,stroke-width:1.5px;
classDef tool fill:#fce4ec,stroke:#ad1457,stroke-width:1.5px;
classDef memory fill:#e0f2f1,stroke:#00695c,stroke-width:1.5px;
class U user;
class UI ui;
class API backend;
class AGENT agent;
class LLM model;
class SEARCH tool;
class MEM memory;
Single-agent, stateful reasoning and tool orchestration.
flowchart TD
INPUT[User Input]
AGENT[LangGraph Agent<br/>Stateful Node]
THINK[Reasoning Step<br/>Plan / Decide]
LLM[LLM Inference<br/>Ollama]
TOOL[Web Search<br/>Tavily]
MEMORY[Memory Update<br/>thread_id]
INPUT --> AGENT --> THINK
THINK --> LLM
THINK --> TOOL
LLM --> MEMORY
TOOL --> MEMORY
MEMORY --> AGENT
%% Styling
classDef input fill:#f5f5f5,stroke:#333,stroke-width:1px;
classDef agent fill:#e8f5e9,stroke:#2e7d32,stroke-width:1.5px;
classDef process fill:#ede7f6,stroke:#5e35b1,stroke-width:1.5px;
classDef model fill:#fff3e0,stroke:#ef6c00,stroke-width:1.5px;
classDef tool fill:#fce4ec,stroke:#ad1457,stroke-width:1.5px;
classDef memory fill:#e0f2f1,stroke:#00695c,stroke-width:1.5px;
class INPUT input;
class AGENT agent;
class THINK process;
class LLM model;
class TOOL tool;
class MEMORY memory;
Local-first execution with clear separation of concerns.
flowchart TD
DEV[Developer Machine]
FE[Streamlit Frontend<br/>app.py]
BE[FastAPI Backend<br/>main.py]
OLLAMA[Ollama Runtime<br/>gpt-oss:20b]
SECRETS[Local Secrets]
TAVILY[TavilyKey.txt]
LANGSMITH[LangSmithKey.txt]
DEV --> FE
DEV --> BE
DEV --> OLLAMA
DEV --> SECRETS
SECRETS --> TAVILY
SECRETS --> LANGSMITH
%% Styling
classDef host fill:#f5f5f5,stroke:#333,stroke-width:1px;
classDef ui fill:#e3f2fd,stroke:#1565c0,stroke-width:1.5px;
classDef backend fill:#ede7f6,stroke:#5e35b1,stroke-width:1.5px;
classDef model fill:#fff3e0,stroke:#ef6c00,stroke-width:1.5px;
classDef secrets fill:#fce4ec,stroke:#ad1457,stroke-width:1.5px;
class DEV host;
class FE ui;
class BE backend;
class OLLAMA model;
class SECRETS,TAVILY,LANGSMITH secrets;
Ensure the following are installed and configured:
-
Python 3.10 or later
-
Ollama
-
Pull the required model:
ollama pull gpt-oss:20b
-
-
API keys stored locally:
TavilyKey.txtLangSmithKey.txt
python main.py- Backend will run at:
http://localhost:8000
streamlit run app.py- Frontend will be available at:
http://localhost:8501
- Verified that the agent can perform live searches (e.g., “What’s the current price of SPY?”)
- Responses are synthesized cleanly without exposing internal tool usage
- Output maintains a professional, authoritative tone
- Confirmed persistent conversational context across multiple turns
- Thread-based
thread_idsystem reliably maintains state - No memory leakage between sessions
- Developers seeking a drop-in open-source alternative to paid AI assistants
- Researchers exploring agentic AI systems without cloud constraints
- Organizations requiring on-device inference for security or compliance
- Builders looking for a reference implementation of local-first AI design