Bluepy - AI Conversational Interface for ARGO Data

An intelligent conversational interface for querying and visualizing ARGO oceanographic float data using RAG (Retrieval-Augmented Generation) and MCP (Model Context Protocol).

Architecture

Data Flow: Ingest Argo NetCDF → normalize & store (Postgres + Parquet) → index metadata & embeddings (FAISS/Chroma) → RAG + MCP translator (LLM) → Backend APIs → Interactive dashboard + Chat UI (Streamlit) + visualizations (Plotly/Leaflet)

Features

ARGO Data Ingestion: Parse NetCDF files and normalize to structured formats
Dual Storage: PostgreSQL with PostGIS for spatial queries + Parquet for analytics
Vector Search: FAISS/Chroma for semantic retrieval of profiles and metadata
RAG + MCP: LLM-powered natural language to SQL translation with structured outputs
FastAPI Backend: RESTful APIs for chat, queries, and data access
Interactive Frontend: Streamlit dashboard with chat, maps, and visualizations
Geospatial Viz: Leaflet maps for float trajectories, Plotly for profiles

Project Structure

bluepy/
├── backend/
│   ├── api/              # FastAPI endpoints
│   ├── core/             # Core business logic
│   ├── db/               # Database models and connections
│   ├── rag/              # RAG + MCP implementation
│   └── main.py           # FastAPI app entry point
├── frontend/
│   ├── app.py            # Streamlit main app
│   ├── components/       # UI components
│   └── utils/            # Frontend utilities
├── ingestion/
│   ├── parsers/          # NetCDF parsers
│   ├── normalizers/      # Data normalization
│   └── pipeline.py       # Ingestion pipeline
├── data/
│   ├── raw/              # Raw NetCDF files
│   ├── processed/        # Parquet files
│   └── embeddings/       # Vector DB storage
├── tests/                # Unit and integration tests
├── docker/               # Docker configurations
├── scripts/              # Utility scripts
├── requirements.txt      # Python dependencies
├── .env.example          # Environment variables template
└── README.md

Quick Start

Prerequisites

Python 3.10+
PostgreSQL 14+ with PostGIS extension
Docker (optional, for containerized deployment)

Installation

Clone and navigate to the project:

cd bluepy

Create virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Set up environment variables:

cp .env.example .env
# Edit .env with your configuration

Initialize database:

python scripts/init_db.py

Run data ingestion (example):

python ingestion/pipeline.py --input data/raw --output data/processed

Running the Application

Backend API:

uvicorn backend.main:app --reload --port 8000

Frontend Dashboard:

streamlit run frontend/app.py --server.port 8501

Access the application at http://localhost:8501

Usage Examples

Natural Language Queries

"Show me salinity profiles near the equator in March 2023"
"What's the average temperature at 500m depth in the Indian Ocean?"
"Find floats with anomalous oxygen levels in the last 6 months"
"Plot temperature vs depth for float 2902123"

API Endpoints

POST /chat - Conversational interface
POST /sql/execute - Execute validated SQL queries
GET /profile/{id} - Get specific profile details
GET /floats - List all floats with filters
GET /map/geojson - Get trajectory data for mapping

Configuration

Key environment variables in .env:

# Database
DATABASE_URL=postgresql://user:pass@localhost:5432/argo_db
POSTGRES_USER=argo_user
POSTGRES_PASSWORD=secure_password
POSTGRES_DB=argo_db

# LLM Configuration
OPENAI_API_KEY=your_api_key_here
LLM_MODEL=gpt-4
EMBEDDING_MODEL=text-embedding-3-small

# Vector DB
VECTOR_DB_TYPE=chroma  # or faiss
CHROMA_PERSIST_DIR=./data/embeddings/chroma

# API
API_HOST=0.0.0.0
API_PORT=8000
CORS_ORIGINS=http://localhost:8501

# Frontend
STREAMLIT_SERVER_PORT=8501
MAP_PROVIDER=leaflet

Development

Running Tests

pytest tests/ -v --cov=backend --cov=ingestion

Code Quality

# Linting
flake8 backend/ ingestion/ frontend/

# Type checking
mypy backend/ ingestion/

# Formatting
black backend/ ingestion/ frontend/

Deployment

Docker Compose

docker-compose up -d

Kubernetes

kubectl apply -f k8s/

Data Schema

PostgreSQL Tables

argo_profile - Main profile data table with spatial indexing argo_profile_meta - Profile metadata and summaries argo_float - Float information and trajectories

See backend/db/schema.sql for complete schema definitions.

Technology Stack

Backend: FastAPI, SQLAlchemy, psycopg2
Database: PostgreSQL + PostGIS, Parquet (PyArrow)
Vector DB: ChromaDB / FAISS
LLM: OpenAI GPT-4 / Anthropic Claude
Frontend: Streamlit, Plotly, Folium/Leaflet
Data Processing: xarray, netCDF4, pandas, numpy
Deployment: Docker, Docker Compose, Kubernetes

License

MIT License - see LICENSE file for details

Acknowledgments

ARGO Program for oceanographic data
OpenAI for LLM capabilities
Streamlit community for excellent framework

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bluepy - AI Conversational Interface for ARGO Data

Architecture

Features

Project Structure

Quick Start

Prerequisites

Installation

Running the Application

Usage Examples

Natural Language Queries

API Endpoints

Configuration

Development

Running Tests

Code Quality

Deployment

Docker Compose

Kubernetes

Data Schema

PostgreSQL Tables

Technology Stack

License

Acknowledgments

About

Uh oh!

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

janvis11/bluepy

Folders and files

Latest commit

History

Repository files navigation

Bluepy - AI Conversational Interface for ARGO Data

Architecture

Features

Project Structure

Quick Start

Prerequisites

Installation

Running the Application

Usage Examples

Natural Language Queries

API Endpoints

Configuration

Development

Running Tests

Code Quality

Deployment

Docker Compose

Kubernetes

Data Schema

PostgreSQL Tables

Technology Stack

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages