Skip to content

nlkitai/py-demo-api

Repository files navigation

py-demo-api

g A production-ready Python FastAPI server for LLM chat interactions using LangChain. Supports multiple providers (OpenAI, Anthropic, Google), streaming and batched responses, with intelligent in-memory caching.

πŸš€ Features

  • Multi-Provider Support: OpenAI, Anthropic (Claude), Google (Gemini)
  • OpenAI-Compatible API: Drop-in replacement for OpenAI API endpoints
  • Streaming & Batched Responses: Real-time streaming or complete responses
  • Smart Caching: In-memory cache with TTL for identical prompts/history
  • LangChain Integration: Leverages LangChain for robust LLM interactions
  • Type-Safe: Full Pydantic validation for requests and responses
  • Docker Ready: Containerized for easy cloud deployment
  • Production-Ready: Health checks, CORS, proper error handling

πŸ“‹ What's Included

py-demo-api/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ __init__.py           # Package initialization
β”‚   β”œβ”€β”€ main.py               # FastAPI application entry point
β”‚   β”œβ”€β”€ config.py             # Configuration settings
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── routes.py         # API endpoints (chat, health, cache)
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── chat.py           # Request/response models
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── llm_service.py    # LLM provider integration
β”‚   └── utils/
β”‚       β”œβ”€β”€ __init__.py
β”‚       └── cache.py          # In-memory caching implementation
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ Dockerfile               # Docker image definition
β”œβ”€β”€ docker-compose.yml       # Docker Compose configuration
β”œβ”€β”€ .gitignore              # Git ignore rules
└── README.md               # This file

πŸ› οΈ Quick Start

Prerequisites

  • Python 3.11+
  • pip or poetry
  • API keys for desired providers (OpenAI, Anthropic, or Google)

Local Setup

  1. Clone and navigate to the directory:

    cd py-demo-api
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Configure environment variables: Create a .env file in the root directory:

    # Required: At least one provider API key
    OPENAI_API_KEY=sk-your-openai-key-here
    ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here
    GOOGLE_API_KEY=your-google-key-here
    
    # Optional: Configuration
    DEFAULT_PROVIDER=openai
    DEFAULT_MODEL=gpt-4o-mini
    CACHE_MAX_SIZE=1000
    CACHE_TTL_SECONDS=3600
  5. Run the server:

    uvicorn app.main:app --reload

    Or using the Python script:

    python -m app.main
  6. Access the API:

🐳 Docker Deployment

Using Docker Compose (Recommended)

  1. Create .env file with your API keys (see above)

  2. Start the service:

    docker-compose up -d
  3. View logs:

    docker-compose logs -f
  4. Stop the service:

    docker-compose down

Using Docker directly

  1. Build the image:

    docker build -t py-demo-api .
  2. Run the container:

    docker run -d \
      -p 8000:8000 \
      -e OPENAI_API_KEY=your-key \
      -e ANTHROPIC_API_KEY=your-key \
      --name py-demo-api \
      py-demo-api

πŸ“‘ API Usage

OpenAI-Compatible Endpoint

Batched Response:

curl -X POST "http://localhost:8000/api/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "provider": "openai",
    "model": "gpt-3.5-turbo",
    "temperature": 0.7
  }'

Streaming Response:

curl -X POST "http://localhost:8000/api/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Tell me a story"}
    ],
    "stream": true,
    "provider": "openai"
  }'

Python Client Example

import requests

# Batched response
response = requests.post(
    "http://localhost:8000/api/v1/chat/completions",
    json={
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is machine learning?"}
        ],
        "provider": "openai",
        "temperature": 0.7
    }
)

data = response.json()
print(data["message"]["content"])
print(f"Cached: {data['cached']}")

# Streaming response
response = requests.post(
    "http://localhost:8000/api/v1/chat/completions",
    json={
        "messages": [{"role": "user", "content": "Count to 10"}],
        "stream": True
    },
    stream=True
)

for line in response.iter_lines():
    if line:
        print(line.decode())

🎯 Supported Providers & Models

OpenAI (Latest as of Nov 2025)

  • gpt-4o - Latest GPT-4 Optimized model ⭐
  • gpt-4o-mini - Faster, cheaper GPT-4 (recommended for most use cases) ⭐
  • gpt-4-turbo - Previous generation GPT-4 Turbo
  • gpt-4 - Standard GPT-4
  • gpt-3.5-turbo - Legacy, cheaper option

Anthropic Claude (Latest as of Nov 2025)

  • claude-3-5-sonnet-20241022 - Latest Claude 3.5 Sonnet ⭐
  • claude-3-5-sonnet - Alias for latest 3.5 Sonnet
  • claude-3-opus-20240229 - Most capable Claude 3 model
  • claude-3-opus - Alias for Claude 3 Opus
  • claude-3-sonnet-20240229 - Balanced Claude 3 model
  • claude-3-sonnet - Alias for Claude 3 Sonnet
  • claude-3-haiku-20240307 - Fastest, most affordable Claude 3
  • claude-3-haiku - Alias for Claude 3 Haiku

Google Gemini (Latest as of Nov 2025)

  • gemini-1.5-pro - Most capable Gemini model ⭐
  • gemini-1.5-flash - Faster, more affordable Gemini ⭐
  • gemini-pro - Previous generation Gemini
  • gemini-pro-vision - Previous generation with vision support

⭐ = Recommended models

πŸ’Ύ Caching

The API implements intelligent in-memory caching:

  • Cache Key: Hash of messages + model + provider + temperature + max_tokens
  • TTL: Configurable (default: 1 hour)
  • Max Size: Configurable (default: 1000 entries)
  • Behavior: Identical requests return cached responses instantly

Clear cache manually:

curl -X POST "http://localhost:8000/api/cache/clear"

πŸ—οΈ Architecture

Request Flow

  1. Client sends chat request
  2. Cache lookup by request hash
  3. If cached β†’ return immediately
  4. If not β†’ LangChain β†’ LLM Provider
  5. Cache response for future use
  6. Return to client

Key Components

  • FastAPI: Modern async web framework
  • LangChain: LLM orchestration and provider abstraction
  • Pydantic: Data validation and settings management
  • cachetools: TTL-based in-memory caching

πŸ”§ Configuration

All configuration via environment variables (see .env):

Variable Description Default
OPENAI_API_KEY OpenAI API key None
ANTHROPIC_API_KEY Anthropic API key None
GOOGLE_API_KEY Google API key None
DEFAULT_PROVIDER Default LLM provider openai
DEFAULT_MODEL Default model name gpt-4o-mini
CACHE_MAX_SIZE Max cached responses 1000
CACHE_TTL_SECONDS Cache entry lifetime 3600
HOST Server host 0.0.0.0
PORT Server port 8000
ENVIRONMENT Environment mode development

πŸ§ͺ Testing

Check health endpoint:

curl http://localhost:8000/api/health

Expected response:

{
  "status": "healthy",
  "cache_stats": {
    "size": 0,
    "max_size": 1000,
    "ttl_seconds": 3600
  }
}

πŸš€ Cloud Deployment

This API is ready for cloud deployment on:

  • AWS: ECS, EKS, or EC2 with Docker
  • Google Cloud: Cloud Run, GKE, or Compute Engine
  • Azure: Container Instances, AKS, or App Service
  • DigitalOcean: App Platform or Droplets
  • Railway, Render, Fly.io: Direct Docker deployment

Deployment Checklist

  • Set production environment variables
  • Configure CORS for your domain
  • Set up monitoring and logging
  • Enable HTTPS/TLS
  • Implement rate limiting (if needed)
  • Set appropriate cache sizes

πŸ“ License

MIT License - Feel free to use in your projects!

🀝 Contributing

Contributions welcome! Please feel free to submit a Pull Request.


Need help? Check the API docs at /docs or open an issue on GitHub.

About

NLUX demo APIs built using Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published