A hands-on, structured guide to Ollama covering local LLM usage, LangChain integration, REST APIs, tool calling, Ollama Cloud, and advanced experiments with models like Gemma 3 (4B), including vision and multimodal workflows.
Ollama is a platform that allows you to run, manage, and experiment with large language models (LLMs) locally and in the cloud. It provides a simple CLI, REST API, and library support to work with modern open-source models like LLaMA, Gemma, Qwen, Mistral, and more.
Key idea: Local-first AI with optional cloud scalability.
- π Run LLMs locally (privacy + speed)
- π No data leaves your machine (local mode)
- π§© Easy integration with Python, LangChain, REST APIs
- π οΈ Built-in support for tool calling & function execution
- βοΈ Ollama Cloud for scalable inference
- π§ͺ Perfect for learning, experimentation, and R&D
.
βββ Ollama.ipynb # Core Ollama usage & CLI experiments
βββ Ollama using Rest API.ipynb # REST API calls (generate, chat, models)
βββ Ollama Using LangChain.ipynb # LangChain + Ollama integration
βββ Tool Calling.ipynb # Function / tool calling with Ollama
βββ Ollama Cloud.ipynb # Cloud-based inference concepts
βββ Modelfile.txt # Custom model configuration
βββ Ollama Short notes.docx # Quick theory notes
βββ README.md # This documentation
# Check Ollama version
ollama --version
# List available models
ollama list
# Pull a model
ollama pull gemma3:4b
# Run a model interactively
ollama run gemma3:4b
# Run with a prompt
ollama run gemma3:4b "Explain LLMs in simple words"
# Remove a model
ollama rm gemma3:4b# Generate text
curl http://localhost:11434/api/generate \
-d '{"model": "gemma3:4b", "prompt": "What is GenAI?"}'
# Chat API
curl http://localhost:11434/api/chat \
-d '{"model": "gemma3:4b", "messages": [{"role": "user", "content": "Hello"}]}'
# List models via API
curl http://localhost:11434/api/tagsfrom ollama import chat
response = chat(
model="gemma3:4b",
messages=[{"role": "user", "content": "Explain tool calling"}]
)
print(response["message"]["content"])from langchain_community.llms import Ollama
llm = Ollama(model="gemma3:4b")
print(llm.invoke("What is RAG?"))- Define tools (functions)
- Pass tool schema to model
- Model decides when to call tools
Used for:
- Calculations
- API calls
- Database queries
Prompt β Cloud Endpoint β GPU Inference β Response
Used when:
- Large models (30B+)
- High traffic apps
- Production workloads
- Pull models (
ollama pull gemma3:4b) - Run models locally (
ollama run gemma3:4b) - Manage models (list, delete, update)
- Python library for programmatic access
Benefit: Simple developer experience with production-ready models.
Ollama exposes a local REST server:
/api/generate/api/chat/api/tags(list models)
Use cases:
- Backend integration
- Web apps
- Microservices
Benefits:
- Language agnostic
- Easy to scale
- Works with Docker & cloud infra
LangChain enables:
- Prompt templates
- Chains & agents
- Memory
- Tool usage
Workflow:
User β LangChain β Ollama Model β Response
Benefits:
- Build RAG pipelines
- AI agents
- Conversational memory
Tool calling allows models to:
- Call Python functions
- Execute APIs
- Perform structured reasoning
Examples:
- Calculator tools
- Database queries
- File operations
- External API calls
Why it matters:
Turns LLMs into AI agents, not just chatbots.
- User sends prompt
- Request hits cloud inference server
- Model runs on GPU/TPU
- Response streamed back
- β‘ High performance GPUs
- π Auto scaling
- π§ Large models (70B+)
- π No local hardware dependency
| Feature | Local Ollama | Ollama Cloud |
|---|---|---|
| Privacy | β High | |
| Cost | β Low | π° Usage based |
| Speed | β‘ Fast (small models) | π Fast (large models) |
| Offline | β Yes | β No |
- Lightweight
- Fast inference
- High-quality reasoning
- Ideal for laptops
- Text generation
- Instruction following
- Tool calling compatibility
- Vision & multimodal prompts (image β text)
Supported workflows:
- Image captioning
- OCR-like text extraction
- Visual reasoning
Use cases:
- Document processing
- Image understanding
- AI assistants with vision
| Feature | Gemma 3 | LLaMA | Qwen | Mistral |
|---|---|---|---|---|
| Tool Calling | β | β | β | |
| Vision | β | β | β | |
| Cloud Support | β | β | β | β |
| RAG Friendly | β | β | β | β |
- Ollama CLI basics
- Pull & run models
- Simple text generation
- REST API usage
- LangChain integration
- Prompt engineering
- Tool calling
- Vision models
- RAG pipelines
- Cloud deployment
- Learn LLMs practically
- Build production-ready AI apps
- No vendor lock-in
- Works locally & in cloud
- Ideal for students & professionals
- RAG with vector databases
- Multi-agent systems
- Fine-tuning custom models
- Production deployment (Docker, Kubernetes)
Happy Building π