Skip to content

AI Framework for Remote Sensing Image Analysis using RAG - 88%+ accuracy, multi-modal queries, ChatGPT-like interface

License

Notifications You must be signed in to change notification settings

debanjan06/geospatial-rag

Repository files navigation

GeoSpatial-RAG: An AI Framework For Analysis Of Remote Sensing Images

Python 3.8+ License: MIT Streamlit App

A novel AI framework designed specifically for the analysis of remote sensing images, integrating large language models (LLMs) with specialized vision-language models to overcome challenges in Earth observation data analysis.

GeoSpatial-RAG Demo

🌍 Overview

GeoSpatial-RAG employs a retrieval-augmented generation (RAG) approach that creates a multi-modal knowledge vector database from remote sensing imagery and textual descriptions. The framework addresses the significant domain gap between natural images and remote sensing imagery by developing a specialized pipeline using CLIP (Contrastive Language-Image Pretraining).

🎯 Key Innovation

  • Domain-Specific RAG: First RAG system specifically designed for remote sensing imagery
  • Multi-Modal Intelligence: Seamlessly combines text and image understanding
  • High Accuracy: Achieves 88%+ similarity matching for relevant queries
  • Production Ready: Complete web interface with ChatGPT-like experience

✨ Key Features

  • 🧠 Multi-modal Knowledge Vector Database: Unified encoding of remote sensing images and text descriptions
  • πŸ” Cross-modal Retrieval: Semantic search using natural language queries or image inputs
  • 🎯 CLIP-based Embeddings: Leverages CLIP for both visual and textual information encoding
  • πŸ€– LangChain Integration: Advanced text generation with vision-language model support
  • πŸ—ƒοΈ SQLite Database: Efficient storage and retrieval of embeddings
  • 🌐 Web Interface: Modern ChatGPT-like interface for easy interaction
  • ⚑ Real-time Processing: GPU-accelerated processing for fast responses

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Input Query   β”‚    β”‚     Images      β”‚    β”‚  Text Captions  β”‚
β”‚   (Text/Image)  β”‚    β”‚                 β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                      β”‚                      β”‚
          β–Ό                      β–Ό                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    CLIP Encoder                                 β”‚
β”‚                (Text + Vision)                                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                SQLite Vector Database                           β”‚
β”‚            (Text & Image Embeddings)                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Similarity Search & Retrieval                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        LangChain RAG Pipeline + VLM Generation                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β”‚
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Generated Response                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • CUDA-compatible GPU (recommended)
  • 8GB+ RAM

Installation

  1. Clone the repository
git clone https://github.com/debanjan06/geospatial-rag.git
cd geospatial-rag
  1. Create virtual environment
python -m venv geospatial_env

# Windows
geospatial_env\Scripts\activate

# Linux/MacOS
source geospatial_env/bin/activate
  1. Install dependencies
pip install -r requirements.txt
pip install -e .
  1. Set up environment variables
cp .env.example .env
# Edit .env with your configuration

πŸ—ƒοΈ Database Setup

Option 1: Use Pre-built Database (Recommended for Testing)

# Download our pre-built database (10,975 documents)
# Place in: database/rsicd_embeddings.db
# Contact: bl.sc.p2dsc24032@bl.students.amrita.edu for access

Option 2: Create Your Own Database

# 1. Download RSICD dataset
wget [RSICD_DATASET_URL]

# 2. Generate embeddings
python scripts/generate_embeddings.py --dataset_path /path/to/RSICD --output_dir ./database

# 3. Create SQLite database
python scripts/create_database.py --embeddings_dir ./database --db_path ./database/rsicd_embeddings.db

Option 3: Demo Database (Quick Testing)

# Create a small demo database for testing
python scripts/create_demo_database.py

πŸ§ͺ Test Your Setup

# Test database and imports
python test_database.py

Expected output:

πŸš€ GeoSpatial-RAG System Test
========================================
βœ… Database connected successfully!
   πŸ“Š Descriptions: 10,975
   πŸ“ Text embeddings: 10,975
   πŸ–ΌοΈ Image embeddings: 10,975
βœ… All tests passed!

πŸ”§ Usage

Command Line Interface

# Interactive demo
python demo/interactive_demo.py --db_path ./database/rsicd_embeddings.db

Web Interface (Recommended)

# Start the web interface
streamlit run streamlit_app.py

Then open: http://localhost:8501

Python API

from geospatial_rag import GeoSpatialRAG
from PIL import Image

# Initialize the RAG system
rag = GeoSpatialRAG(db_path="./database/rsicd_embeddings.db")

# Text-only query
results = rag.query("Show me aerial views of storage tanks")

# Text + Image query
image = Image.open("satellite_image.jpg")
results = rag.query("What does this image show?", image=image)

# Display results
for doc in results['documents']:
    print(f"Description: {doc.page_content}")
    print(f"Similarity: {doc.metadata['similarity']:.4f}")
    print("---")

print(f"AI Response: {results['response']}")

# Close when done
rag.close()

πŸ“Š Performance Results

Our system has been tested and validated with impressive results:

  • πŸ“Š Database Size: 10,975 remote sensing images with embeddings
  • 🎯 Accuracy: 88%+ similarity scores for relevant queries
  • ⚑ Speed: <2 seconds average query time on GPU
  • πŸ” Precision: High relevance in top-5 results for domain-specific queries

Example Query Results

Query Top Similarity Score Retrieved Documents Response Quality
"industrial complex with buildings" 0.8818 5/5 relevant Excellent
"aerial view of storage tanks" 0.7631 5/5 relevant Excellent
"satellite image of urban area" 0.8203 4/5 relevant Very Good
"remote sensing of forest" 0.7892 5/5 relevant Excellent

πŸ“ Project Structure

geospatial-rag/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── geospatial_rag/           # Main package
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ embeddings.py         # CLIP embedding generation
β”‚       β”œβ”€β”€ database.py           # SQLite database operations
β”‚       β”œβ”€β”€ retriever.py          # Custom retriever class
β”‚       β”œβ”€β”€ pipeline.py           # Main RAG pipeline
β”‚       └── utils.py              # Utility functions
β”œβ”€β”€ demo/
β”‚   └── interactive_demo.py       # Command-line interface
β”œβ”€β”€ tests/
β”‚   └── test_*.py                 # Test modules
β”œβ”€β”€ streamlit_app.py              # Web interface
β”œβ”€β”€ setup_web_interface.py       # Web interface setup
β”œβ”€β”€ quick_start.py                # Quick start script
β”œβ”€β”€ test_database.py             # Database testing
β”œβ”€β”€ requirements.txt              # Dependencies
β”œβ”€β”€ setup.py                      # Package setup
β”œβ”€β”€ LICENSE                       # MIT License
β”œβ”€β”€ .gitignore                   # Git ignore rules
└── README.md                     # This file

🌐 Web Interface Features

The Streamlit web interface provides:

  • πŸ’¬ ChatGPT-like Interface: Natural conversation flow
  • πŸ–ΌοΈ Image Upload: Drag-and-drop satellite/aerial image analysis
  • βš™οΈ Advanced Settings: Configurable similarity thresholds and result counts
  • πŸ“Š Real-time Stats: Database statistics and system status
  • πŸ” Live Search: Instant results with similarity scores
  • πŸ“± Responsive Design: Works on desktop and mobile

πŸ› οΈ Configuration

Environment Variables (.env)

# Model Configuration
CLIP_MODEL_NAME=openai/clip-vit-base-patch32
VLM_MODEL_NAME=Salesforce/blip-image-captioning-large
DEVICE=auto

# Database Configuration
DB_PATH=./database/rsicd_embeddings.db

# Processing Configuration
BATCH_SIZE=16
TEXT_WEIGHT=0.7
IMAGE_WEIGHT=0.3
TOP_K=5

# API Keys (optional)
HUGGINGFACE_API_KEY=your_hf_api_key_here

Advanced Configuration

The system supports extensive configuration through:

  • Environment variables
  • Configuration files (JSON/YAML)
  • Command-line arguments
  • Python API parameters

πŸ“ˆ Dataset Information

RSICD Dataset

  • Size: 10,921 remote sensing images
  • Resolution: 224Γ—224 pixels
  • Sources: Google Earth, Baidu Map, MapABC, Tianditu
  • Descriptions: 5 sentences per image
  • Splits: Train (8,734) / Valid (1,094) / Test (1,093)
  • Features: High intra-class diversity and low inter-class dissimilarity

Supported Image Types

  • Satellite imagery
  • Aerial photography
  • Remote sensing data
  • Multispectral images
  • Urban planning imagery
  • Agricultural monitoring
  • Environmental surveillance

πŸ§ͺ Testing

# Run all tests
pytest tests/ -v

# Run specific test modules
pytest tests/test_embeddings.py
pytest tests/test_database.py
pytest tests/test_pipeline.py

# Run with coverage
pytest tests/ --cov=geospatial_rag --cov-report=html

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines.

Development Setup

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Install development dependencies: pip install -e ".[dev]"
  4. Make your changes and add tests
  5. Run tests: pytest tests/
  6. Submit a pull request

Areas for Contribution

  • πŸ”¬ New Models: Integration of additional vision-language models
  • πŸ“Š Datasets: Support for new remote sensing datasets
  • 🌐 Interfaces: Mobile apps, desktop applications
  • πŸš€ Performance: Optimization and scaling improvements
  • πŸ“š Documentation: Tutorials, examples, and guides

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“š References

This work builds upon and is inspired by the following research:

[1] L. Fang et al., "Open-world recognition in remote sensing: Concepts, challenges, and opportunities," IEEE Geosci. Remote Sens. Mag., vol. 12, no. 2, pp. 8–31, 2024.

[2] R. M. Haralick, K. Shanmugam, and I. Dinstein, "Textural features for image classification," IEEE Trans. Syst., Man, Cybern., vol. SMC-3, no. 6, pp. 610–621, 1973.

[3] K. Kuckreja, M. S. Danish, M. Naseer, A. Das, S. Khan, and F. S. Khan, "Geochat: Grounded large vision-language model for remote sensing," arXiv preprint arXiv:2311.15826, 2023.

[4] R. Xu, C. Wang, J. Zhang, S. Xu, W. Meng, and X. Zhang, "Rssformer: Foreground saliency enhancement for remote sensing land-cover segmentation," IEEE Trans. Image Process., vol. 32, pp. 1052–1064, 2023.

[5] J. Lin, Z. Yang, Q. Liu, Y. Yan, P. Ghamisi, W. Xie, and L. Fang, "Hslabeling: Toward efficient labeling for large-scale remote sensing image segmentation with hybrid sparse labeling," IEEE Trans. Image Process., vol. 34, pp. 1864–1878, 2025.

[6] W. Zhang, M. Cai, T. Zhang, Y. Zhuang, and X. Mao, "Earthgpt: A universal multimodal large language model for multisensor image comprehension in remote sensing domain," IEEE Trans. Geosci. Remote Sens., vol. 62, p. 5917820, 2024.

[7] Y. Hu, J. Yuan, C. Wen, X. Lu, and X. Li, "Rsgpt: A remote sensing vision language model and benchmark," arXiv preprint arXiv:2307.15266, 2023.

[8] L. Zhu, F. Wei, and Y. Lu, "Beyond text: Frozen large language models in visual signal comprehension," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 27047–27057, 2024.

[9] Z. Yuan, Z. Xiong, L. Mou, and X. X. Zhu, "Chatearthnet: A global-scale image–text dataset empowering vision–language geo-foundation models," Earth Syst. Sci. Data, vol. 17, pp. 1245–1263, 2025.

[10] X. Lu, B. Wang, X. Zheng, and X. Li, "Exploring models and data for remote sensing image caption generation," IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 4, pp. 2183–2195, 2018.

πŸ† Acknowledgments

  • Amrita Viswa Vidyapeetham for research support and computational resources
  • OpenAI for the CLIP model and vision-language research
  • Salesforce for the BLIP model
  • RSICD dataset creators for providing the remote sensing image captioning dataset
  • LangChain community for the RAG framework
  • Streamlit team for the excellent web app framework

πŸ“ž Contact & Support

Getting Help

  • πŸ› Bug Reports: GitHub Issues
  • πŸ’‘ Feature Requests: GitHub Discussions
  • πŸ“§ Direct Contact: For database access or collaboration inquiries
  • πŸ“š Documentation: Check our docs/ directory for detailed guides

🌟 Star History

Star History Chart


⭐ Star this repository if you find it helpful!

πŸš€ Ready to revolutionize remote sensing analysis with AI?

Get Started β€’ Documentation β€’ Examples β€’ Web Demo

About

AI Framework for Remote Sensing Image Analysis using RAG - 88%+ accuracy, multi-modal queries, ChatGPT-like interface

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages