Skip to content

VisionaryQA is a next-generation Visual Question Answering application that empowers users to upload any image, pose natural-language queries, and receive intelligent, context-aware answers driven by powerful vision-language models like BLIP-2.

Notifications You must be signed in to change notification settings

trainmker/VisionaryQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ VisionaryQA

VisionaryQA is a next-generation Visual Question Answering application that empowers users to upload any image, pose natural-language queries, and receive intelligent, context-aware answers driven by powerful vision-language models like BLIP-2.


πŸ› οΈ Features

  • Upload & Ask: Drag & drop or browse an image, then type any question.
  • Model Backend: Uses co-operative FastAPI + Hugging Face transformers with BLIP-2 or alternative.
  • Interactive UI: Modern, responsive interface built with Streamlit (or React + Tailwind).
  • Aesthetic Touches: Embedded images, styling, and live feedback animations.
  • Containerized: Dockerfile for seamless deployment.

πŸ“ Repository Structure

visual-vqa/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py            # FastAPI backend
β”‚   β”œβ”€β”€ vqa.py             # Inference logic (BLIP-2 pipeline)
β”‚   β”œβ”€β”€ utils.py           # Image preprocessing
β”‚   └── models/
β”‚       └── blip2_model.py # Model loading wrapper
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ streamlit_app.py   # Streamlit UI
β”‚   └── assets/
β”‚       β”œβ”€β”€ logo.png       # Application logo


πŸš€ Quick Start

1. Clone & Install

git clone https://github.com/your-username/visual-vqa.git
cd visual-vqa
pip install -r requirements.txt

2. Run Backend

uvicorn app.main:app --reload

3. Run Frontend

In a new terminal:

streamlit run frontend/streamlit_app.py

πŸ–ΌοΈ User Interface

  1. Image Upload
  2. Question Input
  3. Answer Display with typing animation

πŸ”§ Implementation Details

Backend (app/main.py)

from fastapi import FastAPI, File, UploadFile
from app.vqa import answer_question

app = FastAPI()

@app.post("/vqa/")
async def vqa_api(image: UploadFile = File(...), question: str = ""):
    content = await image.read()
    answer = answer_question(content, question)
    return {"answer": answer}

Inference (app/vqa.py)

from transformers import Blip2Processor, Blip2ForConditionalGeneration
from PIL import Image
import io

processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b")

def answer_question(image_bytes: bytes, question: str) -> str:
    image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
    inputs = processor(images=image, text=question, return_tensors="pt")
    outputs = model.generate(**inputs)
    return processor.decode(outputs[0], skip_special_tokens=True)

Frontend (frontend/streamlit_app.py)

import streamlit as st
import requests

st.title("πŸ–ΌοΈ Visual Q&A")
image = st.file_uploader("Upload an image", type=["png","jpg","jpeg"])
question = st.text_input("Ask a question about the image...")
if st.button("Get Answer") and image and question:
    files = {"image": image.getvalue()}
    data = {"question": question}
    res = requests.post("http://localhost:8000/vqa/", files={'image': image}, data={'question': question})
    st.success(res.json().get("answer"))

🐳 Docker Support

FROM python:3.10-slim
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 8501 8000
CMD ["/bin/sh", "-c", "uvicorn app.main:app --host 0.0.0.0 & streamlit run frontend/streamlit_app.py --server.port 8501"]

πŸ“Έ Assets

  • logo.png: Application logo
  • screenshot.png: UI mockup

βš™οΈ CI/CD Pipeline

This project uses GitHub Actions to automate testing, linting, and deployment.

... (existing CI/CD content) ...

πŸ§ͺ Tests

We use pytest to validate the core functionality:

visual-vqa/
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_vqa_api.py
β”‚   └── test_answer_question.py

1. tests/test_vqa_api.py

import io
import pytest
from fastapi.testclient import TestClient
from app.main import app

client = TestClient(app)

def test_vqa_endpoint_no_question():
    # Upload an image without question should return 400
    response = client.post(
        "/vqa/",
        files={"image": ("test.png", b"fakebytes", "image/png")},
        data={"question": ""}
    )
    assert response.status_code == 400

@pytest.mark.parametrize("question,expected_status", [
    ("What is this?", 200),
])
def test_vqa_endpoint_valid(question, expected_status):
    # Upload fake image bytes and a question
    response = client.post(
        "/vqa/",
        files={"image": ("test.png", b"fakebytes", "image/png")},
        data={"question": question}
    )
    assert response.status_code == expected_status
    json_data = response.json()
    assert "answer" in json_data

2. tests/test_answer_question.py

import tempfile
from app.vqa import answer_question
from PIL import Image

def create_test_image(path):
    # Create a simple RGB image
    img = Image.new("RGB", (64, 64), color=(155, 0, 0))
    img.save(path, format="PNG")

def test_answer_question_runs_without_error(tmp_path):
    img_path = tmp_path / "test.png"
    create_test_image(img_path)
    # Ensure it returns a string
    answer = answer_question(str(img_path), "Is it red?")
    assert isinstance(answer, str)

These tests will run in CI, catching regressions in API and inference logic.

πŸ” Benefits of CI/CD

Implementing CI/CD brings multiple advantages to the Visual VQA project:

  • Automated Quality Assurance: Every code change triggers automated tests and linting, catching bugs and style issues early.
  • Faster Feedback Loop: Developers receive immediate feedback on code quality and functionality before merging.
  • Consistent Builds: Ensures that the application builds and runs correctly across different environments.
  • Easy Rollbacks & Deployments: Automated deployment pipelines can quickly roll out new features or revert problematic releases.
  • Improved Collaboration: Contributors can focus on feature development, trusting that CI/CD enforces standards.

πŸ§ͺ What CI/CD Does:

  1. Continuous Integration (CI): Automatically merges and tests every change pushed to the repository. It includes:

    • Testing: Runs unit and integration tests (via pytest) to confirm that new changes don’t break existing functionality.
    • Linting: Uses tools like flake8 to enforce code style and catch syntax errors or anti-patterns.
  2. Continuous Deployment (CD): Automates packaging and releasing the application. It includes:

    • Building: Creates artifacts such as Docker images in a reproducible manner.
    • Publishing: Pushes the Docker image to a registry (e.g., Docker Hub) or deploys to hosting platforms.

By integrating CI/CD, the Visual VQA system remains stable, maintainable, and ready for rapid iteration.


🀝 Contributing

Feel free to open issues or pull requests.


πŸ“¬ Contact

[Your Name] β€’ [your@email.com]

About

VisionaryQA is a next-generation Visual Question Answering application that empowers users to upload any image, pose natural-language queries, and receive intelligent, context-aware answers driven by powerful vision-language models like BLIP-2.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published