VisionaryQA is a next-generation Visual Question Answering application that empowers users to upload any image, pose natural-language queries, and receive intelligent, context-aware answers driven by powerful vision-language models like BLIP-2.
- Upload & Ask: Drag & drop or browse an image, then type any question.
- Model Backend: Uses co-operative FastAPI + Hugging Face
transformerswith BLIP-2 or alternative. - Interactive UI: Modern, responsive interface built with Streamlit (or React + Tailwind).
- Aesthetic Touches: Embedded images, styling, and live feedback animations.
- Containerized: Dockerfile for seamless deployment.
visual-vqa/
βββ app/
β βββ main.py # FastAPI backend
β βββ vqa.py # Inference logic (BLIP-2 pipeline)
β βββ utils.py # Image preprocessing
β βββ models/
β βββ blip2_model.py # Model loading wrapper
βββ frontend/
β βββ streamlit_app.py # Streamlit UI
β βββ assets/
β βββ logo.png # Application logo
git clone https://github.com/your-username/visual-vqa.git
cd visual-vqa
pip install -r requirements.txtuvicorn app.main:app --reloadIn a new terminal:
streamlit run frontend/streamlit_app.py- Image Upload
- Question Input
- Answer Display with typing animation
from fastapi import FastAPI, File, UploadFile
from app.vqa import answer_question
app = FastAPI()
@app.post("/vqa/")
async def vqa_api(image: UploadFile = File(...), question: str = ""):
content = await image.read()
answer = answer_question(content, question)
return {"answer": answer}from transformers import Blip2Processor, Blip2ForConditionalGeneration
from PIL import Image
import io
processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b")
def answer_question(image_bytes: bytes, question: str) -> str:
image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
inputs = processor(images=image, text=question, return_tensors="pt")
outputs = model.generate(**inputs)
return processor.decode(outputs[0], skip_special_tokens=True)import streamlit as st
import requests
st.title("πΌοΈ Visual Q&A")
image = st.file_uploader("Upload an image", type=["png","jpg","jpeg"])
question = st.text_input("Ask a question about the image...")
if st.button("Get Answer") and image and question:
files = {"image": image.getvalue()}
data = {"question": question}
res = requests.post("http://localhost:8000/vqa/", files={'image': image}, data={'question': question})
st.success(res.json().get("answer"))FROM python:3.10-slim
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE 8501 8000
CMD ["/bin/sh", "-c", "uvicorn app.main:app --host 0.0.0.0 & streamlit run frontend/streamlit_app.py --server.port 8501"]- logo.png: Application logo
- screenshot.png: UI mockup
This project uses GitHub Actions to automate testing, linting, and deployment.
... (existing CI/CD content) ...
We use pytest to validate the core functionality:
visual-vqa/
βββ tests/
β βββ test_vqa_api.py
β βββ test_answer_question.py
import io
import pytest
from fastapi.testclient import TestClient
from app.main import app
client = TestClient(app)
def test_vqa_endpoint_no_question():
# Upload an image without question should return 400
response = client.post(
"/vqa/",
files={"image": ("test.png", b"fakebytes", "image/png")},
data={"question": ""}
)
assert response.status_code == 400
@pytest.mark.parametrize("question,expected_status", [
("What is this?", 200),
])
def test_vqa_endpoint_valid(question, expected_status):
# Upload fake image bytes and a question
response = client.post(
"/vqa/",
files={"image": ("test.png", b"fakebytes", "image/png")},
data={"question": question}
)
assert response.status_code == expected_status
json_data = response.json()
assert "answer" in json_dataimport tempfile
from app.vqa import answer_question
from PIL import Image
def create_test_image(path):
# Create a simple RGB image
img = Image.new("RGB", (64, 64), color=(155, 0, 0))
img.save(path, format="PNG")
def test_answer_question_runs_without_error(tmp_path):
img_path = tmp_path / "test.png"
create_test_image(img_path)
# Ensure it returns a string
answer = answer_question(str(img_path), "Is it red?")
assert isinstance(answer, str)These tests will run in CI, catching regressions in API and inference logic.
Implementing CI/CD brings multiple advantages to the Visual VQA project:
- Automated Quality Assurance: Every code change triggers automated tests and linting, catching bugs and style issues early.
- Faster Feedback Loop: Developers receive immediate feedback on code quality and functionality before merging.
- Consistent Builds: Ensures that the application builds and runs correctly across different environments.
- Easy Rollbacks & Deployments: Automated deployment pipelines can quickly roll out new features or revert problematic releases.
- Improved Collaboration: Contributors can focus on feature development, trusting that CI/CD enforces standards.
-
Continuous Integration (CI): Automatically merges and tests every change pushed to the repository. It includes:
- Testing: Runs unit and integration tests (via
pytest) to confirm that new changes donβt break existing functionality. - Linting: Uses tools like
flake8to enforce code style and catch syntax errors or anti-patterns.
- Testing: Runs unit and integration tests (via
-
Continuous Deployment (CD): Automates packaging and releasing the application. It includes:
- Building: Creates artifacts such as Docker images in a reproducible manner.
- Publishing: Pushes the Docker image to a registry (e.g., Docker Hub) or deploys to hosting platforms.
By integrating CI/CD, the Visual VQA system remains stable, maintainable, and ready for rapid iteration.
Feel free to open issues or pull requests.
[Your Name] β’ [your@email.com]