GitHub - bilgeyucel/multimodal-agent-workshop: 🖼️ Workshop: Build a multimodal AI agent with Haystack & GPT-4o — featuring image understanding, document retrieval, conversational memory, and human-in-the-loop safety controls

🖼️ Giving Eyes to Your AI: Engineering a Multimodal Agent

A hands-on workshop exploring multimodal AI agents with Haystack.

What You'll Build

📄 Multimodal indexing pipeline (PDFs + images) using CLIP embeddings
🤖 Vision-enabled agent powered by GPT-4o
🔍 RAG tool for searching company policies
💬 Conversational memory for context-aware interactions
🔁 Human-in-the-loop controls for sensitive actions

Get Started

👉 See multimodal_agent_notebook.ipynb for the full interactive experience.

Deploy with Hayhooks

Want to deploy the agent as an API? Check out multimodal-agent/pipeline_wrapper.py — a Python script version of the notebook with Hayhooks integration pre-configured for serving the conversational agent.

Files

The files/ directory contains the sample data used in the workshop:

receipt.jpeg — A sample receipt image for the expense reimbursement demo
social_budget_policy.md — Company policy document for retrieval

Requirements

Python 3.10+
OpenAI API key (or your preferred LLM provider)
See the notebook for full package installation instructions

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
files		files
multimodal-agent		multimodal-agent
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
building_multimodal_pipelines.ipynb		building_multimodal_pipelines.ipynb
multimodal_agent_notebook.ipynb		multimodal_agent_notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🖼️ Giving Eyes to Your AI: Engineering a Multimodal Agent

What You'll Build

Get Started

Deploy with Hayhooks

Files

Requirements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

bilgeyucel/multimodal-agent-workshop

Folders and files

Latest commit

History

Repository files navigation

🖼️ Giving Eyes to Your AI: Engineering a Multimodal Agent

What You'll Build

Get Started

Deploy with Hayhooks

Files

Requirements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages