Skip to content

๐Ÿ–ผ๏ธ Workshop: Build a multimodal AI agent with Haystack & GPT-4o โ€” featuring image understanding, document retrieval, conversational memory, and human-in-the-loop safety controls

License

Notifications You must be signed in to change notification settings

bilgeyucel/multimodal-agent-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

13 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ–ผ๏ธ Giving Eyes to Your AI: Engineering a Multimodal Agent

A hands-on workshop exploring multimodal AI agents with Haystack.

What You'll Build

  • ๐Ÿ“„ Multimodal indexing pipeline (PDFs + images) using CLIP embeddings
  • ๐Ÿค– Vision-enabled agent powered by GPT-4o
  • ๐Ÿ” RAG tool for searching company policies
  • ๐Ÿ’ฌ Conversational memory for context-aware interactions
  • ๐Ÿ” Human-in-the-loop controls for sensitive actions

Get Started

๐Ÿ‘‰ See multimodal_agent_notebook.ipynb for the full interactive experience.

Deploy with Hayhooks

Want to deploy the agent as an API? Check out multimodal-agent/pipeline_wrapper.py โ€” a Python script version of the notebook with Hayhooks integration pre-configured for serving the conversational agent.

Files

The files/ directory contains the sample data used in the workshop:

  • receipt.jpeg โ€” A sample receipt image for the expense reimbursement demo
  • social_budget_policy.md โ€” Company policy document for retrieval

Requirements

  • Python 3.10+
  • OpenAI API key (or your preferred LLM provider)
  • See the notebook for full package installation instructions

About

๐Ÿ–ผ๏ธ Workshop: Build a multimodal AI agent with Haystack & GPT-4o โ€” featuring image understanding, document retrieval, conversational memory, and human-in-the-loop safety controls

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •