A hands-on workshop exploring multimodal AI agents with Haystack.
- ๐ Multimodal indexing pipeline (PDFs + images) using CLIP embeddings
- ๐ค Vision-enabled agent powered by GPT-4o
- ๐ RAG tool for searching company policies
- ๐ฌ Conversational memory for context-aware interactions
- ๐ Human-in-the-loop controls for sensitive actions
๐ See multimodal_agent_notebook.ipynb for the full interactive experience.
Want to deploy the agent as an API? Check out multimodal-agent/pipeline_wrapper.py โ a Python script version of the notebook with Hayhooks integration pre-configured for serving the conversational agent.
The files/ directory contains the sample data used in the workshop:
receipt.jpegโ A sample receipt image for the expense reimbursement demosocial_budget_policy.mdโ Company policy document for retrieval
- Python 3.10+
- OpenAI API key (or your preferred LLM provider)
- See the notebook for full package installation instructions