Skip to content

Rumeleq/ragapp

Repository files navigation

GrepEvent - Retrieval-Augmented Generation

Chatbot app, where the user asks AI about tech events taking place in Poland. Created using Python

Table of contents

Introduction

This is a project that was assigned to us by a datarabbit.ai company as a recruitment task. We had to complete it in order to do an internship there.

This task was also an opportunity for us to learn new technologies, that we were unfamiliar with at the beginning.

Technologies

Setup

In order to run this app, docker is required.

If you don't have docker installed on your computer yet, you can install it here

Once you have docker installed, follow these guidelines:

  1. Clone the repo on your local machine

    1. You can do it by running this command in terminal:
      git clone https://github.com/Rumeleq/ragapp.git
      
  2. Prepare the .env file, it should be placed in the project's root folder

    It should contain variables like this:

    OPENAI_API_KEY=your_api_key
    CHROMADB_HOST=chromadb
    CHROMADB_PORT=8000
    CHROMADB_DIR=./chroma
    SCRAPING_OUTPUT_DIR=./data
    SCRAPING_URLS=https://www.eventbrite.com/d/poland/other--events/?page=1, https://www.eventbrite.com/d/poland/all-events/?subcategories=4004&page=1, https://www.eventbrite.com/d/poland/science-and-tech--events/?page=1, https://crossweb.pl/wydarzenia/, https://unikonferencje.pl/konferencje/technologie_informacyjne, https://unikonferencje.pl/konferencje/elektrotechnika, https://unikonferencje.pl/konferencje/automatyka_robotyka, https://unikonferencje.pl/konferencje/informatyka_teoretyczna
    
  3. Make sure you are in the project's root folder and run the command: 1.

    docker compose up
    

    There are two versions of this command: docker-compose up and docker compose up. On Windows you can run both and it will work fine, however on Linux, it is recommended to pick the second version (without the dash). The command docker compose up forces docker to use docker_compose_v2 which is just better, more stable and more reliable. 2. By running the above command, docker should:

    1. install the chromadb image (unless you have it already)
    2. run etl container after the chroma's healthcheck
    3. in etl container scraper.py script should scrape the data from websites:
      1. crossweb
      2. unikonferencje
      3. eventbritte
    4. after the scraper.py finishes successfully, frontend container should run and expose the port 8501
    5. The whole process could take even a few minutes, especially when running for the first time
  4. If you see in docker logs that frontend container is starting to run, you can visit the webapp in browser

Screenshots

Correctly set up and working app looks like this: app in use

Status

The project is: done

Acknowledgements

Special thanks to datarabbit team for giving us this interesting challenge

Many thanks to:

Our team

People and their roles:

Rumeleq - repository owner, responsible for etl scraper

wiktorKycia - repository maintainer, responsible for frontend - displaying data on a website, also responsible for dockerization

JanTopolewski - responsible for data flow, connecting to chroma database and AI, prompt templates

About

Recruitment task - RAG application

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •