A client has a system that collects news artifacts from web pages, tweets, facebook posts, etc. The client is interested in scoring a given new artifact against a topic. The client has hired experts to score a few of these news items in the range from 0 to 10; a score of 0 means the news item is totally NOT relevant while a score of 10 means the news item is very relevant. The range of results between 0 and 10 signifies the degree of relevance of the news item to the topic.
The client wants to explore how useful existing LLMs such as GPT-3 are for this task. You are hired as a consultant to explore the efficiency of GPT3-like LLMs to this task. If your recommendation is positive, you must demonstrate that your strategies to design prompts are reproducible and produce a consistent result.
You should also set up an MLOps pipeline that helps automate the task of using different LLMs and different topics. Your pipeline should also allow future improvements in the prompt design to be integrated without breaking the system. A centralized log system should be incorporated into your pipeline to help monitor outputs, cost, performance, and other relevant artifacts.
Our data is versioned using DVC
news - For now we have only one virsion of news data
- news-v0 : original version of the data
- news-v1 : first stage cleaned news data
- test-news-v1 : enhanced test data
- test-news-v2 : 2nd enhanced test data
- test_news-v0 : track test news data
- train-news-v1 : enhanced train data
- train-news-v2 : 2nd enhanced train data
- train_news-v0 : track train news data
The directories for this project is self-explanatory. You can find the api (for making predictions) setup in api folder. The versioned data in data folder. notebook directory contains the notebooks for this project. You can find helper classes in scripts directory.
This project uses co:here api for making predictions. Thus you need to have your own api_key.
create
config.pyfile in the root directory then place your api key as follows
api_key = "**************"
If you want to fine tune your model, you can find
tuner.txtfile in ./data/ directory. Use this file for finetuning co:here Generate
git clone https://github.com/Nathnael12/Prompt-engineering.git
cd Prompt-engineering
pip install -r requirements.txt
You will find it in the api directory. There are three endpoints included
{host:port}/checkused for checking whether or not our API is up{host:port}/bnewscoreused for predicting news scores{host:port}/jdentitiesused for extracting job entities
for this project you will use
host:port = http://127.0.0.1:8000/
you can start the api by the following command
cd api
uvicorn app:app --reload
The above command should start your api at http://127.0.0.1:8000/
