Skip to content

mjthewalker/QdrantFinance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 

Repository files navigation

âš¡ Mohnish Hemanth Kumar

🤖 NVIDIA 10K ANALYSIS(2020-2024) using 🦙 llama 3.1


About the task

In this task we will analyze the SEC 10K reports of NVIDIA over the last 5 years and derive insights and conclude whether the company grew over the years or not. We will be using the RAG(Retrieval Augmented Generation) approach for this task.

Getting the Data

We extract the data from SEC's Official Website using the API service provided by SEC-API. We extract only some sections of the filings in html format.

🧹 Data Cleaning

We will be using 🦙 llama parse to parse the data. Since llama parse only accepts pdf files as input we will be converting the html files into pdf. After that we will merge all the parsed data into one single .MD file.

Analysis

We will be using RAG approach. We first split the data into small chunks using RecursiveCharacterTextSplitter(), Then we embed the data using 'BAAI/bge-base-en-v1.5' model. We then use qdrant to create a vector database which also contains a vector search engine for RAG. We use flashrankrerank to rerank the data. Finally we will be using llama 3.1 llm with the help of Groq API to derive insights.

📓 Notebook

The source code for this task is available here

Results

Financial Analysis 1

Financial Analysis 2

Visualizations

Creative Insights

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published