AI-powered system that analyzes resumes and recommends the most relevant research papers
- π ResearchReach: Intelligent Research Paper Matcher & Cold-Email Assistant
- π Overview
- π Introduction
- π How It Works
- π Research Paper Matching System
- π 1οΈβ£ Resume Parsing and Skill Extraction
- π 2οΈβ£ Research Paper Retrieval
- π 3οΈβ£ Convert to Sentence Embeddings
- π 4οΈβ£ Compute Cosine Similarity
- π‘ 5οΈβ£ Final Output
- βοΈ 6οΈβ£ Email Generation
- π€ Contributors
ResearchReach is an AI-driven web platform that intelligently matches research papers with a candidateβs resume.
Using advanced natural language processing techniques such as Sentence-BERT (SBERT) embeddings and cosine similarity, the system evaluates a userβs:
- Skills
- Projects
- Technical experience
- Research interests
It then identifies the most relevant research papers from the web and automatically drafts a professional cold-email tailored to the selected paper.
This makes the process of research discovery and outreach faster, more accurate, and significantly more efficient.
Finding research papers that align precisely with your skills and academic profile can be tedious.
ResearchReach fully automates this process in four steps:
- β Extracts skills and project details from the resume
- β Converts resume and paper text into embeddings using SBERT
- β Computes semantic similarity using cosine similarity
- β Recommends the most relevant research paper with a high matching score
Designed for students, researchers, and applicants seeking internships or collaboration opportunities, ResearchReach offers a streamlined, intelligent, and user-friendly experience.
The matching process follows a 4-step pipeline:
- The system extracts key details from the candidate's resume, including:
- β Skills (e.g., Machine Learning, NLP)
- β Projects (e.g., Fake News Detection using BERT)
β Example:
Skills = ["Machine Learning", "Natural Language Processing", "Deep Learning", "Python"]
Projects = ["Fake News Detection using BERT", "Text Summarization with LSTM"] This information is concatenated into a single text input:
"Machine Learning Natural Language Processing Deep Learning Python Fake News Detection using BERT Text Summarization with LSTM" | Component | Tool |
|---|---|
| Frontend | React.js |
| Backend | Flask |
| Embedding Model | Sentence-BERT (all-MiniLM-L6-v2) |
| Paper Retrieval | Semantic Scholar API |
| Similarity Calculation | Cosine Similarity (Scikit-learn) |
| Email Generation | Gemini API |
| Paper Download | Unpaywall API |
β
Fast and Efficient: Handles large datasets quickly using SBERT.
β
Accurate Matching: High similarity scoring using cosine similarity.
β
Automated Paper Retrieval: Uses Semantic Scholar to find relevant papers.
β
Secure Data Handling: Ensures data privacy and integrity.
β
Email Automation: Automatically generates internship request emails based on the matching paper.
- Resume Parsing and Skill Extraction
- Research Paper Retrieval
- Convert to Sentence Embeddings
- Compute Cosine Similarity
- Generate and Send Email
The system extracts skills and projects from the resume using pdfplumber, spaCy, and KeyBERT.
Example Skills:
Machine Learning, Natural Language Processing, Deep Learning, Python, Fake News Detection using BERT, Text Summarization with LSTM
The system retrieves research papers using Web Scraping with the help of beautifulsoup4 & Spacy
Example papers:
π Paper 1:
Title: "A Deep Learning Approach to Fake News Detection"
Abstract: "We propose a model based on BERT for detecting fake news articles. Our approach achieves state-of-the-art performance in text classification tasks."
π Paper 2:
Title: "Efficient Image Classification with CNNs"
Abstract: "We present an optimized CNN model for image classification. The model reduces computational cost while maintaining accuracy."
The system converts text into high-dimensional vector embeddings using Sentence-BERT (all-MiniLM-L6-v2):
from sentence_transformers import SentenceTransformer
embed_model = SentenceTransformer('all-MiniLM-L6-v2')
resume_embedding = embed_model.encode(resume_text)
paper_1_embedding = embed_model.encode(paper_1_text)
paper_2_embedding = embed_model.encode(paper_2_text) Resume Embedding β [0.12, -0.08, ..., 0.32]
Paper 1 Embedding β [0.11, -0.07, ..., 0.30]
Paper 2 Embedding β [0.02, 0.45, ..., -0.12] Cosine similarity measures how similar two vectors are:
[ \text{Cosine Similarity} = \frac{A \cdot B}{||A|| \cdot ||B||} ]
β Example calculation:
from sklearn.metrics.pairwise import cosine_similarity
similarity_1 = cosine_similarity([resume_embedding], [paper_1_embedding])
similarity_2 = cosine_similarity([resume_embedding], [paper_2_embedding]) | Pair | Similarity Score | Result |
|---|---|---|
| Resume & Paper 1 | 0.92 | β High Similarity |
| Resume & Paper 2 | 0.34 | β Low Similarity |
The paper with the highest similarity score is selected as the most relevant match.
β
Most Relevant Paper Found!
Title: "A Deep Learning Approach to Fake News Detection"
Abstract: "We propose a model based on BERT for detecting fake news articles. Our approach achieves state-of-the-art performance in text classification tasks."
Similarity Score: 0.92
Once a matching paper is found, the system generates an internship request email using the Gemini API.
β
Formal & Professional
β
Technical & Research-Oriented
β
Enthusiastic & Passionate
We would like to extend our heartfelt gratitude to everyone who contributed to this project. Your hard work and dedication made this possible!
|
Srujan Rana π Project Lead, Backend Developer |
Rudra Prasad Jena π» Frontend Developer & π API Integration |
Abhishek Kumar π» Frontend Developer |
π Want to contribute?
We welcome contributions from the community! If you'd like to improve the project or report issues, feel free to fork the repo and submit a pull request.



