A full-stack AI system that performs automated evaluation of student answer sheets based upon model answer key provided.
CopyChecker-AI is an intelligent academic evaluation system that:
- 📄 Accepts Model Answer Key and Student Answer PDFs
- 🧠 Extracts text and images using advanced OCR pipelines
- 🖼 Performs layout-aware segmentation using YOLOv8
- 🔍 Detects structural and semantic similarities
- 🤖 Uses LLM reasoning (Groq API) for evaluation scoring
- ⚡ Generates structured evaluation results
The system goes beyond simple text matching by combining:
- Computer Vision
- OCR
- Document Layout Analysis
- Large Language Model evaluation
- Parallel processing pipelines
This makes it significantly more advanced than traditional plagiarism detection tools.
Traditional answer checkers:
- ❌ Rely only on text matching
- ❌ Fail on scanned handwritten PDFs
- ❌ Ignore layout structure
- ❌ Cannot evaluate answer quality
CopyChecker-AI solves this by:
- Converting PDFs into high-resolution images
- Performing layout-aware segmentation
- Extracting text via OCR
- Using LLM-based semantic comparison
- Producing contextual evaluation output
- Python
- Flask (Backend interface)
- HTML & CSS (Frontend interface)
- OpenCV
- YOLOv8 (Ultralytics)
- Image preprocessing with CLAHE
- Layout-aware segmentation
- Tesseract OCR
- pdf2image
- Poppler
- PyMuPDF (fitz)
- Groq API (LLM reasoning)
- LangChain
- Vector Indexing
- Full-text Search Index
- Converts PDFs to 300 DPI images
- Multi-threaded page conversion
- Handles scanned and handwritten PDFs
- Extracts both text and embedded images
Instead of naive OCR:
- Detects text blocks using YOLO segmentation model
- Extracts structured regions
- Preserves logical content flow
This improves:
- OCR accuracy
- Content alignment
- Context grouping
- LAB color space conversion
- CLAHE contrast enhancement
- RGB normalization
- Tesseract OCR
- Page-wise concurrent processing
Improves:
- Low contrast scan handling
- Handwritten text clarity
- Reduced OCR noise
Uses:
concurrent.futures- Multi-threading for performance
Benefit:
- Faster evaluation for multi-page PDFs
- Scalable evaluation architecture
Instead of simple similarity scores:
- Sends extracted content to Groq LLM endpoint
- Uses contextual comparison prompts
- Produces structured evaluation output
This enables:
- Context-aware grading
- Semantic similarity detection
- Answer quality assessment