A Go library for detecting plagiarism using stopwords and n-grams analysis.
This library implements a plagiarism detection algorithm that uses stopwords and n-grams to identify similar content between texts. The algorithm analyzes the position and frequency of function words (stopwords) to calculate a similarity score.
The algorithm creates relationships between similar documents, allowing you to visualize networks of similar content and identify potential plagiarism chains.
The similarity network shows how documents are connected based on their similarity scores. Each node represents a document, and edges show the similarity percentage between documents.
This algorithm can track how content spreads and evolves through reproduction, paraphrasing, and falsification - helping identify the chain of misinformation.
Tracking how original content transforms through reproduction, paraphrasing, new article creation, falsification, patchwork plagiarism, and ultimately misinformation.
- Academic Integrity: Detect plagiarism in student papers and research articles
- Content Monitoring: Track content copying and unauthorized republishing
- Misinformation Tracking: Identify how false information spreads and mutates
- Copyright Protection: Find unauthorized use of copyrighted content
- SEO Analysis: Detect duplicate content across websites
- Fast plagiarism detection using stopwords n-gram analysis
- Configurable n-gram size (default: 8)
- English stopwords support
- Simple and intuitive API
- Detects paraphrasing and patchwork plagiarism
go get github.com/KinshukSS2/plag-checkerpackage main
import (
"fmt"
"github.com/KinshukSS2/plag-checker"
)
func main() {
source := "This is a test document for plagiarism detection"
target := "This is a test document for detection"
detector, _ := plagiarism.NewDetector()
err := detector.DetectWithStrings(source, target)
if err != nil {
panic(err)
}
fmt.Printf("Similarity Score: %.2f\n", detector.Score)
fmt.Printf("Similar n-grams: %d\n", detector.Similar)
fmt.Printf("Total n-grams: %d\n", detector.Total)
}// Create with default settings (n=8, language="en")
detector, err := plagiarism.NewDetector()
// Create with custom n-gram size
detector, err := plagiarism.NewDetector(plagiarism.SetN(12))
// Create with custom stopwords
customStopwords := []string{"the", "a", "is", "are"}
detector, err := plagiarism.NewDetector(plagiarism.SetStopWords(customStopwords))// Detect using text strings
err := detector.DetectWithStrings(sourceText, targetText)
// Detect using pre-filtered stopwords
err := detector.DetectWithStopWords(sourceStopwords, targetStopwords)After detection, access results through the detector:
detector.Score- Similarity score (0.0 to 1.0)detector.Similar- Number of matching n-gramsdetector.Total- Total number of n-grams compared
- Tokenization: Text is split into individual words (tokens)
- Stopword Filtering: Only stopwords (function words like "the", "is", "and") are kept
- N-gram Generation: Stopwords are converted into n-grams (sequences of N words)
- Similarity Calculation: N-grams are compared for positional matches
- Score: Ratio of matching n-grams to total n-grams
Run tests:
go test ./...Run tests with coverage:
go test -cover ./...See the examples/ directory for complete examples.
MIT License - See LICENSE file for details
Algorithm inspired by research in plagiarism detection using stopwords n-grams.
By:Kinshuk

