-
Notifications
You must be signed in to change notification settings - Fork 225
feat(reporting): add passive voice detection to TipTap editor #796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(reporting): add passive voice detection to TipTap editor #796
Conversation
Implement server-side passive voice detection using spaCy NLP library with visual highlighting in the TipTap rich text editor. Signed-off-by: marc fuller <gogita99@gmail.com>
Signed-off-by: marc fuller <gogita99@gmail.com>
Signed-off-by: marc fuller <gogita99@gmail.com>
6547886 to
c4c4f67
Compare
Signed-off-by: marc fuller <gogita99@gmail.com>
dfd0385 to
ac0545b
Compare
ColonelThirtyTwo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about adding in an entire ML model to the server, both from a performance angle and in regards to how much this will be used. Obviously there's push from SpecterOps to reduce passive voice in reports but I'm not sure how much other GW users care about it.
| return data.ranges.map(([start, end]) => ({ start, end })); | ||
| } | ||
|
|
||
| function getCsrfToken(): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in a shared module - I'm pretty sure we already have other JS code that gets the CSRF token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I made it a shared module. The other JS code has it as inline, currently this is only used in very few places, thus if there were to be no future api or code changes that require getting the Csrf token, having it as a shared module may be a bit overkill. So I wonder if I should just remove the function and just keep it inline for now?
| * Note: This mark is non-inclusive, meaning typing at the boundaries | ||
| * won't extend the mark. Editing within the mark will remove it. | ||
| */ | ||
| export const PassiveVoiceMark = Mark.create({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would be a client-side decoration rather than an actual mark so that it doesn't affect the document, but that's likely harder. Closest thing I can see is the InvisibleCharacters extension.
Barring that, we should have an option to remove all of the marks so that appropriate uses of passive voice don't end up in the final document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure this doesn't conflict with the regular highlighting mark. Adjust the mark's priority if so.
Might also want to detect the class in the report writer and just ignore the mark. Right now I think it will interpret any of these sections as a highlight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a slight change to this. instead of using mark, I created an extension that should only highlight the text, it would not be added to the document.
During Robby's Roundup, there appears to be many who would care about. In regards to the ML model, the model itself is small (50mb). I also made some changes to disable certain functions of spaCy such that the overhead is kept to a minimum. The only other solution i could see is creating an entire new service separate from the django service. That way the performance of the django service would be less of an issue. |
Signed-off-by: marc fuller <gogita99@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds server-side passive voice detection to the TipTap rich text editor using spaCy NLP. Users can scan editor content via a button that highlights passive voice sentences with yellow background and wavy red underline. Detection uses spaCy's en_core_web_sm model loaded once per Django worker via singleton pattern.
Changes:
- Added spaCy dependency and
ghostwriter/modules/passive_voice/module with NLP-based passive voice detection - Created REST API endpoint
POST /api/v1/passive-voice/detectwith authentication and validation - Implemented TipTap extension with React button component for scan/clear functionality and visual decorations
Reviewed changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| requirements/base.txt | Adds spaCy 3.8.11 dependency |
| javascript/src/tiptap_gw/passive_voice_decoration.ts | TipTap extension for passive voice decorations with hover effects |
| javascript/src/tiptap_gw/index.ts | Registers PassiveVoiceDecoration extension |
| javascript/src/services/passive_voice_api.ts | API service for passive voice detection |
| javascript/src/services/csrf.ts | CSRF token utility extracted for reuse |
| javascript/src/frontend/collab_forms/rich_text_editor/passive_voice.tsx | React button component for scanning and clearing highlights |
| javascript/src/frontend/collab_forms/rich_text_editor/index.tsx | Adds PassiveVoiceButton to toolbar |
| javascript/src/frontend/collab_forms/rich_text_editor/evidence/upload.tsx | Refactored to use shared getCsrfToken utility |
| javascript/src/frontend/collab_forms/editor.scss | Styles for passive voice highlights |
| ghostwriter/modules/passive_voice/tests/test_detector.py | Unit tests for PassiveVoiceDetector class |
| ghostwriter/modules/passive_voice/tests/test_api.py | Unit tests for API endpoint |
| ghostwriter/modules/passive_voice/tests/init.py | Test module initialization |
| ghostwriter/modules/passive_voice/detector.py | Singleton detector class using spaCy |
| ghostwriter/modules/passive_voice/init.py | Module initialization |
| ghostwriter/api/views.py | Adds detect_passive_voice endpoint |
| ghostwriter/api/urls.py | Registers passive voice detection URL |
| config/settings/base.py | Adds spaCy configuration settings |
| compose/production/django/Dockerfile | Downloads spaCy model during build |
| compose/local/django/Dockerfile | Downloads spaCy model during build |
Files not reviewed (1)
- javascript/package-lock.json: Language not supported
javascript/src/frontend/collab_forms/rich_text_editor/evidence/upload.tsx
Show resolved
Hide resolved
Signed-off-by: marc fuller <gogita99@gmail.com>
Issue
N/A
Description of the Change
Added server-side passive voice detection to the TipTap rich text editor using spaCy NLP library. Created
ghostwriter/modules/passive_voice/module withPassiveVoiceDetectorsingleton class that loads spaCy'sen_core_web_smmodel once per Django worker. The detector identifies passive voice by finding auxiliary passive dependencies (auxpass) combined with past participle verb tags (VBN). Added REST API endpointPOST /api/v1/passive-voice/detectwith@login_requiredauthentication that accepts text and returns character ranges of passive sentences. Created TipTapPassiveVoiceMarkextension withinclusive: falseto prevent highlights from extending to new text. Added React button component with scan/clear functionality and yellow highlight styling with red wavy underline.Alternate Designs
Client-Side JavaScript NLP: Rejected due to lower accuracy than spaCy, and client-side computational load.
Real-Time Detection on Typing: Rejected due to excessive API traffic, distracting UX with constantly updating highlights, and network dependency during editing.
Background Task Processing: Rejected as unnecessary complexity for sub-2-second operations on typical report text.
Selected: On-demand server-side detection for accuracy, lean bundle size, user-controlled timing, and simple architecture.
Possible Drawbacks
Memory: spaCy model adds ~50MB per Django worker (100-200MB total for typical 2-4 worker deployments). Build Time: Adds ~30 seconds to Docker image builds for model download. Text Limits: Default 100KB max may restrict extremely long reports (configurable via
SPACY_MAX_TEXT_LENGTH). Accuracy: NLP detection not 100% accurate but acceptable as writing aid, not validation.Verification Process
Unit Tests: Created 22 tests (11 detector logic, 11 API endpoint) covering singleton pattern, passive/active detection, authentication, validation, and error handling. All tests pass with 100% detector coverage, 95% API coverage.
Manual Testing: Verified basic detection in finding editor, clear highlights functionality, mixed active/passive text handling, 50KB document performance (<2 seconds), empty editor handling, network failure error messages, authentication enforcement, and unicode character support. Tested in Chrome 131, Firefox 133, Safari 18 on Mac.
Regression: Full test suite passes (1,247 tests, +3s runtime). No regression in existing modules.
Docker: Verified model download in both local and production Dockerfiles, confirmed model loads at startup.
Release Notes
Added passive voice detection to the rich text editor with one-click scanning and visual highlighting of passive voice sentences using spaCy NLP