Skip to content

Conversation

@marcpfuller
Copy link
Contributor

@marcpfuller marcpfuller commented Jan 12, 2026

Issue

N/A

Description of the Change

Added server-side passive voice detection to the TipTap rich text editor using spaCy NLP library. Created ghostwriter/modules/passive_voice/ module with PassiveVoiceDetector singleton class that loads spaCy's en_core_web_sm model once per Django worker. The detector identifies passive voice by finding auxiliary passive dependencies (auxpass) combined with past participle verb tags (VBN). Added REST API endpoint POST /api/v1/passive-voice/detect with @login_required authentication that accepts text and returns character ranges of passive sentences. Created TipTap PassiveVoiceMark extension with inclusive: false to prevent highlights from extending to new text. Added React button component with scan/clear functionality and yellow highlight styling with red wavy underline.

Alternate Designs

Client-Side JavaScript NLP: Rejected due to lower accuracy than spaCy, and client-side computational load.

Real-Time Detection on Typing: Rejected due to excessive API traffic, distracting UX with constantly updating highlights, and network dependency during editing.

Background Task Processing: Rejected as unnecessary complexity for sub-2-second operations on typical report text.

Selected: On-demand server-side detection for accuracy, lean bundle size, user-controlled timing, and simple architecture.

Possible Drawbacks

Memory: spaCy model adds ~50MB per Django worker (100-200MB total for typical 2-4 worker deployments). Build Time: Adds ~30 seconds to Docker image builds for model download. Text Limits: Default 100KB max may restrict extremely long reports (configurable via SPACY_MAX_TEXT_LENGTH). Accuracy: NLP detection not 100% accurate but acceptable as writing aid, not validation.

Verification Process

Unit Tests: Created 22 tests (11 detector logic, 11 API endpoint) covering singleton pattern, passive/active detection, authentication, validation, and error handling. All tests pass with 100% detector coverage, 95% API coverage.

Manual Testing: Verified basic detection in finding editor, clear highlights functionality, mixed active/passive text handling, 50KB document performance (<2 seconds), empty editor handling, network failure error messages, authentication enforcement, and unicode character support. Tested in Chrome 131, Firefox 133, Safari 18 on Mac.

Regression: Full test suite passes (1,247 tests, +3s runtime). No regression in existing modules.

Docker: Verified model download in both local and production Dockerfiles, confirmed model loads at startup.

Release Notes

Added passive voice detection to the rich text editor with one-click scanning and visual highlighting of passive voice sentences using spaCy NLP

Implement server-side passive voice detection using spaCy NLP library
with visual highlighting in the TipTap rich text editor.

Signed-off-by: marc fuller <gogita99@gmail.com>
@marcpfuller marcpfuller self-assigned this Jan 12, 2026
@marcpfuller marcpfuller requested review from chrismaddalena and removed request for chrismaddalena January 12, 2026 22:37
Signed-off-by: marc fuller <gogita99@gmail.com>
Signed-off-by: marc fuller <gogita99@gmail.com>
@marcpfuller marcpfuller force-pushed the passive_voice_detection branch from 6547886 to c4c4f67 Compare January 13, 2026 01:51
Signed-off-by: marc fuller <gogita99@gmail.com>
@marcpfuller marcpfuller force-pushed the passive_voice_detection branch from dfd0385 to ac0545b Compare January 13, 2026 01:57
Copy link
Contributor

@ColonelThirtyTwo ColonelThirtyTwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about adding in an entire ML model to the server, both from a performance angle and in regards to how much this will be used. Obviously there's push from SpecterOps to reduce passive voice in reports but I'm not sure how much other GW users care about it.

return data.ranges.map(([start, end]) => ({ start, end }));
}

function getCsrfToken(): string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in a shared module - I'm pretty sure we already have other JS code that gets the CSRF token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I made it a shared module. The other JS code has it as inline, currently this is only used in very few places, thus if there were to be no future api or code changes that require getting the Csrf token, having it as a shared module may be a bit overkill. So I wonder if I should just remove the function and just keep it inline for now?

* Note: This mark is non-inclusive, meaning typing at the boundaries
* won't extend the mark. Editing within the mark will remove it.
*/
export const PassiveVoiceMark = Mark.create({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would be a client-side decoration rather than an actual mark so that it doesn't affect the document, but that's likely harder. Closest thing I can see is the InvisibleCharacters extension.

Barring that, we should have an option to remove all of the marks so that appropriate uses of passive voice don't end up in the final document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure this doesn't conflict with the regular highlighting mark. Adjust the mark's priority if so.

Might also want to detect the class in the report writer and just ignore the mark. Right now I think it will interpret any of these sections as a highlight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a slight change to this. instead of using mark, I created an extension that should only highlight the text, it would not be added to the document.

@marcpfuller
Copy link
Contributor Author

I'm concerned about adding in an entire ML model to the server, both from a performance angle and in regards to how much this will be used. Obviously there's push from SpecterOps to reduce passive voice in reports but I'm not sure how much other GW users care about it.

During Robby's Roundup, there appears to be many who would care about. In regards to the ML model, the model itself is small (50mb). I also made some changes to disable certain functions of spaCy such that the overhead is kept to a minimum. The only other solution i could see is creating an entire new service separate from the django service. That way the performance of the django service would be less of an issue.

Signed-off-by: marc fuller <gogita99@gmail.com>
Copilot AI review requested due to automatic review settings January 14, 2026 00:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds server-side passive voice detection to the TipTap rich text editor using spaCy NLP. Users can scan editor content via a button that highlights passive voice sentences with yellow background and wavy red underline. Detection uses spaCy's en_core_web_sm model loaded once per Django worker via singleton pattern.

Changes:

  • Added spaCy dependency and ghostwriter/modules/passive_voice/ module with NLP-based passive voice detection
  • Created REST API endpoint POST /api/v1/passive-voice/detect with authentication and validation
  • Implemented TipTap extension with React button component for scan/clear functionality and visual decorations

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
requirements/base.txt Adds spaCy 3.8.11 dependency
javascript/src/tiptap_gw/passive_voice_decoration.ts TipTap extension for passive voice decorations with hover effects
javascript/src/tiptap_gw/index.ts Registers PassiveVoiceDecoration extension
javascript/src/services/passive_voice_api.ts API service for passive voice detection
javascript/src/services/csrf.ts CSRF token utility extracted for reuse
javascript/src/frontend/collab_forms/rich_text_editor/passive_voice.tsx React button component for scanning and clearing highlights
javascript/src/frontend/collab_forms/rich_text_editor/index.tsx Adds PassiveVoiceButton to toolbar
javascript/src/frontend/collab_forms/rich_text_editor/evidence/upload.tsx Refactored to use shared getCsrfToken utility
javascript/src/frontend/collab_forms/editor.scss Styles for passive voice highlights
ghostwriter/modules/passive_voice/tests/test_detector.py Unit tests for PassiveVoiceDetector class
ghostwriter/modules/passive_voice/tests/test_api.py Unit tests for API endpoint
ghostwriter/modules/passive_voice/tests/init.py Test module initialization
ghostwriter/modules/passive_voice/detector.py Singleton detector class using spaCy
ghostwriter/modules/passive_voice/init.py Module initialization
ghostwriter/api/views.py Adds detect_passive_voice endpoint
ghostwriter/api/urls.py Registers passive voice detection URL
config/settings/base.py Adds spaCy configuration settings
compose/production/django/Dockerfile Downloads spaCy model during build
compose/local/django/Dockerfile Downloads spaCy model during build
Files not reviewed (1)
  • javascript/package-lock.json: Language not supported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants