Skip to content

Conversation

@jalajthanaki
Copy link
Owner

@jalajthanaki jalajthanaki commented Oct 23, 2025

Fix polyglot HTTP 403 error in wordsteam.py - Replace with NLTK stemmers

Summary

This PR fixes the critical issue where 3_1_wordsteam.py fails with HTTP Error 403: Forbidden when attempting to use the polyglot library. The polyglot data server (polyglot.cs.stonybrook.edu) is permanently down, making the library unusable.

Changes made:

  • Removed broken polyglot dependency and imports
  • Replaced polyglot_stem() function with alternative_stemmers() using NLTK's SnowballStemmer and LancasterStemmer
  • Updated all print statements from Python 2 to Python 3 syntax
  • Fixed syntax error: Added missing comma between "canonical" and "historical" in the words list (line 19)
  • Added comprehensive documentation (POLYGLOT_FIX_README.md and MIGRATION_GUIDE.md)
  • Added comparison table showing Porter, Snowball, and Lancaster stemmers side-by-side

Result: The script now runs successfully without errors and provides enhanced educational value by comparing multiple stemming algorithms.

Review & Testing Checklist for Human

  • Verify educational equivalence - Confirm that NLTK stemming provides equivalent learning value to polyglot's morpheme analysis for Chapter 3's objectives. Note: Stemming is simpler than morphological decomposition, so check if this aligns with what the book chapter intends to teach.
  • Check for other polyglot usage - Search the repository for other uses of polyglot (especially in other chapters) that might need similar fixes
  • Run the script - Execute python3 ch3/3_1_wordsteam.py to verify it works correctly and produces meaningful output
  • Review Python version alignment - Confirm that Python 3 syntax is appropriate for the book's target audience (book was published ~2016-2017, may have targeted Python 2.7)

Test Plan:

# 1. Install dependencies
pip install nltk

# 2. Run the fixed script
cd ch3
python3 3_1_wordsteam.py

# Expected: Script completes successfully showing:
#   - Porter stemmer results for derivational/inflectional morphemes
#   - Snowball stemmer results
#   - Lancaster stemmer results  
#   - Comparison table of all three stemmers

Notes

Important semantic difference: This PR replaces polyglot's morpheme analysis with NLTK stemming. While related, stemming is a simpler operation than morphological decomposition. However, given that polyglot is permanently broken (server down since ~2018), this is the best available alternative that maintains the educational objectives of understanding word structure.

The documentation files provide migration guidance for users who have existing polyglot code and explain why this change was necessary.


Link to Devin run: https://app.devin.ai/sessions/5b232592eb644fcba8d24c926d47500a
Requested by: jalajthanaki@gmail.com (@jalajthanaki)

- Remove broken polyglot dependency (server permanently down)
- Replace with NLTK stemmers (SnowballStemmer, LancasterStemmer)
- Update Python 2 print statements to Python 3 syntax
- Add comprehensive documentation and migration guide
- Add comparison table showing all three stemming algorithms
- Tested and verified working solution

Fixes the HTTP 403 Forbidden error that occurs when polyglot
attempts to download morfessor models from the defunct server
at polyglot.cs.stonybrook.edu

Co-Authored-By: jalajthanaki@gmail.com <jalajthanaki@gmail.com>
@devin-ai-integration
Copy link

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Co-Authored-By: jalajthanaki@gmail.com <jalajthanaki@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants