Skip to content

Conversation

@jalajthanaki
Copy link
Owner

Fix polyglot morpheme error by replacing with NLTK SnowballStemmer

Summary

This PR addresses the HTTP 403 error that occurs when running 3_1_wordsteam.py due to the polyglot library's data server (http://polyglot.cs.stonybrook.edu/~polyglot) being unavailable. The fix replaces polyglot's morphological analysis with NLTK's SnowballStemmer and updates the code to Python 3 syntax.

Changes made:

  • Replaced polyglot Word objects and .morphemes with NLTK's SnowballStemmer
  • Updated all print statements from Python 2 to Python 3 syntax
  • Updated installation documentation to note polyglot deprecation
  • Tested the script successfully runs and produces output

⚠️ Important: This changes the functionality from morphological segmentation to stemming. The output semantics are different - the original showed morpheme breakdowns, the new version shows stems only.

Review & Testing Checklist for Human

  • Critical: Verify output is pedagogically acceptable - The original code showed morpheme segmentation (e.g., breaking words into meaningful units), but the replacement only shows stemming (root form). Review if this maintains the educational value for Chapter 3.
  • Test the script end-to-end - Run python ch3/3_1_wordsteam.py and verify output matches book's expectations
  • Check for other polyglot usage - Search the repository for other files using polyglot that may have the same issue
  • Verify Python version consistency - This changes the file to Python 3. Check if other book examples are Python 2 or 3
  • Review if book text needs updates - The chapter text/examples may reference morpheme output that will now look different

Test Plan

cd ch3
python 3_1_wordsteam.py
# Verify output shows appropriate stems and no errors

Notes

  • The polyglot library itself still works, but its data server is permanently unavailable, making morpheme analysis impossible
  • SnowballStemmer provides similar but not identical functionality - it's stemming, not morphological segmentation
  • This is a functional change, not just a bug fix, so careful review is needed

Link to Devin run: https://app.devin.ai/sessions/83802e4ce3e64ba38e388398c9c423f5
Requested by: jalajthanaki@gmail.com (@jalajthanaki)

- Replaced polyglot library with NLTK's SnowballStemmer due to unavailable data server
- Updated all print statements to Python 3 syntax
- Updated installation documentation to reflect the change
- Fixes HTTP Error 403: Forbidden when accessing polyglot.cs.stonybrook.edu

The polyglot library's data server is no longer accessible, causing the script to fail.
This fix uses NLTK's SnowballStemmer which provides similar stemming functionality
without requiring external data downloads.

Co-Authored-By: jalajthanaki@gmail.com <jalajthanaki@gmail.com>
@devin-ai-integration
Copy link

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants