Skip to content

Conversation

@jalajthanaki
Copy link
Owner

Fix wordsteam.py: Python 3 compatibility and polyglot HTTP 403 errors

Summary

This PR addresses two critical issues preventing Chapter 3's wordsteam.py script from running:

  1. Python 2 to 3 syntax incompatibility: Updated all print statements to use Python 3 parentheses syntax
  2. Polyglot library HTTP 403 errors: The polyglot library fails when attempting to download morpheme models from deprecated servers

Three solutions provided:

  • Fixed original script (3_1_wordsteam.py): Python 3 compatible, but polyglot may still fail
  • Polyglot patcher (polyglot_downloader_patch.py): Adds User-Agent headers to help bypass server restrictions (may not fully resolve the issue)
  • NLTK-only alternative (3_1_wordsteam_nltk_only.py): Recommended - Works reliably without polyglot dependencies, demonstrates Porter, Lancaster, and Snowball stemmers

Review & Testing Checklist for Human

  • Verify Python 3 syntax changes in 3_1_wordsteam.py are complete (all print statements updated correctly)
  • Test the NLTK-only script (3_1_wordsteam_nltk_only.py) - This is the recommended solution and should work out of the box with just pip install nltk
  • Review documentation clarity in FIXES_README.md - Is it clear which solution users should choose? Does it adequately explain the polyglot issues?
  • Verify the patch script (polyglot_downloader_patch.py) - It modifies an external library file. Consider whether this approach is appropriate or if we should just recommend the NLTK alternative
  • Decide on polyglot support: Given that polyglot is deprecated (last updated 2016) and its servers are unreliable, should we even try to maintain support or just recommend the NLTK alternative?

Test Plan

# Test NLTK-only solution (should work reliably)
pip install nltk
python3 ch3/3_1_wordsteam_nltk_only.py

# Optional: Try the patched polyglot approach
# Note: May still fail due to server issues
sudo apt-get install libicu-dev pkg-config
pip install polyglot PyICU pycld2 morfessor numpy six
python3 ch3/polyglot_downloader_patch.py
python3 ch3/3_1_wordsteam.py

Notes

Known limitations:

  • Even with the patch, polyglot morpheme functionality may still fail because the project's infrastructure is deprecated and unreliable
  • The patch script modifies the installed polyglot library files - users will lose the patch if they reinstall polyglot
  • The polyglot library hasn't been updated since 2016 and has various Python 3 compatibility issues

Why the NLTK-only solution is recommended:

  • No external server dependencies
  • Well-maintained library
  • Demonstrates multiple stemming algorithms (Porter, Lancaster, Snowball)
  • Actually provides more educational value than the original

Session Info:

- Fix Python 2 to 3 syntax: Update all print statements to use parentheses
- Add polyglot_downloader_patch.py: Patch script to fix HTTP 403 errors
- Add 3_1_wordsteam_nltk_only.py: NLTK-only alternative that works reliably
- Add comprehensive FIXES_README.md: Detailed documentation of issues and solutions
- Add Chapter_3_Installation_Commands_UPDATED.txt: Updated installation guide

The original script had two main issues:
1. Python 2 print syntax incompatible with Python 3
2. Polyglot library HTTP 403 errors when downloading morpheme models

Solutions provided:
- Fixed original script with Python 3 syntax
- Created patch for polyglot downloader (adds User-Agent headers)
- Created NLTK-only alternative (recommended, no external dependencies)
- Comprehensive documentation for troubleshooting

The NLTK-only version is recommended as polyglot is deprecated (last updated 2016)
and has unreliable server infrastructure.

Co-Authored-By: jalajthanaki@gmail.com <jalajthanaki@gmail.com>
@devin-ai-integration
Copy link

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants