Skip to content

Conversation

@lpi-tn
Copy link
Collaborator

@lpi-tn lpi-tn commented Dec 17, 2025

This pull request improves the handling of Wikipedia page titles with special characters because if you take the very specific example of https://fr.wikipedia.org/wiki/Bossons%2FBl%C3%A9cherette you can see a '/' in the name and when the API is requested by the default behavior it crash

Bug Fixes & Improvements:

  • The is_redirection function now uses quote_plus to properly encode Wikipedia page titles, ensuring correct URL formation for titles with special characters. [1] [2]

Testing Enhancements:

  • Added a new unit test (test_is_redirection_true_with_weird_text) to verify that redirections (HTTP 307) are correctly detected, especially for page titles containing special characters.

Error Message Update:

  • Updated the error message for redirection detection to clarify that the Wikipedia updater, not the update process, determines if a document is a redirection.…edirection detection

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in Wikipedia page redirection handling where titles containing special characters (like slashes) would cause URL formation errors. The fix ensures proper URL encoding using quote_plus, which correctly handles special characters in Wikipedia page titles.

  • Replaced URL encoding method to properly handle special characters in Wikipedia page titles
  • Added comprehensive test coverage for the redirection detection with special characters
  • Corrected error message terminology for clarity

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
welearn_datastack/modules/wikipedia_updater.py Added quote_plus import and applied it to encode page titles before URL construction
welearn_datastack/nodes_workflow/WikipediaUpdater/wikipedia_updater.py Fixed error message to use "updater" instead of "update" for grammatical correctness
tests/wikipedia_updater/test_wikipedia_updater.py Added unit test verifying correct URL encoding for titles with special characters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lpi-tn lpi-tn requested a review from jmsevin December 17, 2025 10:55
@lpi-tn lpi-tn merged commit 6d8db3c into main Dec 18, 2025
7 checks passed
@lpi-tn lpi-tn deleted the Fix/wikipedia-updater branch December 18, 2025 12:57
@lpi-tn lpi-tn restored the Fix/wikipedia-updater branch January 13, 2026 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants