Skip to content

Commit e40969d

Browse files
mircealunguclaude
andcommitted
Fix: Add User-Agent to redirect checker
Some sites (e.g. atlasmag.dk) block requests without User-Agent, causing "Could not get url after redirects" errors in crawler. Also added timeout to prevent hanging. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 1910d39 commit e40969d

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

zeeguu/core/content_retriever/article_downloader.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,9 @@ def is_duplicate_by_simhash(content, feed, session):
174174

175175
def _url_after_redirects(url):
176176
# solve redirects and save the clean url
177-
response = requests.get(url)
177+
# Some sites block requests without User-Agent (e.g. atlasmag.dk)
178+
headers = {"User-Agent": "Mozilla/5.0 (compatible; ZeeguuBot/1.0)"}
179+
response = requests.get(url, headers=headers, timeout=10)
178180
return response.url
179181

180182

0 commit comments

Comments
 (0)