Hello team,
Has anything changed recently to block specific HTTP User-Agents?
I have a Python script that was calling the search API, it used to work for a long time, but recently the API started responding with HTTP 403. Not sure when the problem has started because I am not actively monitoring it.
I went to the documentation to double check if anything has changed and to my surprise the curl request with the same semantics worked.
So I added the request.add_header("User-Agent", "curl/7.68.0") line to my Python script and it started working.
Minimal example to reproduce it:
import urllib.request
import urllib.parse
API_KEY = "sk_xxx"
DOMAIN = "google.com"
url = f"https://api.logo.dev/search?q={urllib.parse.quote(DOMAIN)}"
# Fails with 403
request = urllib.request.Request(url)
request.add_header("Authorization", f"Bearer {API_KEY}")
try:
with urllib.request.urlopen(request) as response:
print("Without User-Agent: OK")
except Exception as e:
print(f"Without User-Agent: {e}")
# Works
request = urllib.request.Request(url)
request.add_header("Authorization", f"Bearer {API_KEY}")
request.add_header("User-Agent", "curl/7.68.0")
try:
with urllib.request.urlopen(request) as response:
print("With User-Agent: OK")
except Exception as e:
print(f"With User-Agent: {e}")
Result of the script:
Without User-Agent: HTTP Error 403: Forbidden
With User-Agent: OK
Maybe I am doing something very wrong here and have a poor understanding of the ecosystem but in this case I would like to know how to properly fix the Python script - I find my solution very hackyish.