-
Notifications
You must be signed in to change notification settings - Fork 0
fix(conf): verify ahrefs bot by ip instead of rdns #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add parser_ahrefs.go to parse Ahrefs IP list JSON format - Update ahrefsbot.yaml to use official IP API (no RDNS) - Ahrefs provides public crawler IPs at api.ahrefs.com
Reviewer's GuideSwitches Ahrefs bot verification from reverse DNS/domain matching to an IP-allowlist fetched via a new JSON parser that reads crawler IPs from Ahrefs’ public API. Sequence diagram for Ahrefs bot IP verification via parsersequenceDiagram
participant BotRequestHandler
participant BotVerifier
participant HTTPClient
participant AhrefsParser
BotRequestHandler->>BotVerifier: handleRequest(userAgent AhrefsBot, remoteIP)
BotVerifier->>HTTPClient: GET https://api.ahrefs.com/v3/public/crawler-ips
HTTPClient-->>BotVerifier: JSON body with ips.ip_address
BotVerifier->>AhrefsParser: Parse(responseBodyReader)
AhrefsParser-->>BotVerifier: []netip.Prefix
BotVerifier->>BotVerifier: check remoteIP in allowed prefixes
alt IP allowed
BotVerifier-->>BotRequestHandler: allow request
else IP not allowed
BotVerifier-->>BotRequestHandler: block request
end
Class diagram for AhrefsParser and parser registrationclassDiagram
class Parser {
<<interface>>
Name() string
Parse(r io.Reader) []netip.Prefix
}
class AhrefsParser {
+Name() string
+Parse(r io.Reader) ([]netip.Prefix, error)
}
class ParserRegistry {
-parsers map[string]Parser
+RegisterParser(name string, parser Parser)
+GetParser(name string) Parser
}
Parser <|.. AhrefsParser
ParserRegistry o--> Parser
class AhrefsBotConfig {
kind string
name string
ua string
parser string
urls []string
}
AhrefsBotConfig --> ParserRegistry : uses parser name ahrefs
%% Specific implementation details for AhrefsParser
class AhrefsParserInternal {
+Parse(r io.Reader) ([]netip.Prefix, error)
-decodeJSON(r io.Reader) data
-convertToPrefixes(data) []netip.Prefix
}
AhrefsParser ..> AhrefsParserInternal : logic flow
class AhrefsAPIResponse {
ips []AhrefsIP
}
class AhrefsIP {
ip_address string
}
AhrefsAPIResponse "1" --> "many" AhrefsIP
class netipPrefix {
addr string
bits int
}
AhrefsParser --> netipPrefix : returns
class Init {
+init()
}
Init ..> ParserRegistry : RegisterParser(ahrefs, &AhrefsParser)
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 1 issue, and left some high level feedback:
- In
Parse, invalid IPs are silently skipped; consider at least surfacing an error when all entries fail to parse so upstream callers can distinguish between an empty response and a parsing issue. - The logic currently assumes all entries are individual IP addresses and converts them to /32 or /128; if the Ahrefs API ever returns CIDR blocks, this would misinterpret them, so you may want to explicitly validate the format or handle CIDRs defensively.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `Parse`, invalid IPs are silently skipped; consider at least surfacing an error when all entries fail to parse so upstream callers can distinguish between an empty response and a parsing issue.
- The logic currently assumes all entries are individual IP addresses and converts them to /32 or /128; if the Ahrefs API ever returns CIDR blocks, this would misinterpret them, so you may want to explicitly validate the format or handle CIDRs defensively.
## Individual Comments
### Comment 1
<location> `parser/parser_ahrefs.go:27-29` </location>
<code_context>
+
+ var prefixes []netip.Prefix
+ for _, ip := range data.IPs {
+ addr, err := netip.ParseAddr(ip.IPAddress)
+ if err != nil {
+ continue
+ }
+ bits := 32
</code_context>
<issue_to_address>
**issue (bug_risk):** Silent skipping of invalid IPs can hide data issues or upstream changes.
Currently, any IP that fails `netip.ParseAddr` is dropped with no signal to the caller. If the upstream API changes format (e.g., starts returning CIDRs or hostnames), this could silently degrade to fewer or zero prefixes. Please consider surfacing parse failures (e.g., returning an error with the offending entry) or at least treating the case where *all* entries fail to parse as an error so callers can distinguish “no IPs” from “parse failure.”
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| addr, err := netip.ParseAddr(ip.IPAddress) | ||
| if err != nil { | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Silent skipping of invalid IPs can hide data issues or upstream changes.
Currently, any IP that fails netip.ParseAddr is dropped with no signal to the caller. If the upstream API changes format (e.g., starts returning CIDRs or hostnames), this could silently degrade to fewer or zero prefixes. Please consider surfacing parse failures (e.g., returning an error with the offending entry) or at least treating the case where all entries fail to parse as an error so callers can distinguish “no IPs” from “parse failure.”
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10 +/- ##
==========================================
- Coverage 74.84% 72.76% -2.08%
==========================================
Files 14 15 +1
Lines 640 661 +21
==========================================
+ Hits 479 481 +2
- Misses 117 136 +19
Partials 44 44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Benchmark Results |
Summary by Sourcery
Update Ahrefs bot verification to use a custom IP parser fed from Ahrefs’ published crawler IP API instead of reverse DNS checks.
New Features:
Enhancements: