Skip to content

Conversation

@cnlangzi
Copy link
Owner

@cnlangzi cnlangzi commented Jan 11, 2026

Summary by Sourcery

Update Ahrefs bot verification to use a custom IP parser fed from Ahrefs’ published crawler IP API instead of reverse DNS checks.

New Features:

  • Introduce an Ahrefs-specific IP parser to convert Ahrefs crawler IP API responses into IP prefixes for bot verification.

Enhancements:

  • Reconfigure the Ahrefs bot definition to reference the new parser and consume crawler IPs from the Ahrefs public API URL instead of RDNS and static domains.

- Add parser_ahrefs.go to parse Ahrefs IP list JSON format
- Update ahrefsbot.yaml to use official IP API (no RDNS)
- Ahrefs provides public crawler IPs at api.ahrefs.com
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jan 11, 2026

Reviewer's Guide

Switches Ahrefs bot verification from reverse DNS/domain matching to an IP-allowlist fetched via a new JSON parser that reads crawler IPs from Ahrefs’ public API.

Sequence diagram for Ahrefs bot IP verification via parser

sequenceDiagram
    participant BotRequestHandler
    participant BotVerifier
    participant HTTPClient
    participant AhrefsParser

    BotRequestHandler->>BotVerifier: handleRequest(userAgent AhrefsBot, remoteIP)
    BotVerifier->>HTTPClient: GET https://api.ahrefs.com/v3/public/crawler-ips
    HTTPClient-->>BotVerifier: JSON body with ips.ip_address
    BotVerifier->>AhrefsParser: Parse(responseBodyReader)
    AhrefsParser-->>BotVerifier: []netip.Prefix
    BotVerifier->>BotVerifier: check remoteIP in allowed prefixes
    alt IP allowed
        BotVerifier-->>BotRequestHandler: allow request
    else IP not allowed
        BotVerifier-->>BotRequestHandler: block request
    end
Loading

Class diagram for AhrefsParser and parser registration

classDiagram
    class Parser {
        <<interface>>
        Name() string
        Parse(r io.Reader) []netip.Prefix
    }

    class AhrefsParser {
        +Name() string
        +Parse(r io.Reader) ([]netip.Prefix, error)
    }

    class ParserRegistry {
        -parsers map[string]Parser
        +RegisterParser(name string, parser Parser)
        +GetParser(name string) Parser
    }

    Parser <|.. AhrefsParser
    ParserRegistry o--> Parser

    class AhrefsBotConfig {
        kind string
        name string
        ua string
        parser string
        urls []string
    }

    AhrefsBotConfig --> ParserRegistry : uses parser name ahrefs

    %% Specific implementation details for AhrefsParser
    class AhrefsParserInternal {
        +Parse(r io.Reader) ([]netip.Prefix, error)
        -decodeJSON(r io.Reader) data
        -convertToPrefixes(data) []netip.Prefix
    }

    AhrefsParser ..> AhrefsParserInternal : logic flow

    class AhrefsAPIResponse {
        ips []AhrefsIP
    }

    class AhrefsIP {
        ip_address string
    }

    AhrefsAPIResponse "1" --> "many" AhrefsIP

    class netipPrefix {
        addr string
        bits int
    }

    AhrefsParser --> netipPrefix : returns

    class Init {
        +init()
    }

    Init ..> ParserRegistry : RegisterParser(ahrefs, &AhrefsParser)
Loading

File-Level Changes

Change Details Files
Change Ahrefs bot configuration to use a custom parser that pulls allowed IPs from Ahrefs’ crawler IP API instead of RDNS/domain checks.
  • Replace RDNS-based validation fields with a parser-based configuration reference
  • Configure the Ahrefs bot to pull IP data from Ahrefs’ public crawler IPs endpoint
bots/conf.d/ahrefsbot.yaml
Introduce an Ahrefs-specific parser that converts Ahrefs’ crawler IP JSON response into a list of IP prefixes for allowlisting.
  • Define AhrefsParser implementing the expected parser interface with Name and Parse methods
  • Decode the Ahrefs crawler IP JSON payload into a typed structure
  • Parse returned IP strings into netip.Addr values, infer prefix length (/32 for IPv4, /128 for IPv6), and accumulate as netip.Prefix list
  • Register the Ahrefs parser under the name "ahrefs" at init time so it can be referenced from configuration
parser/parser_ahrefs.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In Parse, invalid IPs are silently skipped; consider at least surfacing an error when all entries fail to parse so upstream callers can distinguish between an empty response and a parsing issue.
  • The logic currently assumes all entries are individual IP addresses and converts them to /32 or /128; if the Ahrefs API ever returns CIDR blocks, this would misinterpret them, so you may want to explicitly validate the format or handle CIDRs defensively.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `Parse`, invalid IPs are silently skipped; consider at least surfacing an error when all entries fail to parse so upstream callers can distinguish between an empty response and a parsing issue.
- The logic currently assumes all entries are individual IP addresses and converts them to /32 or /128; if the Ahrefs API ever returns CIDR blocks, this would misinterpret them, so you may want to explicitly validate the format or handle CIDRs defensively.

## Individual Comments

### Comment 1
<location> `parser/parser_ahrefs.go:27-29` </location>
<code_context>
+
+	var prefixes []netip.Prefix
+	for _, ip := range data.IPs {
+		addr, err := netip.ParseAddr(ip.IPAddress)
+		if err != nil {
+			continue
+		}
+		bits := 32
</code_context>

<issue_to_address>
**issue (bug_risk):** Silent skipping of invalid IPs can hide data issues or upstream changes.

Currently, any IP that fails `netip.ParseAddr` is dropped with no signal to the caller. If the upstream API changes format (e.g., starts returning CIDRs or hostnames), this could silently degrade to fewer or zero prefixes. Please consider surfacing parse failures (e.g., returning an error with the offending entry) or at least treating the case where *all* entries fail to parse as an error so callers can distinguish “no IPs” from “parse failure.”
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +27 to +29
addr, err := netip.ParseAddr(ip.IPAddress)
if err != nil {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Silent skipping of invalid IPs can hide data issues or upstream changes.

Currently, any IP that fails netip.ParseAddr is dropped with no signal to the caller. If the upstream API changes format (e.g., starts returning CIDRs or hostnames), this could silently degrade to fewer or zero prefixes. Please consider surfacing parse failures (e.g., returning an error with the offending entry) or at least treating the case where all entries fail to parse as an error so callers can distinguish “no IPs” from “parse failure.”

@codecov
Copy link

codecov bot commented Jan 11, 2026

Codecov Report

❌ Patch coverage is 9.52381% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.76%. Comparing base (02d1225) to head (60ce193).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
parser/parser_ahrefs.go 9.52% 19 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #10      +/-   ##
==========================================
- Coverage   74.84%   72.76%   -2.08%     
==========================================
  Files          14       15       +1     
  Lines         640      661      +21     
==========================================
+ Hits          479      481       +2     
- Misses        117      136      +19     
  Partials       44       44              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

Benchmark Results

BenchmarkFindBotByUA_Hit_First             	  912418	      1567 ns/op	       9 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_First-4           	 2191832	       586.2 ns/op	      15 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_First-8           	 3804255	       619.7 ns/op	      13 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_Middle            	 1333860	      1011 ns/op	       9 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_Middle-4          	 2952556	       380.1 ns/op	       3 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_Middle-8          	 4761373	       284.9 ns/op	       4 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_Last              	 1376060	       760.3 ns/op	       6 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_Last-4            	 2529663	       460.9 ns/op	       9 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_Last-8            	 4688588	       476.3 ns/op	      11 B/op	       0 allocs/op
BenchmarkFindBotByUA_Miss                  	  482388	      2466 ns/op	      48 B/op	       0 allocs/op
BenchmarkFindBotByUA_Miss-4                	 1362046	       876.9 ns/op	      22 B/op	       0 allocs/op
BenchmarkFindBotByUA_Miss-8                	 1419037	       845.8 ns/op	      17 B/op	       0 allocs/op
BenchmarkFindBotByUA_CaseSensitive         	 1699888	      1047 ns/op	      16 B/op	       0 allocs/op
BenchmarkFindBotByUA_CaseSensitive-4       	 4022510	       297.6 ns/op	       4 B/op	       0 allocs/op
BenchmarkFindBotByUA_CaseSensitive-8       	--- FAIL: BenchmarkFindBotByUA_CaseSensitive-8
BenchmarkValidate_KnownBot_IPHit           	 1509622	       964.1 ns/op	      20 B/op	       0 allocs/op
BenchmarkValidate_KnownBot_IPHit-4         	 2995089	       365.9 ns/op	       8 B/op	       0 allocs/op
BenchmarkValidate_KnownBot_IPHit-8         	 2862322	       350.1 ns/op	       2 B/op	       0 allocs/op
BenchmarkValidate_Browser                  	  237259	      5356 ns/op	      96 B/op	       0 allocs/op
BenchmarkValidate_Browser-4                	  608026	      1995 ns/op	      53 B/op	       0 allocs/op
BenchmarkValidate_Browser-8                	  622018	      2010 ns/op	      63 B/op	       0 allocs/op
BenchmarkContainsWord                      	73996806	        16.61 ns/op	       0 B/op	       0 allocs/op
BenchmarkContainsWord-4                    	73859707	        16.34 ns/op	       0 B/op	       0 allocs/op
BenchmarkContainsWord-8                    	74000661	        16.17 ns/op	       0 B/op	       0 allocs/op
BenchmarkValidate_WithBotUA                	 1000000	      1184 ns/op	      12 B/op	       0 allocs/op
BenchmarkValidate_WithBotUA-4              	 2861529	       445.7 ns/op	      11 B/op	       0 allocs/op
BenchmarkValidate_WithBotUA-8              	 2643841	       435.5 ns/op	       7 B/op	       0 allocs/op
BenchmarkValidate_WithBotUA_IPMismatch     	  792289	      1562 ns/op	      28 B/op	       0 allocs/op
BenchmarkValidate_WithBotUA_IPMismatch-4   	 2163819	       572.7 ns/op	      15 B/op	       0 allocs/op
BenchmarkValidate_WithBotUA_IPMismatch-8   	 2054359	       560.7 ns/op	       8 B/op	       0 allocs/op
BenchmarkValidate_BrowserUA                	  293779	      4308 ns/op	      91 B/op	       0 allocs/op
BenchmarkValidate_BrowserUA-4              	  829911	      1475 ns/op	      38 B/op	       0 allocs/op
BenchmarkValidate_BrowserUA-8              	  823964	      1484 ns/op	      31 B/op	       0 allocs/op
BenchmarkValidate_UnknownBotUA             	 8207937	       163.2 ns/op	       3 B/op	       0 allocs/op
BenchmarkValidate_UnknownBotUA-4           	22592143	        57.86 ns/op	       1 B/op	       0 allocs/op
BenchmarkValidate_UnknownBotUA-8           	23103595	        55.37 ns/op	       1 B/op	       0 allocs/op
BenchmarkContainsIP                        	56553549	        88.38 ns/op	       0 B/op	       0 allocs/op
BenchmarkContainsIP-4                      	100000000	        11.23 ns/op	       0 B/op	       0 allocs/op
BenchmarkContainsIP-8                      	76045347	       334.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkFindBotByUA                       	  800192	      1543 ns/op	      15 B/op	       0 allocs/op
BenchmarkFindBotByUA-4                     	 2050100	       573.1 ns/op	      12 B/op	       0 allocs/op
BenchmarkFindBotByUA-8                     	 2120923	       579.4 ns/op	       8 B/op	       0 allocs/op
BenchmarkClassifyUA                        	 2141329	       550.1 ns/op	       5 B/op	       0 allocs/op
BenchmarkClassifyUA-4                      	 4889764	       247.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkClassifyUA-8                      	 4862206	       247.9 ns/op	       0 B/op	       0 allocs/op
Benchmark_MixedTraffic                     	  471880	      2559 ns/op	      19 B/op	       0 allocs/op
Benchmark_MixedTraffic-4                   	 1342503	       941.7 ns/op	      22 B/op	       0 allocs/op
Benchmark_MixedTraffic-8                   	 1256643	       964.4 ns/op	       9 B/op	       0 allocs/op
BenchmarkReload                            	     835	   1440394 ns/op	  680394 B/op	    6629 allocs/op
BenchmarkReload-4                          	     978	   1238784 ns/op	  670374 B/op	    6487 allocs/op
BenchmarkReload-8                          	     981	   1262788 ns/op	  678778 B/op	    6589 allocs/op
PASS
ok  	github.com/cnlangzi/knownbots	114.722s

@cnlangzi cnlangzi merged commit 67dff45 into main Jan 11, 2026
3 of 5 checks passed
@cnlangzi cnlangzi deleted the fix/ahrefs branch January 11, 2026 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants