Skip to content

Conversation

@cnlangzi
Copy link
Owner

@cnlangzi cnlangzi commented Jan 12, 2026

Summary by Sourcery

Add ASN-based bot verification with per-bot ASN configuration and caching, integrate it into the validator’s IP verification pipeline, and refactor bot IP/RDNS handling for better encapsulation and performance.

New Features:

  • Support ASN-based verification for bots via new ASN configuration in bot YAML and an ASN cache integrated into IP verification.
  • Introduce an IPTree structure and ASN cache to accelerate IP and ASN prefix lookups for bot validation.
  • Add ASN data fetching and caching infrastructure with multiple upstream fetchers (RIPE, RouteViews, BGPHE) and per-bot ASN JSON caches.

Bug Fixes:

  • Ensure validator and ASN-based verification safely handle invalid or malformed IPs without panics.

Enhancements:

  • Refactor bot IP range, ASN, and RDNS lifecycle management into Bot methods to centralize initialization, refresh, and persistence logic.
  • Replace the generic RDNS cache with a dedicated RDNS type and clarify bot kind constants and structures for clearer responsibilities.
  • Update performance documentation and architecture docs to describe ASN verification, new verification order, and rationale for bot method encapsulation.

Build:

  • Introduce a Makefile to standardize test, integration test, and benchmark execution.
  • Add new module dependencies for Patricia trie-based prefix storage and supporting libraries.

CI:

  • Adjust GitHub Actions workflows to run tests and benchmarks via Makefile targets and add a separate integration test job.

Documentation:

  • Extend README, AGENTS, and API docs to document ASN-based verification, configuration options, performance characteristics, and updated operational workflow.

Tests:

  • Add unit and benchmark tests for ASN cache behavior, IPTree-backed IP matching, and bot ASN refresh semantics.
  • Add integration tests for ASN fetchers (RIPE, RouteViews, BGPHE) and new validator behavior with invalid IPs and ASN verification.

- Move IPTree and ASN cache operations to Bot methods (loadCachedIPs,
  refreshIPs, initializeASN, refreshASN) to eliminate duplicate logic
  between initBot and runScheduler
- Centralize state management with ensureIPTree/storePrefixes helpers
- Remove redundant standalone helpers from validator.go
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jan 12, 2026

Reviewer's Guide

Implements ASN-based bot IP verification and a shared radix-based IPTree, refactors bot IP/RDNS/ASN lifecycle into Bot methods, replaces the generic RDNS cache type, removes Validator.Reload, wires ASN checks into the validation pipeline, and adds Makefile, ASN fetcher/store infrastructure, CI, docs, and tests to support the new behavior.

Sequence diagram for background scheduler refreshing IP, ASN, and RDNS

sequenceDiagram
    participant Validator
    participant Scheduler as SchedulerGoroutine
    participant Bot
    participant HTTPClient
    participant AsnStore
    participant RDNSCache as RDNS

    Validator->>Scheduler: startScheduler(ctx)
    loop Every refresh interval
        Scheduler->>Validator: runScheduler(HTTPClient)
        Validator->>Validator: getBots()
        loop for each bot
            Validator->>Bot: refreshIPs(HTTPClient, root)
            alt bot has URLs and new prefixes
                Bot->>HTTPClient: downloadIPs(bot)
                HTTPClient-->>Bot: []netip.Prefix
                Bot->>Bot: storePrefixes(prefixes)
                Bot->>FileSystem: writeIPs(botDir/ips.txt, prefixes)
                FileSystem-->>Bot: ok
            else no new prefixes
                Bot-->>Validator: no-op
            end

            Validator->>Bot: refreshASN(AsnStore)
            alt bot has ASN list
                loop for each ASN
                    Bot->>AsnStore: Refresh(bot.Name, asn)
                    AsnStore->>AsnStore: fetch(asn) via Fetchers
                    AsnStore-->>Bot: []netip.Prefix
                    Bot->>ASNCache as ASN: Add(asn, prefixes)
                end
                Bot->>Bot: asns.Store(newASN)
            else no ASN configured
                Bot-->>Validator: no-op
            end

            Validator->>Bot: refreshRDNS()
            alt bot.RDNS and bot.rdns not nil
                Bot->>RDNSCache: Prune(bot.Domains)
                RDNSCache-->>Bot: pruned
                Bot->>RDNSCache: Persist()
                RDNSCache-->>Bot: ok or error
            else RDNS disabled or nil
                Bot-->>Validator: no-op
            end
        end
    end
Loading

Class diagram for updated bot validation and ASN infrastructure

classDiagram
    class Bot {
        +string Name
        +BotKind Kind
        +string Parser
        +string UA
        +[]string URLs
        +[]int ASN
        +[]string Domains
        +bool RDNS
        -atomic.Pointer~IPTree~ ips
        -atomic.Pointer~ASN~ asns
        -RDNS rdns
        -LRU fail
        +void storePrefixes(prefixes []netip.Prefix)
        +void initIPs(path string)
        +void refreshIPs(httpClient *http.Client, root string)
        +void initRDNS(path string)
        +void refreshRDNS()
        +void initASN(store *AsnStore)
        +void refreshASN(store *AsnStore)
        +bool ContainsIP(ipStr string)
        +ResultStatus VerifyRDNS(ipStr string)
    }

    class BotKind {
        <<enum>>
        +KindSearchEngine
        +KindSocialMedia
        +KindAITraining
        +KindAIAssist
        +KindAIMixed
        +KindSEO
        +KindMonitor
        +KindSecurity
        +KindScraper
        +KindUnknown
    }

    class IPTree {
        -uint_tree.TreeV4 v4
        -uint_tree.TreeV6 v6
        +void Add(prefix netip.Prefix)
        +bool Contains(ip netip.Addr)
        +int Count()
    }

    class ASN {
        -[]int asns
        -IPTree tree
        +[]int ASNs()
        +void Add(asn int, prefixes []netip.Prefix)
        +bool Contains(ip netip.Addr)
        +int Count()
    }

    class RDNS {
        -atomic.Pointer~map[string]string~ valid
        -string file
        +string Get(key string) bool
        +void Set(key string, value string)
        +error Persist()
        +void Prune(domains []string)
        +int Size()
        +error Close()
    }

    class LRU {
        +bool Contains(key string)
        +void Add(key string)
    }

    class Validator {
        -string root
        -atomic.Pointer~[]*Bot~ bots
        -atomic.Pointer~map[byte][]*Bot~ uaIndex
        -AsnStore asnStore
        -context.CancelFunc cancel
        -int failLimit
        -bool classifyUA
        +Result Validate(ua string, ip string)
        +Result verifyIP(bot *Bot, ipStr string)
        +void startScheduler(ctx context.Context)
        +void runScheduler(httpClient *http.Client)
        +error Close()
    }

    class AsnStore {
        -string cacheDir
        -[]Fetcher fetchers
        +[]netip.Prefix Refresh(botName string, asn int)
        +[]netip.Prefix Load(botName string, asn int)
        -error save(botName string, asn int, prefixes []netip.Prefix)
        -[]netip.Prefix fetch(asn int) error
    }

    class Fetcher {
        <<interface>>
        +[]netip.Prefix Fetch(asn int) error
    }

    class RIPEFetcher {
        -http.Client* Client
        +[]netip.Prefix Fetch(asn int) error
    }

    class RouteViewsFetcher {
        -http.Client* Client
        +[]netip.Prefix Fetch(asn int) error
    }

    class BGPHEFetcher {
        -http.Client* Client
        +[]netip.Prefix Fetch(asn int) error
    }

    class UtilFunctions {
        +[]netip.Prefix Deduplicate(prefixes []netip.Prefix)
        +[]netip.Prefix FilterInvalidPrefixes(prefixes []netip.Prefix)
        +bool ValidatePrefix(prefix netip.Prefix)
    }

    %% Relationships
    Bot --> BotKind
    Bot --> IPTree
    Bot --> ASN
    Bot --> RDNS
    Bot --> LRU

    ASN --> IPTree

    Validator --> Bot
    Validator --> AsnStore

    AsnStore --> Fetcher
    RIPEFetcher ..|> Fetcher
    RouteViewsFetcher ..|> Fetcher
    BGPHEFetcher ..|> Fetcher

    UtilFunctions ..> ASN
    UtilFunctions ..> AsnStore

    RDNS --> LRU
Loading

File-Level Changes

Change Details Files
Introduce shared IPTree and ASN cache structures and integrate them into Bot and Validator verification flow.
  • Add IPTree type backed by kentik/patricia v4/v6 trees with Add, Contains, Count helpers
  • Extend Bot struct to hold atomic IPTree and ASN pointers, ASN config, and an RDNS cache field
  • Implement Bot methods to initialize and refresh IP ranges, ASN data, and RDNS caches from disk/HTTP/ASN store
  • Update buildBot/loadBot and embedded config loading to construct IPTree-based IP state instead of raw prefix slices
  • Change Bot.ContainsIP and Validator.verifyIP to use IPTree and ASN caches, with verification order IP ranges → ASN → RDNS
bot.go
embed.go
validator.go
asn.go
Replace generic RDNS cache with a dedicated RDNS type and adjust RDNS verification and tests.
  • Rename cache.go to rdns.go and rename Cache to RDNS with same atomic map semantics and file-backed persistence
  • Wire RDNS caches onto Bot.rdns instead of Validator-level Cache field, and adjust VerifyRDNS to use Bot.rdns directly
  • Update scheduler to prune/persist RDNS via Bot.refreshRDNS and to always initialize per-bot RDNS cache directories
  • Rename and update LRU/RDNS tests from Cache_* to RDNS_* and adjust Bot tests to use IPTree/RDNS abstraction
rdns.go
bot.go
rdns_lru_test.go
knownbots_test.go
Add ASN fetcher/store infrastructure and integrate ASN-based verification into bots.
  • Introduce ASN struct using IPTree to hold per-bot ASN prefixes plus Deduplicate and FilterInvalidPrefixes helpers
  • Add asn package with Store that loads/saves per-bot ASN JSON caches and fetches prefixes via RIPE, RouteViews, and BGPHE fetchers
  • Implement RIPE, RouteViews, and BGPHE fetchers plus regex-based HTML parsing for BGPHE and validation utilities
  • Initialize a shared asn.Store in Validator.New and call Bot.initASN/refreshASN to populate/refresh ASN caches per bot
  • Add unit, benchmark, and integration tests covering ASN cache operations, fetchers, prefix validation, and invalid IP handling
asn.go
asn/store.go
asn/fetcher.go
asn/fetcher_ripe.go
asn/fetcher_routeviews.go
asn/fetcher_bgphe.go
asn/util.go
asn_test.go
asn_benchmark_test.go
asn/fetcher_ripe_test.go
asn/fetcher_routeviews_test.go
asn/fetcher_bgphe_regex_test.go
asn/fetcher_bgphe_test.go
validator_invalid_ip_test.go
bot_asn_refresh_test.go
Remove Validator.Reload hot-reload path and update docs to favor validator recreation for config changes.
  • Delete Validator.Reload implementation and its benchmark and tests, and remove references from README API docs
  • Remove reload-focused tests in validator and bot tests that depended on Reload behavior
  • Document new recommended pattern for applying config changes by recreating Validator with WithRoot instead of hot reload
validator.go
validator_bench_test.go
bot.go
knownbots_test.go
README.md
Update documentation, CI, and tooling to reflect ASN verification and new workflows.
  • Extend README and AGENTS docs with ASN configuration options, validation flow diagrams including ASN, performance table updates, and Bot method rationale
  • Add asn/ directory description to AGENTS layout section and update references to rdns.go
  • Introduce Makefile with fmt, vet, test, test-short, test-integration, bench, and bench-output targets
  • Update GitHub Actions workflow to use make test-short for unit tests, add an integration test job calling make test-integration, and run benchmarks via make bench-output
  • Add kentik/patricia and golang.org/x/sync dependencies to go.mod/go.sum and ignore new artifacts as needed
README.md
AGENTS.md
.github/workflows/test.yml
Makefile
go.mod
go.sum
.gitignore

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 9 issues, and left some high level feedback:

  • In verifyIP, using netip.MustParseAddr(ipStr) will panic on invalid IP input, whereas other paths (like ContainsIP) parse safely; consider switching to netip.ParseAddr and handling the error to avoid crashing on malformed client input.
  • The Bot.VerifyRDNS method assumes b.rdns is non-nil, but initRDNS can early-return on NewRDNS error while leaving RDNS set to true; this can lead to a nil-pointer dereference at runtime and should be guarded with a nil check or by disabling RDNS on init failure.
  • The asn.ASN type is mutated via Add while being read via Contains without synchronization, yet refreshASN is called from the scheduler while validation can run concurrently; consider making ASN immutable with atomic pointer swaps (as done for IPTree) or adding proper locking to avoid data races.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `verifyIP`, using `netip.MustParseAddr(ipStr)` will panic on invalid IP input, whereas other paths (like `ContainsIP`) parse safely; consider switching to `netip.ParseAddr` and handling the error to avoid crashing on malformed client input.
- The `Bot.VerifyRDNS` method assumes `b.rdns` is non-nil, but `initRDNS` can early-return on `NewRDNS` error while leaving `RDNS` set to true; this can lead to a nil-pointer dereference at runtime and should be guarded with a nil check or by disabling RDNS on init failure.
- The `asn.ASN` type is mutated via `Add` while being read via `Contains` without synchronization, yet `refreshASN` is called from the scheduler while validation can run concurrently; consider making `ASN` immutable with atomic pointer swaps (as done for `IPTree`) or adding proper locking to avoid data races.

## Individual Comments

### Comment 1
<location> `validator.go:188-189` </location>
<code_context>
 	}

-	// RDNS verification
+	// ASN verification (fast after cache load, ~100ns)
+	if bot.asns != nil && bot.asns.Contains(netip.MustParseAddr(ipStr)) {
+		return Result{BotName: bot.Name, BotKind: bot.Kind, Status: StatusVerified, IsBot: true}
+	}
</code_context>

<issue_to_address>
**issue (bug_risk):** Avoid netip.MustParseAddr here to prevent potential panics on malformed IPs.

`verifyIP` accepts arbitrary `ipStr`, so relying on prior validation makes it fragile if the function is reused or called with untrusted input. Prefer either:

- Parsing once earlier and passing a `netip.Addr` into this function, or
- Using `netip.ParseAddr` here and returning early on error.

This keeps the verifier safe from panics in more usage contexts.
</issue_to_address>

### Comment 2
<location> `asn/asn.go:33-34` </location>
<code_context>
+}
+
+// Add adds all prefixes from an ASN to the cache.
+func (c *ASN) Add(asn int, prefixes []netip.Prefix) error {
+	c.asns = append(c.asns, asn)
+	for _, prefix := range prefixes {
+		addr := prefix.Addr()
</code_context>

<issue_to_address>
**issue (bug_risk):** ASN.Add mutates internal state without synchronization, which can race with concurrent Contains calls.

`ASN` is accessed via `Bot.asns`: `refreshASN` calls `Add` from the scheduler while `Contains` is called from `Validate` in other goroutines. `Add` mutates both the `asns` slice and the radix trees without any locking, so concurrent `Add`/`Contains` can race unless `TreeV4/TreeV6` are documented as safe for concurrent writes (which is uncommon).

Prefer making `ASN` immutable for readers: construct a new `*ASN` with updated trees and then atomically swap `bot.asns` (e.g. using `atomic.Pointer[*asn.ASN]`). This ensures `Contains` always operates on a stable, read‑only snapshot.
</issue_to_address>

### Comment 3
<location> `asn/asn.go:79` </location>
<code_context>
+
+// Count returns the total number of prefixes in the cache.
+func (c *ASN) Count() int {
+	return c.v4.CountTags() + c.v6.CountTags()
+}
+
</code_context>

<issue_to_address>
**suggestion (bug_risk):** ASN.Add appends duplicate ASN IDs to the asns slice on repeated calls.

Because `Bot.refreshASN` can call `Add` repeatedly for the same ASN, `c.asns` can accumulate duplicates, and `ASNs()` may return an inflated list. To avoid this, consider deduplicating by tracking seen ASNs in a `map[int]struct{}` or by checking recent entries before appending, if call patterns allow that optimization.

Suggested implementation:

```golang
	}
}

// Count returns the total number of prefixes in the cache.
func (c *ASN) Count() int {
	return c.v4.CountTags() + c.v6.CountTags()
}

import (
	"net/netip"

	"github.com/kentik/patricia"
	"github.com/kentik/patricia/uint_tree"
)

// ASN provides fast O(1) IP lookup using Radix Trees for ASN prefix matching.
type ASN struct {
	asns []int
	// seen tracks which ASN IDs have already been added to avoid duplicates
	seen map[int]struct{}

```

```golang
	asnsCopy := make([]int, len(c.asns))
	copy(asnsCopy, c.asns)

	var seen map[int]struct{}
	if len(asnsCopy) > 0 {
		seen = make(map[int]struct{}, len(asnsCopy))
		for _, id := range asnsCopy {
			seen[id] = struct{}{}
		}
	}

	return &ASN{
		asns: asnsCopy,
		seen: seen,
		v4:   c.v4.Clone(),
		v6:   c.v6.Clone(),
	}

```

```golang
func (c *ASN) Add(prefix netip.Prefix, asn int) {
	// existing logic (e.g. inserting into v4/v6 trees) ...

	// Lazily initialize and/or backfill the seen map to avoid duplicates.
	if c.seen == nil {
		if len(c.asns) > 0 {
			c.seen = make(map[int]struct{}, len(c.asns)+1)
			for _, existing := range c.asns {
				c.seen[existing] = struct{}{}
			}
		} else {
			c.seen = make(map[int]struct{}, 1)
		}
	}

	// Only append ASN IDs we have not seen before.
	if _, ok := c.seen[asn]; !ok {
		c.seen[asn] = struct{}{}
		c.asns = append(c.asns, asn)
	}
}

```

I’ve assumed the presence and signature of `ASN.Add` as `func (c *ASN) Add(prefix netip.Prefix, asn int)`. If the actual signature or local variable name for the ASN ID differs:

1. Update the `Add` method replacement block to use the correct parameter name and to merge with the existing body rather than the placeholder comment.
2. Ensure any constructors or factory functions (e.g. `NewASN(...)`) initialize `seen: make(map[int]struct{})` on new instances so deduplication works even before the first `Add` call.
3. If there are other code paths that mutate `c.asns` directly (outside of `Add`/`Clone`), they should either:
   - be refactored to go through `Add`, or
   - update `c.seen` consistently when they modify `c.asns`.
</issue_to_address>

### Comment 4
<location> `asn/fetcher_bgphe.go:18-19` </location>
<code_context>
+	Client *http.Client
+}
+
+// Match prefix links like: <a href="/net/31.13.24.0/21">31.13.24.0/21</a>
+var bgpHePrefixRe = regexp.MustCompile(`<a href="/net/([0-9./a-fA-F]+)"[^>]*>`)
+
+func NewBGPHE() *BGPHE {
</code_context>

<issue_to_address>
**issue (bug_risk):** BGPHE regex won't match IPv6 prefixes because it excludes ':' from the character class.

Because the pattern only permits digits, `.`, `/`, and hex letters, it will fail to match IPv6 prefixes like `2001:db8::/32` (which contain `:`), so IPv6 entries from bgp.he.net will be dropped.

Consider extending the character class to include `:`, e.g. `([0-9a-fA-F:.]+)`, or a stricter variant that still covers all valid IPv6 prefixes in the HTML.
</issue_to_address>

### Comment 5
<location> `asn/store.go:18-27` </location>
<code_context>
+}
+
+// NewStore creates a new ASN store.
+func NewStore(cacheDir string) (*Store, error) {
+	s := &Store{
+		cacheDir: cacheDir,
+		fetchers: []Fetcher{
+			NewRIPE(),
+			NewRouteViews(),
+			NewBGPHE(),
+		},
+	}
+
+	return s, nil
+}
+
</code_context>

<issue_to_address>
**nitpick:** NewStore returns an error but currently cannot fail; consider simplifying the API or adding validation.

Since the function always returns `nil`, either drop the error from the signature for a cleaner API, or introduce concrete validation of `cacheDir` (e.g., non-empty, writable) so the error path is meaningful.
</issue_to_address>

### Comment 6
<location> `asn/store.go:92-101` </location>
<code_context>
+}
+
+func (s *Store) fetch(asn int) ([]netip.Prefix, error) {
+	var lastErr error
+	for _, fetcher := range s.fetchers {
+		prefixes, err := fetcher.Fetch(asn)
+		if err != nil {
+			lastErr = err
+			continue
+		}
+		if len(prefixes) > 0 {
+			return prefixes, nil
+		}
+	}
+
+	return nil, fmt.Errorf("all fetchers failed for ASN %d: %w", asn, lastErr)
+}
</code_context>

<issue_to_address>
**suggestion:** fetch returns an error wrapping a possibly nil lastErr when all fetchers return no prefixes but no error.

When all fetchers return `(nil, nil)` or `([]Prefix{}, nil)`, `lastErr` remains `nil`, so the function returns `fmt.Errorf("all fetchers failed...: %w", nil)`. That produces a non-nil error whose wrapped cause is `<nil>`, which can be misleading.

Consider tracking whether any error actually occurred (e.g. a `hadErr` flag) and either returning a clearer message like "no prefixes from any fetcher" or wrapping the last non-nil error only when one exists.

Suggested implementation:

```golang
func (s *Store) fetch(asn int) ([]netip.Prefix, error) {
	var (
		lastErr error
		hadErr  bool
	)

	for _, fetcher := range s.fetchers {
		prefixes, err := fetcher.Fetch(asn)
		if err != nil {
			hadErr = true
			lastErr = err
			continue
		}

		if len(prefixes) > 0 {
			return prefixes, nil
		}
	}

	if hadErr {
		return nil, fmt.Errorf("all fetchers failed for ASN %d: %w", asn, lastErr)
	}

	return nil, fmt.Errorf("no prefixes found for ASN %d from any fetcher", asn)
}

// NewStore creates a new ASN store.
func NewStore(cacheDir string) (*Store, error) {

```

1. Ensure the file imports the `fmt` package if it is not already imported:
   - If you have an import block like `import (...)`, add `fmt` inside it.
   - If there is a single-line import, convert to a block or add another import line for `fmt`.
2. If you have a custom error type or error-wrapping helper used elsewhere in this package, you may want to use that instead of `fmt.Errorf` for consistency.
</issue_to_address>

### Comment 7
<location> `knownbots_test.go:181` </location>
<code_context>
 	for _, tt := range tests {
+		tree := NewIPTree()
+		tree.Add(netip.MustParsePrefix(tt.cidr))
 		bot := &Bot{
-			custom: &atomic.Pointer[[]IPPrefix]{},
+			ips: &atomic.Pointer[IPTree]{},
</code_context>

<issue_to_address>
**issue (testing):** Add tests that exercise ASN-based verification in the Validator/verifyIP flow

Current tests only cover the new `IPTree` usage in `ContainsIP`; they don’t exercise the ASN-based verification path in the bot/validator flow. Please add tests that:
- Build a `Bot` with an `asn.ASN` containing a known prefix, run it through `Validator`/`verifyIP`, and assert that an IP in that ASN is `StatusVerified` with `IsBot=true` even if it’s not in explicit IP ranges and before RDNS.
- Use an IP not in any configured ASN and assert that verification still fails.

This will confirm the new ASN logic is actually used end-to-end in bot verification.
</issue_to_address>

### Comment 8
<location> `asn/asn_test.go:8` </location>
<code_context>
+	"testing"
+)
+
+func TestNewASN(t *testing.T) {
+	cache := NewASN()
+	if cache == nil {
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests for ASN Store (Load/Refresh/save) using a temporary directory and stub fetcher

The core ASN logic is covered, but the `Store` layer that manages JSON persistence and fetchers isn’t. In `store_test.go`, you could: (1) use a temporary directory for `cacheDir`, (2) pass in a fake `Fetcher` that returns a fixed set of prefixes (and optional errors), and (3) assert that `Refresh` writes the expected JSON and `Load` reconstructs the correct `[]netip.Prefix`. It’s also worth adding a case with invalid prefixes in the cache file to ensure they’re skipped cleanly.

Suggested implementation:

```golang
package asn

import (
	"context"
	"encoding/json"
	"errors"
	"net/netip"
	"os"
	"path/filepath"
	"testing"
	"time"
)

// fakeFetcher is a stub implementation of the ASN Fetcher used by Store.
// Adjust the method signatures to match the real Fetcher interface.
type fakeFetcher struct {
	prefixes []netip.Prefix
	err      error
}

func (f *fakeFetcher) Fetch(ctx context.Context) ([]netip.Prefix, error) {
	// Simulate a bit of work and support cancellation
	select {
	case <-ctx.Done():
		return nil, ctx.Err()
	case <-time.After(1 * time.Millisecond):
	}

	if f.err != nil {
		return nil, f.err
	}
	return append([]netip.Prefix(nil), f.prefixes...), nil
}

func TestNewASN(t *testing.T) {
	cache := NewASN()
	if cache == nil {
		t.Fatal("expected non-nil cache")
	}
	if cache.IPv4Tree() == nil {
		t.Error("expected non-nil IPv4Tree")
	}
	if cache.IPv6Tree() == nil {
		t.Error("expected non-nil IPv6Tree")
	}
}

// TestStoreRefreshAndLoad verifies that Refresh persists prefixes to disk as JSON
// and Load reconstructs the correct []netip.Prefix from the cache file.
func TestStoreRefreshAndLoad(t *testing.T) {
	t.Parallel()

	// Temporary directory for this test's cache files
	cacheDir := t.TempDir()

	// Prepare a deterministic set of prefixes returned by the fake fetcher
	wantPrefixes := []netip.Prefix{
		netip.MustParsePrefix("1.2.3.0/24"),
		netip.MustParsePrefix("2001:db8::/32"),
	}

	ff := &fakeFetcher{
		prefixes: wantPrefixes,
	}

	// NOTE: adjust constructor and arguments to match the real Store API.
	store := NewStore(cacheDir, ff)
	ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
	defer cancel()

	// First call should fetch from the fake fetcher and persist to disk.
	if err := store.Refresh(ctx); err != nil {
		t.Fatalf("Refresh() error = %v, want nil", err)
	}

	// Ensure the cache file exists and contains what we expect in JSON form.
	// If the real implementation uses a different filename, update cacheFileName.
	cacheFileName := filepath.Join(cacheDir, "asn_cache.json")

	data, err := os.ReadFile(cacheFileName)
	if err != nil {
		t.Fatalf("failed to read cache file %q: %v", cacheFileName, err)
	}

	var gotJSON []string
	if err := json.Unmarshal(data, &gotJSON); err != nil {
		t.Fatalf("failed to unmarshal cache JSON: %v", err)
	}

	if len(gotJSON) != len(wantPrefixes) {
		t.Fatalf("unexpected number of prefixes in JSON: got %d, want %d", len(gotJSON), len(wantPrefixes))
	}

	for i, p := range wantPrefixes {
		if gotJSON[i] != p.String() {
			t.Errorf("JSON[%d] = %q, want %q", i, gotJSON[i], p.String())
		}
	}

	// Now construct a new Store pointing at the same cacheDir and verify Load
	// reconstructs the original prefixes from the JSON cache file.
	store2 := NewStore(cacheDir, &fakeFetcher{
		// Configure with an error to ensure Load does not call Fetch.
		err: errors.New("fetch should not be called during Load"),
	})

	loaded, err := store2.Load()
	if err != nil {
		t.Fatalf("Load() error = %v, want nil", err)
	}

	if len(loaded) != len(wantPrefixes) {
		t.Fatalf("Load() returned %d prefixes, want %d", len(loaded), len(wantPrefixes))
	}

	for i, want := range wantPrefixes {
		if loaded[i] != want {
			t.Errorf("Load()[%d] = %v, want %v", i, loaded[i], want)
		}
	}
}

// TestStoreLoadSkipsInvalidPrefixes ensures that invalid prefixes in the cache
// file are skipped cleanly and do not cause Load to fail.
func TestStoreLoadSkipsInvalidPrefixes(t *testing.T) {
	t.Parallel()

	cacheDir := t.TempDir()

	// Manually write a cache file that mixes valid and invalid prefixes.
	cacheFileName := filepath.Join(cacheDir, "asn_cache.json")

	raw := []string{
		"203.0.113.0/24", // valid
		"not-a-prefix",   // invalid
		"2001:db8::/32",  // valid
		"203.0.113.256/32", // invalid IP
	}
	data, err := json.Marshal(raw)
	if err != nil {
		t.Fatalf("failed to marshal test cache JSON: %v", err)
	}

	if err := os.WriteFile(cacheFileName, data, 0o644); err != nil {
		t.Fatalf("failed to write test cache file: %v", err)
	}

	// A Store whose fetcher should not be used during Load.
	store := NewStore(cacheDir, &fakeFetcher{
		err: errors.New("fetch should not be called during Load"),
	})

	got, err := store.Load()
	if err != nil {
		t.Fatalf("Load() error = %v, want nil", err)
	}

	// Only the valid prefixes should be returned.
	want := []netip.Prefix{
		netip.MustParsePrefix("203.0.113.0/24"),
		netip.MustParsePrefix("2001:db8::/32"),
	}

	if len(got) != len(want) {
		t.Fatalf("Load() returned %d prefixes, want %d", len(got), len(want))
	}

	for i, w := range want {
		if got[i] != w {
			t.Errorf("Load()[%d] = %v, want %v", i, got[i], w)
		}
	}
}

```

These tests assume the existence of a `Store` abstraction and related APIs. To integrate them with your actual codebase, you may need to:

1. **Adjust the Store API usage:**
   - Replace `NewStore(cacheDir, ff)` with the real constructor (e.g. `NewStore(cacheDir, WithFetcher(ff))` or similar).
   - Update `store.Refresh(ctx)` and `store.Load()` to match your method names and signatures (for example, `store.Refresh(ctx, force bool)` or `store.Load(ctx)`).

2. **Align with the real Fetcher interface:**
   - The stub `fakeFetcher` currently implements `Fetch(ctx context.Context) ([]netip.Prefix, error)`.
   - If your real `Fetcher` interface has a different name, method, or return type, update `fakeFetcher` accordingly and ensure it satisfies the interface.

3. **Match the on-disk cache format and filename:**
   - Tests currently assume the cache is JSON-encoded as `[]string` of prefix CIDRs and stored in `asn_cache.json` inside `cacheDir`.
   - If your implementation uses a different filename, directory structure, or JSON schema (e.g. objects with ASN metadata), update:
     - `cacheFileName` paths in tests.
     - The `gotJSON` decoding and assertions.
     - The `raw` data written in `TestStoreLoadSkipsInvalidPrefixes`.

4. **Confirm when Fetch is called vs Load:**
   - The tests assume `Load()` reads only from disk and does not trigger a fetch; if your design differs, either:
     - Adjust the expectations (e.g., allow Fetch on Load), or
     - Update the implementation to keep Load as a pure disk read, and `Refresh` as the network/Fetcher path.

5. **Parallel test safety:**
   - These tests rely on `t.TempDir()` to avoid shared state; ensure that your `Store` doesn’t introduce global state (e.g., fixed cache path) that could conflict across parallel tests.
</issue_to_address>

### Comment 9
<location> `README.md:260` </location>
<code_context>
+  └──────────┘      └──────────────┘ │ RDNS     │
</code_context>

<issue_to_address>
**issue (typo):** Fix probable typo `rdns=t` to `rdns=true` in the diagram.

In the RDNS cache box, the label `(rdns=t)` is inconsistent with the earlier `(rdns=true)` usage in the README and original diagram. Please update it to `(rdns=true)` for consistency.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@github-actions
Copy link

github-actions bot commented Jan 12, 2026

Benchmark Results

BenchmarkASNCache_Add                      	 1000000	      1545 ns/op	     443 B/op	      10 allocs/op
BenchmarkASNCache_Add-4                    	 1000000	      1420 ns/op	     443 B/op	      10 allocs/op
BenchmarkASNCache_Add-8                    	 1000000	      1443 ns/op	     443 B/op	      10 allocs/op
BenchmarkASNCache_Contains                 	17584021	        67.44 ns/op	      20 B/op	       2 allocs/op
BenchmarkASNCache_Contains-4               	33316281	        34.07 ns/op	      20 B/op	       2 allocs/op
BenchmarkASNCache_Contains-8               	33632808	        36.49 ns/op	      20 B/op	       2 allocs/op
BenchmarkASNCache_Count                    	240973417	         5.251 ns/op	       0 B/op	       0 allocs/op
BenchmarkASNCache_Count-4                  	239896488	         4.979 ns/op	       0 B/op	       0 allocs/op
BenchmarkASNCache_Count-8                  	240934194	         4.977 ns/op	       0 B/op	       0 allocs/op
BenchmarkManager_VerifyIP                  	14905923	        79.55 ns/op	      20 B/op	       2 allocs/op
BenchmarkManager_VerifyIP-4                	31246377	        38.07 ns/op	      20 B/op	       2 allocs/op
BenchmarkManager_VerifyIP-8                	29870930	        42.63 ns/op	      20 B/op	       2 allocs/op
BenchmarkDistinct                          	 1498246	       811.7 ns/op	     384 B/op	      19 allocs/op
BenchmarkDistinct-4                        	 1573089	       761.8 ns/op	     384 B/op	      19 allocs/op
BenchmarkDistinct-8                        	 1573929	       761.6 ns/op	     384 B/op	      19 allocs/op
BenchmarkValidatePrefix                    	39283392	        30.55 ns/op	       0 B/op	       0 allocs/op
BenchmarkValidatePrefix-4                  	36866299	        30.49 ns/op	       0 B/op	       0 allocs/op
BenchmarkValidatePrefix-8                  	39342876	        30.51 ns/op	       0 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_First             	 1301018	      1166 ns/op	      20 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_First-4           	 2227196	       569.6 ns/op	      21 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_First-8           	 2490214	       608.9 ns/op	      24 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_Middle            	  991258	      1233 ns/op	      25 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_Middle-4          	 2392368	       464.3 ns/op	      21 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_Middle-8          	 4996014	       520.6 ns/op	      24 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_Last              	 1000000	      1152 ns/op	      43 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_Last-4            	 2743015	       368.4 ns/op	      20 B/op	       0 allocs/op
BenchmarkFindBotByUA_Hit_Last-8            	 2321821	       545.8 ns/op	      27 B/op	       0 allocs/op
BenchmarkFindBotByUA_Miss                  	  493398	      2506 ns/op	      80 B/op	       0 allocs/op
BenchmarkFindBotByUA_Miss-4                	 1275422	       940.1 ns/op	      35 B/op	       0 allocs/op
BenchmarkFindBotByUA_Miss-8                	--- FAIL: BenchmarkFindBotByUA_Miss-8
BenchmarkFindBotByUA_CaseSensitive         	 1000000	      1115 ns/op	      37 B/op	       0 allocs/op
BenchmarkFindBotByUA_CaseSensitive-4       	 3498962	       452.9 ns/op	      22 B/op	       0 allocs/op
BenchmarkFindBotByUA_CaseSensitive-8       	 2439232	       523.6 ns/op	      22 B/op	       0 allocs/op
BenchmarkValidate_KnownBot_IPHit           	 1218859	       925.3 ns/op	      61 B/op	       2 allocs/op
BenchmarkValidate_KnownBot_IPHit-4         	 4012075	       460.7 ns/op	      36 B/op	       2 allocs/op
BenchmarkValidate_KnownBot_IPHit-8         	 3109028	       433.2 ns/op	      40 B/op	       2 allocs/op
BenchmarkValidate_Browser                  	  219626	      5844 ns/op	     252 B/op	       1 allocs/op
BenchmarkValidate_Browser-4                	  489142	      2131 ns/op	      75 B/op	       0 allocs/op
BenchmarkValidate_Browser-8                	  602510	      2056 ns/op	     102 B/op	       0 allocs/op
BenchmarkContainsWord                      	74211529	        17.89 ns/op	       0 B/op	       0 allocs/op
BenchmarkContainsWord-4                    	73329073	        16.26 ns/op	       0 B/op	       0 allocs/op
BenchmarkContainsWord-8                    	66591972	        16.57 ns/op	       0 B/op	       0 allocs/op
BenchmarkValidate_WithBotUA                	  990604	      1302 ns/op	      48 B/op	       1 allocs/op
BenchmarkValidate_WithBotUA-4              	 2683639	       477.3 ns/op	      36 B/op	       1 allocs/op
BenchmarkValidate_WithBotUA-8              	 2495970	       495.2 ns/op	      38 B/op	       1 allocs/op
BenchmarkValidate_WithBotUA_IPMismatch     	  740101	      1721 ns/op	      83 B/op	       1 allocs/op
BenchmarkValidate_WithBotUA_IPMismatch-4   	 2076832	       632.0 ns/op	      40 B/op	       1 allocs/op
BenchmarkValidate_WithBotUA_IPMismatch-8   	 1833045	       592.2 ns/op	      27 B/op	       1 allocs/op
BenchmarkValidate_BrowserUA                	  275984	      4789 ns/op	     204 B/op	       1 allocs/op
BenchmarkValidate_BrowserUA-4              	  842253	      1485 ns/op	      34 B/op	       0 allocs/op
BenchmarkValidate_BrowserUA-8              	  826398	      1556 ns/op	      77 B/op	       0 allocs/op
BenchmarkValidate_UnknownBotUA             	 7679907	       177.9 ns/op	       7 B/op	       0 allocs/op
BenchmarkValidate_UnknownBotUA-4           	21024901	        61.35 ns/op	       3 B/op	       0 allocs/op
BenchmarkValidate_UnknownBotUA-8           	19379098	        61.31 ns/op	       3 B/op	       0 allocs/op
BenchmarkContainsIP                        	47371538	        28.47 ns/op	       1 B/op	       0 allocs/op
BenchmarkContainsIP-4                      	79941525	        33.68 ns/op	      14 B/op	       1 allocs/op
BenchmarkContainsIP-8                      	95714067	        41.27 ns/op	      20 B/op	       1 allocs/op
BenchmarkFindBotByUA                       	  744453	      1567 ns/op	      23 B/op	       0 allocs/op
BenchmarkFindBotByUA-4                     	 2058116	       614.6 ns/op	      20 B/op	       0 allocs/op
BenchmarkFindBotByUA-8                     	 1857130	       597.3 ns/op	      26 B/op	       0 allocs/op
BenchmarkClassifyUA                        	 2151088	       570.4 ns/op	       8 B/op	       0 allocs/op
BenchmarkClassifyUA-4                      	 4180044	       251.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkClassifyUA-8                      	 4800512	       248.0 ns/op	       0 B/op	       0 allocs/op
Benchmark_MixedTraffic                     	  475156	      2655 ns/op	      52 B/op	       0 allocs/op
Benchmark_MixedTraffic-4                   	 1262340	       975.1 ns/op	      41 B/op	       0 allocs/op
Benchmark_MixedTraffic-8                   	 1274650	       963.9 ns/op	      37 B/op	       0 allocs/op
PASS
ok  	github.com/cnlangzi/knownbots	111.091s

- Add Bot method encapsulation explanation to Architecture Decisions
- Update AGENTS.md directory structure (cache.go -> rdns.go, add asn/)
- Update file organization guidelines
@codecov
Copy link

codecov bot commented Jan 12, 2026

Codecov Report

❌ Patch coverage is 42.81437% with 191 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.61%. Comparing base (60ce193) to head (3fd0b6b).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
asn/store.go 0.00% 56 Missing ⚠️
bot.go 62.39% 40 Missing and 4 partials ⚠️
asn/fetcher_bgphe.go 0.00% 32 Missing ⚠️
asn/fetcher_ripe.go 0.00% 24 Missing ⚠️
asn/fetcher_routeviews.go 0.00% 24 Missing ⚠️
asn.go 85.71% 4 Missing and 1 partial ⚠️
validator.go 77.77% 2 Missing and 2 partials ⚠️
asn/util.go 88.88% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #14       +/-   ##
===========================================
- Coverage   72.76%   62.61%   -10.16%     
===========================================
  Files          15       23        +8     
  Lines         661      987      +326     
===========================================
+ Hits          481      618      +137     
- Misses        136      309      +173     
- Partials       44       60       +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Delete Reload method as it's no longer needed
- Remove associated tests (TestValidator_Reload, BenchmarkReload)
- Bot lifecycle now handled entirely through initBot during New()
  and periodic refresh via background scheduler
- Delete Reload from API Reference methods list
- Replace 'Step 4: Reload Configuration' with 'Step 4: Restart Validator'
  noting that bot config changes require restarting the validator
- Create Makefile with common commands (test, test-short, test-integration, bench, fmt, vet)
- Update test.yml to use make commands for consistency and maintainability
- ASN.Add and ASN.Contains now use atomic.Pointer for v4/v6 trees
- Remove redundant error return from asn.NewStore (always returns nil)
- Add ASN deduplication in Add to prevent duplicate ASN numbers
- Update tests to use new Add signature (void return)
- Replace MustParseAddr with ParseAddr in ASN verification path to prevent panics on malformed client input
- Fix BGPHE regex to include ':' character for IPv6 prefix matching (e.g., 2001:db8::/32)
- Add nil check for bot.rdns to avoid potential nil pointer dereference
- Add comprehensive tests for invalid IP handling across all verification paths
- Add regex unit tests covering IPv4, IPv6 (uppercase/lowercase), and mixed scenarios
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • The ASN cache uses shared TreeV4/TreeV6 instances with Add and Contains called from the scheduler and validation concurrently; consider cloning-and-swap or other synchronization instead of mutating the same Patricia trees to avoid data races.
  • In Bot.initIPs you return when len(b.URLs) > 0, which prevents URL-based bots from loading their previously persisted ips.txt cache on startup; if the intent is to warm the in-memory IPTree from the last snapshot, this condition should be revisited.
  • The bench-output Makefile target uses go test ... -o benchmark_output.txt, which writes a test binary rather than benchmark text; if you intend to capture human-readable benchmark results, drop -o and redirect stdout instead, or adjust the workflow naming to reflect that this file is a binary.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The ASN cache uses shared `TreeV4/TreeV6` instances with `Add` and `Contains` called from the scheduler and validation concurrently; consider cloning-and-swap or other synchronization instead of mutating the same Patricia trees to avoid data races.
- In `Bot.initIPs` you `return` when `len(b.URLs) > 0`, which prevents URL-based bots from loading their previously persisted `ips.txt` cache on startup; if the intent is to warm the in-memory IPTree from the last snapshot, this condition should be revisited.
- The `bench-output` Makefile target uses `go test ... -o benchmark_output.txt`, which writes a test binary rather than benchmark text; if you intend to capture human-readable benchmark results, drop `-o` and redirect stdout instead, or adjust the workflow naming to reflect that this file is a binary.

## Individual Comments

### Comment 1
<location> `asn/asn.go:14-23` </location>
<code_context>
+type ASN struct {
</code_context>

<issue_to_address>
**issue (bug_risk):** Concurrent mutation of patricia trees in ASN is not synchronized and can race with concurrent Contains/Clone calls

`atomic.Pointer` only protects the root pointer; the underlying `*uint_tree.TreeV4`/`TreeV6` are mutated in `Add` while being read in `Contains` and cloned in `Clone` without synchronization. Since the patricia trees are not documented as safe for concurrent mutation and read, this can cause data races under concurrent `VerifyIP`/scheduler refresh. Consider either building new trees and atomically swapping the root (copy-on-write) or guarding all tree access with a `sync.RWMutex` instead of relying solely on `atomic.Pointer`.
</issue_to_address>

### Comment 2
<location> `asn/asn.go:37-45` </location>
<code_context>
+
+// Add adds all prefixes from an ASN to the cache (thread-safe).
+func (c *ASN) Add(asn int, prefixes []netip.Prefix) {
+	// Deduplicate ASN numbers
+	seen := false
+	for _, existing := range c.asns {
+		if existing == asn {
+			seen = true
+			break
+		}
+	}
+	if !seen {
+		c.asns = append(c.asns, asn)
+	}
</code_context>

<issue_to_address>
**issue (bug_risk):** ASN list stored in ASN.asns is mutated without synchronization and exposed directly via ASNs()

This can cause data races under concurrent refreshes and reads, and callers of `ASNs()` can also unintentionally mutate the shared backing array. Please either synchronize access (e.g., guard `asns` with a mutex) or treat it as immutable by atomically swapping a pointer to a new slice (as with the trees) and returning a copy from `ASNs()`.
</issue_to_address>

### Comment 3
<location> `validator_invalid_ip_test.go:112-17` </location>
<code_context>
+		t.Fatal("testbot not found")
+	}
+
+	asnCache := asn.NewASN()
+	testBot.asns = asnCache
+
+	testCases := []struct {
+		name       string
+		ip         string
+		shouldFail bool
+	}{
+		{"valid IPv4", "8.8.8.8", false},
+		{"valid IPv6", "2001:db8::1", false},
+		{"invalid IP", "invalid-ip", true},
+		{"malformed IPv4", "999.999.999.999", true},
+		{"malformed IPv6", "2001:db8::gggg", true},
+		{"empty string", "", true},
+	}
+
+	for _, tc := range testCases {
</code_context>

<issue_to_address>
**issue (testing):** TestVerifyIPWithASN never asserts a successful ASN-based verification, so the positive path isn’t exercised

In `TestVerifyIPWithASN`, you create `asnCache := asn.NewASN()` and assign it to `testBot.asns`, but never add any prefixes. That means `v.verifyIP(testBot, tc.ip)` can’t return `StatusVerified`, so the test only exercises failure/non-panic cases. Please extend this test (or add another) to populate `asnCache` with prefixes covering at least one "valid" IP and assert that `verifyIP` returns `StatusVerified`, so the ASN-based success path is actually verified.
</issue_to_address>

### Comment 4
<location> `asn/fetcher_ripe_test.go:7-16` </location>
<code_context>
+	"testing"
+)
+
+func TestIntegration_RIPE(t *testing.T) {
+	if testing.Short() {
+		t.Skip("Skipping integration test in short mode")
+	}
+
+	fetcher := NewRIPE()
+
+	// Test with Google ASN (AS15169)
+	prefixes, err := fetcher.Fetch(15169)
+	if err != nil {
+		t.Fatalf("failed to fetch prefixes from RIPE: %v", err)
+	}
+
+	if len(prefixes) == 0 {
+		t.Error("expected some prefixes for AS15169, got none")
+	}
+
+	// Verify some expected Google prefixes are present
+	foundGooglePrefix := false
+	for _, p := range prefixes {
+		if p.String() == "8.8.8.0/24" {
+			foundGooglePrefix = true
+			break
+		}
+	}
+	if !foundGooglePrefix {
+		t.Logf("did not find 8.8.8.0/24, but got %d prefixes total", len(prefixes))
+	}
+
+	// Verify all prefixes are valid
+	for _, p := range prefixes {
+		if !ValidatePrefix(p) {
+			t.Errorf("invalid prefix: %s", p.String())
+		}
+	}
+
+	t.Logf("RIPE fetched %d prefixes for AS15169 (Google)", len(prefixes))
+}
</code_context>

<issue_to_address>
**suggestion (testing):** Integration tests for external ASN providers rely on live network services and could benefit from complementary mocked tests

These tests are well-flagged as integration tests and gated by `testing.Short()`, but they still depend on live external services, response formats, and specific ASNs/prefixes, which can become flaky. Please add complementary unit tests that exercise the ASN fetching logic via a stubbed `http.Client` or an interface around the fetchers so you can deterministically cover cases like successful JSON with known prefixes, HTTP errors, malformed JSON, and empty results. That way parsing and error handling are reliably validated, with these integration tests as an additional safety net rather than the only coverage for these paths.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

- Move asn.go/asn_test.go from asn/ to root directory
- Simplify ASN to use existing IPTree instead of separate v4/v6 trees
- Remove duplicate patricia tree logic
- ASN struct now immutable with single tree field
- Performance improved to ~16.59ns for Contains operation (from ~19ns)
- Changed Bot.asns from *asn.ASN to *atomic.Pointer[ASN] for thread-safety
- Fixed refreshASN() to rebuild entire ASN cache (Copy-on-Write pattern)
- Move ValidatePrefix to asn/util.go to avoid import cycle
- All tests passing
- Changed from '-o benchmark_output.txt' (writes test binary) to stdout redirect
- Added benchmark_output.txt to .gitignore
- Now properly captures human-readable benchmark results
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 8 issues, and left some high level feedback:

  • Consider applying FilterInvalidPrefixes and Deduplicate to the prefix lists returned from ASN fetchers (e.g., in asn.Store.fetch or Bot.initASN/refreshASN) before adding them to the trees to avoid storing /0, loopback/multicast ranges and duplicate entries in the ASN cache.
  • ASN.ASNs() currently returns the internal slice directly; if you expect external callers to use this, it would be safer to return a copy to prevent accidental mutation of the cache’s internal state.
  • In Validator.New, you eagerly create per-bot directories for all bots even if they do not use IP files, ASN, or RDNS; you could short-circuit directory creation for bots without any on-disk state to reduce filesystem churn during initialization.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider applying `FilterInvalidPrefixes` and `Deduplicate` to the prefix lists returned from ASN fetchers (e.g., in `asn.Store.fetch` or `Bot.initASN/refreshASN`) before adding them to the trees to avoid storing `/0`, loopback/multicast ranges and duplicate entries in the ASN cache.
- `ASN.ASNs()` currently returns the internal slice directly; if you expect external callers to use this, it would be safer to return a copy to prevent accidental mutation of the cache’s internal state.
- In `Validator.New`, you eagerly create per-bot directories for all bots even if they do not use IP files, ASN, or RDNS; you could short-circuit directory creation for bots without any on-disk state to reduce filesystem churn during initialization.

## Individual Comments

### Comment 1
<location> `asn.go:30-39` </location>
<code_context>
+}
+
+// Add adds all prefixes from an ASN to the cache.
+func (c *ASN) Add(asn int, prefixes []netip.Prefix) {
+	seen := false
+	for _, existing := range c.asns {
+		if existing == asn {
+			seen = true
+			break
+		}
+	}
+	if !seen {
+		c.asns = append(c.asns, asn)
+	}
+
+	for _, prefix := range prefixes {
+		c.tree.Add(prefix)
+	}
</code_context>

<issue_to_address>
**suggestion (performance):** Consider deduplicating and validating prefixes before adding them to the IPTree.

`Add` currently inserts every prefix into `IPTree` as-is, and `Store.fetch` / `Store.Refresh` pass through upstream results unchanged. Since you already have `Deduplicate` and `FilterInvalidPrefixes`, consider applying them either inside `Add` or at the call sites (`initASN`, `refreshASN`, `Store.fetch`) to keep the tree free of duplicates and invalid prefixes (loopback, multicast, `/0`), which should reduce memory and speed up lookups.

Suggested implementation:

```golang
 // Add adds all prefixes from an ASN to the cache.
func (c *ASN) Add(asn int, prefixes []netip.Prefix) {
	// Track which ASNs have been added.
	seen := false
	for _, existing := range c.asns {
		if existing == asn {
			seen = true
			break
		}
	}
	if !seen {
		c.asns = append(c.asns, asn)
	}

	// Sanitize prefixes before inserting into the IPTree to avoid
	// duplicates and invalid prefixes (loopback, multicast, /0, etc.).
	prefixes = FilterInvalidPrefixes(prefixes)
	prefixes = Deduplicate(prefixes)

	for _, prefix := range prefixes {
		c.tree.Add(prefix)
	}

```

These changes assume the existing helpers have (or can be adjusted to have) the following signatures in the same package as `asn.go`:

- `func FilterInvalidPrefixes(prefixes []netip.Prefix) []netip.Prefix`
- `func Deduplicate(prefixes []netip.Prefix) []netip.Prefix`

If their signatures differ (e.g. methods on `ASN`, returning maps, taking additional parameters), adapt the two calls inside `Add` accordingly. Also ensure `asn.go` already imports `net/netip`; if not, add `net/netip` to the imports.
</issue_to_address>

### Comment 2
<location> `asn/store.go:41-49` </location>
<code_context>
+	return prefixes
+}
+
+func (s *Store) Load(botName string, asn int) []netip.Prefix {
+	path := filepath.Join(s.cacheDir, botName, fmt.Sprintf("AS%d.json", asn))
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return nil
+	}
+
+	var prefixStrings []string
+	if err := json.Unmarshal(data, &prefixStrings); err != nil {
+		return nil
+	}
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Loading cached ASN JSON fails silently; consider surfacing or logging the error.

In `Load`, any error from `os.ReadFile` or `json.Unmarshal` leads to returning `nil`, making a corrupted cache indistinguishable from a missing one. This can hide bad cache files and result in empty prefixes without visibility. When `EnableLog` is on, consider logging these failures so operators can notice and fix corrupted cache files.

Suggested implementation:

```golang
func (s *Store) Load(botName string, asn int) []netip.Prefix {
	path := filepath.Join(s.cacheDir, botName, fmt.Sprintf("AS%d.json", asn))
	data, err := os.ReadFile(path)
	if err != nil {
		if s.EnableLog {
			log.Printf("asn store: failed to read cache file %q for bot %q ASN %d: %v", path, botName, asn, err)
		}
		return nil
	}

	var prefixStrings []string
	if err := json.Unmarshal(data, &prefixStrings); err != nil {
		if s.EnableLog {
			log.Printf("asn store: failed to unmarshal cache file %q for bot %q ASN %d: %v", path, botName, asn, err)
		}
		return nil
	}

```

To fully wire this up you’ll likely need to:

1. Ensure the `Store` type has a boolean field named `EnableLog` (or adjust the code above to match the existing flag name you already use).
2. Make sure `asn/store.go` imports the standard `log` package in its import block:

   - Add `"log"` to the existing `import (...)` list, without removing any existing imports.

If you already have a structured logger or helper (e.g., `s.logf` or similar) used elsewhere in this file, you may want to replace the `log.Printf(...)` calls with that helper to stay consistent with your logging conventions.
</issue_to_address>

### Comment 3
<location> `validator.go:185` </location>
<code_context>

-	// RDNS verification
-	if bot.RDNS {
+	// ASN verification (fast after cache load, ~100ns)
+	if bot.asns != nil {
+		cache := bot.asns.Load()
</code_context>

<issue_to_address>
**suggestion (performance):** IP address is parsed twice in verifyIP; you can parse once and reuse for ASN checks.

`ContainsIP` already parses `ipStr` with `netip.ParseAddr`, and the ASN branch parses it again. If this is a hot path, parse once at the start of `verifyIP`, then pass the `netip.Addr` to both the prefix tree and ASN logic. This also lets you skip ASN verification when parsing fails.

Suggested implementation:

```golang
 // Verification order: IP ranges → ASN → RDNS (fastest to slowest)
func (v *Validator) verifyIP(bot *Bot, ipStr string) Result {
	// Parse IP once and reuse for range + ASN checks
	ip, ipErr := netip.ParseAddr(ipStr)

	// Check IP ranges first (fastest, ~200ns)
	if ipErr == nil {
		// Uses parsed netip.Addr to avoid re-parsing in hot path
		if bot.ContainsIPAddr(ip) {
			return Result{BotName: bot.Name, BotKind: bot.Kind, Status: StatusVerified, IsBot: true}
		}
	} else {
		// Fallback to existing string-based check if parsing fails
		if bot.ContainsIP(ipStr) {
			return Result{BotName: bot.Name, BotKind: bot.Kind, Status: StatusVerified, IsBot: true}
		}
	}

	// ASN verification (fast after cache load, ~100ns)
	if bot.asns != nil && ipErr == nil {
		cache := bot.asns.Load()
		if cache != nil && cache.Contains(ip) {
			return Result{BotName: bot.Name, BotKind: bot.Kind, Status: StatusVerified, IsBot: true}
		}
	}

	// RDNS verification (50-200ms cold, ~450ns cached)
	if bot.RDNS && bot.rdns != nil {

```

To make this compile and fully avoid double parsing in the hot path, you will also need to:

1. **Add a parsed-addr helper to `Bot`** (or wherever `ContainsIP` is implemented), for example:

   ```go
   // ContainsIPAddr should perform the same prefix/range logic as ContainsIP,
   // but assumes the IP has already been parsed.
   func (b *Bot) ContainsIPAddr(ip netip.Addr) bool {
       // Reuse the existing prefix tree / range logic here, but DO NOT parse again.
       // Example (adapt to your actual implementation):
       //   return b.prefixTree.Contains(ip)
       panic("implement ContainsIPAddr using existing range logic without re-parsing")
   }
   ```

   Ensure `validator.go` (and the file defining `Bot`) import `net/netip` if they do not already:

   ```go
   import (
       // ...
       "net/netip"
   )
   ```

2. **Remove or refactor any remaining internal `netip.ParseAddr` calls** inside the range-checking logic that are no longer necessary when using `ContainsIPAddr(netip.Addr)`, so that valid IPs are parsed exactly once in `verifyIP`.

3. Optionally, once all callers are migrated to `ContainsIPAddr`, you can simplify `ContainsIP(string)` to just parse and delegate to `ContainsIPAddr` to centralize the logic:

   ```go
   func (b *Bot) ContainsIP(ipStr string) bool {
       ip, err := netip.ParseAddr(ipStr)
       if err != nil {
           return false
       }
       return b.ContainsIPAddr(ip)
   }
   ```
</issue_to_address>

### Comment 4
<location> `bot.go:122-131` </location>
<code_context>
+	b.ips.Store(tree)
+}
+
+func (b *Bot) initIPs(path string) {
+	if len(b.URLs) > 0 {
+		return
+	}
+
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return
+	}
+
+	var prefixes []netip.Prefix
+	for _, line := range strings.Split(string(data), "\n") {
+		line = strings.TrimSpace(line)
+		if line == "" {
+			continue
+		}
+		prefix, err := netip.ParsePrefix(line)
+		if err != nil {
+			continue
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Ignored file and parse errors in initIPs may hide configuration issues.

In `initIPs`, both `os.ReadFile` failures and `ParsePrefix` errors are ignored, so a bad or unreadable `ips.txt` leaves the bot with no prefixes and no indication of why. Since `EnableLog` is already available and used elsewhere, consider logging at least the file read error (and optionally the number of malformed lines) so misconfigured or corrupted cache files are detectable.

Suggested implementation:

```golang
func (b *Bot) initIPs(path string) {
	if len(b.URLs) > 0 {
		return
	}

	data, err := os.ReadFile(path)
	if err != nil {
		if b.EnableLog {
			log.Printf("initIPs: failed to read IP prefixes from %q: %v", path, err)
		}
		return
	}

	var (
		prefixes  []netip.Prefix
		malformed int
	)

	for _, line := range strings.Split(string(data), "\n") {
		line = strings.TrimSpace(line)
		if line == "" {
			continue
		}
		prefix, err := netip.ParsePrefix(line)
		if err != nil {
			malformed++
			continue
		}
		prefixes = append(prefixes, prefix)
	}

	if malformed > 0 && b.EnableLog {
		log.Printf("initIPs: ignored %d malformed IP prefix line(s) in %q", malformed, path)
	}

	b.storePrefixes(prefixes)
}

```

1. Ensure the file imports the standard `log` package if it is not already present:
   - Add `log` to the import list: `import "log"`.
2. This edit assumes `b.EnableLog` is a boolean flag used elsewhere in the file to guard logging. If the logging convention is different (e.g., a method or logger field), replace the `if b.EnableLog { log.Printf(...) }` blocks with the appropriate logging calls consistent with the rest of the codebase.
</issue_to_address>

### Comment 5
<location> `validator_invalid_ip_test.go:69` </location>
<code_context>
+	}
+}
+
+func TestVerifyIPWithASN(t *testing.T) {
+	tmpDir := t.TempDir()
+
</code_context>

<issue_to_address>
**issue (testing):** TestVerifyIPWithASN never exercises a successful ASN-based verification path

`TestVerifyIPWithASN` builds an ASN cache but never populates it, so `v.verifyIP(testBot, tc.ip)` never exercises a successful ASN match. As written, it only checks that invalid inputs don’t panic, not that ASN-backed verification works. Please add a case that seeds `asnCache` with a known prefix (e.g. `8.8.8.0/24`) and asserts that `verifyIP` returns `StatusVerified`, `IsBot=true`, and correct bot attributes for matching IPs.
</issue_to_address>

### Comment 6
<location> `knownbots_test.go:181` </location>
<code_context>
+func TestContainsIP(t *testing.T) {
</code_context>

<issue_to_address>
**suggestion (testing):** IPTree itself is not directly tested, only indirectly via Bot.ContainsIP

Since `IPTree` is now the core structure backing both custom ranges and ASN prefixes, it should have its own focused tests (e.g. IPv4 vs IPv6, overlapping prefixes, non‑added IPs, `Count()` behaviour, and invalid prefixes from `patricia.ParseFromNetIPPrefix`) to localize failures and strengthen correctness guarantees beyond what we get indirectly via `Bot.ContainsIP`.
</issue_to_address>

### Comment 7
<location> `bot_asn_refresh_test.go:9-18` </location>
<code_context>
+func TestRefreshASNRebuildsCache(t *testing.T) {
</code_context>

<issue_to_address>
**suggestion (testing):** TestRefreshASNRebuildsCache bypasses Bot.initASN/refreshASN and does not cover their logic

This test only mutates `bot.asns` and never calls `Bot.initASN` or `Bot.refreshASN`, so it doesn’t verify the real refresh path using `asn.Store`. Please add tests that use a fake/temporary `asn.Store` (controlling `Load`/`Refresh` output), invoke `initASN`/`refreshASN`, and assert that the cache is rebuilt correctly (including handling of empty results so stale prefixes aren’t retained).

Suggested implementation:

```golang
package knownbots

import (
	"context"
	"net/netip"
	"sync/atomic"
	"testing"
)

// fakeASNStore is a test double for asn.Store that allows tests to control
// the values returned by Load and Refresh.
type fakeASNStore struct {
	loadCalled    atomic.Bool
	refreshCalled atomic.Bool

	loadResult    map[int][]netip.Prefix
	refreshResult map[int][]netip.Prefix

	loadErr    error
	refreshErr error
}

func (f *fakeASNStore) Load(ctx context.Context) (map[int][]netip.Prefix, error) {
	f.loadCalled.Store(true)
	return f.loadResult, f.loadErr
}

func (f *fakeASNStore) Refresh(ctx context.Context) (map[int][]netip.Prefix, error) {
	f.refreshCalled.Store(true)
	return f.refreshResult, f.refreshErr
}

// TestInitASNUsesStoreAndPopulatesCache verifies that initASN calls the
// underlying asn.Store and populates the bot's ASN prefix cache.
func TestInitASNUsesStoreAndPopulatesCache(t *testing.T) {
	t.Helper()

	as15169Prefixes := []netip.Prefix{
		netip.MustParsePrefix("8.8.8.0/24"),
		netip.MustParsePrefix("8.8.4.0/24"),
	}
	as13335Prefixes := []netip.Prefix{
		netip.MustParsePrefix("1.1.1.0/24"),
	}

	store := &fakeASNStore{
		loadResult: map[int][]netip.Prefix{
			15169: as15169Prefixes,
			13335: as13335Prefixes,
		},
	}

	bot := &Bot{
		Name:    "testbot",
		ASN:     []int{15169, 13335},
		asnStore: store,
	}

	if err := bot.initASN(context.Background()); err != nil {
		t.Fatalf("initASN() error = %v", err)
	}

	if !store.loadCalled.Load() {
		t.Fatalf("initASN() did not call Store.Load")
	}

	for asn, want := range store.loadResult {
		got := bot.asns.Load().(map[int][]netip.Prefix)[asn]
		if len(got) != len(want) {
			t.Fatalf("ASN %d: got %d prefixes, want %d", asn, len(got), len(want))
		}
		for i, p := range want {
			if got[i] != p {
				t.Fatalf("ASN %d prefix[%d]: got %v, want %v", asn, i, got[i], p)
			}
		}
	}
}

// TestRefreshASNRebuildsCache verifies that refreshASN uses the asn.Store
// to replace the existing cache with the newly refreshed prefixes.
func TestRefreshASNRebuildsCache(t *testing.T) {
	t.Helper()

	initialPrefixesAS15169 := []netip.Prefix{
		netip.MustParsePrefix("8.8.8.0/24"),
		netip.MustParsePrefix("8.8.4.0/24"),
		netip.MustParsePrefix("2001:db8::/32"),
	}
	initialPrefixesAS13335 := []netip.Prefix{
		netip.MustParsePrefix("1.1.1.0/24"),
	}

	refreshedPrefixesAS15169 := []netip.Prefix{
		netip.MustParsePrefix("8.8.8.0/24"),  // kept
		netip.MustParsePrefix("8.8.5.0/24"),  // new
	}
	refreshedPrefixesAS13335 := []netip.Prefix{
		netip.MustParsePrefix("1.1.1.0/24"),  // kept
		netip.MustParsePrefix("1.0.0.0/24"),  // new
	}

	store := &fakeASNStore{
		loadResult: map[int][]netip.Prefix{
			15169: initialPrefixesAS15169,
			13335: initialPrefixesAS13335,
		},
		refreshResult: map[int][]netip.Prefix{
			15169: refreshedPrefixesAS15169,
			13335: refreshedPrefixesAS13335,
		},
	}

	bot := &Bot{
		Name:    "testbot",
		ASN:     []int{15169, 13335},
		asnStore: store,
	}

	// Initialize from Load.
	if err := bot.initASN(context.Background()); err != nil {
		t.Fatalf("initASN() error = %v", err)
	}

	// Now refresh and ensure the cache is replaced with refreshResult.
	if err := bot.refreshASN(context.Background()); err != nil {
		t.Fatalf("refreshASN() error = %v", err)
	}

	if !store.refreshCalled.Load() {
		t.Fatalf("refreshASN() did not call Store.Refresh")
	}

	gotCache, ok := bot.asns.Load().(map[int][]netip.Prefix)
	if !ok {
		t.Fatalf("bot.asns.Load() did not return map[int][]netip.Prefix")
	}

	for asn, want := range store.refreshResult {
		got, ok := gotCache[asn]
		if !ok {
			t.Fatalf("ASN %d missing from cache after refresh", asn)
		}
		if len(got) != len(want) {
			t.Fatalf("ASN %d: got %d prefixes after refresh, want %d", asn, len(got), len(want))
		}
		for i, p := range want {
			if got[i] != p {
				t.Fatalf("ASN %d prefix[%d]: got %v, want %v", asn, i, got[i], p)
			}
		}
	}
}

// TestRefreshASNRemovesStalePrefixes verifies that when Refresh returns an
// empty result for a given ASN, stale prefixes from previous loads do not
// remain in the cache.
func TestRefreshASNRemovesStalePrefixes(t *testing.T) {
	t.Helper()

	initialPrefixesAS15169 := []netip.Prefix{
		netip.MustParsePrefix("8.8.8.0/24"),
		netip.MustParsePrefix("8.8.4.0/24"),
	}

	store := &fakeASNStore{
		loadResult: map[int][]netip.Prefix{
			15169: initialPrefixesAS15169,
		},
		// Simulate the backend returning no prefixes on refresh
		// so that the cache should no longer contain any entries
		// for 15169.
		refreshResult: map[int][]netip.Prefix{},
	}

	bot := &Bot{
		Name:    "testbot",
		ASN:     []int{15169},
		asnStore: store,
	}

	if err := bot.initASN(context.Background()); err != nil {
		t.Fatalf("initASN() error = %v", err)
	}

	if err := bot.refreshASN(context.Background()); err != nil {
		t.Fatalf("refreshASN() error = %v", err)
	}

	gotCache, ok := bot.asns.Load().(map[int][]netip.Prefix)
	if !ok {
		t.Fatalf("bot.asns.Load() did not return map[int][]netip.Prefix")
	}

	if _, ok := gotCache[15169]; ok {
		t.Fatalf("expected ASN 15169 prefixes to be removed after refresh with empty result")
	}
}

```

These test changes assume the following, which you may need to align with your existing code:

1. `Bot` has a field `asnStore` of an interface type compatible with `fakeASNStore`:
   ```go
   type Store interface {
       Load(ctx context.Context) (map[int][]netip.Prefix, error)
       Refresh(ctx context.Context) (map[int][]netip.Prefix, error)
   }

   type Bot struct {
       Name     string
       ASN      []int
       asnStore asn.Store
       asns     atomic.Value // storing map[int][]netip.Prefix
   }
   ```
   If your actual `asn.Store` interface or field names differ, adjust `fakeASNStore` and the `Bot` initialization accordingly.

2. `Bot.initASN(ctx)` should:
   - Call `asnStore.Load(ctx)` and
   - Replace the internal `asns` cache with the map returned by `Load`.

3. `Bot.refreshASN(ctx)` should:
   - Call `asnStore.Refresh(ctx)` and
   - Replace (not merge) `asns` with the returned map, so that when `Refresh` returns an empty map, previously cached prefixes are removed.

If your implementation currently merges results or keeps stale entries, you should update `refreshASN` to rebuild the cache using only the `Refresh` result for these tests to pass and to match the behavior described in your review comment.
</issue_to_address>

### Comment 8
<location> `asn/fetcher_ripe_test.go:7-16` </location>
<code_context>
+	"testing"
+)
+
+func TestIntegration_RIPE(t *testing.T) {
+	if testing.Short() {
+		t.Skip("Skipping integration test in short mode")
+	}
+
+	fetcher := NewRIPE()
+
+	// Test with Google ASN (AS15169)
+	prefixes, err := fetcher.Fetch(15169)
+	if err != nil {
+		t.Fatalf("failed to fetch prefixes from RIPE: %v", err)
+	}
+
+	if len(prefixes) == 0 {
+		t.Error("expected some prefixes for AS15169, got none")
+	}
+
+	// Verify some expected Google prefixes are present
+	foundGooglePrefix := false
+	for _, p := range prefixes {
+		if p.String() == "8.8.8.0/24" {
+			foundGooglePrefix = true
+			break
+		}
+	}
+	if !foundGooglePrefix {
+		t.Logf("did not find 8.8.8.0/24, but got %d prefixes total", len(prefixes))
+	}
+
+	// Verify all prefixes are valid
+	for _, p := range prefixes {
+		if !ValidatePrefix(p) {
+			t.Errorf("invalid prefix: %s", p.String())
+		}
+	}
+
+	t.Logf("RIPE fetched %d prefixes for AS15169 (Google)", len(prefixes))
+}
</code_context>

<issue_to_address>
**suggestion (testing):** Integration tests depend on external network services and could be flaky without additional safeguards

These tests are correctly marked as integration and skipped in `testing.Short()`, but they still rely on live services and network conditions. To keep CI stable while retaining real-provider coverage:

- Gate them behind an explicit env var (e.g. `if os.Getenv("KNOWNBOTS_ASN_INTEGRATION") == "" { t.Skip(...) }`).
- Add unit tests for each fetcher using `httptest.Server` to simulate success, malformed responses, and non-200 status codes so they can run reliably everywhere.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 122 to 131
func (b *Bot) initIPs(path string) {
if len(b.URLs) > 0 {
return
}

data, err := os.ReadFile(path)
if err != nil {
return
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Ignored file and parse errors in initIPs may hide configuration issues.

In initIPs, both os.ReadFile failures and ParsePrefix errors are ignored, so a bad or unreadable ips.txt leaves the bot with no prefixes and no indication of why. Since EnableLog is already available and used elsewhere, consider logging at least the file read error (and optionally the number of malformed lines) so misconfigured or corrupted cache files are detectable.

Suggested implementation:

func (b *Bot) initIPs(path string) {
	if len(b.URLs) > 0 {
		return
	}

	data, err := os.ReadFile(path)
	if err != nil {
		if b.EnableLog {
			log.Printf("initIPs: failed to read IP prefixes from %q: %v", path, err)
		}
		return
	}

	var (
		prefixes  []netip.Prefix
		malformed int
	)

	for _, line := range strings.Split(string(data), "\n") {
		line = strings.TrimSpace(line)
		if line == "" {
			continue
		}
		prefix, err := netip.ParsePrefix(line)
		if err != nil {
			malformed++
			continue
		}
		prefixes = append(prefixes, prefix)
	}

	if malformed > 0 && b.EnableLog {
		log.Printf("initIPs: ignored %d malformed IP prefix line(s) in %q", malformed, path)
	}

	b.storePrefixes(prefixes)
}
  1. Ensure the file imports the standard log package if it is not already present:
    • Add log to the import list: import "log".
  2. This edit assumes b.EnableLog is a boolean flag used elsewhere in the file to guard logging. If the logging convention is different (e.g., a method or logger field), replace the if b.EnableLog { log.Printf(...) } blocks with the appropriate logging calls consistent with the rest of the codebase.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourcery-ai review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @cnlangzi, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

- Add Sanitize() function in asn/util.go to filter invalid and deduplicate prefixes
- Apply Sanitize() in Store.fetch() before saving to cache
- Filters out /0, loopback, multicast, and unspecified prefixes
- Removes duplicate prefixes in single pass for performance
- Add comprehensive tests for Sanitize() function
- Add error logging for IP cache file read failures in initIPs()
- Performance: ~748ns/op for 8 prefixes with filtering and dedup
- Rename Deduplicate() to Distinct() for more idiomatic naming (SQL-like)
- Merge Sanitize() into Distinct() - now validates and deduplicates in single pass
- Remove redundant Sanitize() function
- Update all tests and benchmarks to use Distinct()
- Simpler API: one function does both filtering and deduplication
@cnlangzi cnlangzi merged commit 028161d into main Jan 12, 2026
3 of 5 checks passed
@cnlangzi cnlangzi deleted the feat/asn branch January 12, 2026 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants