-
Notifications
You must be signed in to change notification settings - Fork 0
feat(bot): added verify bot by ASN #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Move IPTree and ASN cache operations to Bot methods (loadCachedIPs, refreshIPs, initializeASN, refreshASN) to eliminate duplicate logic between initBot and runScheduler - Centralize state management with ensureIPTree/storePrefixes helpers - Remove redundant standalone helpers from validator.go
Reviewer's GuideImplements ASN-based bot IP verification and a shared radix-based IPTree, refactors bot IP/RDNS/ASN lifecycle into Bot methods, replaces the generic RDNS cache type, removes Validator.Reload, wires ASN checks into the validation pipeline, and adds Makefile, ASN fetcher/store infrastructure, CI, docs, and tests to support the new behavior. Sequence diagram for background scheduler refreshing IP, ASN, and RDNSsequenceDiagram
participant Validator
participant Scheduler as SchedulerGoroutine
participant Bot
participant HTTPClient
participant AsnStore
participant RDNSCache as RDNS
Validator->>Scheduler: startScheduler(ctx)
loop Every refresh interval
Scheduler->>Validator: runScheduler(HTTPClient)
Validator->>Validator: getBots()
loop for each bot
Validator->>Bot: refreshIPs(HTTPClient, root)
alt bot has URLs and new prefixes
Bot->>HTTPClient: downloadIPs(bot)
HTTPClient-->>Bot: []netip.Prefix
Bot->>Bot: storePrefixes(prefixes)
Bot->>FileSystem: writeIPs(botDir/ips.txt, prefixes)
FileSystem-->>Bot: ok
else no new prefixes
Bot-->>Validator: no-op
end
Validator->>Bot: refreshASN(AsnStore)
alt bot has ASN list
loop for each ASN
Bot->>AsnStore: Refresh(bot.Name, asn)
AsnStore->>AsnStore: fetch(asn) via Fetchers
AsnStore-->>Bot: []netip.Prefix
Bot->>ASNCache as ASN: Add(asn, prefixes)
end
Bot->>Bot: asns.Store(newASN)
else no ASN configured
Bot-->>Validator: no-op
end
Validator->>Bot: refreshRDNS()
alt bot.RDNS and bot.rdns not nil
Bot->>RDNSCache: Prune(bot.Domains)
RDNSCache-->>Bot: pruned
Bot->>RDNSCache: Persist()
RDNSCache-->>Bot: ok or error
else RDNS disabled or nil
Bot-->>Validator: no-op
end
end
end
Class diagram for updated bot validation and ASN infrastructureclassDiagram
class Bot {
+string Name
+BotKind Kind
+string Parser
+string UA
+[]string URLs
+[]int ASN
+[]string Domains
+bool RDNS
-atomic.Pointer~IPTree~ ips
-atomic.Pointer~ASN~ asns
-RDNS rdns
-LRU fail
+void storePrefixes(prefixes []netip.Prefix)
+void initIPs(path string)
+void refreshIPs(httpClient *http.Client, root string)
+void initRDNS(path string)
+void refreshRDNS()
+void initASN(store *AsnStore)
+void refreshASN(store *AsnStore)
+bool ContainsIP(ipStr string)
+ResultStatus VerifyRDNS(ipStr string)
}
class BotKind {
<<enum>>
+KindSearchEngine
+KindSocialMedia
+KindAITraining
+KindAIAssist
+KindAIMixed
+KindSEO
+KindMonitor
+KindSecurity
+KindScraper
+KindUnknown
}
class IPTree {
-uint_tree.TreeV4 v4
-uint_tree.TreeV6 v6
+void Add(prefix netip.Prefix)
+bool Contains(ip netip.Addr)
+int Count()
}
class ASN {
-[]int asns
-IPTree tree
+[]int ASNs()
+void Add(asn int, prefixes []netip.Prefix)
+bool Contains(ip netip.Addr)
+int Count()
}
class RDNS {
-atomic.Pointer~map[string]string~ valid
-string file
+string Get(key string) bool
+void Set(key string, value string)
+error Persist()
+void Prune(domains []string)
+int Size()
+error Close()
}
class LRU {
+bool Contains(key string)
+void Add(key string)
}
class Validator {
-string root
-atomic.Pointer~[]*Bot~ bots
-atomic.Pointer~map[byte][]*Bot~ uaIndex
-AsnStore asnStore
-context.CancelFunc cancel
-int failLimit
-bool classifyUA
+Result Validate(ua string, ip string)
+Result verifyIP(bot *Bot, ipStr string)
+void startScheduler(ctx context.Context)
+void runScheduler(httpClient *http.Client)
+error Close()
}
class AsnStore {
-string cacheDir
-[]Fetcher fetchers
+[]netip.Prefix Refresh(botName string, asn int)
+[]netip.Prefix Load(botName string, asn int)
-error save(botName string, asn int, prefixes []netip.Prefix)
-[]netip.Prefix fetch(asn int) error
}
class Fetcher {
<<interface>>
+[]netip.Prefix Fetch(asn int) error
}
class RIPEFetcher {
-http.Client* Client
+[]netip.Prefix Fetch(asn int) error
}
class RouteViewsFetcher {
-http.Client* Client
+[]netip.Prefix Fetch(asn int) error
}
class BGPHEFetcher {
-http.Client* Client
+[]netip.Prefix Fetch(asn int) error
}
class UtilFunctions {
+[]netip.Prefix Deduplicate(prefixes []netip.Prefix)
+[]netip.Prefix FilterInvalidPrefixes(prefixes []netip.Prefix)
+bool ValidatePrefix(prefix netip.Prefix)
}
%% Relationships
Bot --> BotKind
Bot --> IPTree
Bot --> ASN
Bot --> RDNS
Bot --> LRU
ASN --> IPTree
Validator --> Bot
Validator --> AsnStore
AsnStore --> Fetcher
RIPEFetcher ..|> Fetcher
RouteViewsFetcher ..|> Fetcher
BGPHEFetcher ..|> Fetcher
UtilFunctions ..> ASN
UtilFunctions ..> AsnStore
RDNS --> LRU
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 9 issues, and left some high level feedback:
- In
verifyIP, usingnetip.MustParseAddr(ipStr)will panic on invalid IP input, whereas other paths (likeContainsIP) parse safely; consider switching tonetip.ParseAddrand handling the error to avoid crashing on malformed client input. - The
Bot.VerifyRDNSmethod assumesb.rdnsis non-nil, butinitRDNScan early-return onNewRDNSerror while leavingRDNSset to true; this can lead to a nil-pointer dereference at runtime and should be guarded with a nil check or by disabling RDNS on init failure. - The
asn.ASNtype is mutated viaAddwhile being read viaContainswithout synchronization, yetrefreshASNis called from the scheduler while validation can run concurrently; consider makingASNimmutable with atomic pointer swaps (as done forIPTree) or adding proper locking to avoid data races.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `verifyIP`, using `netip.MustParseAddr(ipStr)` will panic on invalid IP input, whereas other paths (like `ContainsIP`) parse safely; consider switching to `netip.ParseAddr` and handling the error to avoid crashing on malformed client input.
- The `Bot.VerifyRDNS` method assumes `b.rdns` is non-nil, but `initRDNS` can early-return on `NewRDNS` error while leaving `RDNS` set to true; this can lead to a nil-pointer dereference at runtime and should be guarded with a nil check or by disabling RDNS on init failure.
- The `asn.ASN` type is mutated via `Add` while being read via `Contains` without synchronization, yet `refreshASN` is called from the scheduler while validation can run concurrently; consider making `ASN` immutable with atomic pointer swaps (as done for `IPTree`) or adding proper locking to avoid data races.
## Individual Comments
### Comment 1
<location> `validator.go:188-189` </location>
<code_context>
}
- // RDNS verification
+ // ASN verification (fast after cache load, ~100ns)
+ if bot.asns != nil && bot.asns.Contains(netip.MustParseAddr(ipStr)) {
+ return Result{BotName: bot.Name, BotKind: bot.Kind, Status: StatusVerified, IsBot: true}
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid netip.MustParseAddr here to prevent potential panics on malformed IPs.
`verifyIP` accepts arbitrary `ipStr`, so relying on prior validation makes it fragile if the function is reused or called with untrusted input. Prefer either:
- Parsing once earlier and passing a `netip.Addr` into this function, or
- Using `netip.ParseAddr` here and returning early on error.
This keeps the verifier safe from panics in more usage contexts.
</issue_to_address>
### Comment 2
<location> `asn/asn.go:33-34` </location>
<code_context>
+}
+
+// Add adds all prefixes from an ASN to the cache.
+func (c *ASN) Add(asn int, prefixes []netip.Prefix) error {
+ c.asns = append(c.asns, asn)
+ for _, prefix := range prefixes {
+ addr := prefix.Addr()
</code_context>
<issue_to_address>
**issue (bug_risk):** ASN.Add mutates internal state without synchronization, which can race with concurrent Contains calls.
`ASN` is accessed via `Bot.asns`: `refreshASN` calls `Add` from the scheduler while `Contains` is called from `Validate` in other goroutines. `Add` mutates both the `asns` slice and the radix trees without any locking, so concurrent `Add`/`Contains` can race unless `TreeV4/TreeV6` are documented as safe for concurrent writes (which is uncommon).
Prefer making `ASN` immutable for readers: construct a new `*ASN` with updated trees and then atomically swap `bot.asns` (e.g. using `atomic.Pointer[*asn.ASN]`). This ensures `Contains` always operates on a stable, read‑only snapshot.
</issue_to_address>
### Comment 3
<location> `asn/asn.go:79` </location>
<code_context>
+
+// Count returns the total number of prefixes in the cache.
+func (c *ASN) Count() int {
+ return c.v4.CountTags() + c.v6.CountTags()
+}
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** ASN.Add appends duplicate ASN IDs to the asns slice on repeated calls.
Because `Bot.refreshASN` can call `Add` repeatedly for the same ASN, `c.asns` can accumulate duplicates, and `ASNs()` may return an inflated list. To avoid this, consider deduplicating by tracking seen ASNs in a `map[int]struct{}` or by checking recent entries before appending, if call patterns allow that optimization.
Suggested implementation:
```golang
}
}
// Count returns the total number of prefixes in the cache.
func (c *ASN) Count() int {
return c.v4.CountTags() + c.v6.CountTags()
}
import (
"net/netip"
"github.com/kentik/patricia"
"github.com/kentik/patricia/uint_tree"
)
// ASN provides fast O(1) IP lookup using Radix Trees for ASN prefix matching.
type ASN struct {
asns []int
// seen tracks which ASN IDs have already been added to avoid duplicates
seen map[int]struct{}
```
```golang
asnsCopy := make([]int, len(c.asns))
copy(asnsCopy, c.asns)
var seen map[int]struct{}
if len(asnsCopy) > 0 {
seen = make(map[int]struct{}, len(asnsCopy))
for _, id := range asnsCopy {
seen[id] = struct{}{}
}
}
return &ASN{
asns: asnsCopy,
seen: seen,
v4: c.v4.Clone(),
v6: c.v6.Clone(),
}
```
```golang
func (c *ASN) Add(prefix netip.Prefix, asn int) {
// existing logic (e.g. inserting into v4/v6 trees) ...
// Lazily initialize and/or backfill the seen map to avoid duplicates.
if c.seen == nil {
if len(c.asns) > 0 {
c.seen = make(map[int]struct{}, len(c.asns)+1)
for _, existing := range c.asns {
c.seen[existing] = struct{}{}
}
} else {
c.seen = make(map[int]struct{}, 1)
}
}
// Only append ASN IDs we have not seen before.
if _, ok := c.seen[asn]; !ok {
c.seen[asn] = struct{}{}
c.asns = append(c.asns, asn)
}
}
```
I’ve assumed the presence and signature of `ASN.Add` as `func (c *ASN) Add(prefix netip.Prefix, asn int)`. If the actual signature or local variable name for the ASN ID differs:
1. Update the `Add` method replacement block to use the correct parameter name and to merge with the existing body rather than the placeholder comment.
2. Ensure any constructors or factory functions (e.g. `NewASN(...)`) initialize `seen: make(map[int]struct{})` on new instances so deduplication works even before the first `Add` call.
3. If there are other code paths that mutate `c.asns` directly (outside of `Add`/`Clone`), they should either:
- be refactored to go through `Add`, or
- update `c.seen` consistently when they modify `c.asns`.
</issue_to_address>
### Comment 4
<location> `asn/fetcher_bgphe.go:18-19` </location>
<code_context>
+ Client *http.Client
+}
+
+// Match prefix links like: <a href="/net/31.13.24.0/21">31.13.24.0/21</a>
+var bgpHePrefixRe = regexp.MustCompile(`<a href="/net/([0-9./a-fA-F]+)"[^>]*>`)
+
+func NewBGPHE() *BGPHE {
</code_context>
<issue_to_address>
**issue (bug_risk):** BGPHE regex won't match IPv6 prefixes because it excludes ':' from the character class.
Because the pattern only permits digits, `.`, `/`, and hex letters, it will fail to match IPv6 prefixes like `2001:db8::/32` (which contain `:`), so IPv6 entries from bgp.he.net will be dropped.
Consider extending the character class to include `:`, e.g. `([0-9a-fA-F:.]+)`, or a stricter variant that still covers all valid IPv6 prefixes in the HTML.
</issue_to_address>
### Comment 5
<location> `asn/store.go:18-27` </location>
<code_context>
+}
+
+// NewStore creates a new ASN store.
+func NewStore(cacheDir string) (*Store, error) {
+ s := &Store{
+ cacheDir: cacheDir,
+ fetchers: []Fetcher{
+ NewRIPE(),
+ NewRouteViews(),
+ NewBGPHE(),
+ },
+ }
+
+ return s, nil
+}
+
</code_context>
<issue_to_address>
**nitpick:** NewStore returns an error but currently cannot fail; consider simplifying the API or adding validation.
Since the function always returns `nil`, either drop the error from the signature for a cleaner API, or introduce concrete validation of `cacheDir` (e.g., non-empty, writable) so the error path is meaningful.
</issue_to_address>
### Comment 6
<location> `asn/store.go:92-101` </location>
<code_context>
+}
+
+func (s *Store) fetch(asn int) ([]netip.Prefix, error) {
+ var lastErr error
+ for _, fetcher := range s.fetchers {
+ prefixes, err := fetcher.Fetch(asn)
+ if err != nil {
+ lastErr = err
+ continue
+ }
+ if len(prefixes) > 0 {
+ return prefixes, nil
+ }
+ }
+
+ return nil, fmt.Errorf("all fetchers failed for ASN %d: %w", asn, lastErr)
+}
</code_context>
<issue_to_address>
**suggestion:** fetch returns an error wrapping a possibly nil lastErr when all fetchers return no prefixes but no error.
When all fetchers return `(nil, nil)` or `([]Prefix{}, nil)`, `lastErr` remains `nil`, so the function returns `fmt.Errorf("all fetchers failed...: %w", nil)`. That produces a non-nil error whose wrapped cause is `<nil>`, which can be misleading.
Consider tracking whether any error actually occurred (e.g. a `hadErr` flag) and either returning a clearer message like "no prefixes from any fetcher" or wrapping the last non-nil error only when one exists.
Suggested implementation:
```golang
func (s *Store) fetch(asn int) ([]netip.Prefix, error) {
var (
lastErr error
hadErr bool
)
for _, fetcher := range s.fetchers {
prefixes, err := fetcher.Fetch(asn)
if err != nil {
hadErr = true
lastErr = err
continue
}
if len(prefixes) > 0 {
return prefixes, nil
}
}
if hadErr {
return nil, fmt.Errorf("all fetchers failed for ASN %d: %w", asn, lastErr)
}
return nil, fmt.Errorf("no prefixes found for ASN %d from any fetcher", asn)
}
// NewStore creates a new ASN store.
func NewStore(cacheDir string) (*Store, error) {
```
1. Ensure the file imports the `fmt` package if it is not already imported:
- If you have an import block like `import (...)`, add `fmt` inside it.
- If there is a single-line import, convert to a block or add another import line for `fmt`.
2. If you have a custom error type or error-wrapping helper used elsewhere in this package, you may want to use that instead of `fmt.Errorf` for consistency.
</issue_to_address>
### Comment 7
<location> `knownbots_test.go:181` </location>
<code_context>
for _, tt := range tests {
+ tree := NewIPTree()
+ tree.Add(netip.MustParsePrefix(tt.cidr))
bot := &Bot{
- custom: &atomic.Pointer[[]IPPrefix]{},
+ ips: &atomic.Pointer[IPTree]{},
</code_context>
<issue_to_address>
**issue (testing):** Add tests that exercise ASN-based verification in the Validator/verifyIP flow
Current tests only cover the new `IPTree` usage in `ContainsIP`; they don’t exercise the ASN-based verification path in the bot/validator flow. Please add tests that:
- Build a `Bot` with an `asn.ASN` containing a known prefix, run it through `Validator`/`verifyIP`, and assert that an IP in that ASN is `StatusVerified` with `IsBot=true` even if it’s not in explicit IP ranges and before RDNS.
- Use an IP not in any configured ASN and assert that verification still fails.
This will confirm the new ASN logic is actually used end-to-end in bot verification.
</issue_to_address>
### Comment 8
<location> `asn/asn_test.go:8` </location>
<code_context>
+ "testing"
+)
+
+func TestNewASN(t *testing.T) {
+ cache := NewASN()
+ if cache == nil {
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for ASN Store (Load/Refresh/save) using a temporary directory and stub fetcher
The core ASN logic is covered, but the `Store` layer that manages JSON persistence and fetchers isn’t. In `store_test.go`, you could: (1) use a temporary directory for `cacheDir`, (2) pass in a fake `Fetcher` that returns a fixed set of prefixes (and optional errors), and (3) assert that `Refresh` writes the expected JSON and `Load` reconstructs the correct `[]netip.Prefix`. It’s also worth adding a case with invalid prefixes in the cache file to ensure they’re skipped cleanly.
Suggested implementation:
```golang
package asn
import (
"context"
"encoding/json"
"errors"
"net/netip"
"os"
"path/filepath"
"testing"
"time"
)
// fakeFetcher is a stub implementation of the ASN Fetcher used by Store.
// Adjust the method signatures to match the real Fetcher interface.
type fakeFetcher struct {
prefixes []netip.Prefix
err error
}
func (f *fakeFetcher) Fetch(ctx context.Context) ([]netip.Prefix, error) {
// Simulate a bit of work and support cancellation
select {
case <-ctx.Done():
return nil, ctx.Err()
case <-time.After(1 * time.Millisecond):
}
if f.err != nil {
return nil, f.err
}
return append([]netip.Prefix(nil), f.prefixes...), nil
}
func TestNewASN(t *testing.T) {
cache := NewASN()
if cache == nil {
t.Fatal("expected non-nil cache")
}
if cache.IPv4Tree() == nil {
t.Error("expected non-nil IPv4Tree")
}
if cache.IPv6Tree() == nil {
t.Error("expected non-nil IPv6Tree")
}
}
// TestStoreRefreshAndLoad verifies that Refresh persists prefixes to disk as JSON
// and Load reconstructs the correct []netip.Prefix from the cache file.
func TestStoreRefreshAndLoad(t *testing.T) {
t.Parallel()
// Temporary directory for this test's cache files
cacheDir := t.TempDir()
// Prepare a deterministic set of prefixes returned by the fake fetcher
wantPrefixes := []netip.Prefix{
netip.MustParsePrefix("1.2.3.0/24"),
netip.MustParsePrefix("2001:db8::/32"),
}
ff := &fakeFetcher{
prefixes: wantPrefixes,
}
// NOTE: adjust constructor and arguments to match the real Store API.
store := NewStore(cacheDir, ff)
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
// First call should fetch from the fake fetcher and persist to disk.
if err := store.Refresh(ctx); err != nil {
t.Fatalf("Refresh() error = %v, want nil", err)
}
// Ensure the cache file exists and contains what we expect in JSON form.
// If the real implementation uses a different filename, update cacheFileName.
cacheFileName := filepath.Join(cacheDir, "asn_cache.json")
data, err := os.ReadFile(cacheFileName)
if err != nil {
t.Fatalf("failed to read cache file %q: %v", cacheFileName, err)
}
var gotJSON []string
if err := json.Unmarshal(data, &gotJSON); err != nil {
t.Fatalf("failed to unmarshal cache JSON: %v", err)
}
if len(gotJSON) != len(wantPrefixes) {
t.Fatalf("unexpected number of prefixes in JSON: got %d, want %d", len(gotJSON), len(wantPrefixes))
}
for i, p := range wantPrefixes {
if gotJSON[i] != p.String() {
t.Errorf("JSON[%d] = %q, want %q", i, gotJSON[i], p.String())
}
}
// Now construct a new Store pointing at the same cacheDir and verify Load
// reconstructs the original prefixes from the JSON cache file.
store2 := NewStore(cacheDir, &fakeFetcher{
// Configure with an error to ensure Load does not call Fetch.
err: errors.New("fetch should not be called during Load"),
})
loaded, err := store2.Load()
if err != nil {
t.Fatalf("Load() error = %v, want nil", err)
}
if len(loaded) != len(wantPrefixes) {
t.Fatalf("Load() returned %d prefixes, want %d", len(loaded), len(wantPrefixes))
}
for i, want := range wantPrefixes {
if loaded[i] != want {
t.Errorf("Load()[%d] = %v, want %v", i, loaded[i], want)
}
}
}
// TestStoreLoadSkipsInvalidPrefixes ensures that invalid prefixes in the cache
// file are skipped cleanly and do not cause Load to fail.
func TestStoreLoadSkipsInvalidPrefixes(t *testing.T) {
t.Parallel()
cacheDir := t.TempDir()
// Manually write a cache file that mixes valid and invalid prefixes.
cacheFileName := filepath.Join(cacheDir, "asn_cache.json")
raw := []string{
"203.0.113.0/24", // valid
"not-a-prefix", // invalid
"2001:db8::/32", // valid
"203.0.113.256/32", // invalid IP
}
data, err := json.Marshal(raw)
if err != nil {
t.Fatalf("failed to marshal test cache JSON: %v", err)
}
if err := os.WriteFile(cacheFileName, data, 0o644); err != nil {
t.Fatalf("failed to write test cache file: %v", err)
}
// A Store whose fetcher should not be used during Load.
store := NewStore(cacheDir, &fakeFetcher{
err: errors.New("fetch should not be called during Load"),
})
got, err := store.Load()
if err != nil {
t.Fatalf("Load() error = %v, want nil", err)
}
// Only the valid prefixes should be returned.
want := []netip.Prefix{
netip.MustParsePrefix("203.0.113.0/24"),
netip.MustParsePrefix("2001:db8::/32"),
}
if len(got) != len(want) {
t.Fatalf("Load() returned %d prefixes, want %d", len(got), len(want))
}
for i, w := range want {
if got[i] != w {
t.Errorf("Load()[%d] = %v, want %v", i, got[i], w)
}
}
}
```
These tests assume the existence of a `Store` abstraction and related APIs. To integrate them with your actual codebase, you may need to:
1. **Adjust the Store API usage:**
- Replace `NewStore(cacheDir, ff)` with the real constructor (e.g. `NewStore(cacheDir, WithFetcher(ff))` or similar).
- Update `store.Refresh(ctx)` and `store.Load()` to match your method names and signatures (for example, `store.Refresh(ctx, force bool)` or `store.Load(ctx)`).
2. **Align with the real Fetcher interface:**
- The stub `fakeFetcher` currently implements `Fetch(ctx context.Context) ([]netip.Prefix, error)`.
- If your real `Fetcher` interface has a different name, method, or return type, update `fakeFetcher` accordingly and ensure it satisfies the interface.
3. **Match the on-disk cache format and filename:**
- Tests currently assume the cache is JSON-encoded as `[]string` of prefix CIDRs and stored in `asn_cache.json` inside `cacheDir`.
- If your implementation uses a different filename, directory structure, or JSON schema (e.g. objects with ASN metadata), update:
- `cacheFileName` paths in tests.
- The `gotJSON` decoding and assertions.
- The `raw` data written in `TestStoreLoadSkipsInvalidPrefixes`.
4. **Confirm when Fetch is called vs Load:**
- The tests assume `Load()` reads only from disk and does not trigger a fetch; if your design differs, either:
- Adjust the expectations (e.g., allow Fetch on Load), or
- Update the implementation to keep Load as a pure disk read, and `Refresh` as the network/Fetcher path.
5. **Parallel test safety:**
- These tests rely on `t.TempDir()` to avoid shared state; ensure that your `Store` doesn’t introduce global state (e.g., fixed cache path) that could conflict across parallel tests.
</issue_to_address>
### Comment 9
<location> `README.md:260` </location>
<code_context>
+ └──────────┘ └──────────────┘ │ RDNS │
</code_context>
<issue_to_address>
**issue (typo):** Fix probable typo `rdns=t` to `rdns=true` in the diagram.
In the RDNS cache box, the label `(rdns=t)` is inconsistent with the earlier `(rdns=true)` usage in the README and original diagram. Please update it to `(rdns=true)` for consistency.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Benchmark Results |
- Add Bot method encapsulation explanation to Architecture Decisions - Update AGENTS.md directory structure (cache.go -> rdns.go, add asn/) - Update file organization guidelines
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #14 +/- ##
===========================================
- Coverage 72.76% 62.61% -10.16%
===========================================
Files 15 23 +8
Lines 661 987 +326
===========================================
+ Hits 481 618 +137
- Misses 136 309 +173
- Partials 44 60 +16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Delete Reload method as it's no longer needed - Remove associated tests (TestValidator_Reload, BenchmarkReload) - Bot lifecycle now handled entirely through initBot during New() and periodic refresh via background scheduler
- Delete Reload from API Reference methods list - Replace 'Step 4: Reload Configuration' with 'Step 4: Restart Validator' noting that bot config changes require restarting the validator
- Create Makefile with common commands (test, test-short, test-integration, bench, fmt, vet) - Update test.yml to use make commands for consistency and maintainability
- ASN.Add and ASN.Contains now use atomic.Pointer for v4/v6 trees - Remove redundant error return from asn.NewStore (always returns nil) - Add ASN deduplication in Add to prevent duplicate ASN numbers - Update tests to use new Add signature (void return)
- Replace MustParseAddr with ParseAddr in ASN verification path to prevent panics on malformed client input - Fix BGPHE regex to include ':' character for IPv6 prefix matching (e.g., 2001:db8::/32) - Add nil check for bot.rdns to avoid potential nil pointer dereference - Add comprehensive tests for invalid IP handling across all verification paths - Add regex unit tests covering IPv4, IPv6 (uppercase/lowercase), and mixed scenarios
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 4 issues, and left some high level feedback:
- The ASN cache uses shared
TreeV4/TreeV6instances withAddandContainscalled from the scheduler and validation concurrently; consider cloning-and-swap or other synchronization instead of mutating the same Patricia trees to avoid data races. - In
Bot.initIPsyoureturnwhenlen(b.URLs) > 0, which prevents URL-based bots from loading their previously persistedips.txtcache on startup; if the intent is to warm the in-memory IPTree from the last snapshot, this condition should be revisited. - The
bench-outputMakefile target usesgo test ... -o benchmark_output.txt, which writes a test binary rather than benchmark text; if you intend to capture human-readable benchmark results, drop-oand redirect stdout instead, or adjust the workflow naming to reflect that this file is a binary.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The ASN cache uses shared `TreeV4/TreeV6` instances with `Add` and `Contains` called from the scheduler and validation concurrently; consider cloning-and-swap or other synchronization instead of mutating the same Patricia trees to avoid data races.
- In `Bot.initIPs` you `return` when `len(b.URLs) > 0`, which prevents URL-based bots from loading their previously persisted `ips.txt` cache on startup; if the intent is to warm the in-memory IPTree from the last snapshot, this condition should be revisited.
- The `bench-output` Makefile target uses `go test ... -o benchmark_output.txt`, which writes a test binary rather than benchmark text; if you intend to capture human-readable benchmark results, drop `-o` and redirect stdout instead, or adjust the workflow naming to reflect that this file is a binary.
## Individual Comments
### Comment 1
<location> `asn/asn.go:14-23` </location>
<code_context>
+type ASN struct {
</code_context>
<issue_to_address>
**issue (bug_risk):** Concurrent mutation of patricia trees in ASN is not synchronized and can race with concurrent Contains/Clone calls
`atomic.Pointer` only protects the root pointer; the underlying `*uint_tree.TreeV4`/`TreeV6` are mutated in `Add` while being read in `Contains` and cloned in `Clone` without synchronization. Since the patricia trees are not documented as safe for concurrent mutation and read, this can cause data races under concurrent `VerifyIP`/scheduler refresh. Consider either building new trees and atomically swapping the root (copy-on-write) or guarding all tree access with a `sync.RWMutex` instead of relying solely on `atomic.Pointer`.
</issue_to_address>
### Comment 2
<location> `asn/asn.go:37-45` </location>
<code_context>
+
+// Add adds all prefixes from an ASN to the cache (thread-safe).
+func (c *ASN) Add(asn int, prefixes []netip.Prefix) {
+ // Deduplicate ASN numbers
+ seen := false
+ for _, existing := range c.asns {
+ if existing == asn {
+ seen = true
+ break
+ }
+ }
+ if !seen {
+ c.asns = append(c.asns, asn)
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** ASN list stored in ASN.asns is mutated without synchronization and exposed directly via ASNs()
This can cause data races under concurrent refreshes and reads, and callers of `ASNs()` can also unintentionally mutate the shared backing array. Please either synchronize access (e.g., guard `asns` with a mutex) or treat it as immutable by atomically swapping a pointer to a new slice (as with the trees) and returning a copy from `ASNs()`.
</issue_to_address>
### Comment 3
<location> `validator_invalid_ip_test.go:112-17` </location>
<code_context>
+ t.Fatal("testbot not found")
+ }
+
+ asnCache := asn.NewASN()
+ testBot.asns = asnCache
+
+ testCases := []struct {
+ name string
+ ip string
+ shouldFail bool
+ }{
+ {"valid IPv4", "8.8.8.8", false},
+ {"valid IPv6", "2001:db8::1", false},
+ {"invalid IP", "invalid-ip", true},
+ {"malformed IPv4", "999.999.999.999", true},
+ {"malformed IPv6", "2001:db8::gggg", true},
+ {"empty string", "", true},
+ }
+
+ for _, tc := range testCases {
</code_context>
<issue_to_address>
**issue (testing):** TestVerifyIPWithASN never asserts a successful ASN-based verification, so the positive path isn’t exercised
In `TestVerifyIPWithASN`, you create `asnCache := asn.NewASN()` and assign it to `testBot.asns`, but never add any prefixes. That means `v.verifyIP(testBot, tc.ip)` can’t return `StatusVerified`, so the test only exercises failure/non-panic cases. Please extend this test (or add another) to populate `asnCache` with prefixes covering at least one "valid" IP and assert that `verifyIP` returns `StatusVerified`, so the ASN-based success path is actually verified.
</issue_to_address>
### Comment 4
<location> `asn/fetcher_ripe_test.go:7-16` </location>
<code_context>
+ "testing"
+)
+
+func TestIntegration_RIPE(t *testing.T) {
+ if testing.Short() {
+ t.Skip("Skipping integration test in short mode")
+ }
+
+ fetcher := NewRIPE()
+
+ // Test with Google ASN (AS15169)
+ prefixes, err := fetcher.Fetch(15169)
+ if err != nil {
+ t.Fatalf("failed to fetch prefixes from RIPE: %v", err)
+ }
+
+ if len(prefixes) == 0 {
+ t.Error("expected some prefixes for AS15169, got none")
+ }
+
+ // Verify some expected Google prefixes are present
+ foundGooglePrefix := false
+ for _, p := range prefixes {
+ if p.String() == "8.8.8.0/24" {
+ foundGooglePrefix = true
+ break
+ }
+ }
+ if !foundGooglePrefix {
+ t.Logf("did not find 8.8.8.0/24, but got %d prefixes total", len(prefixes))
+ }
+
+ // Verify all prefixes are valid
+ for _, p := range prefixes {
+ if !ValidatePrefix(p) {
+ t.Errorf("invalid prefix: %s", p.String())
+ }
+ }
+
+ t.Logf("RIPE fetched %d prefixes for AS15169 (Google)", len(prefixes))
+}
</code_context>
<issue_to_address>
**suggestion (testing):** Integration tests for external ASN providers rely on live network services and could benefit from complementary mocked tests
These tests are well-flagged as integration tests and gated by `testing.Short()`, but they still depend on live external services, response formats, and specific ASNs/prefixes, which can become flaky. Please add complementary unit tests that exercise the ASN fetching logic via a stubbed `http.Client` or an interface around the fetchers so you can deterministically cover cases like successful JSON with known prefixes, HTTP errors, malformed JSON, and empty results. That way parsing and error handling are reliably validated, with these integration tests as an additional safety net rather than the only coverage for these paths.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Move asn.go/asn_test.go from asn/ to root directory - Simplify ASN to use existing IPTree instead of separate v4/v6 trees - Remove duplicate patricia tree logic - ASN struct now immutable with single tree field - Performance improved to ~16.59ns for Contains operation (from ~19ns) - Changed Bot.asns from *asn.ASN to *atomic.Pointer[ASN] for thread-safety - Fixed refreshASN() to rebuild entire ASN cache (Copy-on-Write pattern) - Move ValidatePrefix to asn/util.go to avoid import cycle - All tests passing
- Changed from '-o benchmark_output.txt' (writes test binary) to stdout redirect - Added benchmark_output.txt to .gitignore - Now properly captures human-readable benchmark results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 8 issues, and left some high level feedback:
- Consider applying
FilterInvalidPrefixesandDeduplicateto the prefix lists returned from ASN fetchers (e.g., inasn.Store.fetchorBot.initASN/refreshASN) before adding them to the trees to avoid storing/0, loopback/multicast ranges and duplicate entries in the ASN cache. ASN.ASNs()currently returns the internal slice directly; if you expect external callers to use this, it would be safer to return a copy to prevent accidental mutation of the cache’s internal state.- In
Validator.New, you eagerly create per-bot directories for all bots even if they do not use IP files, ASN, or RDNS; you could short-circuit directory creation for bots without any on-disk state to reduce filesystem churn during initialization.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider applying `FilterInvalidPrefixes` and `Deduplicate` to the prefix lists returned from ASN fetchers (e.g., in `asn.Store.fetch` or `Bot.initASN/refreshASN`) before adding them to the trees to avoid storing `/0`, loopback/multicast ranges and duplicate entries in the ASN cache.
- `ASN.ASNs()` currently returns the internal slice directly; if you expect external callers to use this, it would be safer to return a copy to prevent accidental mutation of the cache’s internal state.
- In `Validator.New`, you eagerly create per-bot directories for all bots even if they do not use IP files, ASN, or RDNS; you could short-circuit directory creation for bots without any on-disk state to reduce filesystem churn during initialization.
## Individual Comments
### Comment 1
<location> `asn.go:30-39` </location>
<code_context>
+}
+
+// Add adds all prefixes from an ASN to the cache.
+func (c *ASN) Add(asn int, prefixes []netip.Prefix) {
+ seen := false
+ for _, existing := range c.asns {
+ if existing == asn {
+ seen = true
+ break
+ }
+ }
+ if !seen {
+ c.asns = append(c.asns, asn)
+ }
+
+ for _, prefix := range prefixes {
+ c.tree.Add(prefix)
+ }
</code_context>
<issue_to_address>
**suggestion (performance):** Consider deduplicating and validating prefixes before adding them to the IPTree.
`Add` currently inserts every prefix into `IPTree` as-is, and `Store.fetch` / `Store.Refresh` pass through upstream results unchanged. Since you already have `Deduplicate` and `FilterInvalidPrefixes`, consider applying them either inside `Add` or at the call sites (`initASN`, `refreshASN`, `Store.fetch`) to keep the tree free of duplicates and invalid prefixes (loopback, multicast, `/0`), which should reduce memory and speed up lookups.
Suggested implementation:
```golang
// Add adds all prefixes from an ASN to the cache.
func (c *ASN) Add(asn int, prefixes []netip.Prefix) {
// Track which ASNs have been added.
seen := false
for _, existing := range c.asns {
if existing == asn {
seen = true
break
}
}
if !seen {
c.asns = append(c.asns, asn)
}
// Sanitize prefixes before inserting into the IPTree to avoid
// duplicates and invalid prefixes (loopback, multicast, /0, etc.).
prefixes = FilterInvalidPrefixes(prefixes)
prefixes = Deduplicate(prefixes)
for _, prefix := range prefixes {
c.tree.Add(prefix)
}
```
These changes assume the existing helpers have (or can be adjusted to have) the following signatures in the same package as `asn.go`:
- `func FilterInvalidPrefixes(prefixes []netip.Prefix) []netip.Prefix`
- `func Deduplicate(prefixes []netip.Prefix) []netip.Prefix`
If their signatures differ (e.g. methods on `ASN`, returning maps, taking additional parameters), adapt the two calls inside `Add` accordingly. Also ensure `asn.go` already imports `net/netip`; if not, add `net/netip` to the imports.
</issue_to_address>
### Comment 2
<location> `asn/store.go:41-49` </location>
<code_context>
+ return prefixes
+}
+
+func (s *Store) Load(botName string, asn int) []netip.Prefix {
+ path := filepath.Join(s.cacheDir, botName, fmt.Sprintf("AS%d.json", asn))
+ data, err := os.ReadFile(path)
+ if err != nil {
+ return nil
+ }
+
+ var prefixStrings []string
+ if err := json.Unmarshal(data, &prefixStrings); err != nil {
+ return nil
+ }
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Loading cached ASN JSON fails silently; consider surfacing or logging the error.
In `Load`, any error from `os.ReadFile` or `json.Unmarshal` leads to returning `nil`, making a corrupted cache indistinguishable from a missing one. This can hide bad cache files and result in empty prefixes without visibility. When `EnableLog` is on, consider logging these failures so operators can notice and fix corrupted cache files.
Suggested implementation:
```golang
func (s *Store) Load(botName string, asn int) []netip.Prefix {
path := filepath.Join(s.cacheDir, botName, fmt.Sprintf("AS%d.json", asn))
data, err := os.ReadFile(path)
if err != nil {
if s.EnableLog {
log.Printf("asn store: failed to read cache file %q for bot %q ASN %d: %v", path, botName, asn, err)
}
return nil
}
var prefixStrings []string
if err := json.Unmarshal(data, &prefixStrings); err != nil {
if s.EnableLog {
log.Printf("asn store: failed to unmarshal cache file %q for bot %q ASN %d: %v", path, botName, asn, err)
}
return nil
}
```
To fully wire this up you’ll likely need to:
1. Ensure the `Store` type has a boolean field named `EnableLog` (or adjust the code above to match the existing flag name you already use).
2. Make sure `asn/store.go` imports the standard `log` package in its import block:
- Add `"log"` to the existing `import (...)` list, without removing any existing imports.
If you already have a structured logger or helper (e.g., `s.logf` or similar) used elsewhere in this file, you may want to replace the `log.Printf(...)` calls with that helper to stay consistent with your logging conventions.
</issue_to_address>
### Comment 3
<location> `validator.go:185` </location>
<code_context>
- // RDNS verification
- if bot.RDNS {
+ // ASN verification (fast after cache load, ~100ns)
+ if bot.asns != nil {
+ cache := bot.asns.Load()
</code_context>
<issue_to_address>
**suggestion (performance):** IP address is parsed twice in verifyIP; you can parse once and reuse for ASN checks.
`ContainsIP` already parses `ipStr` with `netip.ParseAddr`, and the ASN branch parses it again. If this is a hot path, parse once at the start of `verifyIP`, then pass the `netip.Addr` to both the prefix tree and ASN logic. This also lets you skip ASN verification when parsing fails.
Suggested implementation:
```golang
// Verification order: IP ranges → ASN → RDNS (fastest to slowest)
func (v *Validator) verifyIP(bot *Bot, ipStr string) Result {
// Parse IP once and reuse for range + ASN checks
ip, ipErr := netip.ParseAddr(ipStr)
// Check IP ranges first (fastest, ~200ns)
if ipErr == nil {
// Uses parsed netip.Addr to avoid re-parsing in hot path
if bot.ContainsIPAddr(ip) {
return Result{BotName: bot.Name, BotKind: bot.Kind, Status: StatusVerified, IsBot: true}
}
} else {
// Fallback to existing string-based check if parsing fails
if bot.ContainsIP(ipStr) {
return Result{BotName: bot.Name, BotKind: bot.Kind, Status: StatusVerified, IsBot: true}
}
}
// ASN verification (fast after cache load, ~100ns)
if bot.asns != nil && ipErr == nil {
cache := bot.asns.Load()
if cache != nil && cache.Contains(ip) {
return Result{BotName: bot.Name, BotKind: bot.Kind, Status: StatusVerified, IsBot: true}
}
}
// RDNS verification (50-200ms cold, ~450ns cached)
if bot.RDNS && bot.rdns != nil {
```
To make this compile and fully avoid double parsing in the hot path, you will also need to:
1. **Add a parsed-addr helper to `Bot`** (or wherever `ContainsIP` is implemented), for example:
```go
// ContainsIPAddr should perform the same prefix/range logic as ContainsIP,
// but assumes the IP has already been parsed.
func (b *Bot) ContainsIPAddr(ip netip.Addr) bool {
// Reuse the existing prefix tree / range logic here, but DO NOT parse again.
// Example (adapt to your actual implementation):
// return b.prefixTree.Contains(ip)
panic("implement ContainsIPAddr using existing range logic without re-parsing")
}
```
Ensure `validator.go` (and the file defining `Bot`) import `net/netip` if they do not already:
```go
import (
// ...
"net/netip"
)
```
2. **Remove or refactor any remaining internal `netip.ParseAddr` calls** inside the range-checking logic that are no longer necessary when using `ContainsIPAddr(netip.Addr)`, so that valid IPs are parsed exactly once in `verifyIP`.
3. Optionally, once all callers are migrated to `ContainsIPAddr`, you can simplify `ContainsIP(string)` to just parse and delegate to `ContainsIPAddr` to centralize the logic:
```go
func (b *Bot) ContainsIP(ipStr string) bool {
ip, err := netip.ParseAddr(ipStr)
if err != nil {
return false
}
return b.ContainsIPAddr(ip)
}
```
</issue_to_address>
### Comment 4
<location> `bot.go:122-131` </location>
<code_context>
+ b.ips.Store(tree)
+}
+
+func (b *Bot) initIPs(path string) {
+ if len(b.URLs) > 0 {
+ return
+ }
+
+ data, err := os.ReadFile(path)
+ if err != nil {
+ return
+ }
+
+ var prefixes []netip.Prefix
+ for _, line := range strings.Split(string(data), "\n") {
+ line = strings.TrimSpace(line)
+ if line == "" {
+ continue
+ }
+ prefix, err := netip.ParsePrefix(line)
+ if err != nil {
+ continue
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Ignored file and parse errors in initIPs may hide configuration issues.
In `initIPs`, both `os.ReadFile` failures and `ParsePrefix` errors are ignored, so a bad or unreadable `ips.txt` leaves the bot with no prefixes and no indication of why. Since `EnableLog` is already available and used elsewhere, consider logging at least the file read error (and optionally the number of malformed lines) so misconfigured or corrupted cache files are detectable.
Suggested implementation:
```golang
func (b *Bot) initIPs(path string) {
if len(b.URLs) > 0 {
return
}
data, err := os.ReadFile(path)
if err != nil {
if b.EnableLog {
log.Printf("initIPs: failed to read IP prefixes from %q: %v", path, err)
}
return
}
var (
prefixes []netip.Prefix
malformed int
)
for _, line := range strings.Split(string(data), "\n") {
line = strings.TrimSpace(line)
if line == "" {
continue
}
prefix, err := netip.ParsePrefix(line)
if err != nil {
malformed++
continue
}
prefixes = append(prefixes, prefix)
}
if malformed > 0 && b.EnableLog {
log.Printf("initIPs: ignored %d malformed IP prefix line(s) in %q", malformed, path)
}
b.storePrefixes(prefixes)
}
```
1. Ensure the file imports the standard `log` package if it is not already present:
- Add `log` to the import list: `import "log"`.
2. This edit assumes `b.EnableLog` is a boolean flag used elsewhere in the file to guard logging. If the logging convention is different (e.g., a method or logger field), replace the `if b.EnableLog { log.Printf(...) }` blocks with the appropriate logging calls consistent with the rest of the codebase.
</issue_to_address>
### Comment 5
<location> `validator_invalid_ip_test.go:69` </location>
<code_context>
+ }
+}
+
+func TestVerifyIPWithASN(t *testing.T) {
+ tmpDir := t.TempDir()
+
</code_context>
<issue_to_address>
**issue (testing):** TestVerifyIPWithASN never exercises a successful ASN-based verification path
`TestVerifyIPWithASN` builds an ASN cache but never populates it, so `v.verifyIP(testBot, tc.ip)` never exercises a successful ASN match. As written, it only checks that invalid inputs don’t panic, not that ASN-backed verification works. Please add a case that seeds `asnCache` with a known prefix (e.g. `8.8.8.0/24`) and asserts that `verifyIP` returns `StatusVerified`, `IsBot=true`, and correct bot attributes for matching IPs.
</issue_to_address>
### Comment 6
<location> `knownbots_test.go:181` </location>
<code_context>
+func TestContainsIP(t *testing.T) {
</code_context>
<issue_to_address>
**suggestion (testing):** IPTree itself is not directly tested, only indirectly via Bot.ContainsIP
Since `IPTree` is now the core structure backing both custom ranges and ASN prefixes, it should have its own focused tests (e.g. IPv4 vs IPv6, overlapping prefixes, non‑added IPs, `Count()` behaviour, and invalid prefixes from `patricia.ParseFromNetIPPrefix`) to localize failures and strengthen correctness guarantees beyond what we get indirectly via `Bot.ContainsIP`.
</issue_to_address>
### Comment 7
<location> `bot_asn_refresh_test.go:9-18` </location>
<code_context>
+func TestRefreshASNRebuildsCache(t *testing.T) {
</code_context>
<issue_to_address>
**suggestion (testing):** TestRefreshASNRebuildsCache bypasses Bot.initASN/refreshASN and does not cover their logic
This test only mutates `bot.asns` and never calls `Bot.initASN` or `Bot.refreshASN`, so it doesn’t verify the real refresh path using `asn.Store`. Please add tests that use a fake/temporary `asn.Store` (controlling `Load`/`Refresh` output), invoke `initASN`/`refreshASN`, and assert that the cache is rebuilt correctly (including handling of empty results so stale prefixes aren’t retained).
Suggested implementation:
```golang
package knownbots
import (
"context"
"net/netip"
"sync/atomic"
"testing"
)
// fakeASNStore is a test double for asn.Store that allows tests to control
// the values returned by Load and Refresh.
type fakeASNStore struct {
loadCalled atomic.Bool
refreshCalled atomic.Bool
loadResult map[int][]netip.Prefix
refreshResult map[int][]netip.Prefix
loadErr error
refreshErr error
}
func (f *fakeASNStore) Load(ctx context.Context) (map[int][]netip.Prefix, error) {
f.loadCalled.Store(true)
return f.loadResult, f.loadErr
}
func (f *fakeASNStore) Refresh(ctx context.Context) (map[int][]netip.Prefix, error) {
f.refreshCalled.Store(true)
return f.refreshResult, f.refreshErr
}
// TestInitASNUsesStoreAndPopulatesCache verifies that initASN calls the
// underlying asn.Store and populates the bot's ASN prefix cache.
func TestInitASNUsesStoreAndPopulatesCache(t *testing.T) {
t.Helper()
as15169Prefixes := []netip.Prefix{
netip.MustParsePrefix("8.8.8.0/24"),
netip.MustParsePrefix("8.8.4.0/24"),
}
as13335Prefixes := []netip.Prefix{
netip.MustParsePrefix("1.1.1.0/24"),
}
store := &fakeASNStore{
loadResult: map[int][]netip.Prefix{
15169: as15169Prefixes,
13335: as13335Prefixes,
},
}
bot := &Bot{
Name: "testbot",
ASN: []int{15169, 13335},
asnStore: store,
}
if err := bot.initASN(context.Background()); err != nil {
t.Fatalf("initASN() error = %v", err)
}
if !store.loadCalled.Load() {
t.Fatalf("initASN() did not call Store.Load")
}
for asn, want := range store.loadResult {
got := bot.asns.Load().(map[int][]netip.Prefix)[asn]
if len(got) != len(want) {
t.Fatalf("ASN %d: got %d prefixes, want %d", asn, len(got), len(want))
}
for i, p := range want {
if got[i] != p {
t.Fatalf("ASN %d prefix[%d]: got %v, want %v", asn, i, got[i], p)
}
}
}
}
// TestRefreshASNRebuildsCache verifies that refreshASN uses the asn.Store
// to replace the existing cache with the newly refreshed prefixes.
func TestRefreshASNRebuildsCache(t *testing.T) {
t.Helper()
initialPrefixesAS15169 := []netip.Prefix{
netip.MustParsePrefix("8.8.8.0/24"),
netip.MustParsePrefix("8.8.4.0/24"),
netip.MustParsePrefix("2001:db8::/32"),
}
initialPrefixesAS13335 := []netip.Prefix{
netip.MustParsePrefix("1.1.1.0/24"),
}
refreshedPrefixesAS15169 := []netip.Prefix{
netip.MustParsePrefix("8.8.8.0/24"), // kept
netip.MustParsePrefix("8.8.5.0/24"), // new
}
refreshedPrefixesAS13335 := []netip.Prefix{
netip.MustParsePrefix("1.1.1.0/24"), // kept
netip.MustParsePrefix("1.0.0.0/24"), // new
}
store := &fakeASNStore{
loadResult: map[int][]netip.Prefix{
15169: initialPrefixesAS15169,
13335: initialPrefixesAS13335,
},
refreshResult: map[int][]netip.Prefix{
15169: refreshedPrefixesAS15169,
13335: refreshedPrefixesAS13335,
},
}
bot := &Bot{
Name: "testbot",
ASN: []int{15169, 13335},
asnStore: store,
}
// Initialize from Load.
if err := bot.initASN(context.Background()); err != nil {
t.Fatalf("initASN() error = %v", err)
}
// Now refresh and ensure the cache is replaced with refreshResult.
if err := bot.refreshASN(context.Background()); err != nil {
t.Fatalf("refreshASN() error = %v", err)
}
if !store.refreshCalled.Load() {
t.Fatalf("refreshASN() did not call Store.Refresh")
}
gotCache, ok := bot.asns.Load().(map[int][]netip.Prefix)
if !ok {
t.Fatalf("bot.asns.Load() did not return map[int][]netip.Prefix")
}
for asn, want := range store.refreshResult {
got, ok := gotCache[asn]
if !ok {
t.Fatalf("ASN %d missing from cache after refresh", asn)
}
if len(got) != len(want) {
t.Fatalf("ASN %d: got %d prefixes after refresh, want %d", asn, len(got), len(want))
}
for i, p := range want {
if got[i] != p {
t.Fatalf("ASN %d prefix[%d]: got %v, want %v", asn, i, got[i], p)
}
}
}
}
// TestRefreshASNRemovesStalePrefixes verifies that when Refresh returns an
// empty result for a given ASN, stale prefixes from previous loads do not
// remain in the cache.
func TestRefreshASNRemovesStalePrefixes(t *testing.T) {
t.Helper()
initialPrefixesAS15169 := []netip.Prefix{
netip.MustParsePrefix("8.8.8.0/24"),
netip.MustParsePrefix("8.8.4.0/24"),
}
store := &fakeASNStore{
loadResult: map[int][]netip.Prefix{
15169: initialPrefixesAS15169,
},
// Simulate the backend returning no prefixes on refresh
// so that the cache should no longer contain any entries
// for 15169.
refreshResult: map[int][]netip.Prefix{},
}
bot := &Bot{
Name: "testbot",
ASN: []int{15169},
asnStore: store,
}
if err := bot.initASN(context.Background()); err != nil {
t.Fatalf("initASN() error = %v", err)
}
if err := bot.refreshASN(context.Background()); err != nil {
t.Fatalf("refreshASN() error = %v", err)
}
gotCache, ok := bot.asns.Load().(map[int][]netip.Prefix)
if !ok {
t.Fatalf("bot.asns.Load() did not return map[int][]netip.Prefix")
}
if _, ok := gotCache[15169]; ok {
t.Fatalf("expected ASN 15169 prefixes to be removed after refresh with empty result")
}
}
```
These test changes assume the following, which you may need to align with your existing code:
1. `Bot` has a field `asnStore` of an interface type compatible with `fakeASNStore`:
```go
type Store interface {
Load(ctx context.Context) (map[int][]netip.Prefix, error)
Refresh(ctx context.Context) (map[int][]netip.Prefix, error)
}
type Bot struct {
Name string
ASN []int
asnStore asn.Store
asns atomic.Value // storing map[int][]netip.Prefix
}
```
If your actual `asn.Store` interface or field names differ, adjust `fakeASNStore` and the `Bot` initialization accordingly.
2. `Bot.initASN(ctx)` should:
- Call `asnStore.Load(ctx)` and
- Replace the internal `asns` cache with the map returned by `Load`.
3. `Bot.refreshASN(ctx)` should:
- Call `asnStore.Refresh(ctx)` and
- Replace (not merge) `asns` with the returned map, so that when `Refresh` returns an empty map, previously cached prefixes are removed.
If your implementation currently merges results or keeps stale entries, you should update `refreshASN` to rebuild the cache using only the `Refresh` result for these tests to pass and to match the behavior described in your review comment.
</issue_to_address>
### Comment 8
<location> `asn/fetcher_ripe_test.go:7-16` </location>
<code_context>
+ "testing"
+)
+
+func TestIntegration_RIPE(t *testing.T) {
+ if testing.Short() {
+ t.Skip("Skipping integration test in short mode")
+ }
+
+ fetcher := NewRIPE()
+
+ // Test with Google ASN (AS15169)
+ prefixes, err := fetcher.Fetch(15169)
+ if err != nil {
+ t.Fatalf("failed to fetch prefixes from RIPE: %v", err)
+ }
+
+ if len(prefixes) == 0 {
+ t.Error("expected some prefixes for AS15169, got none")
+ }
+
+ // Verify some expected Google prefixes are present
+ foundGooglePrefix := false
+ for _, p := range prefixes {
+ if p.String() == "8.8.8.0/24" {
+ foundGooglePrefix = true
+ break
+ }
+ }
+ if !foundGooglePrefix {
+ t.Logf("did not find 8.8.8.0/24, but got %d prefixes total", len(prefixes))
+ }
+
+ // Verify all prefixes are valid
+ for _, p := range prefixes {
+ if !ValidatePrefix(p) {
+ t.Errorf("invalid prefix: %s", p.String())
+ }
+ }
+
+ t.Logf("RIPE fetched %d prefixes for AS15169 (Google)", len(prefixes))
+}
</code_context>
<issue_to_address>
**suggestion (testing):** Integration tests depend on external network services and could be flaky without additional safeguards
These tests are correctly marked as integration and skipped in `testing.Short()`, but they still rely on live services and network conditions. To keep CI stable while retaining real-provider coverage:
- Gate them behind an explicit env var (e.g. `if os.Getenv("KNOWNBOTS_ASN_INTEGRATION") == "" { t.Skip(...) }`).
- Add unit tests for each fetcher using `httptest.Server` to simulate success, malformed responses, and non-200 status codes so they can run reliably everywhere.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| func (b *Bot) initIPs(path string) { | ||
| if len(b.URLs) > 0 { | ||
| return | ||
| } | ||
|
|
||
| data, err := os.ReadFile(path) | ||
| if err != nil { | ||
| return | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Ignored file and parse errors in initIPs may hide configuration issues.
In initIPs, both os.ReadFile failures and ParsePrefix errors are ignored, so a bad or unreadable ips.txt leaves the bot with no prefixes and no indication of why. Since EnableLog is already available and used elsewhere, consider logging at least the file read error (and optionally the number of malformed lines) so misconfigured or corrupted cache files are detectable.
Suggested implementation:
func (b *Bot) initIPs(path string) {
if len(b.URLs) > 0 {
return
}
data, err := os.ReadFile(path)
if err != nil {
if b.EnableLog {
log.Printf("initIPs: failed to read IP prefixes from %q: %v", path, err)
}
return
}
var (
prefixes []netip.Prefix
malformed int
)
for _, line := range strings.Split(string(data), "\n") {
line = strings.TrimSpace(line)
if line == "" {
continue
}
prefix, err := netip.ParsePrefix(line)
if err != nil {
malformed++
continue
}
prefixes = append(prefixes, prefix)
}
if malformed > 0 && b.EnableLog {
log.Printf("initIPs: ignored %d malformed IP prefix line(s) in %q", malformed, path)
}
b.storePrefixes(prefixes)
}- Ensure the file imports the standard
logpackage if it is not already present:- Add
logto the import list:import "log".
- Add
- This edit assumes
b.EnableLogis a boolean flag used elsewhere in the file to guard logging. If the logging convention is different (e.g., a method or logger field), replace theif b.EnableLog { log.Printf(...) }blocks with the appropriate logging calls consistent with the rest of the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sourcery-ai review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add Sanitize() function in asn/util.go to filter invalid and deduplicate prefixes - Apply Sanitize() in Store.fetch() before saving to cache - Filters out /0, loopback, multicast, and unspecified prefixes - Removes duplicate prefixes in single pass for performance - Add comprehensive tests for Sanitize() function - Add error logging for IP cache file read failures in initIPs() - Performance: ~748ns/op for 8 prefixes with filtering and dedup
- Rename Deduplicate() to Distinct() for more idiomatic naming (SQL-like) - Merge Sanitize() into Distinct() - now validates and deduplicates in single pass - Remove redundant Sanitize() function - Update all tests and benchmarks to use Distinct() - Simpler API: one function does both filtering and deduplication
Summary by Sourcery
Add ASN-based bot verification with per-bot ASN configuration and caching, integrate it into the validator’s IP verification pipeline, and refactor bot IP/RDNS handling for better encapsulation and performance.
New Features:
Bug Fixes:
Enhancements:
Build:
CI:
Documentation:
Tests: