-
Notifications
You must be signed in to change notification settings - Fork 0
fix(conf): fixed amazonbot ip url #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Replace broken S3 URL with official Amazon IP list - Update reference URL to developer.amazon.com
- Add parser_amazon.go to extract IPs from Amazon's HTML page - Update amazonbot.yaml to use amazon parser with official URL - Parser handles embedded JSON in HTML and falls back to IP extraction
Reviewer's GuideUpdates Amazonbot configuration to use Amazon’s official IP address source and introduces a dedicated HTML/JSON parser to extract IPv4 prefixes from the Amazonbot IP listing page. Sequence diagram for Amazonbot IP parsing with AmazonParsersequenceDiagram
participant BotUpdater
participant HTTPClient
participant AmazonParser
BotUpdater->>HTTPClient: GET https://developer.amazon.com/amazonbot/ip-addresses/
HTTPClient-->>BotUpdater: HTML_with_embedded_JSON
BotUpdater->>AmazonParser: Parse(htmlReader)
AmazonParser->>AmazonParser: io.ReadAll(r)
AmazonParser->>AmazonParser: regexp.FindStringSubmatch for JSON_prefixes
alt JSON_prefixes_found
AmazonParser->>AmazonParser: json.Unmarshal(reconstructed_JSON)
alt JSON_unmarshal_success
AmazonParser-->>BotUpdater: []netip.Prefix_from_prefixes
else JSON_unmarshal_failure
AmazonParser->>AmazonParser: parseIPsFromText(raw_text)
AmazonParser-->>BotUpdater: []netip.Prefix_from_IPs
end
else JSON_prefixes_not_found
AmazonParser-->>BotUpdater: nil
end
Class diagram for new AmazonParser and parser registrationclassDiagram
class Parser {
<<interface>>
Name() string
Parse(r io.Reader) []netip.Prefix
}
class AmazonParser {
Name() string
Parse(r io.Reader) []netip.Prefix
parseIPsFromText(data string) []netip.Prefix
}
class ParserRegistry {
RegisterParser(name string, parser Parser)
GetParser(name string) Parser
}
Parser <|.. AmazonParser
ParserRegistry ..> Parser : uses
class PackageParserInit {
init()
}
PackageParserInit ..> ParserRegistry : calls_RegisterParser
PackageParserInit ..> AmazonParser : creates_instance
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #11 +/- ##
==========================================
- Coverage 74.84% 71.85% -2.99%
==========================================
Files 14 17 +3
Lines 640 732 +92
==========================================
+ Hits 479 526 +47
- Misses 117 152 +35
- Partials 44 54 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 1 issue, and left some high level feedback:
- In
AmazonParser.Parse, returning(nil, nil)when neither JSON pattern matches makes it hard for callers to distinguish between “no data” and “parse failed”; consider returning a descriptive error in that case. - The current JSON extraction via HTML regexes (
jsonPattern/jsonPattern2) is quite brittle; if possible, anchor the patterns more strictly to the surrounding markup or use an HTML parser to locate the JSON block to reduce false matches and breakage on minor page changes. - In
parseIPsFromText, you silently skip invalid IPs; if this is expected, consider at least counting or logging how many were discarded so callers can detect unexpected format changes, or document that behavior clearly in a comment.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `AmazonParser.Parse`, returning `(nil, nil)` when neither JSON pattern matches makes it hard for callers to distinguish between “no data” and “parse failed”; consider returning a descriptive error in that case.
- The current JSON extraction via HTML regexes (`jsonPattern`/`jsonPattern2`) is quite brittle; if possible, anchor the patterns more strictly to the surrounding markup or use an HTML parser to locate the JSON block to reduce false matches and breakage on minor page changes.
- In `parseIPsFromText`, you silently skip invalid IPs; if this is expected, consider at least counting or logging how many were discarded so callers can detect unexpected format changes, or document that behavior clearly in a comment.
## Individual Comments
### Comment 1
<location> `parser/parser_amazon.go:25-32` </location>
<code_context>
+ }
+
+ // Extract JSON from HTML page - look for the code block with prefixes
+ jsonPattern := regexp.MustCompile(`\{\s*"creationTime"\s*:\s*"[^"]+"\s*,\s*"prefixes"\s*:\s*\[([^\]]+)\]`)
+ matches := jsonPattern.FindStringSubmatch(string(data))
+ if len(matches) < 2 {
+ // Try alternate pattern - the full JSON object
+ jsonPattern2 := regexp.MustCompile(`\{[^{}]*"prefixes"\s*:\s*\[([^\]]*)\][^{}]*\}`)
+ matches = jsonPattern2.FindStringSubmatch(string(data))
+ if len(matches) < 2 {
+ return nil, nil
+ }
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider falling back to text-based IP extraction when both JSON regexes fail instead of returning nil, nil.
Returning `nil, nil` here makes it impossible for callers to tell whether parsing failed or the page truly had no prefixes, and it misses the chance to reuse the existing `parseIPsFromText` fallback. Invoking that fallback instead would make this path more robust to changes in the JSON structure while still extracting IPs when they’re present in the page text.
```suggestion
if len(matches) < 2 {
// Try alternate pattern - the full JSON object
jsonPattern2 := regexp.MustCompile(`\{[^{}]*"prefixes"\s*:\s*\[([^\]]*)\][^{}]*\}`)
matches = jsonPattern2.FindStringSubmatch(string(data))
if len(matches) < 2 {
// Fall back to text-based IP extraction if JSON parsing fails
return parseIPsFromText(data)
}
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Benchmark Results |
- Fix AmazonParser fallback to parse IPs from text when JSON extraction fails - Add unit tests for AmazonParser (JSON and fallback modes) - Add integration test for AmazonBot (parsed 519 prefixes) - Add AhrefsParser and integration test (parsed 9870 prefixes)
…ion test - Create parser/http.go with shared fetchFromURL helper function - Remove duplicate fetchFromURL from parser_google_test.go - Move AmazonBot integration test to parser_amazon_test.go - Update imports in parser_google_test.go
Summary by Sourcery
Update Amazonbot configuration to use a dedicated Amazon-specific parser and IP address source page.
New Features:
Enhancements: