Skip to content

Conversation

@lkxlzx
Copy link

@lkxlzx lkxlzx commented Dec 2, 2025

No description provided.

- 实现主动预取功能 (14个开发阶段)
  * 优先级队列和预取管理器
  * 智能调度和动态优先级
  * 批量处理和并发控制
  * 访问阈值和命中跟踪

- 实现预取统计 API
  * HTTP REST API 服务器
  * 11个统计字段 (包括增强功能)
  * 支持命令行和 YAML 配置

- 增强的统计功能
  * unique_domains - 唯一域名总数
  * total_processed - 已处理总数
  * total_refreshed/failed - 成功/失败计数
  * last_refresh_time - 最后刷新时间

- 完整的测试覆盖
  * 单元测试 (队列、管理器、集成)
  * API 端点验证
  * 性能测试

- 完整的中文文档
  * 任务清单和实现计划
  * API 文档和配置指南
  * 前端集成示例
  * 项目总结

新增文件:
- internal/cmd/api.go - API 服务器
- proxy/prefetch_*.go - 预取实现
- proxy/*_test.go - 测试文件
- 多个 MD 文档

修改文件:
- internal/cmd/*.go - 配置和集成
- proxy/*.go - 缓存集成
- README.md, config.yaml.dist - 文档更新
- 实现动态刷新时间逻辑 (max(TotalTTL*10%, RefreshBefore))
- 引入 Worker Pool 以支持并发预取刷新
- 修复缓存更新 Bug:内部预取请求不再走缓存查找
- 修复 'unpackItem' Bug:TTL 未被正确赋值
- 修复 'processQueue' 批量刷新逻辑:独立检查每个项目的刷新时间
- 修复 'hitTracker' Bug:阈值为 1 时未记录命中次数
- 添加全面的稳定性测试和混合 TTL 测试
1. 修复 prefetch_manager.go 中的刷新时间计算逻辑,避免忙等待。
2. 修复 Worker Pool 满载时丢弃任务的 Bug,改为重新入队。
3. 新增 prefetch_comprehensive_test.go 和 prefetch_ultimate_test.go,覆盖高并发、队列溢出、上游重试等极端场景。
4. 优化预取调度算法,支持动态 TTL 适配。
主要更改:
- 修复 proxycache.go 中的 cache 对象引用错误(关键 bug)
- 清理详细调试日志,保留必要的警告和信息日志
- 添加 CLI 参数支持(args.go, config.go, proxy.go)
- 优化 prefetch_manager.go 日志输出
- 添加 traffic_gen 测试工具

功能验证:
- 真实环境测试确认 TTL 刷新成功
- 所有核心功能验证通过
- 代码已准备好用于生产环境

相关文档:PREFETCH_VERIFICATION_REPORT.md, FINAL_DELIVERY_SUMMARY.md
Copy link

@windsurf-bot windsurf-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other comments (51)
  • proxy/prefetch_benchmark_test.go (22-22) Error handling is missing for the New() function call. If an error occurs, the test will continue with a nil proxy which could cause a panic.
    	p, err := New(config)
    	if err != nil {
    		b.Fatal("Failed to create proxy:", err)
    	}
    
  • proxy/prefetch_benchmark_test.go (52-52) Error handling is missing for the New() function call. If an error occurs, the test will continue with a nil proxy which could cause a panic.
    	p, err := New(config)
    	if err != nil {
    		b.Fatal("Failed to create proxy:", err)
    	}
    
  • proxy/prefetch_manager.go.backup (45-45) The semaphore release in the deferred function is incorrect. It's trying to receive from the channel instead of sending to it.
    			defer func() { pm.semaphore <- struct{}{} }()
    
  • proxy/prefetch_manager.go.backup (96-100) The makeKey function is referenced in line 56 and 104, but its implementation at lines 96-100 is incomplete and not properly formatted as a function. It's using undefined variables (domain, qtype) instead of the parameters that would be passed to the function.
  • proxy/prefetch_manager.go.backup (195-197) The Stats method references atomic counters `totalProcessed`, `totalRefreshed`, and `totalFailed` in lines 195-197, but these fields are not defined in the PrefetchQueueManager struct.
  • proxy/prefetch_manager.go (438-459) There's a redundant deletion of the refreshing flag. The flag is deleted in the deferred function at line 378-382, but then explicitly deleted again at lines 457-459 before the retention logic. This could lead to race conditions if the code between these two deletions panics.
    	// Extract actual TTL from DNS response for accurate re-scheduling
    	var actualTTL uint32
    	if err == nil && dctx.Res != nil {
    		actualTTL = calculateTTL(dctx.Res)
    		pm.logger.Debug("prefetch refresh completed",
    			"domain", item.Domain,
    			"qtype", item.QType,
    			"actual_ttl", actualTTL,
    			"success", true)
    	} else {
    		pm.logger.Debug("prefetch refresh completed",
    			"domain", item.Domain,
    			"qtype", item.QType,
    			"success", false,
    			"error", err)
    	}
    
    	// Hybrid Retention Logic
    	var retentionTime time.Duration
    	var shouldCheck bool
    
  • proxy/prefetch_manager.go (31-31) The `semaphore` field is marked as deprecated in the struct definition (line 31), but it's still being used in the code (lines 289, 641). This inconsistency could lead to confusion and potential bugs if the field is removed in the future.
  • proxy/prefetch_retention_test.go (26-26) The error return value from `New(config)` is being ignored. Consider checking this error to ensure the test setup is valid.
  • proxy/cache.go (147-149) Setting `expired = false` when prefetchEnabled is true might lead to inconsistent behavior. The expired flag should accurately reflect the item's expiration status, and modifying it based on prefetch settings could cause issues with cache consistency. Consider keeping the expired flag accurate and handling prefetch logic separately.
  • internal/cmd/api.go (24-28) Consider adding a WriteTimeout to the HTTP server configuration to prevent potential resource exhaustion from slow clients. For example:
    	srv := &http.Server{
    		Addr:         addr,
    		ReadTimeout:  60 * time.Second,
    		WriteTimeout: 60 * time.Second,
    		Handler:      mux,
    	}
    
  • internal/cmd/api.go (50-51) This endpoint should validate the HTTP method to ensure it only responds to the intended method (likely GET for a stats endpoint):
    	return func(w http.ResponseWriter, r *http.Request) {
    		if r.Method != http.MethodGet {
    			http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
    			return
    		}
    		stats := p.GetPrefetchStats()
    
  • proxy/prefetch_extended_test.go (114-160) This test uses fixed sleep durations (6s, 25s, 30s) totaling 61 seconds, which makes it slow and potentially flaky in CI environments. Consider using a mock time provider that can be advanced programmatically instead of real sleeps to make the test faster and more reliable.
  • proxy/prefetch_extended_test.go (123-139) There are several blocks of commented-out code related to checking upstream calls. These should either be removed or properly implemented if they're important for test verification.
  • proxy/prefetch_extended_test.go (102-102) The test uses `fmt.Println` and `fmt.Printf` for logging. Consider using `t.Logf` instead, which respects test verbosity flags and properly associates output with the specific test.
  • proxy/prefetch_mixed_test.go (125-140) This test relies on specific timing with sleep statements, which could make it flaky in CI environments. Consider making the test more deterministic by either mocking time or using a test-specific clock that can be advanced programmatically.
  • proxy/prefetch_mixed_test.go (133-133) You're discarding the TTL value in some assertions with `ip, _ = query(...)`. Consider checking the TTL values in all cases to ensure the cache behavior is working as expected.
  • proxy/prefetch_real_test.go (16-18) This test runs for 5 minutes by default, which is too long for a standard unit test. Consider adding a `testing.Short()` check to skip this test during normal test runs or mark it with a build tag like `//go:build integration` so it only runs when explicitly requested.
  • proxy/prefetch_real_test.go (59-59) Replace `fmt.Printf` calls with `t.Logf` to follow Go testing conventions. This ensures proper test output formatting and integration with the testing framework.
  • proxy/prefetch_manager.go (590-590) The `hitTracker.cleanup` method uses a hardcoded value of 11 for the expiry calculation (line 590), which doesn't match the comment that says `maxMultiplier + 1`. This could lead to unexpected behavior if `maxMultiplier` changes from its default value of 10.
  • proxy/prefetch_retention_test.go (113-113) Consider using a named constant instead of the magic number `60` for retention time to make the test more self-documenting.
  • proxy/cache.go (372-382) The prefetch logic is duplicated in both `set` and `setWithSubnet` methods. Consider extracting this common logic into a helper method to avoid duplication and make future maintenance easier.
  • proxy/cache.go (358-358) The `skipPrefetch` parameter is added without documentation explaining its purpose. Consider adding a comment explaining when this parameter should be set to true to help users of this API understand its proper usage.
  • internal/cmd/cmd.go (120-120) The `runAPI` function is called without any error handling. Consider capturing and logging any errors that might occur when starting the API server to prevent silent failures.
  • proxy/prefetch_short_ttl_test.go (57-70) The test relies on fixed sleep durations (200ms, 1000ms) which could make it flaky on slower CI systems. Consider using a more deterministic approach, such as mocking the time or using a channel-based notification when processing is complete.
  • proxy/cache_internal_test.go (78-78) A new boolean parameter has been added to the `set` and `setWithSubnet` methods, but there's no indication of what this parameter represents. Consider adding a comment or using a named parameter in the test to clarify its purpose.
  • proxy/cache_prefetch_test.go (52-52) The proxy instance is created but never stopped. Consider adding `defer p.Stop()` after creating the proxy to ensure proper cleanup of resources.
  • proxy/prefetch_queue.go (17-28) The `AcquirePrefetchItem` function doesn't initialize the `AddedTime` field, but `ReleasePrefetchItem` resets it. Consider initializing `AddedTime` to the current time in `AcquirePrefetchItem` to ensure consistent state for all fields.
  • proxy/prefetch_queue.go (163-182) The commented code in the `Update` method suggests that the caller is expected to update the priority before calling this method, but this isn't clearly documented. Consider either uncommenting the priority calculation or adding a clear comment in the method documentation about this requirement.
  • proxy/prefetch_queue.go (152-161) The `Peek` method currently uses a write lock (`mu.Lock`), but since it only reads data and doesn't modify the queue, it should use a read lock (`mu.RLock`) instead to improve concurrency.
  • proxy/proxycache.go (82-82) When calculating the expiration time for prefetching, consider adding a minimum TTL check to avoid immediate expiration for records with TTL=0:
    // Calculate approximate expiration time based on current time and TTL
    ttl := ci.ttl
    if ttl == 0 {
    	ttl = 60 // Use a minimum TTL value to avoid immediate expiration
    }
    expireTime := time.Now().Add(time.Duration(ttl) * time.Second)
    
  • scripts/traffic_gen/traffic_gen.go (11-11) Consider making the server address configurable via command-line flags rather than hardcoding it. This would make the script more flexible for different testing scenarios.
  • scripts/traffic_gen/traffic_gen.go (17-17) The number of iterations is hardcoded to 60. Consider making this configurable via command-line flags to allow for shorter or longer test runs.
  • scripts/traffic_gen/traffic_gen.go (15-15) The script only generates UDP DNS traffic. Consider adding an option to use TCP as well, as this would provide more comprehensive testing for the DNS proxy.
  • proxy/prefetch_bugfix_test.go (96-96) This test relies on a specific sleep duration (1500ms) to wait for the prefetch to occur. This timing-based approach could lead to flaky test results on slower systems or under heavy load. Consider implementing a more deterministic approach, such as using a channel notification from the mock upstream when the second call happens, or polling with a timeout until the expected condition is met.
  • proxy/prefetch_stability_test.go (88-88) Using fixed sleep durations (1500ms) might lead to flaky tests in CI environments where execution timing can vary. Consider using a polling approach with timeout or a test-specific clock that can be controlled deterministically.
  • proxy/prefetch_stability_test.go (112-112) Consider replacing `fmt.Printf` calls with `t.Logf` for test output. This ensures output is only shown when tests fail or when running with verbose flag.
  • proxy/prefetch_real_test.go (20-23) This test depends on an external service (8.8.8.8) which could make the test unreliable. Consider mocking the upstream DNS server or providing a way to use a local DNS server for testing.
  • proxy/prefetch_retention_test.go (49-49) Using fixed sleep durations (like `time.Sleep(200 * time.Millisecond)`) can lead to flaky tests in CI environments. Consider using a polling approach with timeouts to make the tests more reliable.
  • proxy/prefetch_manager_test.go (73-73) Using fixed sleep durations (like `time.Sleep(200 * time.Millisecond)`) can lead to flaky tests, especially in CI environments. Consider using a polling approach with timeout or synchronization mechanisms like channels to make the test more deterministic.
  • proxy/prefetch_manager_test.go (126-126) After closing `startCh`, the test waits for a fixed duration before checking results. This approach can be flaky. Consider using a WaitGroup to ensure all prefetch operations have completed before making assertions.
  • proxy/prefetch_timing_test.go (53-53) There's an inconsistency in the comment for the 'Very Short TTL' test case. The comment states `max(0.2, 5) = 5s` but then mentions `Cap(1) -> 1s`. This suggests there's a capping mechanism that limits the refresh time to 1s for very short TTLs, but this logic isn't clearly explained in the test. Consider adding a comment about this capping behavior or ensuring the comment accurately reflects the actual logic being tested.
  • config_test.yaml (3-4) Consider using English for comments instead of Chinese to improve accessibility for all contributors. For example:
      threshold: 2              # Prefetch after 2 accesses
      threshold_window: 1m      # 1 minute statistics window
    
  • internal/cmd/config.go (206-231) There's potential confusion between the `Prefetch *proxy.PrefetchConfig` field and the individual prefetch configuration fields (PrefetchEnabled, PrefetchBatchSize, etc.). It's not clear how these settings interact - whether the individual fields override the Prefetch struct values or vice versa. Consider either using only the struct or only individual fields for consistency.
  • internal/cmd/cmd.go (119-121) The API server started by `runAPI` might need proper shutdown handling when the program exits. Consider ensuring that the API server is gracefully shut down along with the DNS proxy in the shutdown sequence.
  • proxy/cache_prefetch_test.go (83-84) Directly modifying the cache data bytes is implementation-dependent and fragile. Consider adding a test helper method in the cache implementation that allows setting expiration time for testing purposes.
  • proxy/proxycache.go (76-92) Consider refactoring the prefetch condition check to improve readability by grouping related conditions:
    if d.IsInternalPrefetch {
    	return hit
    }
    
    if p.Config.Prefetch == nil || !p.Config.Prefetch.Enabled || 
       !dctxCache.prefetchEnabled || dctxCache.prefetchManager == nil {
    	return hit
    }
    
    q := d.Req.Question[0]
    
    // CheckThreshold records the hit and returns true if hits >= threshold-1
    if dctxCache.prefetchManager.CheckThreshold(q.Name, q.Qtype, d.ReqECS) {
    	// Calculate approximate expiration time based on current time and TTL
    	expireTime := time.Now().Add(time.Duration(ci.ttl) * time.Second)
    
    	dctxCache.prefetchManager.Add(q.Name, q.Qtype, d.ReqECS, d.CustomUpstreamConfig, expireTime)
    
    	p.logger.Debug("prefetch triggered",
    		"domain", q.Name,
    		"qtype", dns.TypeToString[q.Qtype],
    		"ttl", ci.ttl,
    		"expire_time", expireTime)
    }
    
  • proxy/dnscontext.go (101-102) Consider making `IsInternalPrefetch` a private field (lowercase first letter) with a getter method if external packages need to check this state. This would prevent accidental modification from outside the package that could lead to prefetch loops or incorrect hit counting.
  • proxy/prefetch_bugfix_test.go (25-32) The mutex `muLock` only protects the `callCount` variable, but not the conditional behavior change in the mock upstream. This could potentially lead to race conditions if the mock is accessed concurrently. Consider protecting the entire behavior of the mock with the mutex.
  • config_test.yaml (7-8) These comments should also be in English for consistency:
      batch_size: 0           # 0 = automatic (recommended)
      check_interval: 10s     # Smart scheduling enabled, usually no need to adjust
    
  • proxy/prefetch_mixed_test.go (101-101) Consider replacing fmt.Println/Printf statements with t.Logf for cleaner test output. This allows the output to be shown only when tests fail or when running with verbose mode.
  • proxy/prefetch_short_ttl_test.go (59-63) There are commented-out debugging notes on lines 60-62 that should be removed as they don't add value to understanding the test logic:
    	// Queue should still have the item (not popped for processing)
    	assert.Equal(t, 1, pm.queue.Len(), "Should NOT be processed yet")
    

💡 To request another review, post a new comment with "/windsurf-review".


### 新逻辑(主动)
```
用户查询 -> 域名加入缓存 ->TTL-5秒时主动刷新 -> 缓存始终新鲜 -> 后台刷新
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a syntax error in the markdown where the closing code block for the 'New Logic' section is missing. This causes the design goals section to be incorrectly formatted as part of the code block.

Suggested change
用户查询 -> 域名加入缓存 ->TTL-5秒时主动刷新 -> 缓存始终新鲜 -> 后台刷新
用户查询 -> 域名加入缓存 ->TTL-5秒时主动刷新 -> 缓存始终新鲜 -> 后台刷新

Comment on lines +87 to +89
ip := answer.(*dns.A).A.String()

fmt.Printf("[%s] Query #%d: IP=%s, TTL=%d\n", time.Now().Format("15:04:05"), queries, ip, ttl)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type assertion on line 87 (answer.(*dns.A).A.String()) could panic if the answer is not a DNS A record. Add a type check before the assertion:

Suggested change
ip := answer.(*dns.A).A.String()
fmt.Printf("[%s] Query #%d: IP=%s, TTL=%d\n", time.Now().Format("15:04:05"), queries, ip, ttl)
if a, ok := answer.(*dns.A); ok {
ip := a.A.String()
fmt.Printf("[%s] Query #%d: IP=%s, TTL=%d\n", time.Now().Format("15:04:05"), queries, ip, ttl)
} else {
t.Logf("Unexpected answer type: %T", answer)
}

pc.BatchSize = conf.PrefetchBatchSize
}
if conf.PrefetchCheckInterval > 0 {
pc.CheckInterval = time.Duration(conf.PrefetchCheckInterval)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time.Duration conversions are missing time units. When converting from numeric types to time.Duration, you need to specify the unit (e.g., time.Second, time.Millisecond). Without units, these values will be interpreted as nanoseconds.

Suggested change
pc.CheckInterval = time.Duration(conf.PrefetchCheckInterval)
pc.CheckInterval = time.Duration(conf.PrefetchCheckInterval) * time.Second

Similar changes should be made for RefreshBefore and ThresholdWindow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant