Skip to content

Conversation

@rlaope
Copy link
Owner

@rlaope rlaope commented Jan 27, 2026

Summary

This PR implements Phase 3-5 of the Argus JVM Profiler expansion plan:

Phase 3: GC Deep Dive

  • Allocation Rate Tracking: Monitor object allocation rate via jdk.ObjectAllocationInNewTLAB, identify top allocating classes
  • Metaspace Monitoring: Track metaspace usage and growth rate via jdk.MetaspaceSummary
  • GC Overhead Calculation: Calculate GC overhead percentage with automatic warnings when > 10%

Phase 4: CPU/Thread Deep Dive

  • Method Profiling: Hot method detection via jdk.ExecutionSample (configurable sampling interval)
  • Lock Contention Analysis: Monitor thread synchronization bottlenecks via jdk.JavaMonitorEnter and jdk.JavaMonitorWait

Phase 5: Correlation Analysis

  • GC ↔ CPU Correlation: Detect CPU spikes within 1 second of GC events
  • GC ↔ Pinning Correlation: Identify pinning increases during garbage collection
  • Automatic Recommendations: Actionable insights for:
    • High GC overhead (> 10%)
    • Suspected memory leaks (sustained heap growth)
    • Lock contention hotspots
    • High allocation rates
    • Metaspace growth warnings

New Files (13)

argus-core:

  • AllocationEvent.java - TLAB allocation event record
  • MetaspaceEvent.java - Metaspace usage event record
  • ExecutionSampleEvent.java - CPU profiling sample record
  • ContentionEvent.java - Lock contention event record

argus-agent:

  • AllocationEventExtractor.java - JFR allocation extractor
  • MetaspaceEventExtractor.java - JFR metaspace extractor
  • ExecutionSampleExtractor.java - JFR execution sample extractor
  • ContentionEventExtractor.java - JFR contention extractor

argus-server:

  • AllocationAnalyzer.java - Allocation rate analysis
  • MetaspaceAnalyzer.java - Metaspace tracking
  • MethodProfilingAnalyzer.java - Hot methods analysis
  • ContentionAnalyzer.java - Lock contention analysis
  • CorrelationAnalyzer.java - Cross-metric correlation

New API Endpoints

Endpoint Description
/allocation-analysis Allocation rate and top allocating classes
/metaspace-metrics Metaspace usage and growth
/method-profiling Hot methods (Top 20)
/contention-analysis Lock contention hotspots
/correlation Correlation analysis and recommendations

New Configuration Options

Property Default Description
argus.allocation.enabled true Enable allocation tracking
argus.allocation.threshold 1024 Minimum allocation size (bytes)
argus.metaspace.enabled true Enable metaspace monitoring
argus.profiling.enabled false Enable method profiling (higher overhead)
argus.profiling.interval 20 Profiling sampling interval (ms)
argus.contention.enabled true Enable lock contention tracking
argus.contention.threshold 10 Minimum contention duration (ms)
argus.correlation.enabled true Enable correlation analysis

Test plan

  • Build passes: ./gradlew build
  • Run MetricsDemo and verify new endpoints return data
  • Verify allocation-analysis shows top allocating classes
  • Verify metaspace-metrics shows memory usage
  • Verify method-profiling shows hot methods (when enabled)
  • Verify contention-analysis shows lock hotspots
  • Verify correlation endpoint returns correlations and recommendations
  • Verify dashboard displays new charts and sections

🤖 Generated with Claude Code

rlaope and others added 6 commits January 27, 2026 10:12
Phase 3: GC Deep Dive
- Add allocation rate tracking (jdk.ObjectAllocationInNewTLAB)
- Add metaspace monitoring (jdk.MetaspaceSummary)
- Add GC overhead calculation with warnings

Phase 4: CPU/Thread Deep Dive
- Add method profiling via execution sampling (jdk.ExecutionSample)
- Add lock contention analysis (jdk.JavaMonitorEnter/Wait)

Phase 5: Correlation Analysis
- Add GC ↔ CPU spike correlation detection
- Add GC ↔ Pinning correlation detection
- Add automatic recommendations (memory leak, contention hotspots, etc.)

New API endpoints:
- /allocation-analysis
- /metaspace-metrics
- /method-profiling
- /contention-analysis
- /correlation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix CorrelationAnalyzer constructor call (no args)
- Pass correlationAnalyzer to EventBroadcaster

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- allocation.enabled: true -> false (can generate millions of events)
- allocation.threshold: 1KB -> 1MB (higher threshold reduces events)
- contention.enabled: true -> false (opt-in feature)
- contention.threshold: 10ms -> 50ms (reduces noise)

These features have high overhead and should be explicitly enabled.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change heap from 128MB to 512MB
- JFR + Netty + app requires more memory than 128MB

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- allocation.enabled: false (was true)
- allocation.threshold: 1MB (was 1KB)
- contention.enabled: false (was true)
- contention.threshold: 50ms (was 10ms)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@rlaope rlaope self-assigned this Jan 27, 2026
@rlaope rlaope added the enhancement New feature or request label Jan 27, 2026
- Add jdk.ObjectAllocationOutsideTLAB event capture
- Large objects (>TLAB size) are allocated outside TLAB and were not
  being tracked, causing allocation analysis to show 0 despite heavy
  allocation activity
- Both TLAB and outside-TLAB events now use same handler
- Update docs with advanced profiling overhead information
- Add runMetricsDemoFull gradle task for full profiling demo

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@rlaope rlaope merged commit c2829d8 into master Jan 27, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants