A Python script for analyzing Adobe Experience Manager (AEM) as a Cloud Service CDN logs. This tool filters out bot traffic, system requests, and errors to provide meaningful insights into your CDN performance and usage patterns.
- Filters out bot traffic (Google, Bing, Facebook, etc.)
- Excludes system requests, health checks, and DDoS attacks
- Analyzes traffic patterns by hour, IP, URL, and country
- Calculates cache hit ratios and average TTFB
- Generates detailed CSV reports with URL counts and top user agents
- Supports both HTML and JSON content type analysis
- Python 3.x (comes pre-installed on macOS)
- CDN log files from Adobe Cloud Manager in JSON format
git clone https://github.com/ericvangeem/aem-cdn-usage.git
cd aem-cdn-usagepython3 -m venv venvsource venv/bin/activateYou should see (venv) appear at the beginning of your terminal prompt.
pip install pandaspython analyze_logs.py <path-to-log-file>python analyze_logs.py cdn-logs-2024-11-17.logThe script generates two types of output:
Displays comprehensive analytics including:
- Total requests (after filtering)
- Requests per hour
- Top 50 IP addresses
- Top 100 URLs
- Top 50 User Agents
- HTTP status code distribution
- Average Time to First Byte (TTFB)
- Cache hit ratio (HIT/MISS/PASS)
- Country distribution
- Content type distribution
Generates a file named requested_urls_<input-filename>.csv containing:
- Each unique URL
- Request count for each URL
- Top 5 user agents accessing each URL
- Count for each user agent
The script expects CDN log files in JSON format with one JSON object per line. Each log entry should contain fields like:
url- The requested URLstatus- HTTP status coderes_ctype- Response content typecli_ip- Client IP addresscli_country- Client countrycache- Cache status (HIT/MISS/PASS)ttfb- Time to First Bytereq_ua- User agent stringddos- DDoS flagtimestamp- Request timestamp
The script automatically excludes:
- Non-HTML/JSON content types
- HTTP status codes ≥ 300 (redirects, errors)
- Requests to
/libs/*paths - DDoS flagged requests
- Health check requests (
/system/probes/health) - Skyline service warmup requests
- Known bot traffic (Google, Bing, Facebook, etc.)
manifest.jsonandfavicon.icorequests
When you're done analyzing logs:
deactivatecd ~/path/to/aem-cdn-usage
source venv/bin/activate
python analyze_logs.py your-log-file.logMake sure you've activated the virtual environment:
source venv/bin/activateCheck that your log file path is correct. Use the full path if needed:
python analyze_logs.py /full/path/to/your/logfile.logYou must provide a log file as an argument:
python analyze_logs.py cdn-logs.log- Log in to Adobe Cloud Manager
- Navigate to your environment
- Go to the "Logs" section
- Select "CDN" as the log type
- Download the log files (they typically come as
.log.gzfiles) - Unzip the files:
gunzip cdn-logs.log.gz - Run the analysis script on the unzipped
.logfile
MIT
Eric Van Geem