Add website import functionality with secure headless browser support and image URL preservation #8

Copilot · 2025-09-16T19:43:19Z

This PR implements the ability to import websites as notes using the new ml-notes import-url <url> command. The feature uses a headless browser to load web pages, wait for dynamic content, and convert the HTML to clean markdown format.

New Features

Core Import Command

ml-notes import-url <url> - Import any website as a new note
Title Extraction - Automatically uses the page title as the note title
HTML to Markdown - Clean conversion with smart content filtering
Dynamic Content Support - Waits for JavaScript execution and page loading

Content Processing

The import process intelligently extracts the main content by:

Prioritizing semantic elements (<article>, <main>, [role="main"])
Falling back to common content classes (.content, .post-content, etc.)
Filtering out navigation, sidebars, footers, scripts, and styles
Cleaning up excessive whitespace and formatting
Preserving Original Image URLs - Converts relative image URLs to absolute URLs while maintaining external/CDN links unchanged

Integration with Existing Features

Tagging Support - Use --tags "tag1,tag2" for manual tagging
Auto-tagging - Use --auto-tag for AI-powered tag generation (requires Ollama)
Search Integration - Imported notes are automatically indexed for semantic search
Timeout Control - Configure page load timeout with --timeout 30s

Usage Examples

# Basic website import
ml-notes import-url https://blog.example.com/article

# Import with manual tags
ml-notes import-url https://docs.example.com --tags "docs,reference"

# Import with AI auto-tagging
ml-notes import-url https://example.com/tutorial --auto-tag

# Import with custom timeout for slow sites
ml-notes import-url https://heavy-site.com --timeout 60s

Technical Implementation

Dependencies

chromedp - Headless Chrome browser automation
html-to-markdown - Clean HTML to markdown conversion with customizable rules

Secure Browser Configuration

The implementation uses a security-first approach for Chrome configuration:

SSL/TLS Validation - Full certificate validation for secure connections
Web Security Enabled - Standard browser security measures maintained
Smart Sandbox Management - Only disables sandbox in CI/container environments when necessary
Environment Detection - Automatically detects CI, Docker, and AppArmor restrictions
Modern User Agent - Uses current Chrome user agent for better compatibility

Image URL Preservation

To maintain visual fidelity of imported content:

Relative to Absolute Conversion - Converts /logo.png to https://example.com/logo.png
External URLs Unchanged - Preserves CDN and external image links
Protocol Preservation - Maintains HTTPS for secure sites
Complex Path Resolution - Handles ../images/photo.jpg style relative paths
Fallback Protection - Gracefully handles malformed URLs

Error Handling

Comprehensive error handling for:

Invalid URLs and malformed input
Network connectivity issues
Page load timeouts
Empty or inaccessible content
Browser startup failures

Security Features

The browser configuration prioritizes security when interacting with live websites:

Certificate Validation - Validates SSL certificates and respects certificate pinning
Sandbox Protection - Uses Chrome's sandbox when possible for isolation
Conditional Security Bypass - Only applies minimal security bypasses in restricted environments (CI/containers)
Normal Browser Behavior - Acts like a standard secure browser for desktop usage

Documentation

Added comprehensive documentation for the new feature:

README.md - Added "Website Import" to features list and detailed CLI usage section with examples
USAGE_GUIDE.md - Added import-url to Essential Commands Summary table
Command Help - Comprehensive help text with usage examples and feature descriptions

Testing

Added comprehensive test coverage including:

URL validation testing
Markdown content cleaning functionality
Environment detection validation
Image URL resolution testing
Integration tests with static and dynamic content
Edge case testing (timeouts, invalid URLs, missing files)
Regression testing to ensure existing functionality remains intact

All existing tests continue to pass, ensuring no breaking changes to the existing codebase.

Example Output

🌐 Importing from: https://example.com/article
📄 Page title: How to Use ML Notes
📝 Content extracted (1,247 characters)
🏷️  Auto-generated tags: tutorial, documentation, ml

✅ Note imported successfully!
ID: 42
Title: How to Use ML Notes
Tags: tutorial, documentation, ml
Created: 2025-09-16 14:30:15
Source: https://example.com/article

This feature enables users to quickly capture web content as structured notes while maintaining the rich formatting, visual fidelity, and searchability that ml-notes provides, with robust security measures for safe interaction with live websites.

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Co-authored-by: streed <805140+streed@users.noreply.github.com>

streed · 2025-09-16T20:04:42Z

@copilot we should use secure settings for the headless chrome. We will be interacting with live websites and we should be acting as a normal browser here.

Co-authored-by: streed <805140+streed@users.noreply.github.com>

Copilot · 2025-09-16T20:10:19Z

@copilot we should use secure settings for the headless chrome. We will be interacting with live websites and we should be acting as a normal browser here.

I've implemented secure Chrome configuration settings that act like a normal browser when interacting with live websites. The changes include:

Security Improvements:

✅ Enabled SSL certificate validation (removed ignore-certificate-errors flags)
✅ Enabled web security (removed disable-web-security flag)
✅ Smart sandbox detection - only disables sandbox in CI/container environments
✅ Updated to modern Chrome user agent

Environment Detection:
The browser now automatically detects restricted environments (CI, Docker, AppArmor) and only applies minimal security bypasses when necessary. For normal desktop usage, it maintains full security.

Commit: a25587c

This ensures ml-notes import behaves securely when accessing real websites while still working in CI environments.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

accounts.google.com
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=7666 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED491856126 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,1888937985379864243,17653849207025809653,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --disable-REDACTED-networking --disable-breakpad --disable-extensions --metrics-recording-only --safebrowsing-disable-auto-update --mute-audio --force-color-profile=srgb --enable-automation --disable-gpu --enable-unsafe-swiftshader --no-default-browser-check --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --disable-popup-blocking --no-sandbox --disable-REDACTEDing-occluded-windows --disable-dev-shm-usage --disable-renderer-REDACTEDing --no-first-run --hide-scrollbars --disable-default-apps --disable-ipc-flooding-protection --disable-prompt-on-repost --enable-features=NetworkService,NetworkServiceInProcess --disable-sync --use-mock-keychain --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --headless --disable-hang-monitor --password-store=basic --user-data-dir=/tmp/chromedp-REDACTED491856126 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
clients2.google.com
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=7666 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED491856126 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,1888937985379864243,17653849207025809653,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --disable-REDACTED-networking --disable-breakpad --disable-extensions --metrics-recording-only --safebrowsing-disable-auto-update --mute-audio --force-color-profile=srgb --enable-automation --disable-gpu --enable-unsafe-swiftshader --no-default-browser-check --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --disable-popup-blocking --no-sandbox --disable-REDACTEDing-occluded-windows --disable-dev-shm-usage --disable-renderer-REDACTEDing --no-first-run --hide-scrollbars --disable-default-apps --disable-ipc-flooding-protection --disable-prompt-on-repost --enable-features=NetworkService,NetworkServiceInProcess --disable-sync --use-mock-keychain --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --headless --disable-hang-monitor --password-store=basic --user-data-dir=/tmp/chromedp-REDACTED491856126 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
www.google.com
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=7666 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED491856126 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,1888937985379864243,17653849207025809653,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --disable-REDACTED-networking --disable-breakpad --disable-extensions --metrics-recording-only --safebrowsing-disable-auto-update --mute-audio --force-color-profile=srgb --enable-automation --disable-gpu --enable-unsafe-swiftshader --no-default-browser-check --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --disable-popup-blocking --no-sandbox --disable-REDACTEDing-occluded-windows --disable-dev-shm-usage --disable-renderer-REDACTEDing --no-first-run --hide-scrollbars --disable-default-apps --disable-ipc-flooding-protection --disable-prompt-on-repost --enable-features=NetworkService,NetworkServiceInProcess --disable-sync --use-mock-keychain --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --headless --disable-hang-monitor --password-store=basic --user-data-dir=/tmp/chromedp-REDACTED491856126 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

streed · 2025-09-16T20:14:41Z

@copilot image links should still point to their original links to maintain the look of the note as close to the website as possible.

Co-authored-by: streed <805140+streed@users.noreply.github.com>

Copilot · 2025-09-16T20:26:21Z

@copilot image links should still point to their original links to maintain the look of the note as close to the website as possible.

I've implemented proper image URL preservation that maintains the original website's appearance. The solution includes:

URL Resolution Features:

✅ Converts relative URLs (/logo.png) to absolute URLs (https://example.com/logo.png)
✅ Preserves HTTPS for secure sites, HTTP for non-secure sites
✅ Leaves external/CDN URLs completely unchanged
✅ Handles protocol-relative URLs (//cdn.example.com/image.png)
✅ Resolves complex relative paths (../images/photo.jpg)

Implementation:

Custom markdown conversion rule for <img> tags
Uses Go's url.ResolveReference() for proper URL handling
Comprehensive test coverage for various URL scenarios

Result: All images now point to their original sources, maintaining the visual fidelity of the imported website content.

Commit: 181def2

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

accounts.google.com
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=5920 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED2276477603 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,8815604363791483137,2765502903778978695,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --no-default-browser-check --disable-breakpad --disable-default-apps --disable-hang-monitor --disable-renderer-REDACTEDing --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --disable-REDACTEDing-occluded-windows --use-mock-keychain --disable-REDACTED-networking --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-prompt-on-repost --disable-sync --force-color-profile=srgb --hide-scrollbars --disable-dev-shm-usage --disable-extensions --disable-gpu --enable-features=NetworkService,NetworkServiceInProcess --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --password-store=basic --mute-audio --metrics-recording-only --safebrowsing-disable-auto-update --no-sandbox --no-first-run --disable-popup-blocking --enable-automation --headless --disable-ipc-flooding-protection --enable-unsafe-swiftshader --user-data-dir=/tmp/chromedp-REDACTED2276477603 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=6135 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED3648146207 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,11200105158007422015,1384752722888722307,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
android.clients.google.com
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=6135 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED3648146207 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,11200105158007422015,1384752722888722307,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --disable-breakpad --disable-hang-monitor --enable-features=NetworkService,NetworkServiceInProcess --disable-dev-shm-usage --disable-extensions --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --mute-audio --disable-popup-blocking --disable-prompt-on-repost --safebrowsing-disable-auto-update --enable-automation --headless --disable-default-apps --disable-ipc-flooding-protection --disable-sync --force-color-profile=srgb --metrics-recording-only --password-store=basic --disable-renderer-REDACTEDing --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --disable-client-side-phishing-detection --use-mock-keychain --no-sandbox --no-first-run --hide-scrollbars --disable-REDACTED-networking --disable-gpu --disable-REDACTEDing-occluded-windows --enable-unsafe-swiftshader --no-default-browser-check --disable-REDACTED-timer-throttling --user-data-dir=/tmp/chromedp-REDACTED3648146207 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=8874 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED2564501405 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,10977512617465755840,17708370500840894483,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
clients2.google.com
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=5920 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED2276477603 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,8815604363791483137,2765502903778978695,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --no-default-browser-check --disable-breakpad --disable-default-apps --disable-hang-monitor --disable-renderer-REDACTEDing --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --disable-REDACTEDing-occluded-windows --use-mock-keychain --disable-REDACTED-networking --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-prompt-on-repost --disable-sync --force-color-profile=srgb --hide-scrollbars --disable-dev-shm-usage --disable-extensions --disable-gpu --enable-features=NetworkService,NetworkServiceInProcess --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --password-store=basic --mute-audio --metrics-recording-only --safebrowsing-disable-auto-update --no-sandbox --no-first-run --disable-popup-blocking --enable-automation --headless --disable-ipc-flooding-protection --enable-unsafe-swiftshader --user-data-dir=/tmp/chromedp-REDACTED2276477603 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=6135 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED3648146207 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,11200105158007422015,1384752722888722307,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
example.com
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=5920 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED2276477603 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,8815604363791483137,2765502903778978695,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --no-default-browser-check --disable-breakpad --disable-default-apps --disable-hang-monitor --disable-renderer-REDACTEDing --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --disable-REDACTEDing-occluded-windows --use-mock-keychain --disable-REDACTED-networking --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-prompt-on-repost --disable-sync --force-color-profile=srgb --hide-scrollbars --disable-dev-shm-usage --disable-extensions --disable-gpu --enable-features=NetworkService,NetworkServiceInProcess --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --password-store=basic --mute-audio --metrics-recording-only --safebrowsing-disable-auto-update --no-sandbox --no-first-run --disable-popup-blocking --enable-automation --headless --disable-ipc-flooding-protection --enable-unsafe-swiftshader --user-data-dir=/tmp/chromedp-REDACTED2276477603 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=6135 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED3648146207 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,11200105158007422015,1384752722888722307,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
www.google.com
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=5920 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED2276477603 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,8815604363791483137,2765502903778978695,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --no-default-browser-check --disable-breakpad --disable-default-apps --disable-hang-monitor --disable-renderer-REDACTEDing --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --disable-REDACTEDing-occluded-windows --use-mock-keychain --disable-REDACTED-networking --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-prompt-on-repost --disable-sync --force-color-profile=srgb --hide-scrollbars --disable-dev-shm-usage --disable-extensions --disable-gpu --enable-features=NetworkService,NetworkServiceInProcess --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --password-store=basic --mute-audio --metrics-recording-only --safebrowsing-disable-auto-update --no-sandbox --no-first-run --disable-popup-blocking --enable-automation --headless --disable-ipc-flooding-protection --enable-unsafe-swiftshader --user-data-dir=/tmp/chromedp-REDACTED2276477603 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
- Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=6135 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED3648146207 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,11200105158007422015,1384752722888722307,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

streed · 2025-09-16T20:42:33Z

@copilot let's rename the import command to import-url and add documentation where needed to cover this new feature.

Co-authored-by: streed <805140+streed@users.noreply.github.com>

Copilot · 2025-09-16T20:49:37Z

@copilot let's rename the import command to import-url and add documentation where needed to cover this new feature.

I've renamed the command from import to import-url and added comprehensive documentation as requested.

Command Changes:

✅ Renamed: ml-notes import <url> → ml-notes import-url <url>
✅ Updated all help text and examples

Documentation Added:

✅ README.md: Added "Website Import" to features list and comprehensive CLI usage section
✅ USAGE_GUIDE.md: Added import-url to Essential Commands Summary table
✅ Examples: Basic import, tagging, auto-tagging, and timeout scenarios
✅ Features: Documented headless browser, content extraction, image preservation, and security

Usage Examples:

ml-notes import-url https://blog.example.com/article
ml-notes import-url https://docs.example.com --tags "docs,reference"
ml-notes import-url https://example.com --auto-tag

Commit: eb4347d

Copilot

Pull Request Overview

This PR adds website import functionality to ml-notes, enabling users to import web pages as notes using the new import-url command. The feature uses a headless Chrome browser to handle dynamic content and converts HTML to clean markdown format while preserving image URLs.

Implements ml-notes import-url <url> command with Chrome automation
Adds secure headless browser support with environment-specific security configurations
Preserves image URLs by converting relative paths to absolute URLs

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
go.mod	Updates Go version and adds new dependencies for HTML parsing and Chrome automation
docs/USAGE_GUIDE.md	Adds import-url command to the essential commands summary table
cmd/import_test.go	Comprehensive test coverage for URL validation, content cleaning, and environment detection
cmd/import.go	Core implementation of website import functionality with headless browser support
README.md	Documents the new website import feature with usage examples and feature descriptions

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Initial plan

a8c4052

Copilot AI assigned Copilot and streed Sep 16, 2025

Copilot started work on behalf of streed September 16, 2025 19:43 View session

Copilot AI and others added 2 commits September 16, 2025 19:51

Implement website import functionality with headless browser

629fcb1

Co-authored-by: streed <805140+streed@users.noreply.github.com>

Add tests and finalize website import feature

7104a1d

Co-authored-by: streed <805140+streed@users.noreply.github.com>

Copilot stopped work on behalf of streed due to an error September 16, 2025 19:54
Copilot has encountered an error. See logs for additional details.

Copilot AI requested a review from streed September 16, 2025 19:54

Copilot started work on behalf of streed September 16, 2025 20:04 View session

Implement secure Chrome configuration for website import

a25587c

Co-authored-by: streed <805140+streed@users.noreply.github.com>

Copilot AI changed the title ~~Add website import functionality with headless browser support~~ Add website import functionality with secure headless browser support Sep 16, 2025

Copilot finished work on behalf of streed September 16, 2025 20:11

Copilot started work on behalf of streed September 16, 2025 20:14 View session

Preserve original image URLs in imported website content

181def2

Co-authored-by: streed <805140+streed@users.noreply.github.com>

Copilot AI changed the title ~~Add website import functionality with secure headless browser support~~ Add website import functionality with secure headless browser support and image URL preservation Sep 16, 2025

Copilot finished work on behalf of streed September 16, 2025 20:27

Copilot started work on behalf of streed September 16, 2025 20:42 View session

Rename import command to import-url and add documentation

eb4347d

Co-authored-by: streed <805140+streed@users.noreply.github.com>

Copilot finished work on behalf of streed September 16, 2025 20:50

streed marked this pull request as ready for review September 16, 2025 20:56

Copilot AI review requested due to automatic review settings September 16, 2025 20:56

Copilot AI reviewed Sep 16, 2025

View reviewed changes

streed merged commit cc7a234 into main Sep 16, 2025
8 checks passed

streed deleted the copilot/fix-1391f1db-d7fa-4f9e-889d-1aaf714fb26e branch September 16, 2025 20:58

This was referenced Sep 17, 2025

Lets Convert To Wails #9

Merged

Update Installer Scripts #10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add website import functionality with secure headless browser support and image URL preservation #8

Add website import functionality with secure headless browser support and image URL preservation #8

Uh oh!

Copilot AI commented Sep 16, 2025 •

edited

Loading

Uh oh!

streed commented Sep 16, 2025

Uh oh!

Copilot AI commented Sep 16, 2025 •

edited

Loading

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

streed commented Sep 16, 2025

Uh oh!

Copilot AI commented Sep 16, 2025 •

edited

Loading

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

streed commented Sep 16, 2025

Uh oh!

Copilot AI commented Sep 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add website import functionality with secure headless browser support and image URL preservation #8

Add website import functionality with secure headless browser support and image URL preservation #8

Uh oh!

Conversation

Copilot AI commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Features

Core Import Command

Content Processing

Integration with Existing Features

Usage Examples

Technical Implementation

Dependencies

Secure Browser Configuration

Image URL Preservation

Error Handling

Security Features

Documentation

Testing

Example Output

Uh oh!

streed commented Sep 16, 2025

Uh oh!

Copilot AI commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

streed commented Sep 16, 2025

Uh oh!

Copilot AI commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

streed commented Sep 16, 2025

Uh oh!

Copilot AI commented Sep 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Sep 16, 2025 •

edited

Loading

Copilot AI commented Sep 16, 2025 •

edited

Loading

Copilot AI commented Sep 16, 2025 •

edited

Loading