Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Sep 16, 2025

This PR implements the ability to import websites as notes using the new ml-notes import-url <url> command. The feature uses a headless browser to load web pages, wait for dynamic content, and convert the HTML to clean markdown format.

New Features

Core Import Command

  • ml-notes import-url <url> - Import any website as a new note
  • Title Extraction - Automatically uses the page title as the note title
  • HTML to Markdown - Clean conversion with smart content filtering
  • Dynamic Content Support - Waits for JavaScript execution and page loading

Content Processing

The import process intelligently extracts the main content by:

  • Prioritizing semantic elements (<article>, <main>, [role="main"])
  • Falling back to common content classes (.content, .post-content, etc.)
  • Filtering out navigation, sidebars, footers, scripts, and styles
  • Cleaning up excessive whitespace and formatting
  • Preserving Original Image URLs - Converts relative image URLs to absolute URLs while maintaining external/CDN links unchanged

Integration with Existing Features

  • Tagging Support - Use --tags "tag1,tag2" for manual tagging
  • Auto-tagging - Use --auto-tag for AI-powered tag generation (requires Ollama)
  • Search Integration - Imported notes are automatically indexed for semantic search
  • Timeout Control - Configure page load timeout with --timeout 30s

Usage Examples

# Basic website import
ml-notes import-url https://blog.example.com/article

# Import with manual tags
ml-notes import-url https://docs.example.com --tags "docs,reference"

# Import with AI auto-tagging
ml-notes import-url https://example.com/tutorial --auto-tag

# Import with custom timeout for slow sites
ml-notes import-url https://heavy-site.com --timeout 60s

Technical Implementation

Dependencies

  • chromedp - Headless Chrome browser automation
  • html-to-markdown - Clean HTML to markdown conversion with customizable rules

Secure Browser Configuration

The implementation uses a security-first approach for Chrome configuration:

  • SSL/TLS Validation - Full certificate validation for secure connections
  • Web Security Enabled - Standard browser security measures maintained
  • Smart Sandbox Management - Only disables sandbox in CI/container environments when necessary
  • Environment Detection - Automatically detects CI, Docker, and AppArmor restrictions
  • Modern User Agent - Uses current Chrome user agent for better compatibility

Image URL Preservation

To maintain visual fidelity of imported content:

  • Relative to Absolute Conversion - Converts /logo.png to https://example.com/logo.png
  • External URLs Unchanged - Preserves CDN and external image links
  • Protocol Preservation - Maintains HTTPS for secure sites
  • Complex Path Resolution - Handles ../images/photo.jpg style relative paths
  • Fallback Protection - Gracefully handles malformed URLs

Error Handling

Comprehensive error handling for:

  • Invalid URLs and malformed input
  • Network connectivity issues
  • Page load timeouts
  • Empty or inaccessible content
  • Browser startup failures

Security Features

The browser configuration prioritizes security when interacting with live websites:

  • Certificate Validation - Validates SSL certificates and respects certificate pinning
  • Sandbox Protection - Uses Chrome's sandbox when possible for isolation
  • Conditional Security Bypass - Only applies minimal security bypasses in restricted environments (CI/containers)
  • Normal Browser Behavior - Acts like a standard secure browser for desktop usage

Documentation

Added comprehensive documentation for the new feature:

  • README.md - Added "Website Import" to features list and detailed CLI usage section with examples
  • USAGE_GUIDE.md - Added import-url to Essential Commands Summary table
  • Command Help - Comprehensive help text with usage examples and feature descriptions

Testing

Added comprehensive test coverage including:

  • URL validation testing
  • Markdown content cleaning functionality
  • Environment detection validation
  • Image URL resolution testing
  • Integration tests with static and dynamic content
  • Edge case testing (timeouts, invalid URLs, missing files)
  • Regression testing to ensure existing functionality remains intact

All existing tests continue to pass, ensuring no breaking changes to the existing codebase.

Example Output

🌐 Importing from: https://example.com/article
📄 Page title: How to Use ML Notes
📝 Content extracted (1,247 characters)
🏷️  Auto-generated tags: tutorial, documentation, ml

✅ Note imported successfully!
ID: 42
Title: How to Use ML Notes
Tags: tutorial, documentation, ml
Created: 2025-09-16 14:30:15
Source: https://example.com/article

This feature enables users to quickly capture web content as structured notes while maintaining the rich formatting, visual fidelity, and searchability that ml-notes provides, with robust security measures for safe interaction with live websites.


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Copilot AI and others added 2 commits September 16, 2025 19:51
Co-authored-by: streed <805140+streed@users.noreply.github.com>
Co-authored-by: streed <805140+streed@users.noreply.github.com>
Copilot AI changed the title [WIP] I want to be able to import a website. I should be able to provide a url to a cli command such as ml-notes import <url> and it'll create a new note based on the title of the page, and the body of the page should be converted to markdown as much as po... Add website import functionality with headless browser support Sep 16, 2025
Copilot AI requested a review from streed September 16, 2025 19:54
@streed
Copy link
Owner

streed commented Sep 16, 2025

@copilot we should use secure settings for the headless chrome. We will be interacting with live websites and we should be acting as a normal browser here.

Co-authored-by: streed <805140+streed@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Sep 16, 2025

@copilot we should use secure settings for the headless chrome. We will be interacting with live websites and we should be acting as a normal browser here.

I've implemented secure Chrome configuration settings that act like a normal browser when interacting with live websites. The changes include:

Security Improvements:

  • ✅ Enabled SSL certificate validation (removed ignore-certificate-errors flags)
  • ✅ Enabled web security (removed disable-web-security flag)
  • ✅ Smart sandbox detection - only disables sandbox in CI/container environments
  • ✅ Updated to modern Chrome user agent

Environment Detection:
The browser now automatically detects restricted environments (CI, Docker, AppArmor) and only applies minimal security bypasses when necessary. For normal desktop usage, it maintains full security.

Commit: a25587c

This ensures ml-notes import behaves securely when accessing real websites while still working in CI environments.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • accounts.google.com
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=7666 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED491856126 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,1888937985379864243,17653849207025809653,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --disable-REDACTED-networking --disable-breakpad --disable-extensions --metrics-recording-only --safebrowsing-disable-auto-update --mute-audio --force-color-profile=srgb --enable-automation --disable-gpu --enable-unsafe-swiftshader --no-default-browser-check --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --disable-popup-blocking --no-sandbox --disable-REDACTEDing-occluded-windows --disable-dev-shm-usage --disable-renderer-REDACTEDing --no-first-run --hide-scrollbars --disable-default-apps --disable-ipc-flooding-protection --disable-prompt-on-repost --enable-features=NetworkService,NetworkServiceInProcess --disable-sync --use-mock-keychain --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --headless --disable-hang-monitor --password-store=basic --user-data-dir=/tmp/chromedp-REDACTED491856126 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
  • clients2.google.com
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=7666 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED491856126 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,1888937985379864243,17653849207025809653,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --disable-REDACTED-networking --disable-breakpad --disable-extensions --metrics-recording-only --safebrowsing-disable-auto-update --mute-audio --force-color-profile=srgb --enable-automation --disable-gpu --enable-unsafe-swiftshader --no-default-browser-check --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --disable-popup-blocking --no-sandbox --disable-REDACTEDing-occluded-windows --disable-dev-shm-usage --disable-renderer-REDACTEDing --no-first-run --hide-scrollbars --disable-default-apps --disable-ipc-flooding-protection --disable-prompt-on-repost --enable-features=NetworkService,NetworkServiceInProcess --disable-sync --use-mock-keychain --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --headless --disable-hang-monitor --password-store=basic --user-data-dir=/tmp/chromedp-REDACTED491856126 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
  • www.google.com
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=7666 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED491856126 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,1888937985379864243,17653849207025809653,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --disable-REDACTED-networking --disable-breakpad --disable-extensions --metrics-recording-only --safebrowsing-disable-auto-update --mute-audio --force-color-profile=srgb --enable-automation --disable-gpu --enable-unsafe-swiftshader --no-default-browser-check --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --disable-popup-blocking --no-sandbox --disable-REDACTEDing-occluded-windows --disable-dev-shm-usage --disable-renderer-REDACTEDing --no-first-run --hide-scrollbars --disable-default-apps --disable-ipc-flooding-protection --disable-prompt-on-repost --enable-features=NetworkService,NetworkServiceInProcess --disable-sync --use-mock-keychain --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --headless --disable-hang-monitor --password-store=basic --user-data-dir=/tmp/chromedp-REDACTED491856126 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI changed the title Add website import functionality with headless browser support Add website import functionality with secure headless browser support Sep 16, 2025
@streed
Copy link
Owner

streed commented Sep 16, 2025

@copilot image links should still point to their original links to maintain the look of the note as close to the website as possible.

Co-authored-by: streed <805140+streed@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Sep 16, 2025

@copilot image links should still point to their original links to maintain the look of the note as close to the website as possible.

I've implemented proper image URL preservation that maintains the original website's appearance. The solution includes:

URL Resolution Features:

  • ✅ Converts relative URLs (/logo.png) to absolute URLs (https://example.com/logo.png)
  • ✅ Preserves HTTPS for secure sites, HTTP for non-secure sites
  • ✅ Leaves external/CDN URLs completely unchanged
  • ✅ Handles protocol-relative URLs (//cdn.example.com/image.png)
  • ✅ Resolves complex relative paths (../images/photo.jpg)

Implementation:

  • Custom markdown conversion rule for <img> tags
  • Uses Go's url.ResolveReference() for proper URL handling
  • Comprehensive test coverage for various URL scenarios

Result: All images now point to their original sources, maintaining the visual fidelity of the imported website content.

Commit: 181def2

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • accounts.google.com
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=5920 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED2276477603 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,8815604363791483137,2765502903778978695,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --no-default-browser-check --disable-breakpad --disable-default-apps --disable-hang-monitor --disable-renderer-REDACTEDing --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --disable-REDACTEDing-occluded-windows --use-mock-keychain --disable-REDACTED-networking --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-prompt-on-repost --disable-sync --force-color-profile=srgb --hide-scrollbars --disable-dev-shm-usage --disable-extensions --disable-gpu --enable-features=NetworkService,NetworkServiceInProcess --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --password-store=basic --mute-audio --metrics-recording-only --safebrowsing-disable-auto-update --no-sandbox --no-first-run --disable-popup-blocking --enable-automation --headless --disable-ipc-flooding-protection --enable-unsafe-swiftshader --user-data-dir=/tmp/chromedp-REDACTED2276477603 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=6135 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED3648146207 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,11200105158007422015,1384752722888722307,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
  • android.clients.google.com
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=6135 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED3648146207 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,11200105158007422015,1384752722888722307,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --disable-breakpad --disable-hang-monitor --enable-features=NetworkService,NetworkServiceInProcess --disable-dev-shm-usage --disable-extensions --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --mute-audio --disable-popup-blocking --disable-prompt-on-repost --safebrowsing-disable-auto-update --enable-automation --headless --disable-default-apps --disable-ipc-flooding-protection --disable-sync --force-color-profile=srgb --metrics-recording-only --password-store=basic --disable-renderer-REDACTEDing --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --disable-client-side-phishing-detection --use-mock-keychain --no-sandbox --no-first-run --hide-scrollbars --disable-REDACTED-networking --disable-gpu --disable-REDACTEDing-occluded-windows --enable-unsafe-swiftshader --no-default-browser-check --disable-REDACTED-timer-throttling --user-data-dir=/tmp/chromedp-REDACTED3648146207 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=8874 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED2564501405 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,10977512617465755840,17708370500840894483,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
  • clients2.google.com
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=5920 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED2276477603 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,8815604363791483137,2765502903778978695,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --no-default-browser-check --disable-breakpad --disable-default-apps --disable-hang-monitor --disable-renderer-REDACTEDing --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --disable-REDACTEDing-occluded-windows --use-mock-keychain --disable-REDACTED-networking --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-prompt-on-repost --disable-sync --force-color-profile=srgb --hide-scrollbars --disable-dev-shm-usage --disable-extensions --disable-gpu --enable-features=NetworkService,NetworkServiceInProcess --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --password-store=basic --mute-audio --metrics-recording-only --safebrowsing-disable-auto-update --no-sandbox --no-first-run --disable-popup-blocking --enable-automation --headless --disable-ipc-flooding-protection --enable-unsafe-swiftshader --user-data-dir=/tmp/chromedp-REDACTED2276477603 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=6135 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED3648146207 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,11200105158007422015,1384752722888722307,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
  • example.com
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=5920 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED2276477603 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,8815604363791483137,2765502903778978695,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --no-default-browser-check --disable-breakpad --disable-default-apps --disable-hang-monitor --disable-renderer-REDACTEDing --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --disable-REDACTEDing-occluded-windows --use-mock-keychain --disable-REDACTED-networking --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-prompt-on-repost --disable-sync --force-color-profile=srgb --hide-scrollbars --disable-dev-shm-usage --disable-extensions --disable-gpu --enable-features=NetworkService,NetworkServiceInProcess --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --password-store=basic --mute-audio --metrics-recording-only --safebrowsing-disable-auto-update --no-sandbox --no-first-run --disable-popup-blocking --enable-automation --headless --disable-ipc-flooding-protection --enable-unsafe-swiftshader --user-data-dir=/tmp/chromedp-REDACTED2276477603 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=6135 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED3648146207 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,11200105158007422015,1384752722888722307,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
  • www.google.com
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=5920 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED2276477603 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,8815604363791483137,2765502903778978695,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --no-default-browser-check --disable-breakpad --disable-default-apps --disable-hang-monitor --disable-renderer-REDACTEDing --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --disable-REDACTEDing-occluded-windows --use-mock-keychain --disable-REDACTED-networking --disable-REDACTED-timer-throttling --disable-client-side-phishing-detection --disable-prompt-on-repost --disable-sync --force-color-profile=srgb --hide-scrollbars --disable-dev-shm-usage --disable-extensions --disable-gpu --enable-features=NetworkService,NetworkServiceInProcess --disable-features=site-per-process,Translate,BlinkGenPropertyTrees --password-store=basic --mute-audio --metrics-recording-only --safebrowsing-disable-auto-update --no-sandbox --no-first-run --disable-popup-blocking --enable-automation --headless --disable-ipc-flooding-protection --enable-unsafe-swiftshader --user-data-dir=/tmp/chromedp-REDACTED2276477603 --remote-debugging-port=0 --noerrdialogs --ozone-platform=headless --ozone-override-screen-size=800,600 --use-angle=swiftshader-webgl about:blank (dns block)
    • Triggering command: /usr/local/share/chromium/chrome-linux/chrome --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-US --service-sandbox-type=network --no-sandbox --disable-dev-shm-usage --use-angle=swiftshader-webgl --mute-audio --crashpad-handler-pid=6135 --enable-crash-reporter=, --noerrdialogs --user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 --user-data-dir=/tmp/chromedp-REDACTED3648146207 --subproc-heap-profiling --change-stack-guard-on-fork=enable --shared-files=network_parent_dirs_pipe:100,v8_context_snapshot_data:101 --field-trial-handle=3,i,11200105158007422015,1384752722888722307,262144 --enable-features=NetworkService,NetworkServiceInProcess --disable-features=BlinkGenPropertyTrees,PaintHolding,Translate,site-per-process --variations-seed-version (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI changed the title Add website import functionality with secure headless browser support Add website import functionality with secure headless browser support and image URL preservation Sep 16, 2025
@streed
Copy link
Owner

streed commented Sep 16, 2025

@copilot let's rename the import command to import-url and add documentation where needed to cover this new feature.

Co-authored-by: streed <805140+streed@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Sep 16, 2025

@copilot let's rename the import command to import-url and add documentation where needed to cover this new feature.

I've renamed the command from import to import-url and added comprehensive documentation as requested.

Command Changes:

  • ✅ Renamed: ml-notes import <url>ml-notes import-url <url>
  • ✅ Updated all help text and examples

Documentation Added:

  • README.md: Added "Website Import" to features list and comprehensive CLI usage section
  • USAGE_GUIDE.md: Added import-url to Essential Commands Summary table
  • Examples: Basic import, tagging, auto-tagging, and timeout scenarios
  • Features: Documented headless browser, content extraction, image preservation, and security

Usage Examples:

ml-notes import-url https://blog.example.com/article
ml-notes import-url https://docs.example.com --tags "docs,reference"
ml-notes import-url https://example.com --auto-tag

Commit: eb4347d

@streed streed marked this pull request as ready for review September 16, 2025 20:56
Copilot AI review requested due to automatic review settings September 16, 2025 20:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds website import functionality to ml-notes, enabling users to import web pages as notes using the new import-url command. The feature uses a headless Chrome browser to handle dynamic content and converts HTML to clean markdown format while preserving image URLs.

  • Implements ml-notes import-url <url> command with Chrome automation
  • Adds secure headless browser support with environment-specific security configurations
  • Preserves image URLs by converting relative paths to absolute URLs

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
go.mod Updates Go version and adds new dependencies for HTML parsing and Chrome automation
docs/USAGE_GUIDE.md Adds import-url command to the essential commands summary table
cmd/import_test.go Comprehensive test coverage for URL validation, content cleaning, and environment detection
cmd/import.go Core implementation of website import functionality with headless browser support
README.md Documents the new website import feature with usage examples and feature descriptions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@streed streed merged commit cc7a234 into main Sep 16, 2025
8 checks passed
@streed streed deleted the copilot/fix-1391f1db-d7fa-4f9e-889d-1aaf714fb26e branch September 16, 2025 20:58
This was referenced Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants