-
Notifications
You must be signed in to change notification settings - Fork 0
Add website import functionality with secure headless browser support and image URL preservation #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add website import functionality with secure headless browser support and image URL preservation #8
Conversation
Co-authored-by: streed <805140+streed@users.noreply.github.com>
Co-authored-by: streed <805140+streed@users.noreply.github.com>
ml-notes import <url> and it'll create a new note based on the title of the page, and the body of the page should be converted to markdown as much as po...|
@copilot we should use secure settings for the headless chrome. We will be interacting with live websites and we should be acting as a normal browser here. |
Co-authored-by: streed <805140+streed@users.noreply.github.com>
I've implemented secure Chrome configuration settings that act like a normal browser when interacting with live websites. The changes include: Security Improvements:
Environment Detection: Commit: a25587c This ensures ml-notes import behaves securely when accessing real websites while still working in CI environments. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot image links should still point to their original links to maintain the look of the note as close to the website as possible. |
Co-authored-by: streed <805140+streed@users.noreply.github.com>
I've implemented proper image URL preservation that maintains the original website's appearance. The solution includes: URL Resolution Features:
Implementation:
Result: All images now point to their original sources, maintaining the visual fidelity of the imported website content. Commit: 181def2 Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot let's rename the import command to |
Co-authored-by: streed <805140+streed@users.noreply.github.com>
I've renamed the command from Command Changes:
Documentation Added:
Usage Examples: ml-notes import-url https://blog.example.com/article
ml-notes import-url https://docs.example.com --tags "docs,reference"
ml-notes import-url https://example.com --auto-tagCommit: eb4347d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds website import functionality to ml-notes, enabling users to import web pages as notes using the new import-url command. The feature uses a headless Chrome browser to handle dynamic content and converts HTML to clean markdown format while preserving image URLs.
- Implements
ml-notes import-url <url>command with Chrome automation - Adds secure headless browser support with environment-specific security configurations
- Preserves image URLs by converting relative paths to absolute URLs
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| go.mod | Updates Go version and adds new dependencies for HTML parsing and Chrome automation |
| docs/USAGE_GUIDE.md | Adds import-url command to the essential commands summary table |
| cmd/import_test.go | Comprehensive test coverage for URL validation, content cleaning, and environment detection |
| cmd/import.go | Core implementation of website import functionality with headless browser support |
| README.md | Documents the new website import feature with usage examples and feature descriptions |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
This PR implements the ability to import websites as notes using the new
ml-notes import-url <url>command. The feature uses a headless browser to load web pages, wait for dynamic content, and convert the HTML to clean markdown format.New Features
Core Import Command
ml-notes import-url <url>- Import any website as a new noteContent Processing
The import process intelligently extracts the main content by:
<article>,<main>,[role="main"]).content,.post-content, etc.)Integration with Existing Features
--tags "tag1,tag2"for manual tagging--auto-tagfor AI-powered tag generation (requires Ollama)--timeout 30sUsage Examples
Technical Implementation
Dependencies
Secure Browser Configuration
The implementation uses a security-first approach for Chrome configuration:
Image URL Preservation
To maintain visual fidelity of imported content:
/logo.pngtohttps://example.com/logo.png../images/photo.jpgstyle relative pathsError Handling
Comprehensive error handling for:
Security Features
The browser configuration prioritizes security when interacting with live websites:
Documentation
Added comprehensive documentation for the new feature:
import-urlto Essential Commands Summary tableTesting
Added comprehensive test coverage including:
All existing tests continue to pass, ensuring no breaking changes to the existing codebase.
Example Output
This feature enables users to quickly capture web content as structured notes while maintaining the rich formatting, visual fidelity, and searchability that ml-notes provides, with robust security measures for safe interaction with live websites.
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.