Skip to content

Be able to fetch title of URLs in pages#880

Open
dentarg wants to merge 18 commits intomainfrom
link-titles
Open

Be able to fetch title of URLs in pages#880
dentarg wants to merge 18 commits intomainfrom
link-titles

Conversation

@dentarg
Copy link
Member

@dentarg dentarg commented Feb 2, 2026

@dentarg dentarg marked this pull request as ready for review February 5, 2026 20:45
Comment on lines 90 to 118
@@ -114,8 +116,6 @@ GEM
sawyer (0.9.2)
addressable (>= 2.3.5)
faraday (>= 0.17.3, < 3)
selma (0.4.11)
rb_sys (~> 0.9)
Copy link
Member Author

@dentarg dentarg Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird, but a unclear Bundler behaviour.

Same as a2ef188 that I did to post in the Bundler Slack.

Image

URLs at end of sentences (e.g. "see https://example.com.") had the
trailing period included, which would cause fetch failures. Now strip
common punctuation (.,;:!?) from the end of matched URLs.
LogfmtLogger already handles proper quoting via String#dump, so calling
.inspect on error messages causes double-escaping.
Negative lookbehind (?<!...) requires ES2018+ browsers:
Chrome 62+, Firefox 78+, Safari 16.4+, Node 8.10+
The comment described standalone usage but the partial is included via
haml :'_link_titles' from edit.haml.
Complete the standard log level set (debug, info, warn, error).
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in “fetch link titles” feature in the page editor to convert bare URLs into Markdown links using the target page’s <title>, backed by a new server endpoint and HTTPX-based title fetcher.

Changes:

  • Add /link-title POST endpoint plus LinkTitleFetcher service (HTTPX streaming + redirects + timeout).
  • Add editor UI (“Hämta titlar”) + JS URL extraction/rewrite + CSS styling.
  • Add unit/integration/JS tests and wire JS tests into rake test; introduce structured LogfmtLogger and silence logs in integration tests.

Reviewed changes

Copilot reviewed 16 out of 20 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
views/page/new.haml Adds link-title UI near the content textarea (and renames textarea id).
views/page/edit.haml Adds link-title UI near the content textarea (and renames textarea id).
views/page/_link_titles.haml New partial containing the UI + fetch/replace JS logic.
public/javascripts/link_titles.js URL extraction helper (exported for Node tests).
public/stylesheets/default.css Styles for the inline “Hämta titlar” control and status text.
lib/controllers/page_controller.rb Adds /link-title endpoint that fetches titles and returns JSON.
lib/services/link_title_fetcher.rb New HTTPX-based title fetcher with streaming and byte limit.
lib/services/logfmt_logger.rb New structured logfmt logger.
lib/app.rb Adds App.log / App.log= and requires the new logger.
test/unit/link_title_fetcher_test.rb Unit tests for title parsing/edge cases using a mock HTTP client.
test/integration/app_link_titles_test.rb Integration tests for /link-title.
test/javascript/link_titles_test.js Node tests for URL extraction.
test/unit/logfmt_logger_test.rb Unit tests for logfmt formatting/escaping behavior.
test/integration_test_helper.rb Silences app logging during integration tests.
Rakefile Adds test:javascript and includes it in the default test task.
Gemfile / Gemfile.lock Adds httpx and updates lockfile platforms/deps.
vendor/cache/http-2-1.1.1.gem Vendors new dependency required by HTTPX.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

-# Partial for "Hämta titlar" link and JavaScript
%span.link-titles-inline
%a#link-titles-btn{ onclick: "fetchLinkTitles()" } Hämta titlar
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI uses an <a> without an href plus an onclick handler. Anchors without href are not keyboard-accessible by default and are problematic for assistive tech. Use a <button type="button"> (or add href="#", role="button", keyboard handlers, and event.preventDefault()), and avoid nesting an interactive control inside a <label> if possible.

Suggested change
%a#link-titles-btn{ onclick: "fetchLinkTitles()" } Hämta titlar
%button#link-titles-btn{ type: "button", onclick: "fetchLinkTitles()" } Hämta titlar

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the flow now, pressing tab goes from title input to content text area. Fetching the titles is something of a niche feature, not always used. I think the current solution is fine.

desc "Run JavaScript tests (requires Node.js)"
task :javascript do
if system('which node > /dev/null 2>&1')
sh 'node --test test/javascript/*_test.js'
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Rake task only checks that node exists, but then runs node --test, which requires newer Node versions. If an older Node is installed, this task will fail. Consider feature-detecting --test support (or checking node --version) and skipping with a clear message when unsupported.

Suggested change
sh 'node --test test/javascript/*_test.js'
if system('node --test -e "" > /dev/null 2>&1')
sh 'node --test test/javascript/*_test.js'
else
puts "Skipping JavaScript tests (installed Node.js does not support `--test` flag)"
end

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are totally fine, Gemini says

The --test flag was first introduced to Node.js as an experimental feature in April 2022 with the release of Node.js v18.0.0.

@@ -0,0 +1,30 @@
// Extract bare URLs from content (URLs not already in markdown link format)
// Note: Uses negative lookbehind (?<!...) which requires ES2018+
// Supported in Chrome 62+, Firefox 78+, Safari 16.4+, Node 8.10+
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment claims negative lookbehind is supported in “Node 8.10+”, but this file uses ES2018 regex lookbehind (and the test harness uses node:test / node --test, which generally implies Node 18+). Update the compatibility note to reflect the actual minimum supported Node/browser versions, or avoid lookbehind if older runtimes must be supported.

Suggested change
// Supported in Chrome 62+, Firefox 78+, Safari 16.4+, Node 8.10+
// Supported in Chrome 62+, Firefox 78+, Safari 16.4+, Node 18+

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds resonable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed


post "/link-title", url: "https://example.com"

assert_equal 302, last_response.status
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint returns 401 when not logged in (see post '/link-title' in the controller), but the test asserts a 302 redirect. Update the expectation to match the endpoint behavior (or change the endpoint to redirect, but that’s atypical for a JSON/XHR endpoint).

Suggested change
assert_equal 302, last_response.status
assert_equal 401, last_response.status

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is wrong, why aren't the test failing? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation had dead code b4fc6bd

Comment on lines 48 to 52
def test_link_title_returns_json
ensure_logged_in_user(@user)

post "/link-title", url: "https://nonexistent.invalid"

Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test posts to /link-title with a real hostname (nonexistent.invalid), which causes the controller to make a real outbound HTTP request via HTTPX. That makes the test suite slower and potentially flaky/blocked in CI environments without egress/DNS. Prefer stubbing the fetcher/HTTP client (e.g., inject a mock LinkTitleFetcher/HTTPX in the app for tests) and assert on the stubbed response instead of doing real network I/O.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds resonable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed

Comment on lines 102 to 106
url = params[:url].to_s
halt 400, "Missing URL" if url.empty?

content_type :json

Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/link-title accepts an arbitrary url and fetches it server-side without validation. This is an SSRF vector (e.g., http://localhost, cloud metadata IPs, private RFC1918 ranges) and redirects can pivot to internal hosts. Add strict validation (http/https only) and block private/loopback/link-local IPs (including after DNS resolution and after redirects) before fetching.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds resonable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed

Comment on lines 48 to 52
var content = textarea.value;
var escapedUrl = url.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
// Negative lookbehind requires ES2018+ (Chrome 62+, Firefox 78+, Safari 16.4+)
var pattern = new RegExp('(?<!\\]\\()' + escapedUrl, 'g');
textarea.value = content.replace(pattern, '[' + data.title + '](' + url + ')');
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fetched <title> is inserted into markdown as [${title}](${url}) without escaping. Titles containing ], (, ), backslashes, or newlines can break the markdown and potentially enable unexpected rendering depending on markdown settings. Escape markdown-special characters in data.title before inserting.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds resonable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was addressed

1. Fix Node version in browser compatibility comment (Node 18+)
2. Use mock fetcher in integration tests to avoid network I/O
3. Add SSRF protection via UrlValidator - blocks private IPs,
   loopback, link-local, and validates after DNS resolution
4. Escape markdown-special characters in fetched titles to
   prevent injection (brackets, backslashes, newlines)
The BaseController before filter already handles this with a redirect.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 22 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +19 to +22
@http || HTTPX.plugin(:stream)
.plugin(:follow_redirects)
.with(timeout: { operation_timeout: TIMEOUT }, follow_redirects: { max_redirects: MAX_REDIRECTS })
end
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LinkTitleFetcher enables :follow_redirects, but /link-title only validates the initial URL via UrlValidator.safe?. A user can supply a public URL that 3xx-redirects to a private/loopback/link-local host and the fetcher will follow it, reintroducing an SSRF path despite the validator. Consider disabling automatic redirects and manually following them while re-validating each Location (and/or validating the final effective URL) before performing the next request.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +39
def safe?
uri = URI.parse(@url)
return false unless %w[http https].include?(uri.scheme)
return false if BLOCKED_HOSTS.include?(uri.host&.downcase)
return false if private_ip?(uri.host)

# Resolve DNS and check resolved IPs
ips = resolve_ips(uri.host)
return false if ips.empty?
return false if ips.any? { |ip| private_ip?(ip) }
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UrlValidator.safe? assumes URI.parse(@url).host is present. Some syntactically-valid URIs (e.g. http:///path) have scheme but no host, which will cause Resolv.getaddresses(nil) to raise a TypeError that is not rescued here, turning an invalid input into a 500. Add an explicit return false unless uri.host (and/or rescue TypeError in resolve_ips / safe?).

Copilot uses AI. Check for mistakes.
Comment on lines +38 to +45
var response = await fetch('/link-title', {
method: 'POST',
body: formData
});

if (response.ok) {
var data = await response.json();
if (data.error) {
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client treats any response.ok as JSON and immediately calls response.json(). If the session has expired, BaseController#prevent_unauthorized_modifications will redirect back, and fetch() will follow the 302 and return a 200 HTML page — response.ok will be true, but response.json() will throw and the user will just see a console error. Consider either (a) sending an Accept: application/json (or X-Requested-With) header and returning a non-redirect 401/403 for XHR/JSON requests, or (b) verifying Content-Type is JSON before parsing and showing a visible status error when it isn't.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Be able to fetch title of URLs in pages

1 participant