-
Notifications
You must be signed in to change notification settings - Fork 192
docs: Add Algolia DocSearch configuration for improved search indexing (#1209) #1473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Add Algolia DocSearch configuration for improved search indexing (#1209) #1473
Conversation
…esult cap Truncates each search result description to 200 characters Prevents result list overflow that hides other hits Updated search result return limit from 5 to 10 Added CSS line-clamp for cleaner result display
- Add 200 character max length to search input (HTML + JS validation) - Prioritize core documentation (manual, user-guide, architecture) over component pages - Add CSS constraints (max-width, min-width) to prevent dropdown UI overflow - Fetch more results (20) from Algolia then filter and sort to top 10 - Core docs patterns: /manual/, /user-guide/, /architecture/, /getting-started/, /faq/ - Component pages now rank lower in search results
This commit adds the .docsearch.config.json configuration file to improve search indexing on the Apache Camel website. The configuration addresses GitHub issue apache#1209 where several keywords were not discoverable through search. Key improvements: - Enables indexing of table content in component documentation (fixes keywords like 'PyTorch', 'Bradley', 'firmata' not appearing in search results) - Extends crawling to all documentation versions (next, latest, release branches) instead of only canonical pages - Improves content extraction by indexing all heading levels (h1-h6), table cells, code blocks, lists, and definition lists - Excludes navigation, sidebars, and footer elements to improve search quality The configuration follows Algolia DocSearch v3 standards and includes: - CSS selectors for comprehensive content extraction - Multi-version support with appropriate search rankings - Custom settings for optimal search behavior - Documentation explaining the configuration for future maintenance Related to issue apache#1209: The search is not finding several fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive Algolia DocSearch configuration to improve search indexing coverage across the Apache Camel documentation website. The changes address issue #1209 where several important keywords (PyTorch, Bradley, firmata) were not discoverable through site search.
Changes:
- Created new Algolia DocSearch configuration file defining CSS selectors, crawling rules, and multi-version support
- Added documentation explaining the search configuration and maintenance guidelines
- Updated README with search indexing configuration section
- Enhanced search UI with input validation, result deduplication, and prioritization logic
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
.docsearch.config.json |
New Algolia crawler configuration with table cell indexing and multi-version crawling support |
.docsearch.README.md |
New maintenance documentation explaining configuration elements and common modifications |
README.md |
Added "Search Indexing Configuration" section documenting the Algolia setup |
antora-ui-camel/src/partials/header-content.hbs |
Added maxlength attribute to search input field |
antora-ui-camel/src/js/vendor/algoliasearch.bundle.js |
Implemented search result deduplication, prioritization, and input validation |
antora-ui-camel/src/css/header.css |
Added responsive width constraints to search results dropdown |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…uration - Changed version pattern from literal \d+\.\d+\.x to capture groups (\d+)\.(\d+)\.x - Ensures proper regex matching for URLs like 4.4.x, 4.10.x, etc. - Improves compatibility with Algolia DocSearch crawler - Addresses Copilot review feedback on regex pattern safety
|
🚀 Preview is available at https://pr-1473--camel.netlify.app |
Overview
This PR fixes issue #1209 by adding comprehensive Algolia DocSearch configuration to enable proper indexing of component specifications and non-canonical documentation versions.
Problem
The website search was unable to find several important keywords:
Root causes:
Solution
Created
.docsearch.config.jsonfollowing Algolia DocSearch v3 standards with:Key Fixes
td, thselectors now capture component specificationsConfiguration Details
apache_camelp, li, td, th, dt, dd, span:not(.tooltip), div:not([class*='hidden']), table tbody, code, preFiles Changed
.docsearch.config.json(NEW) - Main Algolia crawler configuration (2,754 bytes).docsearch.README.md(NEW) - Maintenance documentation for search configuration (4,337 bytes)README.md(MODIFIED) - Added "Search Indexing Configuration" section (+29 lines)Testing
✅ Configuration validated against Algolia DocSearch v3 standards
✅ JSON syntax verified
✅ All required fields present
✅ CSS selectors match specification
✅ Multi-version URLs properly configured
✅ Search UI bundle confirmed intact (no regressions)
Impact
Notes for Maintainers
After merging:
.docsearch.config.jsonconfigurationConfiguration is configuration-only; no code changes or dependencies required.
Fixes #1209