Skip to content

Harden documentation link validation to prevent false CI passes#452

Merged
imbajin merged 12 commits intoapache:masterfrom
bitflicker64:fix-link-validation
Feb 12, 2026
Merged

Harden documentation link validation to prevent false CI passes#452
imbajin merged 12 commits intoapache:masterfrom
bitflicker64:fix-link-validation

Conversation

@bitflicker64
Copy link
Contributor

Purpose of the PR

What this PR does

This PR hardens the documentation link validation used in CI.

Previously, CI could pass even when documentation contained broken internal links due to limited link detection and path resolution. This change expands validation coverage and ensures CI fails when unresolved internal links are present.

Key changes

  • Validate all inline Markdown links with accurate file and line numbers
  • Correctly resolve internal paths, including:
    • /docs/* → content/en/docs/*
    • /cn/docs/* → content/cn/docs/*
    • absolute root paths
    • relative links
  • Exclude fenced code blocks from validation without breaking line numbers
  • Skip external links and Hugo shortcodes intentionally
  • Validate non-Markdown assets separately (images, configs, etc.)
  • Fail CI deterministically on unresolved internal links

Notes

This change surfaces existing broken documentation links that CI was previously not detecting.
Those issues can be addressed incrementally in follow-up PRs.

- Add support for relative link validation
- Check absolute root paths (e.g., /language/*)
- Skip asset files (.png, .xml, .css, etc.)
- Strip code blocks from validation
- Add case-insensitive protocol detection

This catches ~54 previously undetected broken links.
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Feb 9, 2026
@bitflicker64
Copy link
Contributor Author

bitflicker64 commented Feb 9, 2026

@imbajin This PR tightens the link validation so CI fails when unresolved internal links exist. Earlier some broken links were not being detected because the validation coverage was limited which allowed CI to pass.
With the updated validator those existing issues are now surfaced. This PR focuses on fixing the validation behaviour itself and the underlying documentation links can be cleaned up separately in follow up PRs if needed.

@imbajin imbajin requested a review from Copilot February 9, 2026 13:31

This comment was marked as outdated.

@bitflicker64
Copy link
Contributor Author

I improved the link validator by adding canonical path resolution, content directory boundary checks, and accurate file line reporting

@imbajin
Copy link
Member

imbajin commented Feb 10, 2026

If the total number of changed lines isn't too large, I'd suggest fixing them together in this PR. Otherwise, (Required) CI failures might be troublesome & blocking the merge, and these fixes should have minimal side effects.

@bitflicker64
Copy link
Contributor Author

If the total number of changed lines isn't too large, I'd suggest fixing them together in this PR. Otherwise, (Required) CI failures might be troublesome & blocking the merge, and these fixes should have minimal side effects.

ill start working on fixing the links

@bitflicker64
Copy link
Contributor Author

sorry for committing the batch ill fix the issues and continue the url fixes

@dosubot dosubot bot removed the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 10, 2026
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Feb 10, 2026
@bitflicker64
Copy link
Contributor Author

Link validation checks are passing now. Fixed broken relative links, corrected SUMMARY navigation for Hugo routing, and removed an invalid reference to a nonexistent file Remaining messages are expected Hugo runtime URL warnings only.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 17 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Note:

> You can get the asynchronous job status by `GET http://localhost:8080/graphspaces/DEFAULT/graphs/hugegraph/tasks/${task_id}` (the task_id here should be 1). See More [AsyncJob RESTfull API](../task)
> You can get the asynchronous job status by `GET http://localhost:8080/graphspaces/DEFAULT/graphs/hugegraph/tasks/${task_id}` (the task_id here should be 1). See More [AsyncJob RESTfull API](./task)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in link text: “RESTfull” should be “RESTful”.

Copilot uses AI. Check for mistakes.
Note:

> You can get the asynchronous job status by `GET http://localhost:8080/graphspaces/DEFAULT/graphs/hugegraph/tasks/${task_id}` (the task_id here should be 2). See More [AsyncJob RESTfull API](../task)
> You can get the asynchronous job status by `GET http://localhost:8080/graphspaces/DEFAULT/graphs/hugegraph/tasks/${task_id}` (the task_id here should be 2). See More [AsyncJob RESTfull API](./task)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in link text: “RESTfull” should be “RESTful”.

Copilot uses AI. Check for mistakes.
Note:

> You can get the asynchronous job status by `GET http://localhost:8080/graphspaces/DEFAULT/graphs/hugegraph/tasks/${task_id}` (the task_id here should be 3). See More [AsyncJob RESTfull API](../task) No newline at end of file
> You can get the asynchronous job status by `GET http://localhost:8080/graphspaces/DEFAULT/graphs/hugegraph/tasks/${task_id}` (the task_id here should be 3). See More [AsyncJob RESTfull API](./task) No newline at end of file
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in link text: “RESTfull” should be “RESTful”.

Copilot uses AI. Check for mistakes.
@bitflicker64
Copy link
Contributor Author

Thanks for the review. I will debug the issues and push a proper fix after my classes.

@bitflicker64
Copy link
Contributor Author

All validator issues raised in the review thread have been addressed.
Path normalization, absolute path handling, URL decoding, and asset resolution logic are corrected.
Heuristic behaviors for code fences and inline code detection are now explicitly documented.
The script is fail fast, Bash compatible, and produces deterministic validation results.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your contribution

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 12, 2026
@imbajin imbajin merged commit 202d930 into apache:master Feb 12, 2026
1 check passed
github-actions bot pushed a commit that referenced this pull request Feb 12, 2026
…#452)

* feat: enhance link validator to catch all internal links

- Add support for relative link validation
- Check absolute root paths (e.g., /language/*)
- Skip asset files (.png, .xml, .css, etc.)
- Strip code blocks from validation
- Add case-insensitive protocol detection 202d930
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] CI link validation misses broken internal documentation links

2 participants