🛡️ Sentinel: [HIGH] Fix unanchored regex in tweet verification#86
🛡️ Sentinel: [HIGH] Fix unanchored regex in tweet verification#86Dexploarer wants to merge 1 commit intomainfrom
Conversation
Replace vulnerable regex with strict `URL` parsing in `parseTweetUrl` to prevent acceptance of spoofed URLs. Adds comprehensive test coverage in `src/api/twitter-verify.test.ts`.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Dexploarer, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the security of tweet URL validation by resolving a high-severity vulnerability. The previous regex-based approach was susceptible to bypasses due to being unanchored. The updated implementation now leverages the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
| const match = parsed.pathname.match(/^\/(\w+)\/status\/(\d+)$/); | ||
| if (!match) return null; |
There was a problem hiding this comment.
The regular expression used to extract the screen name and tweet ID from the tweet URL is too restrictive. It only matches URLs of the form /username/status/1234567890 with no trailing slash, query parameters, or fragments. However, real-world tweet URLs often include trailing slashes, query parameters (e.g., ?s=20), or fragments, which will cause this function to return null for otherwise valid URLs.
Recommended solution:
Update the regex to allow for optional trailing slashes and ignore query parameters/fragments. For example:
const match = parsed.pathname.match(/^\/(\w+)\/status\/(\d+)(?:\/)?$/);Or, better yet, split the pathname and extract the relevant segments, ignoring any extra path components or trailing slashes. This will make the function more robust and user-friendly.
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a high-severity security vulnerability related to an unanchored regex for tweet URL validation. The fix is robust, using the native URL API for parsing and implementing strict checks on both the hostname and pathname. The addition of comprehensive unit tests is excellent and ensures the vulnerability is resolved and won't regress. I have one minor suggestion to improve performance and code style by refactoring the list of valid hosts.
| const validHosts = ["twitter.com", "www.twitter.com", "x.com", "www.x.com"]; | ||
|
|
||
| if (!validHosts.includes(hostname)) { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
For better performance and to adhere to best practices, it's recommended to define validHosts as a constant outside the parseTweetUrl function. This prevents the array from being recreated on every function call. Additionally, using a Set for validHosts provides a more performant lookup (O(1)) compared to an array's includes method (O(n)).
| const validHosts = ["twitter.com", "www.twitter.com", "x.com", "www.x.com"]; | |
| if (!validHosts.includes(hostname)) { | |
| return null; | |
| } | |
| const validHosts = new Set(["twitter.com", "www.twitter.com", "x.com", "www.x.com"]); | |
| if (!validHosts.has(hostname)) { | |
| return null; | |
| } |
Vulnerability: The regex used to validate tweet URLs (
/(?:twitter\.com|x\.com)\/(\w+)\/status\/(\d+)/) was unanchored, allowing URLs likehttps://evil.com/twitter.com/user/status/123to bypass validation.Fix:
new URL()parsing.twitter.com,x.com, andwwwvariants).^/(\w+)/status/(\d+)$.parseTweetUrlfor testing.src/api/twitter-verify.test.tscovering valid and invalid cases.Verification:
bun test src/api/twitter-verify.test.tswhich confirmed that the new implementation correctly rejects the exploit vector.PR created automatically by Jules for task 12309087947173640439 started by @Dexploarer