Skip to content

Conversation

@Huijiro
Copy link
Member

@Huijiro Huijiro commented Oct 15, 2025

Summary by CodeRabbit

  • New Features
    • Filters email attachments to only include valid HTTP/HTTPS links with non-localhost hostnames.
  • Bug Fixes
    • Prevents crashes when content-disposition is missing or unparseable by handling it gracefully.
    • Drops invalid or malformed attachment URLs to avoid processing errors.
  • Documentation
    • Updates descriptions to clarify content-disposition parsing and attachment handling behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 15, 2025

Walkthrough

The email attachment extraction now parses URLs from content-disposition safely, returning None instead of raising errors when parsing fails. Attachments are filtered to include only HTTP/HTTPS URLs with non-localhost hostnames. URL parsing and hostname validation are added, and docstrings are updated accordingly.

Changes

Cohort / File(s) Summary of Changes
Email attachment parsing and filtering
agentuity/io/email.py
- Replace ValueError with None return in _parse_url_from_content_disposition on missing/unparseable headers
- Import and use urlparse for robust URL handling
- Filter attachments to only HTTP/HTTPS with non-localhost hostnames; drop invalid/missing URLs
- Update docstrings to reflect new parsing and filtering behavior

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant E as EmailParser
    participant CD as _parse_url_from_content_disposition
    participant U as urlparse
    participant F as AttachmentFilter

    E->>CD: Extract URL from Content-Disposition
    CD->>U: Parse URL
    alt Valid URL
        U-->>CD: url (scheme, host, ...)
        CD-->>E: URL
        E->>F: Validate scheme/hostname
        alt http/https and non-localhost
            F-->>E: Keep attachment
        else Invalid host or scheme
            F-->>E: Drop attachment
        end
    else Missing/invalid header
        CD-->>E: None
        E->>F: No URL -> Drop attachment
    end

    note over E: Returns only validated attachments
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I sniffed the mail with whiskers keen,
Trimmed the links, kept only clean—
No localhost in my burrow door,
Just http(s) I can explore.
With gentle hops through headers’ maze,
I drop the duds and keep the blaze. 🐇📬

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “Made email attachment URL optional” clearly states the primary change of allowing attachments without a URL to avoid raising an error, matching the update that _parse_url_from_content_disposition now returns None. It is concise and specific enough for a reviewer to understand the main behavior change without needing implementation details.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch update-email

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
agentuity/io/email.py (1)

90-121: Add None check for _url before use.

The data() method uses self._url at line 105 without checking if it's None. While the attachments property filters out None URLs, if an IncomingEmailAttachment is instantiated directly and data() is called, it will fail with a non-descriptive error (TypeError: unsupported operand type(s) when httpx tries to use None as a URL).

Add a guard at the beginning of the method:

 async def data(self):
     """
     Return a Data object that streams the attachment data asynchronously.
     """
+    if self._url is None:
+        raise ValueError("Attachment URL is not available")
     tracer = trace.get_tracer("email")
🧹 Nitpick comments (2)
agentuity/io/email.py (2)

298-299: Consider more specific exception handling.

Catching all exceptions with a bare except Exception may hide legitimate parsing or validation errors that should be logged or handled differently. Consider being more specific about which exceptions to catch.

-            except Exception:
-                continue
+            except (ValueError, TypeError, AttributeError) as e:
+                # Log or handle specific parsing errors if needed
+                continue

387-387: Optional: Remove explicit return None.

Python functions implicitly return None when execution completes without a return statement. The explicit return None is unnecessary but harmless.

                response = await client.post(url, content=email_body, headers=headers)
                response.raise_for_status()
-                return None
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7aafb06 and 50aeb89.

📒 Files selected for processing (1)
  • agentuity/io/email.py (4 hunks)

Comment on lines +293 to +295
hostname_lower = hostname.lower()
if hostname_lower in ('localhost', '127.0.0.1', '::1'):
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Incomplete localhost/loopback address filtering.

The hostname filtering only checks for exact matches of 'localhost', '127.0.0.1', and '::1'. This misses:

  • Other 127.x.x.x loopback addresses (e.g., 127.0.0.2, 127.1.1.1)
  • Alternative IPv6 loopback representations (e.g., 0:0:0:0:0:0:0:1)
  • Private network ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) that could enable SSRF

Consider using a more comprehensive check:

-                hostname_lower = hostname.lower()
-                if hostname_lower in ('localhost', '127.0.0.1', '::1'):
-                    continue
+                import ipaddress
+                
+                hostname_lower = hostname.lower()
+                if hostname_lower == 'localhost':
+                    continue
+                
+                # Check if hostname is an IP address
+                try:
+                    ip = ipaddress.ip_address(hostname)
+                    if ip.is_loopback or ip.is_private or ip.is_link_local:
+                        continue
+                except ValueError:
+                    # Not an IP address, proceed with domain name
+                    pass
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
hostname_lower = hostname.lower()
if hostname_lower in ('localhost', '127.0.0.1', '::1'):
continue
import ipaddress
hostname_lower = hostname.lower()
if hostname_lower == 'localhost':
continue
# Check if hostname is an IP address
try:
ip = ipaddress.ip_address(hostname)
if ip.is_loopback or ip.is_private or ip.is_link_local:
continue
except ValueError:
# Not an IP address, proceed with domain name
pass

@Huijiro Huijiro closed this Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants