Skip to content

Conversation

Copy link

Copilot AI commented Dec 26, 2025

Identified and eliminated performance bottlenecks throughout the IOC extraction pipeline.

Changes

  • extractor method: Fixed incorrect list[str | None] return type and removed unnecessary filter(None, pages) and "".join() operations. pdfminer.extract_text() returns str directly.

  • get_patterns method: Eliminated redundant set() → sorted() → set() conversions. Now converts once, processes, and sorts only when displaying results. Fixed counter logic to increment only after all filtering.

  • detect_language method: Changed from iterating 6 fixed languages to only iterating detected languages via .items(). Reduces typical iterations from 6 to 0-2.

  • patts method: Replaced explicit loop with dictionary comprehension. Maintained inline URL whitespace cleaning using conditional expression.

Example

Before:

pages = self.extractor(pdf=pdf_path)
text = "".join(filter(None, pages))  # Unnecessary operations

sorted_patterns = sorted(set(pvals))
# ... process ...
self.print_and_write_patterns(key, set(sorted_patterns), output, title)  # Convert back to set

After:

text = self.extractor(pdf=pdf_path)  # Direct assignment

unique_patterns = set(pvals)
# ... process ...
if unique_patterns:
    sorted_patterns = sorted(unique_patterns)
    self.print_and_write_patterns(key, sorted_patterns, output, title)

Net reduction: 8 lines of code (30 insertions, 38 deletions)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • astral.sh
    • Triggering command: /usr/bin/curl curl -LsSf REDACTED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Identify and suggest improvements to slow or inefficient code


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits December 26, 2025 10:07
…s, use efficient comprehensions

Co-authored-by: dfirsec <12067255+dfirsec@users.noreply.github.com>
Co-authored-by: dfirsec <12067255+dfirsec@users.noreply.github.com>
Copilot AI changed the title [WIP] Identify and suggest improvements to slow code Optimize IOC extraction: remove redundant operations and data structure conversions Dec 26, 2025
Copilot AI requested a review from dfirsec December 26, 2025 10:12
@dfirsec dfirsec marked this pull request as ready for review December 26, 2025 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants