Skip to content

Comments

Refactored HackerOne report scraper with error handling and memory optimizations#5

Open
thiezn wants to merge 1 commit intog0ldencybersec:mainfrom
thiezn:main
Open

Refactored HackerOne report scraper with error handling and memory optimizations#5
thiezn wants to merge 1 commit intog0ldencybersec:mainfrom
thiezn:main

Conversation

@thiezn
Copy link

@thiezn thiezn commented Jul 11, 2024

Hi batman & batman!

Thanks for sharing both the code and interesting talk. Nice example of leveraging AI for fuzzy data analytics.

I wanted to grab Hacktivity reports for myself for a long time but never got around to it. I saw the hackerone.py script which already figured out how the graphql + report endpoints work so thought I bite the bullet.

This pull request refactors the hackerone.py script to make it a bit more robust by:

  • handling errors and rate limits
  • leverages generators to avoid having to keep all the reports / urls in memory during retrieval
  • incrementally adds reports to the output file so a crash of the script won't lose all of the results

Create a new python virtual environment and install the dependency (Friends don't let other friends use system-level Python dependencies)

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install httpx

You can run the script as follows (use -h to see optional cli arguments like the cookie and/or filename from the command line):

chmod +x hackerone.py
./hackerone.py

Kind regards,
Robin

@thiezn
Copy link
Author

thiezn commented Jul 11, 2024

Just to give some idea, when I ran the script today it took me 137 minutes and gave me 12.668 reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant