A complete scraper for 4chan (Now that's fast.)
pepeScraper is a scraper that uses context for your searches and returns exactly what you want. (I'm learning how to make an item look cooler than it actually is)
- Enter keywords, anything you can think of (just be careful what you search for 👀)
- Control the results by date and exclude what you don't want to appear.
- Control the search speed of this program (do not confuse the processing thread with the 4chan thread)
If you use Windows, just go to releases and download the latest version and then install the dependencies. If you want to help and have access to the source code, use the code below.
git clone https://JuaanReis/pepeScraper.git
pip install -r requirements.txtIf you use Linux it will also be the same thing above (but Linux sometimes forces you to use that damn venv) so use the code below.
git clone https://github.com/JuaanReis/pepeScraper.git
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt "--key <w>": keywords used as the base for search and scraping
"--thread <n>": 4chan thread where the posts are located
"--exclude, -e <w>": keywords to be excluded from the results
"--date <YYYY/MM/DD>": exact date when the OP post was made
"--before <YYYY/MM/DD>": posts before the given date up to today
"--after <YYYY/MM/DD>": posts after the given date up to today
"--min-replies <n>": minimum number of replies the thread must have
"--max-replies <n>": maximum number of replies the thread can have
"--board <board_name>": name(s) of the board(s) to search
"-T <n>": number of threads that the program will work with (workers in the ThreadPoolExecutor)
"--op-only, -op": only consider the original post (OP)
"--no-op, -nop": It's the same as above but the opposite
"--nsfw, -n": to enable vulgar posts
"--nsfw-title, -nt": to enable title vulgar posts
"--output, -o": to save the results to a text file (on your computer, just the link).
"--download_image, -di": download all images from the thread.
I know this meme is awful and the screenshot turned out terrible.
PepeScraper does NOT store anything
it only uses the API and creates a direct link to 4chan.
No logs, no history, no databases, no Facebook copy (maybe you understand).
Everything is stored in RAM and deleted when the program finishes. (That's right, your mom won't find out what you searched for.)
Please don't sue me, I don't have the money to pay a lawyer.
I'm serious, pornography can destroy your brain, your body, and your family (no matter how many times I write this, you'll ignore it). '
python main.py --keyword "pepe" --exclude "nsfw" --date 01-01-1970This can make your research perhaps safer (I don't know if I programmed this right).
The end?

