Skip to content

JuaanReis/pepeScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pepeScraper - 4chan scraper

A complete scraper for 4chan (Now that's fast.)

pepeScraper is a scraper that uses context for your searches and returns exactly what you want. (I'm learning how to make an item look cooler than it actually is)

  • Enter keywords, anything you can think of (just be careful what you search for 👀)
  • Control the results by date and exclude what you don't want to appear.
  • Control the search speed of this program (do not confuse the processing thread with the 4chan thread)

Table of contents

Weird stuff

Stars   PRs Welcome   Last Commit   Python 3.13  License: GPL   Play Random Video   Play Random Video

Installation

If you use Windows, just go to releases and download the latest version and then install the dependencies. If you want to help and have access to the source code, use the code below.

    git clone https://JuaanReis/pepeScraper.git
    pip install -r requirements.txt

If you use Linux it will also be the same thing above (but Linux sometimes forces you to use that damn venv) so use the code below.

    git clone https://github.com/JuaanReis/pepeScraper.git
    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt

Flags

    "--key <w>": keywords used as the base for search and scraping  
    "--thread <n>": 4chan thread where the posts are located  
    "--exclude, -e <w>": keywords to be excluded from the results  
    "--date <YYYY/MM/DD>": exact date when the OP post was made  
    "--before <YYYY/MM/DD>": posts before the given date up to today  
    "--after <YYYY/MM/DD>": posts after the given date up to today  
    "--min-replies <n>": minimum number of replies the thread must have  
    "--max-replies <n>": maximum number of replies the thread can have  
    "--board <board_name>": name(s) of the board(s) to search  
    "-T <n>": number of threads that the program will work with (workers in the ThreadPoolExecutor)  
    "--op-only, -op": only consider the original post (OP)  
    "--no-op, -nop": It's the same as above but the opposite
    "--nsfw, -n": to enable vulgar posts
    "--nsfw-title, -nt": to enable title vulgar posts
    "--output, -o": to save the results to a text file (on your computer, just the link).
    "--download_image, -di": download all images from the thread.

Example

I know this meme is awful and the screenshot turned out terrible.

Privacy and Data Storage

PepeScraper does NOT store anything
it only uses the API and creates a direct link to 4chan.
No logs, no history, no databases, no Facebook copy (maybe you understand).

Everything is stored in RAM and deleted when the program finishes. (That's right, your mom won't find out what you searched for.)

Please don't sue me, I don't have the money to pay a lawyer.

Careful with NSFW Content

I'm serious, pornography can destroy your brain, your body, and your family (no matter how many times I write this, you'll ignore it). '

    python main.py --keyword "pepe" --exclude "nsfw" --date 01-01-1970

This can make your research perhaps safer (I don't know if I programmed this right).



The end?