Skip to content

arsfeld/comicsrss

 
 

Repository files navigation

comicsrss.xyz

Important

This is a personal fork of the original comicsrss.com project

This fork is maintained to re-enable some comics that were removed from the original site and is intended for personal use only. Please respect copyright laws and the terms of service of the comic websites.

For the official Comics RSS site, please visit comicsrss.com.


Scrape, Generate and Deploy Comics RSS

Source code for the site generator and rss feed generator for comicsrss.xyz.

This fork uses GitHub Actions instead of CircleCI for automated scraping and deployment.

Also, all of the site's content is in this repository, as it is hosted by GitHub Pages.

Support the Original Project

If you'd like to help keep the original Comics RSS site going, you can support the creator via PayPal.

Technical Details

I have received many requests to add more comic series to the site. However, my time is limited. So if you want to help out, you can make a scraper!

To be able to add comic series to Comics RSS, it is helpful to understand the basics of what is going on.

Comics RSS has scrapers, and the site generator. Each scraper parses a different comic website, and writes a temporary file to the disk. The site generator reads the temporary JSON files, and writes static HTML/RSS files to the disk.

How scrapers work

The scrapers make https requests to a website (for example, https://www.gocomics.com), parse the responses, and write temporary JSON files to the disk.

On a multi-comic site like https://www.gocomics.com, a scraper has to get the list of comic series (e.g. Agnes, Baby Blues, Calvin and Hobbes, etc). For example, the scraper might request and parse https://www.gocomics.com/comics/a-to-z.

Then, for each comic series, it gets the most recent comic strip. Then it looks up the previous day's comic strip. When it finds a comic strip that it has seen before, it will continue to the next comic series, until it finishes the website.

Finally, it writes the lists of comic series with their list of strips to a temporary JSON file on the hard drive.

How the site generator works

The site generator reads the temporary JSON files made by the scrapers. Those files are read into one big list of comic series, each with their list of comic strips. The generator uses templates to generate an index.html file, and rss/{comic}.rss files.

When these updated/new files are committed and pushed to this repository, they get hosted on gh-pages, which is how you view the site today.

Run locally

  1. Fork and clone the repository
  2. Run these commands on your command line:
# in /comicsrss.com
npm install

cd _generator

# If you want to see all the options:
# node bin --help

# Re-generate the site with the cached scraped site data:
node bin --generate

# If you want to run the scrapers (takes a while) then run this:
# node bin --scrape --generate

# I have nginx serving up my whole code directory, so I can go to http://localhost:80/comicsrss.com/
# If you don't have anything similar set up, you can try:
cd ..
npx serve
# Then open http://localhost:3000 in your browser

Run your own auto-updating scraper and website using GitHub Actions

This fork uses GitHub Actions instead of the original CircleCI setup, making it easier to deploy your own instance:

  1. Fork this repository
  2. The GitHub Actions workflow is already configured and will run automatically
  3. Enable GitHub Pages in your repository settings:
    • Go to Settings → Pages
    • Set Source to "GitHub Actions"
  4. The workflow runs:
    • Every 6 hours automatically
    • On manual trigger via the Actions tab
    • Comprehensive job summaries show scraping statistics and any errors
  5. That's it! Your site will be automatically scraped and deployed to https://[your-username].github.io/comicsrss/

Scraper API

To create a scraper for a single-series website that shows multiple days' comic strips per web page, copy the code from dilbert.js and change it as needed. Note: Dilbert has moved to subscription platforms and no longer provides free comics.

To create a scraper for a multi-series website, copy the code from arcamax.js and change it as needed.

If you're not sure which to use, probaby start from arcamax.js, or feel free to open a GitHub issue to discuss it with me.

License

MIT

About

RSS feeds for comics

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 63.1%
  • JavaScript 32.3%
  • CSS 4.4%
  • Batchfile 0.2%