LessWrong Portable

Download the current versions of:

Title	Author	EPUB	MOBI
The Codex	Scott Alexander	📖	📖
Rationality Abridged	Quaerendo	📖	📖
The Abridged Guide to Intelligent Characters	Eliezer Yudkowsky	📖	📖
Legal Systems Very Different From Ours	David Friedman	📖	📖
Replacing Guilt	Nate Soares	📖	📖

About this

This is started as the latest in a long history of independent, disorganized projects to scrape collections of posts from LessWrong into ebooks. A few selected examples of others:

...Not to mention the official version of the Sequences.

So, why on earth did I start another? LessWrong 2.0. If LessWrong 2.0 is voted to replace LessWrong Classic (see point 4), All the existing aggregators will break. This isn't a big deal, since they really only need to run once (correctly) in order to create the ebook, but anyone who wants to modify them and scrape new ebooks won't be able to use them.

As separate rationale, Scott Alexander's Codex is open for reading now that the site is in open beta. Not that all this content wasn't available elsewhere before, but this is the most intentional linearly-organized collection of his best writings I've seen. I want to read it, and as I read most things, I want to do it on my ebook reader.

However, I realized that (with a tiny bit of refactoring) this is flexible enough to work on content outside of LW2.

Where are the Ebook files?

In the output directory.

I want to make my own version! What should I do?

First, Clone this Repository

git clone https://github.com/LessWrong2/LessWrong-Portable
cd LessWrong-Portable/

Now set up your environment:

npm install

Finally, run build.js, along with the name of the book you want to build. Currently, the options include:

default - A dummy package that demonstrates the JSON schema by creating an ebook containing only this post.
codex - The Codex of Scott Alexander
rationalityabridged - Rationality Abridged by Quaerendo
inadequate - Inadequate Equilibria by Eliezer Yudkowsky
meditation - LessWrong on Meditation by LessWrong Authors
intelligent - The Abridged Guide to Intelligent Characters by Eliezer Yudkowsky
replacingguilt - The Replacing Guilt Series by Nate Soares
parenting - Jeff Kaufman on Parenting

There are also meta files for some other content from outside the rationalist community:

hedonistic - The Hedonistic Imperative by David Pearce
wbwelonmusk - Wait but Why on Elon Musk by Tim Urban
scip - The Structure and Interpretation of Computer Programs

So, for example:

nodejs build.js codex

That will download all of the content of the Codex into the cache/ directory, and then assemble them all into an EPUB file (outputs/TheCodex.epub). LW2 pageloads are pretty slow, but otherwise the script runs pretty fast :)

I'm sure I'm forgetting stuff. Let me know.

I want to make a custom book/sequence! How do I do that?

First follow the directions to build your own version. Once you get to the build step (i.e. nodejs build.js <whatever>), instead of building one of the available options, copy the default build meta file to a version named for your own sequence/book.

For example, I wanted to create a book using some LessWrong posts on meditation. Here's what I did:

cp meta/default.json meta/meditation.json

Next, edit meta/meditation.json. Changing this is mostly optional, except for the URLs. That's really, really important. I used these posts as a starting point:

The contents of the urls array in my meta config file isn't the full url, but the path following "https://www.lesserwrong.com". So, my meta config file should look like this:

{
	"img": "images/lw.png",
	"shorttitle": "LessWrongOnMeditation",
	"metadata": {...},
	"titleSelector": "div.posts-page-content-header-title",
	"contentSelector": "div.posts-page-content-body-html",
	"urls": [
		"/posts/QqSNFcGSZdnARx56E/meditation-insight-and-rationality-part-1-of-3",
		"/posts/QjoTFHzvrxQg9A6j3/meditation-insight-and-rationality-part-2-of-3"
	]
}

If you want to make a book from content outside of LW2, you're going to need to change a few more things. The fields in the metadata object should be more-or-less self-explanatory. metadata.source is used as the base URL for the contents of the urls array, so make sure that putting the two together generates a valid and correct URL. The titleSelector and contentSelector fields, probably less so. If you're not familiar with CSS selectors, this is going to take a little bit of training. Feel free to email me for help.

Now you can build your new book.

nodejs build.js meditation

That should generate a new file entitled output/LessWrongOnMeditation.epub. Enjoy!

Best Practice: Commit your new meta config file to your repository and push it upstream. I'm very interested in aggregating other materials, so if you can manage it, submit a pull request!

How did you make the MOBI version?

It turns out that programatically generating Kindle Formats (e.g. AZW, MOBI) is weirdly difficult. Use Calibre or this Weird Script from Amazon.

How do you make a PDF/Text/Markdown/[Whatever] Version?

I haven't gotten there yet. Feel free to fork this repo and figure it out yourself.

How do you make a Word Version?

Go away.

Why does the script call `wget`, instead of using an http library?

I went through four different libraries to try to make synchronous http requests, and they all did this super annoying thing where they would return a page that hadn't rendered the text content yet. Weirdly, when I made (what I thought was) the same request in curl, it gave me the content I needed. So, instead of figuring out the right way to do it, I just did the thing that worked. I switched to wget when I needed to run a build on a Windows machine and wget was easier to get running. This confers the added bonus in that Ubuntu has wget out of the box but curl must be installed.

Why synchronous requests?

Because it doesn't need to be done fast, but it does need to be done in a precise sequence. Writing an asynchronous version might save a few seconds at runtime, but would take me at least another hour or two to code up. I strongly doubt the number of times this script will ever be run will add up to the development time cost.

I ran the build myself, but it failed! What gives?

If the server barfs for some reason, the script will continue. After all, why waste bandwidth and effort? Re-run it and it will only try to download the files it didn't get the first time. There may be a couple that aren't downloading for structural, rather than essentially random reasons. To fill these in for the canonical ebooks, I just manually saved copies of those pages in the cache/ directory.

If it's having trouble with a custom book you've cooked up, make sure that your the CSS selector for the title is exactly correct. It should be precise enough to identify the title and only the title. If the selector comes up empty, LWP assumes it failed and won't generate a book in the end, though it will continue caching content until it reaches the end of the available URLs.

What's the deal with Legal Systems Very Different From Ours?

Legal Systems Very Different From Ours is an unpublished manuscript by David Friedman. It's available in two formats: Word and HTML. I decided to try to build an ebook based on the Word documents. Partly, I wanted to extend this script to crunch arbitrary Word Documents, but mostly I wanted the latest edition of the manuscript. As best as I can tell, the HTML version was last touched in late 2011, whereas the Word versions date to 2015ish.

Anyway, that turned out to be a critical error. Munging Word documents is difficult because, frankly, Word encourages the user to adopt undesirable and inconsistent typesetting practices, making it arbitrarily difficult to scrape. I finally managed to scrape Legal Systems, but with a significant amount of hacking and non-generalizable code. Accordingly, I'm keeping the meta file and output, but ditching the code I had to use to build it. If you want to try to build it yourself, please be my guest, but I'm not helping ;)

That said, if you figure out a not-crazy way that would generalize to other Word-based books, please (please please) submit a pull request!

Why are there some meta files for books that aren't in the output folder?

This is left to the reader as an exercise in reading between the lines.

What's the roadmap?

Whatever's in the issues queue.
Maybe organize some new sequences in a way that I find useful and add them.

If you want anything else, let me know and I'll tackle it when I've got some spare time. HAHAHAHA.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
cache		cache
images		images
meta		meta
output		output
style		style
.gitignore		.gitignore
README.md		README.md
build.js		build.js
package-lock.json		package-lock.json
package.json		package.json
template.xhtml		template.xhtml
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LessWrong Portable

About this

Where are the Ebook files?

I want to make my own version! What should I do?

I want to make a custom book/sequence! How do I do that?

How did you make the MOBI version?

How do you make a PDF/Text/Markdown/[Whatever] Version?

How do you make a Word Version?

Why does the script call `wget`, instead of using an http library?

Why synchronous requests?

I ran the build myself, but it failed! What gives?

What's the deal with Legal Systems Very Different From Ours?

Why are there some meta files for books that aren't in the output folder?

What's the roadmap?

About

Uh oh!

Releases

Packages

Languages

LessWrong2/LessWrong-Portable

Folders and files

Latest commit

History

Repository files navigation

LessWrong Portable

About this

Where are the Ebook files?

I want to make my own version! What should I do?

I want to make a custom book/sequence! How do I do that?

How did you make the MOBI version?

How do you make a PDF/Text/Markdown/[Whatever] Version?

How do you make a Word Version?

Why does the script call wget, instead of using an http library?

Why synchronous requests?

I ran the build myself, but it failed! What gives?

What's the deal with Legal Systems Very Different From Ours?

Why are there some meta files for books that aren't in the output folder?

What's the roadmap?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Why does the script call `wget`, instead of using an http library?

Packages