Skip to content

nordstrand/offly

Repository files navigation

Offly Build Status Coverage Status

A multi-purpose tool for capturing high fidelity static reproductions of highly dynamic web sites.

Based on filternet.js and PhantomJS.

Usage examples

Serializing a site to file

$ node index.js scrape --crawl_url=https://news.ycombinator.com/  --recursive hacker-news.json
Dumping http traffic to hacker-news.json.
Getting: https://news.ycombinator.com/
Getting: http://www.ycombinator.com/apply/
Getting: http://www.nytimes.com/interactive/2013/10/08/science/the-higgs-boson.html#/?g=true&higgs1_slide=0
^C
Closing hacker-news.json

Serving same site from file

$ node index.js serve hacker-news.json                                                                                                      
'What Is the Higgs? - Interactive Graphic - NYTimes.com' [168.5 kB] 
http://localhost:8128/interactive/2013/10/08/science/the-higgs-boson.html

'Hacker News' [23.6 kB] 
http://localhost:8128/

Serving 102 http transaction(s) in file hacker-news.json on port 8128

Architecture

For capturing a reproduction Offly spins up a proxy that intercepts and records all traffic going through it. Traffic can either be triggered manually through a web browser or programatically by Offly itself.

A reproduction can be served through a browser (intercepting proxy) or as a static web server.

There are extension points for applying custom code (transforms) on http traffic at record/serv-time.

                           +-------------------+                           
                           | Webdriver         |                           
                           | (wd.js)           |                           
                           +-------------------+                           
                                                                           
 +-----------------------+ +-------------------+                           
 | Web browser           | | PhantomJS         |                           
 | (manually operated)   | |                   |                           
 +-----------------------+ +-------------------+                           
                                                                           
 +---------------------------------------------+      +-----------------+  
 | HTTP proxy                                  |<---->| Persistence     |  
 | (filternet.js)                              |      |                 |  
 +---------------------------------------------+      +-----------------+  
++---------------------------------------------+------+--------------------
                                                                           
 +------------+                                                            
 | Web site   |                                                            
 |            |                                                            
 +------------+                                                            

About

Serializes web sites to a single JSON-file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •