Scheduling Notebooks

## Problem

- How can I quickly go from experimentation (`.ipynb`) to production (typically `.py`)?

## Current Solution
The prevailing method of "productioniz'ing" notebooks is,
1. Convert notebooks to python scripts
1. Clean up the script, write some tests, get a code review done
1. Setup cloud machine, install all the libraries (or dockerize the script)
1. Run the script manually or setup a crontab

Is this really efficient?

## Challenging the Status Quo

What if we **run notebooks directly for our production workflows**? Here are some benefits,

- Rich output for each execution (notebook itself!)
- Quickly go from experimentation to production. No time spent in extracting code from `.ipynb`
- Failed workflows are easy to debug (thanks to the rich notebook output)


Why do we *really* need to convert notebooks to python scripts? Here are a few common objections (I'd love to learn more in comments),

- **Code Review** - We can review notebooks directly with [ReviewNB](https://www.reviewnb.com/) & [nbdime](https://github.com/jupyter/nbdime) (`.py` is not necessary).
- **Testing** - We can directly write tests for notebook code with [Treon](https://github.com/ReviewNB/treon) and a few other tools (`.py` is not necessary here either).
- **Code reuse** - This is a legit reason. You should definitely convert notebook code into libraries whenever possible. It makes reuse super easy and keeps the notebook readable. But we don't need to convert **entire** notebook into a script, do we? The final execution can easily be running a notebook that imports the libraries we created.

## Proposed Solution

- You select a notebook from GitHub repo and set a schedule for it to run (once/daily/weekly etc.). 
- You select the instance type (memory, vCPU) for execution. 
- You can specify different parameters for each run via [Papermill](https://github.com/nteract/papermill) 
- ReviewNB executes this notebook on your specified schedule & preserves the result of each run (as an executed notebook)
- ReviewNB supports notebook workflows (parallel executions for different parameters, result of one notebook feeds into the next etc.) 
- For environment, we use stable versions of commonly used DS libraries. User can specify their own environment as well (via dockerfile)

## Motivation

- [Scheduling Notebooks at Netflix](https://medium.com/netflix-techblog/scheduling-notebooks-348e6c14cfd6)

## FAQ

- **Can we run notebooks on our own hardware?**
Absolutely. You can self host ReviewNB & hook it up to your own AWS/GCP account to execute notebooks on your own machines.

- **How will I specify sensitive data (e.g. DB credentials) required for execution?** 
ReviewNB provides a prompt to set any sensitive data as environment variables that are available to notebook at runtime.
  

***
_Feel free to upvote/downvote the issue indicating whether you think this is useful feature or not. I also welcome additional questions/comments/discussion on the issue._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scheduling Notebooks #46

Problem

Current Solution

Challenging the Status Quo

Proposed Solution

Motivation

FAQ

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scheduling Notebooks #46

Description

Problem

Current Solution

Challenging the Status Quo

Proposed Solution

Motivation

FAQ

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions