-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Problem
- How can I quickly go from experimentation (
.ipynb) to production (typically.py)?
Current Solution
The prevailing method of "productioniz'ing" notebooks is,
- Convert notebooks to python scripts
- Clean up the script, write some tests, get a code review done
- Setup cloud machine, install all the libraries (or dockerize the script)
- Run the script manually or setup a crontab
Is this really efficient?
Challenging the Status Quo
What if we run notebooks directly for our production workflows? Here are some benefits,
- Rich output for each execution (notebook itself!)
- Quickly go from experimentation to production. No time spent in extracting code from
.ipynb - Failed workflows are easy to debug (thanks to the rich notebook output)
Why do we really need to convert notebooks to python scripts? Here are a few common objections (I'd love to learn more in comments),
- Code Review - We can review notebooks directly with ReviewNB & nbdime (
.pyis not necessary). - Testing - We can directly write tests for notebook code with Treon and a few other tools (
.pyis not necessary here either). - Code reuse - This is a legit reason. You should definitely convert notebook code into libraries whenever possible. It makes reuse super easy and keeps the notebook readable. But we don't need to convert entire notebook into a script, do we? The final execution can easily be running a notebook that imports the libraries we created.
Proposed Solution
- You select a notebook from GitHub repo and set a schedule for it to run (once/daily/weekly etc.).
- You select the instance type (memory, vCPU) for execution.
- You can specify different parameters for each run via Papermill
- ReviewNB executes this notebook on your specified schedule & preserves the result of each run (as an executed notebook)
- ReviewNB supports notebook workflows (parallel executions for different parameters, result of one notebook feeds into the next etc.)
- For environment, we use stable versions of commonly used DS libraries. User can specify their own environment as well (via dockerfile)
Motivation
FAQ
-
Can we run notebooks on our own hardware?
Absolutely. You can self host ReviewNB & hook it up to your own AWS/GCP account to execute notebooks on your own machines. -
How will I specify sensitive data (e.g. DB credentials) required for execution?
ReviewNB provides a prompt to set any sensitive data as environment variables that are available to notebook at runtime.
Feel free to upvote/downvote the issue indicating whether you think this is useful feature or not. I also welcome additional questions/comments/discussion on the issue.