-
Notifications
You must be signed in to change notification settings - Fork 9
Description
There are two new amazing notebooks from Databricks which will fit in very well here. The first one is similar to our demo that already exists, and the second is a new notebook which can be used as bonus material.
This issue pertains to the second notebook (01-Deltalake...). We'll create a new Delta Lake Optimisations exercise (meant to be run after Delta Lake Walkthrough).
-
Download the two notebooks here: Archive.zip
-
Import the 01-Deltalake notebook to Databricks
-
Remove all Databricks-demo specific text that doesn't pertain to our content (e.g. "a cluster has been created for you...")
-
Add our per-user workspace selector and stream helpers: https://github.com/data-derp/small-exercises/blob/master/delta-lake-walkthrough/delta-lake-walkthrough.py#L31-L150
-
Add at the top of the notebook "This notebook is adapted from the Delta Lake Demo provided by Databricks".
-
Write to python source
-
Upload to a new dir in the small-exercises repo called "delta-lake-optimisations"
-
Create a new readme (similar to the delta lake walkthrough - you can get inspired from it, just make sure you change the urls)
-
Create a new section in data-derp called "Exercise: Delta Lake Optimisations (Bonus)" just after the Delta Lake Walkthrough. NOTE: don't put this in the existing delta lake exercise section, create a NEW section
-
Check that the notebook can be imported according to the README instructions
-
Run through both notebooks that there are no bugs and can be run on a fresh cluster.
-
Short video walkthrough explaining the Optimizations thing (modifications made at the metastore level)