- Build the pipline
- Push the results
- Pull and reproduce
dvc run -d src/download_data.py -o data/raw/store47-2016.csv python src/download_data.py
dvc run -d data/raw/store47-2016.csv -d src/splitter.py -o data/splitter/train.csv -o data/splitter/validation.csv python src/splitter.py
dvc run -d data/splitter/train.csv -d data/splitter/validation.csv -d src/decision_tree.py -o data/decision_tree/model.pkl -M results/score.txt python src/decision_tree.pyFirst, push code changes to Github as usual, for instance:
git commit -am "Change model to be more awesome"
git push origin masterNext, push your dvc files to the cloud:
dvc pushThat's it! Now anyone with access can fetch this repository and use dvc to replicate and build on your work.
First, clone/pull this git repo.
git pull origin master --rebaseNext, pull from the cloud with dvc:
dvc pullFinally, to reproduce the entire pipeline, simply run:
dvc repro model.pkl.dvcHere, model.pkl.dvc is the last output in the dvc pipeline. Running it will reproduce all steps.
If you want to change the model, for example, edit the decision_tree.py file as you see fit. Then, you should be able to re-execute the model simply by re-running the pipeline using dvc repro model.pkl.dvc.
Once the model has been trained
docker build . -t ci-workshop
docker run -d -p 5005:5005 ci-workshop
You can view the app at http://localhost:5005
Note: try to assign 8G memory and 2CPU in Docker when running the docker build
docker pull TBD
docker run -d -p 5005:5005 TBD