Skip to content

Commit 932a187

Browse files
committed
📝 Update dvc section
1 parent 1d36f3f commit 932a187

15 files changed

+888
-338
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ repos:
1818
types: [file]
1919
files: \.(yml|yaml|cff)$
2020
- id: check-added-large-files
21+
args: ['--maxkb=2048']
2122
- id: check-json
2223
types: [file] # override `types: [json]`
2324
files: \.(json|ipynb)$
23.6 KB
Loading
27.2 KB
Loading

docs/productive/dvc/dag.rst

Lines changed: 45 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,19 @@
22
..
33
.. SPDX-License-Identifier: BSD-3-Clause
44
5-
View pipelines
6-
==============
5+
Display pipelines
6+
=================
77

8-
Such data pipelines can be displayed or represented as a dependency graph with
9-
``dvc dag``:
8+
DVC represents a pipeline internally as directed acyclic graphs (DAGs).
109

11-
.. code-block:: console
10+
.. seealso::
11+
`DVC DAG <https://dvc.org/doc/user-guide/pipelines/running-pipelines#dag>`_
1212

13-
$ dvc dag
13+
You can use ``dvc dag`` to visualise or export pipelines:
14+
15+
.. code-block:: console
16+
17+
$ uv run dvc dag
1418
1519
+-------------------+
1620
| data/data.xml.dvc |
@@ -40,14 +44,8 @@ Such data pipelines can be displayed or represented as a dependency graph with
4044
| evaluate |
4145
+----------+
4246
43-
data/data.xml.dvc
44-
prepare.dvc
45-
featurize.dvc
46-
train.dvc
47-
evaluate.dvc
48-
49-
* With ``dvc dag --dot`` a ``.dot`` file for `Graphviz
50-
<https://www.graphviz.org>`_ is generated:
47+
* With ``dvc dag --dot``, a :file:`.dot` file for `Graphviz
48+
<https://www.graphviz.org>`_ can also be generated:
5149

5250
.. graphviz::
5351

@@ -63,3 +61,36 @@ Such data pipelines can be displayed or represented as a dependency graph with
6361
"featurize" -> "evaluate";
6462
"train" -> "evaluate";
6563
}
64+
65+
With ``dvc status``, you can see whether the levels or local and remote storage
66+
have been changed:
67+
68+
.. code-block:: console
69+
70+
$ uv run dvc status
71+
evaluate:
72+
changed deps:
73+
modified: src/dvc_example/evaluate.py
74+
changed outs:
75+
modified: eval
76+
77+
.. seealso::
78+
`dvc status <https://man.dvc.org/status>`_
79+
80+
In :doc:`CI jobs <../git/advanced/gitlab/ci-cd/index>`, it is usually necessary
81+
to check whether the pipeline is up to date without retrieving or executing
82+
anything. With ``dvc repro --dry``, you can find out which pipeline stages would
83+
need to be executed. However, if data is missing, the command will fail. If
84+
missing data should be ignored, you can use ``dvc repro --dry --allow-missing``.
85+
86+
.. code-block:: console
87+
88+
$ uv run dvc repro --allow-missing --dry
89+
'data/data.xml.dvc' didn't change, skipping
90+
Stage 'prepare' didn't change, skipping
91+
Stage 'featurize' didn't change, skipping
92+
Stage 'train' didn't change, skipping
93+
Stage 'evaluate' is cached - skipping run, checking out outputs
94+
Running stage 'evaluate':
95+
> uv run python src/dvc_example/evaluate.py model.pkl data/features
96+
Use `dvc push` to send your updates to remote storage.

docs/productive/dvc/data.rst

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
.. SPDX-FileCopyrightText: 2020 Veit Schiele
2+
..
3+
.. SPDX-License-Identifier: BSD-3-Clause
4+
5+
Manage data
6+
===========
7+
8+
Add data and directories
9+
------------------------
10+
11+
With DVC, you can store and version files, ML models, directories, and
12+
intermediate results with Git without having to check in the file contents to
13+
Git:
14+
15+
.. code-block:: console
16+
17+
$ uv run dvc get https://github.com/iterative/dataset-registry \
18+
get-started/data.xml -o data/data.xml
19+
$ uv run dvc add data/data.xml
20+
21+
This adds the file :file:`data/data.xml` to :file:`data/.gitignore` and writes
22+
the meta information to :file:`data/data.xml.dvc`.
23+
24+
.. seealso::
25+
`.dvc Files <https://dvc.org/doc/user-guide/project-structure/dvc-files>`_
26+
27+
To manage different versions of your project data with Git, simply add :file:`data/.gitignore` and :file:`data/data.xml.dvc`:
28+
29+
.. code-block:: console
30+
31+
$ git add data/.gitignore data/data.xml.dvc
32+
$ git commit -m ":monocle_face: Add data to dvc"
33+
34+
.. seealso::
35+
`External Dependencies and Outputs
36+
<https://dvc.org/doc/user-guide/pipelines/external-dependencies-and-outputs>`_
37+
38+
Saving and retrieving data
39+
--------------------------
40+
41+
The data can be copied from the working directory of your Git repository to the
42+
remote storage location with
43+
44+
.. code-block:: console
45+
46+
$ uv run dvc push
47+
48+
If you want to retrieve more recent data, you can do so with
49+
50+
.. code-block:: console
51+
52+
$ uv run dvc pull
53+
54+
Importing and updating data
55+
---------------------------
56+
57+
As an alternative to ``dvc get``, you can also import data and models from
58+
another project using ``dvc import``, for example:
59+
60+
.. code-block:: console
61+
62+
$ uv run dvc import https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
63+
Importing 'get-started/data.xml (https://github.com/iterative/dataset-registry)' -> 'data/data.xml'
64+
65+
This loads the file from the `dataset-registry
66+
<https://github.com/iterative/dataset-registry>`_ into our :file:`data`
67+
directory, adds it to :file:`.gitignore`, and creates :file:`data.xml.dvc`.
68+
69+
You can use ``dvc update`` to update these data sources before reproducing a
70+
pipeline that depends on them, for example:
71+
72+
.. code-block:: console
73+
74+
$ uv run dvc update data/data.xml.dvc
75+
'data/data.xml.dvc' didn't change, skipping
76+
77+
.. seealso::
78+
* `Discovering and accessing data
79+
<https://dvc.org/doc/user-guide/data-management/discovering-and-accessing-data>`_
80+
* `External Data
81+
<https://dvc.org/doc/user-guide/data-management/importing-external-data>`_
82+
83+
Deleting data
84+
-------------
85+
86+
If you want to remove files or directories from DVC management, you can do so
87+
with `dvc remove <https://dvc.org/doc/command-reference/remove>`_:
88+
89+
.. code-block::
90+
91+
$ uv run dvc remove data/data.xml.dvc
92+
93+
You can then use dvc ``gc -w`` to delete all files and their previous versions
94+
from the cache.
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
.. SPDX-FileCopyrightText: 2020 Veit Schiele
2+
..
3+
.. SPDX-License-Identifier: BSD-3-Clause
4+
5+
Experiments
6+
===========
7+
8+
If you now change the parameters in the :file:`params.yaml` file, you can
9+
compare your current working directory with the last commit (``HEAD``):
10+
11+
.. code-block:: console
12+
13+
$ uv run dvc params diff
14+
Path Param HEAD workspace
15+
params.yaml featurize.max_features 100 200
16+
params.yaml featurize.ngrams 1 2
17+
18+
.. code-block:: console
19+
20+
$ uv run dvc metrics diff
21+
Path Metric HEAD workspace Change
22+
eval/metrics.json avg_prec.test 0.9014 0.925 0.0236
23+
eval/metrics.json avg_prec.train 0.95704 0.97437 0.01733
24+
eval/metrics.json roc_auc.test 0.93196 0.94602 0.01406
25+
eval/metrics.json roc_auc.train 0.97743 0.98667 0.00924
26+
27+
.. code-block:: console
28+
29+
$ uv run dvc plots diff
30+
file:///Users/veit/dvc-example/dvc_plots/index.html
31+
32+
.. raw:: html
33+
:file: plots-diff.html
34+
35+
``dvc exp``
36+
-----------
37+
38+
With `dvc exp <https://dvc.org/doc/command-reference/exp>`_, you can also set
39+
the parameters in the command line, for example:
40+
41+
.. code-block:: console
42+
43+
$ uv run dvc exp run \
44+
--set-param 'featurize.max_features=200'
45+
46+
You can also change multiple parameters with a single call:
47+
48+
.. code-block:: console
49+
50+
$ uv run dvc exp run \
51+
-S 'featurize.max_features=200' \
52+
-S 'featurize.ngrams=2'
53+
54+
With ``--queue``, you can also specify multiple values for a parameter:
55+
56+
.. code-block:: console
57+
58+
$ uv run dvc exp run --queue \
59+
-S 'featurize.max_features=200,300,400' \
60+
-S 'featurize.ngrams=2,3,4'
61+
Queueing with overrides '{'params.yaml': ['featurize.max_features=200', 'featurize.ngrams=2']}'.
62+
Queued experiment 'sober-name' for future execution.
63+
Queueing with overrides '{'params.yaml': ['featurize.max_features=200', 'featurize.ngrams=3']}'.
64+
Queued experiment 'erect-loir' for future execution.
65+
Queueing with overrides '{'params.yaml': ['featurize.max_features=200', 'featurize.ngrams=4']}'.
66+
Queued experiment 'tonic-hood' for future execution.
67+
Queueing with overrides '{'params.yaml': ['featurize.max_features=300', 'featurize.ngrams=2']}'.
68+
...
69+
70+
To better identify the experiments, you can also specify ``--name``:
71+
72+
.. code-block:: console
73+
74+
$ uv run dvc exp run --name 'feature-matrix' --queue \
75+
-S 'featurize.max_features=200,300,400' \
76+
-S 'featurize.ngrams=2,3,4'
77+
Queueing with overrides '{'params.yaml': ['featurize.max_features=200', 'featurize.ngrams=2']}'.
78+
Queued experiment 'feature-matrix-1' for future execution.
79+
Queueing with overrides '{'params.yaml': ['featurize.max_features=200', 'featurize.ngrams=3']}'.
80+
Queued experiment 'feature-matrix-2' for future execution.
81+
...
82+
83+
Once you have placed some experiments in the queue, you can run them all with
84+
the following command:
85+
86+
.. code-block:: console
87+
88+
$ uv run dvc exp run --run-all
89+
90+
With the ``job`` flag of ``dvc queue start``, you can also use multiple workers
91+
for the experiments:
92+
93+
.. code-block:: console
94+
95+
$ uv run dvc queue start --job 8
96+
Started '8' new experiments task queue workers.
97+
98+
.. seealso::
99+
* `Get Started: Experimenting Using Pipelines
100+
<https://dvc.org/doc/start/experiments/experiment-pipelines>`_
101+
* `Running Experiments
102+
<https://dvc.org/doc/user-guide/experiment-management/running-experiments#the-experiments-queue>`_
103+
* `dvc queue <https://dvc.org/doc/command-reference/queue>`_

0 commit comments

Comments
 (0)