From 6782c255a5d7e59a2d8a0334b1eafdf4dcc95f3b Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <unixprog@gmail.com>
Date: Mon, 18 Jan 2021 18:52:45 +0100
Subject: [PATCH 01/20] models: explain ocrd resmgr

---
 site/en/models.md | 247 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 174 insertions(+), 73 deletions(-)

diff --git a/site/en/models.md b/site/en/models.md
index 98e6f4078..372b425fd 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -12,116 +12,217 @@ its own internal format(s) for models. Some support central storage of models
 at a specific location (tesseract, ocropy, kraken) while others require the full
 path to a model (calamari).
 
+Since [v2.22.0](https://github.com/OCR-D/core/releases/v2.22.0), OCR-D/core
+comes with a framework for managing processor resources uniformly. This means
+that OCR-D/core will take care of lookin in well-defined places in the
+filesystem for resources for specific processors. It also knows how to cache
+file parameters passed as a URL. OCR-D/core also comes with a bundled database
+of known resources, such as OCR models, configurations and other
+processor-specific data. This means that OCR-D users should be able to
+concentrate on fine-tuning their OCR workflows and not bother with implementation
+details like "where do I get models from and where do I put them".
+
+All of the above mentioned functionality can be accessed using the `ocrd
+resmgr` command line tool.
+
+## What models are available?
+
+To get a list of the resources that the OCR-D/core [is aware
+of](https://github.com/OCR-D/core/blob/master/ocrd/ocrd/resource_list.yml):
+
+```
+ocrd resmgr list-available
+```
+
+The output will look similar to this:
+
+```
+
+ocrd-calamari-recognize
+- qurator-gt4hist-0.3 (https://qurator-data.de/calamari-models/GT4HistOCR/2019-07-22T15_49+0200/model.tar.xz)
+  Calamari model trained with GT4HistOCR
+- qurator-gt4hist-1.0 (https://qurator-data.de/calamari-models/GT4HistOCR/2019-12-11T11_10+0100/model.tar.xz)
+  Calamari model trained with GT4HistOCR
+
+ocrd-cis-ocropy-recognize
+- LatinHist.pyrnn.gz (https://github.com/chreul/OCR_Testdata_EarlyPrintedBooks/raw/master/LatinHist-98000.pyrnn.gz)
+  ocropy historical latin model by github.com/chreul
+```
+
+As you can see, resources are grouped by the processor they are used by.
+
+The word after the list symbol, e.g. `qurator-gt4hist-0.3`,
+`LatinHist.pyrnn.gz`, define the "name" of the resource, a shorthand you can
+use in parameters without having to specify the full URL (in brackets after the
+name).
+
+The second line of each entry contains a short description of the resource.
+
+## Installing known resources
+
+You can install resources with the `ocrd resmgr download` command. It expects
+the name of the processor as the first argument and either the name or URL of a
+resource as a second argument.
+
 Likewise, model distribution is not currently centralised within OCR-D though we
 are working towards a central model repository.
 
-In the meantime, this guide will show you, for each OCR engine:
+For example, to install the `LatinHist.pyrnn.gz` resource for `ocrd-cis-ocropy-recognize`:
 
-  * which types of models are supported
-  * where to store models locally
-  * which currently available models we recommend
-  * how to invoke the resp. OCR-D wrapper for the engine with a specific model
+```
+ocrd resmgr download ocrd-cis-ocropy-recognize LatinHist.pyrnn.gz
+# or
+ocrd resmgr download ocrd-cis-ocropy-recognize https://github.com/chreul/OCR_Testdata_EarlyPrintedBooks/raw/master/LatinHist-98000.pyrnn.gz
+```
 
-## Tesseract / ocrd_tesserocr
+This will look up the resource in the [bundled resource and user databases](#user-database), download,
+unarchive (where applicable) and store it in the [proper location](#where-is-the-data).
 
-Tesseract models are single files with a `.traineddata` extension.
 
-Tesseract expects models to be in a directory `tessdata` within what Tesseract
-calls `TESSDATA_PREFIX`. When installing Tesseract from Ubuntu packages, that
-location is `/usr/share/tesseract-ocr/4.00/tessdata`. When building from source
-using [ocrd_all](htttps://github.com/OCR-D/ocrd_all), the models are searched
-at `/path/to/ocrd_all/venv/share/tessdata`. If you want to override the
-locations, you can set the `TESSDATA_PREFIX` environment variable, e.g. if you
-want the models location to be `$HOME/tessdata`, you can by adding to your
-`$HOME/.bashrc`: `export TESSDATA_PREFIX=$HOME`.
-
-We recommend you download the following models, either by downloading and
-saving to the right location or by running `make install-models-tesseract` when
-using `ocrd_all`:
-
-  * [equ](https://github.com/tesseract-ocr/tessdata_fast/raw/master/equ.traineddata)
-  * [osd](https://github.com/tesseract-ocr/tessdata_fast/raw/master/osd.traineddata)
-  * [eng](https://github.com/tesseract-ocr/tessdata_fast/raw/master/eng.traineddata)
-  * [deu](https://github.com/tesseract-ocr/tessdata_fast/raw/master/deu.traineddata)
-  * [frk](https://github.com/tesseract-ocr/tessdata_fast/raw/master/frk.traineddata)
-  * [script/Latin](https://github.com/tesseract-ocr/tessdata_fast/raw/master/script/Latin.traineddata)
-  * [script/Fraktur](https://github.com/tesseract-ocr/tessdata_fast/raw/master/script/Fraktur.traineddata)
-  * [@stweil's GT4HistOCR model](https://ub-backup.bib.uni-mannheim.de/~stweil/ocrd-train/data/Fraktur_5000000/tessdata_fast/Fraktur_50000000.334_450937.traineddata)
-
-If you installed Tesseract with Ubuntu's `apt` package manager, you may want to install
-standard models like `deu` or `script/Fraktur` with `apt`:
+**NOTE:** The special name `*` can be used instead of a resource name/url to
+download *all* known resources for this processor. To download all tesseract models:
 
 ```sh
-sudo apt install tesseract-ocr-deu tesseract-ocr-script-frak
+ocrd resmgr download ocrd-tesserocr-recognize '*'
 ```
 
-**NOTE:** When installing with `apt`, he `script/*` models are installed
-without the `script/` prefix, so `script/Latin` becomes just `Latin`,
-`script/Fraktur` becomes `Fraktur` etc.
+(Note that `*` must be in quotes or escaped because of shell wildcard expansion)
 
-OCR-D's Tesseract wrapper,
-[ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr) and more
-specifically, the `ocrd-tesserocr-recognize` processor, expects the name of the
-model(s) to be provided as the `model` parameter. Multiple models can be
-combined by concatenating with `+` (which generally improves accuracy but always slows processing):
+## Installing unknown resources
+
+If you need to install a resource that OCR-D doesn't know of, than can be achieved with the `--any-url/-n` flag to `ocrd resmgr download`:
+
+To install a model for `ocrd-tesserocr-recognize` that is located at `https://my-server/mymodel.traineddata`.
+
+```
+ocrd resmgr download -n ocrd-tesserocr-recognize https://my-server/mymodel.traineddata
+```
+
+This will download and store the resource in the [proper location](#where-is-the-data) and create a stub entry in the
+[user database](#user-database).  You can then use it as the parameter value for the `model` parameter:
+
+```
+ocrd-tesserocr-recognize -P model mymodel
+```
+
+## List installed resources
+
+The `ocrd resmgr list-installed` command has the same output format as `ocrd resmgr list-available` but instead
+of the database, it scans the filesystem locations [where data is searched](#where-is-the-data) for existing
+resources and lists URL and description if a database entry exists.
+
+## User database
+
+Whenever the OCR-D/core resource manager encounters an unknown resource in the filesystem or when you install
+a resource with `ocrd resmgr download`, it will create a new stub entry in the user database, which is found at
+`$HOME/.config/ocrd/resources.yml` and created if it doesn't exist.
+
+This allows you to use the OCR-D/core resource manager mechanics, including
+lookup of known resources by name or URL, without relying (only) on the
+database maintained by the OCR-D/core developers.
+
+**NOTE:** If you produced or found resources that are interesting for the wider
+OCR(-D) community, please tell us in the [OCR-D gitter
+chat](https://gitter.im/OCR-D/Lobby) so we can add it to the database.
+
+## Where is the data
+
+The lookup algorithm is [defined in our specifications](https://ocr-d.de/en/spec/ocrd_tool#file-parameters)
+
+In order of preference, a resource `<name>` for a processor `ocrd-foo` is searched at:
+
+* `$VIRTUAL_ENV/share/ocrd-resources/ocrd-foo/<name>`
+* `$HOME/.config/ocrd-resources/ocrd-foo/<name>`
+* `$HOME/.local/share/ocrd-resources/ocrd-foo/<name>`
+* `$HOME/.cache/ocrd-resources/ocrd-foo/<name>`
+* `$PWD/ocrd-resources/ocrd-foo/<name>`
+
+We recommend using the `$VIRTUAL_ENV` location, which is also the default. But
+you can override the location to store data with the `--location` option, which can
+be `cwd`, `virtualenv`, `config`, `data` and `cache` resp.
 
 ```sh
-# Use the deu and frk models
-ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -p '{"model": "deu+frk"}'
-# Use the script/Fraktur model
-ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -p '{"model": "script/Fraktur"}'
+# will download to $PWD/ocrd-resources/ocrd-anybaseocr-dewarp/latest_net_G.pth
+ocrd resmgr download --location cwd ocrd-anybaseocr-dewarp latest_net_G.pth
+# will download to $HOME.cache/ocrd-resources/ocrd-anybaseocr-dewarp/latest_net_G.pth
+ocrd resmgr download --location cache ocrd-anybaseocr-dewarp latest_net_G.pth
 ```
 
-## Ocropy / ocrd_cis
+## Changing the default resource directory
 
-An Ocropy model is simply the neural network serialized with Python's pickle
-mechanism and is generally distributed in a gzipped form, with a `.pyrnn.gz`
-extension.
+The `$VIRTUAL_ENV` default location is reasonable because we heavily advertise
+using virtual environments and is compatible with
+[ocrd_all](https://github.com/OCR-D/ocrd_all).
+
+However, there are use cases where the `config`/`data/`/`cache` or even the
+`cwd` option should be the default (or only) location to store resources and
+resolve file parameters.
 
-Ocropy has a rather convoluted algorithm to look up models, so we recommend you
-explicitly set the `OCROPUS_DATA` variable to point to the directory with
-ocropy's models. E.g. if you intend to store your models in `$HOME/ocropus-models`, add the following
-to your `$HOME/.bashrc`: `export OCROPUS_DATA=$HOME/ocropus-models`.
+To change the default location, adapt the `$HOME/.config/ocrd/config.yml` file
+(it is created if it doesn't exist whenever you execute `ocrd resmgr`) which
+has a `resource_location` key that accepts the same range of values as the
+`ocrd resmgr --location` command line flag.
 
-We recommend you download the following models, either by downloading and
-saving to the right location or by running `make install-models-ocropus` when
-using `ocrd_all`:
 
-  * [en-default.pyrnn.gz](https://github.com/zuphilip/ocropy-models/raw/master/en-default.pyrnn.gz)
-  * [fraktur.pyrnn.gz](https://github.com/zuphilip/ocropy-models/raw/master/fraktur.pyrnn.gz)
-  * [@jze's fraktur.pyrnn.gz](https://github.com/jze/ocropus-model_fraktur/raw/master/fraktur.pyrnn.gz) (save as `fraktur-jze.pyrnn.gz`)
-  * [@chreul's  LatinHist.pyrnn.gz](https://github.com/chreul/OCR_Testdata_EarlyPrintedBooks/raw/master/LatinHist-98000.pyrnn.gz)
+## Notes on specific processors
 
+## Ocropy / ocrd_cis
+
+An Ocropy model is simply the neural network serialized with Python's pickle
+mechanism and is generally distributed in a gzipped form, with a `.pyrnn.gz`
+extension and can be used as such, no need to unarchive.
 
-To use a specific model with OCR-D's ocropus wrapper in [ocrd_cis](https://github.com/cisocrgroup/ocrd_cis) and more specifically, the `ocrd-cis-ocropy-recognize` processor, use the `model` parameter:
+To use a specific model with OCR-D's ocropus wrapper in
+[ocrd_cis](https://github.com/cisocrgroup/ocrd_cis) and more specifically, the
+`ocrd-cis-ocropy-recognize` processor, use the `model` parameter:
 
 ```sh
-ocrd-cis-ocropy-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-OCRO -p '{"model": "fraktur-jze.pyrnn.gz"}'
+# Model will be downloaded on-demand if it is not locally available yet
+ocrd-cis-ocropy-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-OCRO -P model fraktur-jze.pyrnn.gz
 ```
 
 ## Calamari / ocrd_calamari
 
 Calamari models are Tensorflow model directories. For distribution, this
 directory is usually packed to a tarball or ZIP file. Once downloaded, these
-containers must be unpacked to a directory again.
-
-As calamari does not have a model discovery setup, you must always provide the
-path with a wildcard listing all `*.ckpt.json` ("checkpoint") files.
-
-We recommend you download the following model, either by downloading and
-unpacking manually or by using `make install-models-calamari` if using
-`ocrd_all`:
-
-  * [@mike-gerber's GT4HistOCR model](https://qurator-data.de/calamari-models/GT4HistOCR/2019-12-11T11_10+0100/model.tar.xz)
+containers must be unpacked to a directory again. `ocrd resmgr` handles this
+for you, so you just need the name of the resource in the database.
 
 The Calamari-OCR project also maintains a [repository of models](https://github.com/Calamari-OCR/calamari_models).
 
 To use a specific model with OCR-D's calamari wrapper
 [ocrd_calamari](https://github.com/OCR-D/ocrd_calamari) and more specifically,
-the `ocrd-calamari-recognize` processor, use the `checkpoint` parameter:
+the `ocrd-calamari-recognize` processor, use the `checkpoint_dir` parameter:
+
+```sh
+# To use the "default" model, i.e. the one trained on GT4HistOCR by QURATOR
+ocrd-calamari-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-CALA
+# To use your own trained model
+ocrd-calamari-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-CALA -P checkpoint_dir /path/to/modeldir
+# or, to be able to control which checkpoints to use:
+ocrd-calamari-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-CALA -P checkpoint '/path/to/modeldir/*.ckpt.json'
+```
+
+## Tesseract / ocrd_tesserocr
+
+Tesseract models are single files with a `.traineddata` extension.
+
+Since tesseract only supports model lookup in a single directory, models should
+only be stored in a single location. If the default location (`virtualenv`) is
+not the place you want to use for tesseract models, consider [changing the default location
+in the OCR-D config file](#changing-the-default-resource-directory).
+
+OCR-D's Tesseract wrapper,
+[ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr) and more
+specifically, the `ocrd-tesserocr-recognize` processor, expects the name of the
+model(s) to be provided as the `model` parameter. Multiple models can be
+combined by concatenating with `+` (which generally improves accuracy but always slows processing):
 
 ```sh
-ocrd-calamari-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-CALA -p '{"checkpoint": "/path/to/model/*.ckpt.json"}'
+# Use the deu and frk models
+ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -P model 'deut+frk'
+# Use the Fraktur model
+ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -P Fraktur
 ```
 
 # Model training

From a3e64cdc8558275ab68df0ee75198e53b8b745a2 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <unixprog@gmail.com>
Date: Tue, 19 Jan 2021 19:10:56 +0100
Subject: [PATCH 02/20] models: note on tesseract model storage

---
 site/en/models.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/site/en/models.md b/site/en/models.md
index 372b425fd..c2c204e41 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -212,6 +212,12 @@ only be stored in a single location. If the default location (`virtualenv`) is
 not the place you want to use for tesseract models, consider [changing the default location
 in the OCR-D config file](#changing-the-default-resource-directory).
 
+**NOTE:** For reasons of effiency and to avoid duplicate models, all `ocrd-tesserocr-*` processors
+reuse the resource directory for `ocrd-tesserocr-recognize`.
+
+If the `TESSDATA_PREFIX` environemnt variable is set when any of the tesseract processors
+are called, it will be the location to look for resources instead of the default.
+
 OCR-D's Tesseract wrapper,
 [ocrd_tesserocr](https://github.com/OCR-D/ocrd_tesserocr) and more
 specifically, the `ocrd-tesserocr-recognize` processor, expects the name of the

From efcf5317e19c0a49592aa12c482695d731d7035f Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Wed, 20 Jan 2021 11:47:39 +0100
Subject: [PATCH 03/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/site/en/models.md b/site/en/models.md
index c2c204e41..c3266de9a 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -14,8 +14,8 @@ path to a model (calamari).
 
 Since [v2.22.0](https://github.com/OCR-D/core/releases/v2.22.0), OCR-D/core
 comes with a framework for managing processor resources uniformly. This means
-that OCR-D/core will take care of lookin in well-defined places in the
-filesystem for resources for specific processors. It also knows how to cache
+that processors can delegate to OCR-D/core to resolve specific file resources by name,
+looking in well-defined places in the filesystem. This also includes downloading and caching
 file parameters passed as a URL. OCR-D/core also comes with a bundled database
 of known resources, such as OCR models, configurations and other
 processor-specific data. This means that OCR-D users should be able to

From 1346b563c4221c0a49681dc826b9a42f45aa0912 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Wed, 20 Jan 2021 11:47:56 +0100
Subject: [PATCH 04/20] Update site/en/models.md

Co-authored-by: Elisabeth Engl <53007946+EEngl52@users.noreply.github.com>
---
 site/en/models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/en/models.md b/site/en/models.md
index c3266de9a..25b130396 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -16,7 +16,7 @@ Since [v2.22.0](https://github.com/OCR-D/core/releases/v2.22.0), OCR-D/core
 comes with a framework for managing processor resources uniformly. This means
 that processors can delegate to OCR-D/core to resolve specific file resources by name,
 looking in well-defined places in the filesystem. This also includes downloading and caching
-file parameters passed as a URL. OCR-D/core also comes with a bundled database
+file parameters passed as a URL. Furthermore, OCR-D/core comes with a bundled database
 of known resources, such as OCR models, configurations and other
 processor-specific data. This means that OCR-D users should be able to
 concentrate on fine-tuning their OCR workflows and not bother with implementation

From 9de53dbf16a2133081f47d13d6a51ccaf24b86c2 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Wed, 20 Jan 2021 11:48:12 +0100
Subject: [PATCH 05/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/site/en/models.md b/site/en/models.md
index 25b130396..1d5c34516 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -17,8 +17,8 @@ comes with a framework for managing processor resources uniformly. This means
 that processors can delegate to OCR-D/core to resolve specific file resources by name,
 looking in well-defined places in the filesystem. This also includes downloading and caching
 file parameters passed as a URL. Furthermore, OCR-D/core comes with a bundled database
-of known resources, such as OCR models, configurations and other
-processor-specific data. This means that OCR-D users should be able to
+of known resources, such as models, dictionaries, configurations and other
+processor-specific data files. This means that OCR-D users should be able to
 concentrate on fine-tuning their OCR workflows and not bother with implementation
 details like "where do I get models from and where do I put them".
 

From 9ceed646193781cf1ea86e6ee857c9cd1721ea82 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Wed, 20 Jan 2021 11:48:25 +0100
Subject: [PATCH 06/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/site/en/models.md b/site/en/models.md
index 1d5c34516..b73340db0 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -21,6 +21,7 @@ of known resources, such as models, dictionaries, configurations and other
 processor-specific data files. This means that OCR-D users should be able to
 concentrate on fine-tuning their OCR workflows and not bother with implementation
 details like "where do I get models from and where do I put them".
+In particular, users can reference file parameters by name now.
 
 All of the above mentioned functionality can be accessed using the `ocrd
 resmgr` command line tool.

From c449662bab243e91dce16479fd04eaa2a8ab2ac8 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Wed, 20 Jan 2021 11:48:42 +0100
Subject: [PATCH 07/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/en/models.md b/site/en/models.md
index b73340db0..aa7f2fdc9 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -50,7 +50,7 @@ ocrd-cis-ocropy-recognize
   ocropy historical latin model by github.com/chreul
 ```
 
-As you can see, resources are grouped by the processor they are used by.
+As you can see, resources are grouped by the processors which make use of them.
 
 The word after the list symbol, e.g. `qurator-gt4hist-0.3`,
 `LatinHist.pyrnn.gz`, define the "name" of the resource, a shorthand you can

From 5b338537d84e6fdf5594d2d810cebd3e33282573 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Wed, 20 Jan 2021 11:48:57 +0100
Subject: [PATCH 08/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/en/models.md b/site/en/models.md
index aa7f2fdc9..fe53b7fd7 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -53,7 +53,7 @@ ocrd-cis-ocropy-recognize
 As you can see, resources are grouped by the processors which make use of them.
 
 The word after the list symbol, e.g. `qurator-gt4hist-0.3`,
-`LatinHist.pyrnn.gz`, define the "name" of the resource, a shorthand you can
+`LatinHist.pyrnn.gz`, defines the _name_ of the resource, which is a shorthand you can
 use in parameters without having to specify the full URL (in brackets after the
 name).
 

From 201feb5a3626ec6b0a50f405b51e373383588cb3 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Wed, 20 Jan 2021 11:49:14 +0100
Subject: [PATCH 09/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/en/models.md b/site/en/models.md
index fe53b7fd7..d44443cff 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -65,7 +65,7 @@ You can install resources with the `ocrd resmgr download` command. It expects
 the name of the processor as the first argument and either the name or URL of a
 resource as a second argument.
 
-Likewise, model distribution is not currently centralised within OCR-D though we
+Although model distribution is not currently centralised within OCR-D, we
 are working towards a central model repository.
 
 For example, to install the `LatinHist.pyrnn.gz` resource for `ocrd-cis-ocropy-recognize`:

From 56091bc1940b686dc0b865f58a0b8c699897b317 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Wed, 20 Jan 2021 11:49:35 +0100
Subject: [PATCH 10/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/en/models.md b/site/en/models.md
index d44443cff..e3c78433d 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -87,7 +87,7 @@ download *all* known resources for this processor. To download all tesseract mod
 ocrd resmgr download ocrd-tesserocr-recognize '*'
 ```
 
-(Note that `*` must be in quotes or escaped because of shell wildcard expansion)
+(Note that `*` must be in quotes or escaped to avoid wildcard expansion in the shell.)
 
 ## Installing unknown resources
 

From 99c6453acc66c7cf70c591050e9c28ffcbeffaa3 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Wed, 20 Jan 2021 11:49:57 +0100
Subject: [PATCH 11/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/en/models.md b/site/en/models.md
index e3c78433d..f7b2e00c8 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -91,7 +91,7 @@ ocrd resmgr download ocrd-tesserocr-recognize '*'
 
 ## Installing unknown resources
 
-If you need to install a resource that OCR-D doesn't know of, than can be achieved with the `--any-url/-n` flag to `ocrd resmgr download`:
+If you need to install a resource which OCR-D doesn't know of, that can be achieved by passings its URL in combination with the `--any-url/-n` flag to `ocrd resmgr download`:
 
 To install a model for `ocrd-tesserocr-recognize` that is located at `https://my-server/mymodel.traineddata`.
 

From b958afe871ae13306226c9b04d0321a04fe0fb58 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Wed, 20 Jan 2021 11:50:13 +0100
Subject: [PATCH 12/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/en/models.md b/site/en/models.md
index f7b2e00c8..1502e02c6 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -108,7 +108,7 @@ ocrd-tesserocr-recognize -P model mymodel
 
 ## List installed resources
 
-The `ocrd resmgr list-installed` command has the same output format as `ocrd resmgr list-available` but instead
+The `ocrd resmgr list-installed` command has the same output format as `ocrd resmgr list-available`. But instead
 of the database, it scans the filesystem locations [where data is searched](#where-is-the-data) for existing
 resources and lists URL and description if a database entry exists.
 

From d886e682bff8d871f770a427f300898193154f87 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <unixprog@gmail.com>
Date: Thu, 21 Jan 2021 16:21:28 +0100
Subject: [PATCH 13/20] models: document mounting models in docker

---
 site/en/models.md | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/site/en/models.md b/site/en/models.md
index c2c204e41..203563f6d 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -231,6 +231,31 @@ ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -P model 'deut+frk'
 ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -P Fraktur
 ```
 
+# Models and docker
+
+We recommend a two-step process to make models available in Docker. First
+download all the models that you want to use on the host system. When running
+the docker container, mount that local directory into the container alongside
+the data you want to process.
+
+Download the models to `$HOME/.local/share/ocrd-resources`:
+
+```sh
+ocrd resmgr download --location data ocrd-tesserocr-recognize eng.traineddata
+ocrd resmgr download --location data ocrd-calamari-recognize default
+# ...
+```
+
+Run the `ocrd_all` Docker container:
+
+```sh
+docker run --user $(id -u) --workdir /data \
+  --volume $PWD:/data \
+  --volume $HOME/.local/cache/ocrd-resources:/ocrd-resources \
+  ocrd_all ocrd-tesserocr-recognize -I IN -O OUT -P model eng
+```
+
+
 # Model training
 
 With the pretrained models mentioned above, good results can be obtained for many originals. Nevertheless, the

From 6903a84d8abc5849913b2747c6f0631f13f75c76 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Fri, 22 Jan 2021 17:27:01 +0100
Subject: [PATCH 14/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/site/en/models.md b/site/en/models.md
index c121f6970..3c7385892 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -132,15 +132,15 @@ The lookup algorithm is [defined in our specifications](https://ocr-d.de/en/spec
 
 In order of preference, a resource `<name>` for a processor `ocrd-foo` is searched at:
 
-* `$VIRTUAL_ENV/share/ocrd-resources/ocrd-foo/<name>`
-* `$HOME/.config/ocrd-resources/ocrd-foo/<name>`
-* `$HOME/.local/share/ocrd-resources/ocrd-foo/<name>`
-* `$HOME/.cache/ocrd-resources/ocrd-foo/<name>`
 * `$PWD/ocrd-resources/ocrd-foo/<name>`
+* `$XDG_DATA_HOME/ocrd-resources/ocrd-foo/<name>`
+* `/usr/local/share/ocrd-resources/ocrd-foo/<name>`
 
-We recommend using the `$VIRTUAL_ENV` location, which is also the default. But
+(where `XDG_DATA_HOME` defaults to `$HOME/.local/share` if unset).
+
+We recommend using the `$XDG_DATA_HOME` location, which is also the default. But
 you can override the location to store data with the `--location` option, which can
-be `cwd`, `virtualenv`, `config`, `data` and `cache` resp.
+be `cwd`, `data` and `system` resp.
 
 ```sh
 # will download to $PWD/ocrd-resources/ocrd-anybaseocr-dewarp/latest_net_G.pth

From 90fbb9f35b5c85b4f1927223a64a6a32cb405293 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Fri, 22 Jan 2021 17:27:16 +0100
Subject: [PATCH 15/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/site/en/models.md b/site/en/models.md
index 3c7385892..66f1cd909 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -159,10 +159,6 @@ However, there are use cases where the `config`/`data/`/`cache` or even the
 `cwd` option should be the default (or only) location to store resources and
 resolve file parameters.
 
-To change the default location, adapt the `$HOME/.config/ocrd/config.yml` file
-(it is created if it doesn't exist whenever you execute `ocrd resmgr`) which
-has a `resource_location` key that accepts the same range of values as the
-`ocrd resmgr --location` command line flag.
 
 
 ## Notes on specific processors

From 8d4b7deded56f9dedda248a96bf0c7f95d1b3558 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Fri, 22 Jan 2021 17:27:42 +0100
Subject: [PATCH 16/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/site/en/models.md b/site/en/models.md
index 66f1cd909..355dda8cf 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -145,8 +145,8 @@ be `cwd`, `data` and `system` resp.
 ```sh
 # will download to $PWD/ocrd-resources/ocrd-anybaseocr-dewarp/latest_net_G.pth
 ocrd resmgr download --location cwd ocrd-anybaseocr-dewarp latest_net_G.pth
-# will download to $HOME.cache/ocrd-resources/ocrd-anybaseocr-dewarp/latest_net_G.pth
-ocrd resmgr download --location cache ocrd-anybaseocr-dewarp latest_net_G.pth
+# will download to /usr/local/share/ocrd-resources/ocrd-anybaseocr-dewarp/latest_net_G.pth
+ocrd resmgr download --location system ocrd-anybaseocr-dewarp latest_net_G.pth
 ```
 
 ## Changing the default resource directory

From 65d94da1027a829cc43302d3cc9ce81fea9ee415 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Fri, 22 Jan 2021 17:28:12 +0100
Subject: [PATCH 17/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/site/en/models.md b/site/en/models.md
index 355dda8cf..147257e2f 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -151,13 +151,13 @@ ocrd resmgr download --location system ocrd-anybaseocr-dewarp latest_net_G.pth
 
 ## Changing the default resource directory
 
-The `$VIRTUAL_ENV` default location is reasonable because we heavily advertise
-using virtual environments and is compatible with
-[ocrd_all](https://github.com/OCR-D/ocrd_all).
+The `$XDG_DATA_HOME` default location is reasonable because
+models are usually large files which should persist across different deployments,
+both native and containerized, both single-module and [ocrd_all](https://github.com/OCR-D/ocrd_all).
+Moreover, that variable can easily be overridden during installation.
 
-However, there are use cases where the `config`/`data/`/`cache` or even the
-`cwd` option should be the default (or only) location to store resources and
-resolve file parameters.
+However, there are use cases where `system` or even `cwd` should be
+used as location to store resources, hence the `--location` option.
 
 
 

From f20205a0f0717d510037c8f660f3d581febe1423 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Fri, 22 Jan 2021 17:32:59 +0100
Subject: [PATCH 18/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/en/models.md b/site/en/models.md
index 147257e2f..7ce4fafb2 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -228,7 +228,7 @@ ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -P model 'deut+frk'
 ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -P Fraktur
 ```
 
-# Models and docker
+# Models and Docker
 
 We recommend a two-step process to make models available in Docker. First
 download all the models that you want to use on the host system. When running

From f76c0a7b0c64fd1ddf69ed2c9f623708014712c4 Mon Sep 17 00:00:00 2001
From: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
Date: Mon, 25 Jan 2021 17:58:02 +0100
Subject: [PATCH 19/20] rewrite docker model mounting section

---
 site/en/models.md | 46 ++++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/site/en/models.md b/site/en/models.md
index 7ce4fafb2..3df1ab0ae 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -230,28 +230,30 @@ ocrd-tesserocr-recognize -I OCR-D-SEG-LINE -O OCR-D-OCR-TESS -P Fraktur
 
 # Models and Docker
 
-We recommend a two-step process to make models available in Docker. First
-download all the models that you want to use on the host system. When running
-the docker container, mount that local directory into the container alongside
-the data you want to process.
-
-Download the models to `$HOME/.local/share/ocrd-resources`:
-
-```sh
-ocrd resmgr download --location data ocrd-tesserocr-recognize eng.traineddata
-ocrd resmgr download --location data ocrd-calamari-recognize default
-# ...
-```
-
-Run the `ocrd_all` Docker container:
-
-```sh
-docker run --user $(id -u) --workdir /data \
-  --volume $PWD:/data \
-  --volume $HOME/.local/cache/ocrd-resources:/ocrd-resources \
-  ocrd_all ocrd-tesserocr-recognize -I IN -O OUT -P model eng
-```
-
+We recommend keeping all downloaded resources in a persistent host directory,
+separate of the `ocrd/*` Docker container and data directory, and mounting that
+resource directory into a specific path in the container alongside the data directory.
+The host resource directory can be empty initially. Each time you run the Docker container,
+your processors will access the host directory to resolve resources, and you can download
+additional models into that location using `ocrd resmgr`.
+
+The following will assume (without loss of generality) that your host-side data
+path is under `./data`, and the host-side resource path is under `./models`:
+
+- To download models to `./models` in the host FS and `/usr/local/share/ocrd-resources` in Docker:
+        docker run --user $(id -u) \
+          --volume $PWD/models:/usr/local/share/ocrd-resources \
+        ocrd/all \
+        ocrd resmgr download ocrd-tesserocr-recognize eng.traineddata\; \
+        ocrd resmgr download ocrd-calamari-recognize default\; \
+        ...
+- To run processors, as usual do:
+        docker run --user $(id -u) --workdir /data \
+          --volume $PWD/data:/data \
+          --volume $PWD/models:/usr/local/share/ocrd-resources \
+          ocrd/all ocrd-tesserocr-recognize -I IN -O OUT -P model eng
+
+This principle applies to all `ocrd/*` Docker images, e.g. you can replace `ocrd/all` above with `ocrd/tesserocr` as well.
 
 # Model training
 

From f1ba884896192d5d7d55fa76befc8fd0e47263f6 Mon Sep 17 00:00:00 2001
From: Konstantin Baierer <kba@users.noreply.github.com>
Date: Tue, 26 Jan 2021 12:38:39 +0100
Subject: [PATCH 20/20] Update site/en/models.md

Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
---
 site/en/models.md | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/site/en/models.md b/site/en/models.md
index 3df1ab0ae..de43c542b 100644
--- a/site/en/models.md
+++ b/site/en/models.md
@@ -87,7 +87,12 @@ download *all* known resources for this processor. To download all tesseract mod
 ocrd resmgr download ocrd-tesserocr-recognize '*'
 ```
 
-(Note that `*` must be in quotes or escaped to avoid wildcard expansion in the shell.)
+**NOTE:** Equally, the special processor `*` can be used instead of a processor and a resource
+to download *all* known resources for *all* installed processors:
+
+    ocrd resmgr download '*'
+
+(In either case, `*` must be in quotes or escaped to avoid wildcard expansion by the shell.)
 
 ## Installing unknown resources