Processor.resolve_resource: support on-demand download of URL values by kba · Pull Request #799 · OCR-D/core

kba · 2022-02-14T09:27:50Z

With this in place, users can use URL directly for parameter values:

ocrd-tesserocr-recognize -P model https://github.com/tesseract-ocr/tessdata_best/raw/main/bos.traineddata

and it should download on demand the first time it encounters and registers the URL in the user resource_list.yml. Subsequent calls will use the cached download.

In practice though I cannot seem to find an example where this works:

ocrd_{tesserocr,cis-ocropy} have a different mechanism of model storage. It's still compatible with ocrd resmgr download in tesserocr's case but does not use the self.resolve_resource method this PR extends
ocrd_calamari requires a directory of files, or an archive which is too complex to do on demand in a generalized way IMHO
ocrd-page-transform is a bashlib processor and won't support this.

So if anybody has a good idea on how to test and/or generalize this to make it available to all the processors, pls let me know.

bertsky

LGTM. Perhaps the resource_type can also be guessed from the URL suffix (/ for directory, .zip etc for archive).

(As a test case, because we would not want to be dependent on an external module, you could define a subclass of DummyProcessor with a file parameter and some resolve_resource logic in the constructor, e.g. printing the file to stdout.)

bertsky · 2022-02-14T13:09:12Z

In practice though I cannot seem to find an example where this works:

ocrd_{tesserocr,cis-ocropy} have a different mechanism of model storage. It's still compatible with ocrd resmgr download in tesserocr's case but does not use the self.resolve_resource method this PR extends

Yes. For Ocropy recognition, your cisocrgroup/ocrd_cis#83 is long overdue.

And for Tesseract, I believe your OCR-D/ocrd_tesserocr#176 could be rewritten such that instead of overriding the constructor (for the list_resources and show_resource cases), one would directly override module_dir, so (assuming core will have a mechanism of ensuring that list_all_resources and resolve_resource abide by its Processor.ocrd_tool['resource_locations']) everything will automagically make only the files from the module directory survive.

ocrd_calamari requires a directory of files, or an archive which is too complex to do on demand in a generalized way IMHO

Yes, Github directory downloads would be hard to implement. But IMO we can assume that model deployment for Calamari and and eynollah and sbb_binarize will involve release archives in the future.

ocrd-page-transform is a bashlib processor and won't support this.

With the recent changes to bashlib, this should work out of the box, though. (What happens is that it delegates to ocrd.cli.ocrd_tool's list-resources and show-resource, which in turn add the ocrd-tool.json directory, but also do the usual lookup under the other resource locations. Since we do not have a ocrd__resolve_resource builtin in bashlib yet, I delegate to ocrd__list_resources. But to ensure maximum interoperability, I just commited bertsky/workflow-configuration@9f68fe8.)

A Pythonic test scenario would be running-downloading blla.mlmodel under OCR-D/ocrd_kraken#33, or one of the ocrd_detectron2 config/model combinations.

Processor.resolve_resource: support on-demand download of URL values

e9c6d07

bertsky approved these changes Feb 14, 2022

View reviewed changes

kba mentioned this pull request Dec 5, 2022

Got exception using ocrd_detectron 2 with ocrd_all Release v2022-12-01 bertsky/ocrd_detectron2#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processor.resolve_resource: support on-demand download of URL values#799

Processor.resolve_resource: support on-demand download of URL values#799
kba wants to merge 1 commit intomasterfrom
resource-on-demand

kba commented Feb 14, 2022

Uh oh!

bertsky left a comment

Uh oh!

bertsky commented Feb 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kba commented Feb 14, 2022

Uh oh!

bertsky left a comment

Choose a reason for hiding this comment

Uh oh!

bertsky commented Feb 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants