Skip to content

cannot run with relative path for METS #13

@bertsky

Description

@bertsky

In OCR-D, long ago we moved away from absolute filenames and file:// refs in FLocat.

When calling de.lmu.cis.ocrd.cli.PostCorrectionCommand with an absolute path to the METS, it runs through, but produces output FLocats with absolute paths, which is (now) incorrect.

But when calling with just mets.xml inside the workspace directory, the postprocessor crashes:

22:32:03.614 DEBUG cis.PostCorrectionCommand - loading page
java.lang.NullPointerException
	at de.lmu.cis.ocrd.pagexml.METS$File.openLocalPath(METS.java:175)
	at de.lmu.cis.ocrd.pagexml.METS$File.openInputStream(METS.java:161)
	at de.lmu.cis.ocrd.pagexml.METSFileGroupReader.getPages(METSFileGroupReader.java:41)
	at de.lmu.cis.ocrd.pagexml.METSFileGroupReader.eachWord(METSFileGroupReader.java:54)
	at de.lmu.cis.ocrd.pagexml.METSFileGroupReader.getBaseOCRTokenReader(METSFileGroupReader.java:77)
	at de.lmu.cis.ocrd.pagexml.Workspace.getBaseOCRTokenReader(Workspace.java:33)
	at de.lmu.cis.ocrd.cli.ParametersCommand.getProfile(ParametersCommand.java:92)
	at de.lmu.cis.ocrd.cli.ParametersCommand.getProfile(ParametersCommand.java:61)
	at de.lmu.cis.ocrd.cli.PostCorrectionCommand.predictRankings(PostCorrectionCommand.java:96)
	at de.lmu.cis.ocrd.cli.PostCorrectionCommand.postCorrect(PostCorrectionCommand.java:61)
	at de.lmu.cis.ocrd.cli.PostCorrectionCommand.execute(PostCorrectionCommand.java:37)
	at de.lmu.cis.ocrd.cli.Main.run(Main.java:33)
	at de.lmu.cis.ocrd.cli.Main.main(Main.java:9)

The reason is simply that when opening input files via METS.File.openLocalPath, the first reference

final Path relative = Paths.get(workspace.toString(), path.toString());
is null, because the file instance gets created in
files.add(new METS.File(path.getParent(), f));
which expands to null for the parent of the relative path mets.xml.

So IMO the best fix would be to replace

with the current working directory if workspace is indeed empty.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions