Skip to content

ocrd-segment-repair: handle case where points is empty #60

@stefanCCS

Description

@stefanCCS

Version 0.1.20, ocrd/core 2.33.0

I have a PAGE file, which does not have any real content - like this:

    <pc:Page imageFilename="OCR-D-IMG/0038_IMAGE000918_00001.tif" imageWidth="1420" imageHeight="2313" orientation="0.">
        <pc:AlternativeImage filename="OCR-D-BIN/OCR-D-BIN_0038_IMAGE000918_00001.IMG-BIN.png" comments=",binarized"/>
        <pc:TextRegion id="TR-1" orientation="0.">
            <pc:Coords points=""/>
        </pc:TextRegion>
    </pc:Page>

If I call ocrd-segment-extract-lines, I get an expection like this:

09:19:19.733 DEBUG ocrd.workspace.image_from_page - page 'P_0038_IMAGE000918_00001' has  orientation=0 skew=0.00
09:19:19.733 DEBUG ocrd.workspace.image_from_page - Using AlternativeImage 1 {'', 'binarized'} for page 'P_0038_IMAGE000918_00001'
09:19:19.734 DEBUG ocrd.workspace.download_file - download_file <OcrdFile fileGrp=OCR-D-BIN ID=OCR-D-BIN_0038_IMAGE000918_00001.IMG-BIN, mimetype=image/png, url=OCR-D-BIN/OCR-D-BIN_0038_IMAGE000918_00001.IMG-BIN.png, local_filename=OCR-D-BIN/OCR-D-BIN_0038_IMAGE000918_00001.IMG-BIN.png]/>  [_recursion_count=0]
09:19:19.735 DEBUG PIL.PngImagePlugin - STREAM b'IHDR' 16 13
09:19:19.735 DEBUG PIL.PngImagePlugin - STREAM b'IDAT' 41 65536
Traceback (most recent call last):
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/bin/ocrd-segment-extract-lines", line 8, in <module>
    sys.exit(ocrd_segment_extract_lines())
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/lib/python3.6/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/lib/python3.6/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/lib/python3.6/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/lib/python3.6/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/lib/python3.6/site-packages/ocrd_segment/cli.py", line 65, in ocrd_segment_extract_lines
    return ocrd_cli_wrap_processor(ExtractLines, *args, **kwargs)
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/lib/python3.6/site-packages/ocrd/decorators/__init__.py", line 88, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/lib/python3.6/site-packages/ocrd/processor/helpers.py", line 88, in run_processor
    processor.process()
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/lib/python3.6/site-packages/ocrd_segment/extract_lines.py", line 171, in process
    transparency=self.parameter['transparency'])
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/lib/python3.6/site-packages/ocrd/workspace.py", line 829, in image_from_segment
    fill=fill, transparency=transparency)
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/lib/python3.6/site-packages/ocrd/workspace.py", line 1012, in _crop
    segment_polygon = coordinates_of_segment(segment, parent_image, parent_coords)
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/lib/python3.6/site-packages/ocrd_utils/image.py", line 136, in coordinates_of_segment
    polygon = np.array(polygon_from_points(segment.get_Coords().points))
  File "/home/ocrdadmin/ocrd_all/venv/sub-venv/headless-tf1/lib/python3.6/site-packages/ocrd_utils/image.py", line 148, in polygon_from_points
    polygon.append([float(x_y[0]), float(x_y[1])])
ValueError: could not convert string to float: 

My expection would be, that this PAGE file simply would be ignored.
--> please, clarify ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions