Quality improvement by searching for common OCR errors (transferred from OL)

Original text from https://github.com/internetarchive/openlibrary/issues/810: 

> Sorry if this is out of place, but I just stumbled across an oddity. It appears that the Google-digitized non-English editions have some habitual problems in the OCR which shows up in the boilerplate they inserted.
> 
> For instance, Googling: "carcfully scannod" site:archive.org
> turns up 46,900 results, most of which are scanned from texts in languages that use diacritics. That can't be a coincidence. I'm wondering if it can be put to use for quality improvement. Might they just need a fresh run through OCR with more modern software?

More discussion in the thread.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality improvement by searching for common OCR errors (transferred from OL) #97

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quality improvement by searching for common OCR errors (transferred from OL) #97

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions