Skip to content

Feature: distinguish ISO-8859-2 from windows-1250 mojibake #168

@rspeer

Description

@rspeer

ISO-8859-2 covers many of the same characters as Windows-1250, but unfortunately has the characters in different places.

An awkwardly ambiguous case that I've found is that the text SchlĂźsselwĂśrter gets decoded by ftfy as SchlßsselwÜrter, considering it to be Windows-1250 mojibake, when in fact it was ISO-8859-2 mojibake that should have said Schlüsselwörter. Distinguishing these without additional context would require recognizing the awkward capitalization and the extreme unlikeliness of the sequence "ßss".

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions