Skip to content
This repository was archived by the owner on Sep 28, 2023. It is now read-only.
This repository was archived by the owner on Sep 28, 2023. It is now read-only.

What is the best way to use string literals containing Unicode and have it produce “safe” LaTeX code for it? #98

@leftaroundabout

Description

@leftaroundabout

The IsString instance is pretty robust as far as ASCII is concerned, escaping characters that would be interpreted as control characters by LaTeX and instead producing the code that's needed for making the rendered output look like the original strings.

However, when a string contains Unicode, things aren't so clear-cut. While XeLaTeX has, unlike pdfLaTeX, proper support for UTF-8, it is still in my experience not given that Unicode input will be rendered properly. To be sure, I used to always escape e.g. German umlauts to the safe LaTeX form manually (like it's na\"ive to dismiss Mot\"orhead) when I still wrote LaTeX manually.

Now, that doesn't work in HaTeX, and since I have a couple of times found myself surprised by missing or inadequately substituted unicode characters because.

That seems like a problem we shouldn't be having. It would easy enough to add suitable rules to protectChar, like

diff --git a/Text/LaTeX/Base/Syntax.hs b/Text/LaTeX/Base/Syntax.hs
index 7801593..61ef225 100644
--- a/Text/LaTeX/Base/Syntax.hs
+++ b/Text/LaTeX/Base/Syntax.hs
@@ -134,6 +134,17 @@ protectChar '}'  = "\\}"
 protectChar '~'  = "\\~{}"
 protectChar '\\' = "\\textbackslash{}"
 protectChar '_'  = "\\_{}"
+protectChar 'ӓ'  = "\"a"
+protectChar 'ë'  = "\"e"
+protectChar 'ï'  = "\"i"
+protectChar 'ö'  = "\"o"
+protectChar 'ü'  = "\"u"
+protectChar 'Ä'  = "\"A"
+protectChar 'Ë'  = "\"E"
+protectChar 'Ï'  = "\"I"
+protectChar 'Ö'  = "\"O"
+protectChar 'Ü'  = "\"U"
+protectChar 'ß'  = "{\\ss}"
 protectChar x = [x]
 
 -- Syntax analysis

One might also wish to add accents etc..

But that might well be opening a can of worms. The question would immediately be, where do we end? I personally would be inclined to also add substitutions like $\mathbb{R}$, but that's clearly rather unsafe.

Is there a sensible option to offer multiple different IsString instances, in different modules that can be imported depending on what you need in your document?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions