What is the best way to use string literals containing Unicode and have it produce “safe” LaTeX code for it?

The `IsString` instance is pretty robust as far as ASCII is concerned, [escaping characters that would be interpreted as control characters by LaTeX](https://github.com/Daniel-Diaz/HaTeX/blob/master/Text/LaTeX/Base/Syntax.hs#L136) and instead producing the code that's needed for making the _rendered output_ look like the original strings.

However, when a string contains Unicode, things aren't so clear-cut. While XeLaTeX has, unlike pdfLaTeX, proper support for UTF-8, it is still in my experience not given that Unicode input will be rendered properly. To be sure, I used to always escape e.g. German umlauts to the safe LaTeX form manually (like `it's na\"ive to dismiss Mot\"orhead`) when I still wrote LaTeX manually.

Now, that doesn't work in HaTeX, and since I have a couple of times found myself surprised by missing or inadequately substituted unicode characters because.

That seems like a problem we shouldn't be having. It would easy enough to add suitable rules to `protectChar`, like
```diff
diff --git a/Text/LaTeX/Base/Syntax.hs b/Text/LaTeX/Base/Syntax.hs
index 7801593..61ef225 100644
--- a/Text/LaTeX/Base/Syntax.hs
+++ b/Text/LaTeX/Base/Syntax.hs
@@ -134,6 +134,17 @@ protectChar '}'  = "\\}"
 protectChar '~'  = "\\~{}"
 protectChar '\\' = "\\textbackslash{}"
 protectChar '_'  = "\\_{}"
+protectChar 'ӓ'  = "\"a"
+protectChar 'ë'  = "\"e"
+protectChar 'ï'  = "\"i"
+protectChar 'ö'  = "\"o"
+protectChar 'ü'  = "\"u"
+protectChar 'Ä'  = "\"A"
+protectChar 'Ë'  = "\"E"
+protectChar 'Ï'  = "\"I"
+protectChar 'Ö'  = "\"O"
+protectChar 'Ü'  = "\"U"
+protectChar 'ß'  = "{\\ss}"
 protectChar x = [x]
 
 -- Syntax analysis
```
One might also wish to add accents etc..

But that might well be opening a can of worms. The question would immediately be, where do we end? I personally would be inclined to also add substitutions like `ℝ` → `$\mathbb{R}$`, but that's clearly rather unsafe.

Is there a sensible option to offer multiple different `IsString` instances, in different modules that can be imported depending on what you need in your document?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the best way to use string literals containing Unicode and have it produce “safe” LaTeX code for it? #98

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

What is the best way to use string literals containing Unicode and have it produce “safe” LaTeX code for it? #98

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions