Skip to content

Parse HTML #606

@Witiko

Description

@Witiko

Currently, the Markdown package makes no effort to parse HTML content, contrary to markdown and YAML:

How is one supposed to define a renderer for this? The inlineHtmlTag only gets one argument, the html tag, and I see no way to get at the content between both tags to put braces around the argument?

With difficulty. You would need to scan ahead hoping for a closing HTML tag. There is no attempt to provide comprehensive support for rendering HTML elements at this moment.

We might add an option to enable a new parsing regime for HTML that would produce more useful renderers for all aspects of HTML code, similarly to YAML. However, there's currently no detailed proposal (see e.g. #517) for this feature, which would be required to start the implementation.

Since our parser already differentiates between different types of HTML content following CommonMark's model of HTML, we could start by exposing the corresponding PEG parsers as individual renderers. While it's unclear whether this would be sufficient for rendering HTML in TeX, it would be a start and definitely much less work than including a full-blown HTML parser in addition to the current CommonMark parser.

Originally posted by @u-fischer and @Witiko in #597

Tasks

  • Propose and implement renderers that correspond to CommonMark's model of HTML.
    Produce these renderers if a corresponding option has been enabled.
  • Propose and implement renderers that correspond to HTML nodes.
    Produce these renderers if a corresponding option has been enabled.

Metadata

Metadata

Assignees

Labels

commonmarkRelated to making the syntax of markdown follow the CommonMark specfeature requesthtmlRelated to our support for HTML content.luaRelated to the Lua interface and implementation

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions