txml

tiny xml parser in typescript

getting started

to start using it you can install it with npm:

npm install d0x45/txml#master

or if you are using deno just import the github raw url.

how it works

i tried to keep this as simple as possible, so there are three basic functions:

parseXml(xml: XMLInput, canSkipStartingWhitespace = false): TokensIterator

this function consumes a XMLInput and returns a TokensIterator

buildXmlTree(tokens: TokensIterator, parentTagName?: string): Array<XMLNode>

a recursive function that takes a TokensIterator and returns an array of XMLNodes.

note: the second argument is a way to validate closing tags while recursively calling itself (do not provide a value manually)

walkXmlNodes(tokens: TokensIterator, callback: (path: string, node: XMLNode, parents: Array<XMLNode>) => void | true)

this function calls the provided callback on each node reporting the parent nodes, the node itself and a path argument which is a concatenation of parents' tagNames (e.g. feed.entry.media:thumbnail)

if the callback function returns boolean true, the iteration will be stopped

note: the nodes returned by this function don't have the children field as it is not possible to compute that value

here is a little example:

const xml = `
<parent>
    <child>some text</child>
</parent>`;
const tokens = parseXml(xml);
const tree = buildXmlTree(tokens);
console.log(tree);

you can also iterate over tokens as they are being processed:

for (const token of parseXml(xml)) {
    //     ^^^^^ is a XMLToken
    // do something with token
    // like maybe a callback?
    // see https://en.wikipedia.org/wiki/Simple_API_for_XML
}

or turn them into an array all at once:

const tokens = [...parseXml(xml)];
const tree = buildXmlTree(tokens.values());

types

// the input to `parseXml` can be string or just something you put together to read from another buffer.
// as long as your input is able to read a few characters ahead (let's say like 5 or 6) it does the job.
type XMLInput =
  | string
  | {
      length: number;
      charCodeAt: (i: number) => number;
    };

// the parseXml function itself is implemented as a Generator
type TokensIterator =
  | Generator<Readonly<XMLToken>, never | void>
  | IterableIterator<Readonly<XMLToken>>;

enum XMLTokenType {
  TEXT = 0, // any text value
  TAG_OPEN, // opened tag (e.g. `<?`, `<!`, `<[a-zA-Z]`)
  TAG_CLOSE, // closed tag (e.g. `</identifier>`)
  TAG_SELF_CLOSE, // self-closing tags (e.g. `?>`, `/>`, `-->`, `]]>`)
}

enum XMLTag {
  NONE = 0, // not a tag. at least not a known tag. mostly used for TEXT
  DECLARATION, // xml prolog (e.g. `<?identifier ... ?>`)
  COMMENT, // comment block (e.g. `<!-- comment -->`)
  CDATA, // cdata block (e.g. `<![CDATA[ ]]>`)
  ARBITRARY, // any other tag name (e.g. `<identifier ... >`)
}

type XMLTokenAttribute = {
  key: Array<number>; // key is array of bytes
  value: Array<number>; // same goes for value
  term_c: number; // termination character is either " or ' (e.g. key="value")
  start: number; // the start index of the first character of key
  end: number; // the last term_c position
};

type XMLToken = {
  type: XMLTokenType; // type of token (obviously)
  tag: XMLTag; // the tag type of this token (why do i even bother writing this?)
  start: number; // start index of the token
  end: number; // end index of the token
  // array of charCodes for content. (e.g. [97,97,97,97])
  // to turn into string, use getContentAsStr(token.content)
  // this field might contain the content of a CDATA, COMMENT or TEXT block,
  // or it might contain the tagName of ARBITRARY and DECLARATION tags
  // it all depends on the tag value
  content: Array<number>;
  // possible array of XMLTokenAttribute
  // convert to map with: getAttributesAsMap()
  attributes?: Array<XMLTokenAttribute>;
};

// represents any node that might contain children or be a textual node
type XMLNode =
  | {
      type: XMLTag.ARBITRARY | XMLTag.DECLARATION;
      tagName: string;
      attrMap?: Record<string, string>;
      children?: Array<XMLNode>;
    }
  | {
      type: XMLTag.NONE | XMLTag.CDATA | XMLTag.COMMENT;
      content: string;
    };

tl;dr

it's a tiny xml parser that does the job. give it a star if you like it. contributions are welcome

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
cspell.json		cspell.json
package.json		package.json
tsconfig.json		tsconfig.json
txml.ts		txml.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

txml

getting started

how it works

types

tl;dr

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

D0x45/txml

Folders and files

Latest commit

History

Repository files navigation

txml

getting started

how it works

types

tl;dr

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages