Skip to content

dotjulia/VERS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

$VΣRS$ is a Context Free Grammar (CFG) parser in Rust

Given a context-free grammar (CFG) $G = (V, Σ, R, S)$ and a word $w$, VΣRS determines whether $w$ is a member of the language generated by $G$. $VΣRS$ first converts G to Chomsky Normal Form (CNF) at compile time and then applies the Cocke-Younger-Kasami (CYK) algorithm to $G$ in CNF form to decide $w$'s membership in the generated language.

VΣRS! [G = (
    V = {S},
    Σ = {a = "(", b = ")"},
    R = {
        S -> aSb,
        S -> ab,
    },
    S = S
)];
"((()))".parse_grammar(&G).matches // true

Chomsky Normal Form (CNF)

$VΣRS$ converts a CFG $G = (V, Σ, R, S)$ to CNF at compile time. The conversion is done by applying the following rules to $G$:

  • Subsitute the starting symbol $S$ with $S'$ and add the production rule $S' \rightarrow S$, if $S$ appears on the right-hand side of a production rule.
  • For every nullable production rule $A \rightarrow \epsilon$, substitute it with one or more equivalent rules. A production rule is said to be nullable if it is of the form $A \rightarrow \epsilon$, where $\epsilon$ denotes the empty word, or if it is of the form $A \rightarrow B$, where $B$ is a nullable production rule. Next, substitute every production rule with all its permutations while omitting any or all of the nullable production rules.
  • Eliminate all unit rules in the form of $A \rightarrow B$, where $A$ and $B$ are non-terminals.
  • For every production rule that involves more than two non-terminals denoted by $S \rightarrow ABC$, introduce a new symbol, $X_{AB}$, and replace the first two non-terminals with this symbol, while also adding the rule $X_{AB} \rightarrow AB$. This produces the revised rule $S \rightarrow X_{AB}C$.
  • Given a set of terminals $a \in \Sigma$, for all production rules containing more than one terminals or at least one non-terminal and a terminal, substitute these terminals with a newly defined non-terminal $A_a$.

Cocke-Younger-Kasami (CYK) Algorithm

...

Rust macros

$VΣRS$ uses Rust macros to convert a CFG to CNF and then parses the word at runtime using the CYK algorithm. Using rust macros allows giving useful compile-time errors to the user. For instance, if the user specifies a terminal that is not defined in the grammar, the compiler will throw an error. The following is an example of a compile-time error thrown by $VΣRS$:

VΣRS! [G = (
    V = {S},
    Σ = {a = "(", b = ")"},
    R = {
        S -> aSbc,
        S -> ab,
    },
    S = S
)];
error: non-terminal/terminal 'c' is not defined in the grammar
  --> src/main.rs:27:13
   |
27 |             S -> aSbc,
   |             ^

Limitations:

  • Grammar cannot generate the empty word ("" or $ε$) but production rules can contain the empty word. For instance, assuming $S$ is the starting symbol, the production $S \rightarrow ε$ is considered invalid. However, the rule $B \rightarrow ε$ is considered a legitimate production rule.

About

VΣRS is CFG parser written in Rust.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages