-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Disclaimer
I never had any formal education in language parsing, so, even if I had some practical experience in the matter, I apologise if I'm asking something obviously stupid.
Describe the bug
Let's consider this EBNF sample, an excerpt of the actual C11 grammar with everything unrelated stripped. Here, I've replaced some of the C rules with numbers '1' and '2' to make this minimal.
abstract_declarator: '1'
abstract_declarator: '1'? direct_abstract_declarator
direct_abstract_declarator: '(' abstract_declarator ')'
direct_abstract_declarator: direct_abstract_declarator? '[' '2'? ']'In this issue, we'll try to parse the token sequence "[ ]" according to the direct_abstract_declarator rule of this EBNF. I think that it should be parsed correctly (though it's obviously open for discussion; it's possible that such parser construction should lead to infinite recursion).
The particularly problematic rules here are the ones that create recursion for a direct_abstract_declarator. I expect the parser for this EBNF to be something along the lines of (pseudocode):
parseDirectAbstractDeclarator() {
if (nextToken == '(') { // rule 1
// parse remaining part of '(' abstract_declarator ')'
}
// rule 2:
if (tryParseDirectAbstractDeclarator()) {
// parse remaining part of `direct_abstract_declarator? '[' '2'? ']'`
} else {
// parse `'[' '2'? ']'`
}
}But actually, when I generate a parser for this definition, I'm getting something closer to
parseDirectAbstractDeclarator() {
if (nextToken == '(') { // rule 1
// parse remaining part of '(' abstract_declarator ')'
}
// rule 2:
if (tryParseDirectAbstractDeclarator()) {
// parse remaining part of `direct_abstract_declarator? '[' '2'? ']'`
} else {
return error
}
}I.e. the generated parser ignores the part about left recursion being optional.
Workaround
Modify the EBNF by making the optional part explicit:
abstract_declarator: '1'
abstract_declarator: '1'? direct_abstract_declarator
direct_abstract_declarator: '(' abstract_declarator ')'
direct_abstract_declarator: '[' '2'? ']'
direct_abstract_declarator: direct_abstract_declarator '[' '2'? ']'Which libraries does it affect?
<PackageReference Include="Yoakke.Parser.Generator" Version="2022.1.6-2.38.45-nightly" />To Reproduce
Prepare a C# project with the following packages installed:
<PackageReference Include="Yoakke.C.Syntax" Version="2022.1.6-2.38.45-nightly" />
<PackageReference Include="Yoakke.Parser" Version="2022.1.6-2.38.45-nightly" />
<PackageReference Include="Yoakke.Parser.Generator" Version="2022.1.6-2.38.45-nightly" />Now, write the following code (all in one file of C# 10 for convenience):
using Yoakke.C.Syntax;
using Yoakke.Lexer;
using Yoakke.Parser.Attributes;
var parser = new Foo.CParser(new CLexer("[ ]"));
Console.WriteLine(parser.ParseDirectAbstractDeclarator().Ok);
public record AbstractDeclarator(IToken? T, DirectAbstractDeclarator? DAD);
public record DirectAbstractDeclarator(DirectAbstractDeclarator? DAD, IToken O, AbstractDeclarator? AD, IToken? Two, IToken C);
namespace Foo
{
[Parser(typeof(CTokenType))]
public partial class CParser
{
[Rule("abstract_declarator: '1'")]
private static AbstractDeclarator MakeAbstractDeclarator(IToken t) => new(t, null);
[Rule("abstract_declarator: '1'? direct_abstract_declarator")]
private static AbstractDeclarator MakeAbstractDeclarator(
IToken? pointer,
DirectAbstractDeclarator directAbstractDeclarator) => new(pointer, directAbstractDeclarator);
[Rule("direct_abstract_declarator: '(' abstract_declarator ')'")]
private static DirectAbstractDeclarator MakeDirectAbstractDeclarator(
IToken o,
AbstractDeclarator abstractDeclarator,
IToken c) => new DirectAbstractDeclarator(null, o, abstractDeclarator, null, c);
[Rule("direct_abstract_declarator: direct_abstract_declarator? '[' '2'? ']'")]
private static DirectAbstractDeclarator MakeDirectAbstractDeclarator(
DirectAbstractDeclarator? @base,
IToken o,
IToken? two,
IToken c) => new DirectAbstractDeclarator(@base, o, null, two, c);
}
}I expect this program to write the message to the console, but it will throw an exception:
System.InvalidCastException: Unable to cast object of type 'Yoakke.Parser.ParseError' to type 'Yoakke.Parser.ParseOk`1[DirectAbstractDeclarator]'.
at Yoakke.Parser.ParseResult`1.get_Ok()
at Program.<Main>$(String[] args) in C:\Temp\ConsoleApp5\ConsoleApp5\Program.cs:line 6
This means the sample wasn't parsed.
Workaround
Let's apply the workaround with explicit rule separation:
using Yoakke.C.Syntax;
using Yoakke.Lexer;
using Yoakke.Parser.Attributes;
var parser = new Foo.CParser(new CLexer("[ ]"));
Console.WriteLine(parser.ParseDirectAbstractDeclarator().Ok);
public record AbstractDeclarator(IToken? T, DirectAbstractDeclarator? DAD);
public record DirectAbstractDeclarator(DirectAbstractDeclarator? DAD, IToken O, AbstractDeclarator? AD, IToken? Two, IToken C);
namespace Foo
{
[Parser(typeof(CTokenType))]
public partial class CParser
{
[Rule("abstract_declarator: '1'")]
private static AbstractDeclarator MakeAbstractDeclarator(IToken t) => new(t, null);
[Rule("abstract_declarator: '1'? direct_abstract_declarator")]
private static AbstractDeclarator MakeAbstractDeclarator(
IToken? pointer,
DirectAbstractDeclarator directAbstractDeclarator) => new(pointer, directAbstractDeclarator);
[Rule("direct_abstract_declarator: '(' abstract_declarator ')'")]
private static DirectAbstractDeclarator MakeDirectAbstractDeclarator(
IToken o,
AbstractDeclarator abstractDeclarator,
IToken c) => new DirectAbstractDeclarator(null, o, abstractDeclarator, null, c);
// [Rule("direct_abstract_declarator: direct_abstract_declarator? '[' '2'? ']'")]
// private static DirectAbstractDeclarator MakeDirectAbstractDeclarator(
// DirectAbstractDeclarator? @base,
// IToken o,
// IToken? two,
// IToken c) => new DirectAbstractDeclarator(@base, o, null, two, c);
[Rule("direct_abstract_declarator: '[' '2'? ']'")]
private static DirectAbstractDeclarator MakeDirectAbstractDeclarator(
IToken o,
IToken? two,
IToken c) => new DirectAbstractDeclarator(null, o, null, two, c);
[Rule("direct_abstract_declarator: direct_abstract_declarator '[' '2'? ']'")]
private static DirectAbstractDeclarator MakeDirectAbstractDeclarator(
DirectAbstractDeclarator @base,
IToken o,
IToken? two,
IToken c) => new DirectAbstractDeclarator(@base, o, null, two, c);
}
}Now, this program will work, as I explained above.
Analysis
Let's take a quick look at the code generated for the parser:
private Yoakke.Parser.ParseResult<DirectAbstractDeclarator> parseDirectAbstractDeclarator(int offset)
{
Yoakke.Parser.ParseResult<DirectAbstractDeclarator> a14;
Yoakke.Parser.ParseResult<DirectAbstractDeclarator> a15;
Yoakke.Parser.ParseResult<(Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>, AbstractDeclarator,
Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)> a16;
Yoakke.Parser.ParseResult<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>> a17;
if (this.TokenStream.TryLookAhead(offset, out var a18) && a18.Text == "(")
{
a17 = Yoakke.Parser.ParseResult.Ok((Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)a18, offset + 1);
}
else
{
this.TokenStream.TryLookAhead(offset, out var got);
a17 = Yoakke.Parser.ParseResult.Error("(", got, a18.Range.Start, "direct_abstract_declarator");
}
if (a17.IsOk)
{
Yoakke.Parser.ParseResult<AbstractDeclarator> a19;
a19 = parseAbstractDeclarator(a17.Ok.Offset);
if (a19.IsError && (!this.TokenStream.TryLookAhead(a17.Ok.Offset, out var a20) ||
ReferenceEquals(a20, a19.Error.Got)))
{
a19 = Yoakke.Parser.ParseResult.Error("abstract_declarator", a19.Error.Got, a19.Error.Position,
"direct_abstract_declarator");
}
a19 = a19 | a17.Ok.FurthestError;
if (a19.IsOk)
{
Yoakke.Parser.ParseResult<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>> a21;
if (this.TokenStream.TryLookAhead(a19.Ok.Offset, out var a22) && a22.Text == ")")
{
a21 = Yoakke.Parser.ParseResult.Ok((Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)a22,
a19.Ok.Offset + 1);
}
else
{
this.TokenStream.TryLookAhead(a19.Ok.Offset, out var got);
a21 = Yoakke.Parser.ParseResult.Error(")", got, a22.Range.Start, "direct_abstract_declarator");
}
a21 = a21 | a19.Ok.FurthestError;
if (a21.IsOk)
{
a16 = Yoakke.Parser.ParseResult.Ok((a17.Ok.Value, a19.Ok.Value, a21.Ok.Value), a21.Ok.Offset,
a21.Ok.FurthestError);
}
else
{
a16 = a21.Error;
}
}
else
{
a16 = a19.Error;
}
}
else
{
a16 = a17.Error;
}
if (a16.IsOk)
{
var (a23, a24, a25) = a16.Ok.Value;
a15 = Yoakke.Parser.ParseResult.Ok(MakeDirectAbstractDeclarator(a23, a24, a25), a16.Ok.Offset,
a16.Ok.FurthestError);
}
else
{
a15 = a16.Error;
}
if (a15.IsOk)
{
var placeholder = a15;
a14 = a15.Ok;
while (true)
{
Yoakke.Parser.ParseResult<DirectAbstractDeclarator> a26;
Yoakke.Parser.ParseResult<(DirectAbstractDeclarator?, Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>,
Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>?, Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)> a27;
Yoakke.Parser.ParseResult<DirectAbstractDeclarator?> a28;
Yoakke.Parser.ParseResult<DirectAbstractDeclarator> a29;
a29 = placeholder;
if (a29.IsOk)
a28 = Yoakke.Parser.ParseResult.Ok<DirectAbstractDeclarator?>(a29.Ok.Value, a29.Ok.Offset,
a29.Ok.FurthestError);
else a28 = Yoakke.Parser.ParseResult.Ok(default(DirectAbstractDeclarator?), offset, a29.Error);
if (a28.IsOk)
{
Yoakke.Parser.ParseResult<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>> a30;
if (this.TokenStream.TryLookAhead(a28.Ok.Offset, out var a31) && a31.Text == "[")
{
a30 = Yoakke.Parser.ParseResult.Ok((Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)a31,
a28.Ok.Offset + 1);
}
else
{
this.TokenStream.TryLookAhead(a28.Ok.Offset, out var got);
a30 = Yoakke.Parser.ParseResult.Error("[", got, a31.Range.Start, "direct_abstract_declarator");
}
a30 = a30 | a28.Ok.FurthestError;
if (a30.IsOk)
{
Yoakke.Parser.ParseResult<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>?> a32;
Yoakke.Parser.ParseResult<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>> a33;
if (this.TokenStream.TryLookAhead(a30.Ok.Offset, out var a34) && a34.Text == "2")
{
a33 = Yoakke.Parser.ParseResult.Ok((Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)a34,
a30.Ok.Offset + 1);
}
else
{
this.TokenStream.TryLookAhead(a30.Ok.Offset, out var got);
a33 = Yoakke.Parser.ParseResult.Error("2", got, a34.Range.Start, "direct_abstract_declarator");
}
if (a33.IsOk)
a32 = Yoakke.Parser.ParseResult.Ok<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>?>(
a33.Ok.Value, a33.Ok.Offset, a33.Ok.FurthestError);
else
a32 = Yoakke.Parser.ParseResult.Ok(default(Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>?),
a30.Ok.Offset, a33.Error);
a32 = a32 | a30.Ok.FurthestError;
if (a32.IsOk)
{
Yoakke.Parser.ParseResult<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>> a35;
if (this.TokenStream.TryLookAhead(a32.Ok.Offset, out var a36) && a36.Text == "]")
{
a35 = Yoakke.Parser.ParseResult.Ok((Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)a36,
a32.Ok.Offset + 1);
}
else
{
this.TokenStream.TryLookAhead(a32.Ok.Offset, out var got);
a35 = Yoakke.Parser.ParseResult.Error("]", got, a36.Range.Start,
"direct_abstract_declarator");
}
a35 = a35 | a32.Ok.FurthestError;
if (a35.IsOk)
{
a27 = Yoakke.Parser.ParseResult.Ok((a28.Ok.Value, a30.Ok.Value, a32.Ok.Value, a35.Ok.Value),
a35.Ok.Offset, a35.Ok.FurthestError);
}
else
{
a27 = a35.Error;
}
}
else
{
a27 = a32.Error;
}
}
else
{
a27 = a30.Error;
}
}
else
{
a27 = a28.Error;
}
if (a27.IsOk)
{
var (a37, a38, a39, a40) = a27.Ok.Value;
a26 = Yoakke.Parser.ParseResult.Ok(MakeDirectAbstractDeclarator(a37, a38, a39, a40), a27.Ok.Offset,
a27.Ok.FurthestError);
}
else
{
a26 = a27.Error;
}
if (a26.IsOk)
{
placeholder = a26;
a14 = a26.Ok;
}
else
{
break;
}
}
}
else
{
a14 = a15.Error;
}
return a14;
}To me, it looks like it will ever follow the '[' '2'? ']' pattern iff a15.IsOk, and a15 will be Ok only in case it was able to parse the left, optional instance of direct_abstract_declarator in the rule direct_abstract_declarator: direct_abstract_declarator? '[' '2'? ']'.
Environment:
- OS: Windows 10
- .NET version: .NET 6
- Library version: 2022.1.6-2.38.45-nightly