Skip to content

Optional left recursion doesn't seem to work properly #121

@ForNeVeR

Description

@ForNeVeR

Disclaimer

I never had any formal education in language parsing, so, even if I had some practical experience in the matter, I apologise if I'm asking something obviously stupid.

Describe the bug

Let's consider this EBNF sample, an excerpt of the actual C11 grammar with everything unrelated stripped. Here, I've replaced some of the C rules with numbers '1' and '2' to make this minimal.

abstract_declarator: '1'
abstract_declarator: '1'? direct_abstract_declarator

direct_abstract_declarator: '(' abstract_declarator ')'
direct_abstract_declarator: direct_abstract_declarator? '[' '2'? ']'

In this issue, we'll try to parse the token sequence "[ ]" according to the direct_abstract_declarator rule of this EBNF. I think that it should be parsed correctly (though it's obviously open for discussion; it's possible that such parser construction should lead to infinite recursion).

The particularly problematic rules here are the ones that create recursion for a direct_abstract_declarator. I expect the parser for this EBNF to be something along the lines of (pseudocode):

parseDirectAbstractDeclarator() {
  if (nextToken == '(') { // rule 1
    // parse remaining part of '(' abstract_declarator ')'
  }
  // rule 2:
  if (tryParseDirectAbstractDeclarator()) {
    // parse remaining part of `direct_abstract_declarator? '[' '2'? ']'`
  } else {
    // parse `'[' '2'? ']'`
  }
}

But actually, when I generate a parser for this definition, I'm getting something closer to

parseDirectAbstractDeclarator() {
  if (nextToken == '(') { // rule 1
    // parse remaining part of '(' abstract_declarator ')'
  }
  // rule 2:
  if (tryParseDirectAbstractDeclarator()) {
    // parse remaining part of `direct_abstract_declarator? '[' '2'? ']'`
  } else {
    return error
  }
}

I.e. the generated parser ignores the part about left recursion being optional.

Workaround

Modify the EBNF by making the optional part explicit:

abstract_declarator: '1'
abstract_declarator: '1'? direct_abstract_declarator

direct_abstract_declarator: '(' abstract_declarator ')'
direct_abstract_declarator: '[' '2'? ']'
direct_abstract_declarator: direct_abstract_declarator '[' '2'? ']'

Which libraries does it affect?

      <PackageReference Include="Yoakke.Parser.Generator" Version="2022.1.6-2.38.45-nightly" />

To Reproduce

Prepare a C# project with the following packages installed:

      <PackageReference Include="Yoakke.C.Syntax" Version="2022.1.6-2.38.45-nightly" />
      <PackageReference Include="Yoakke.Parser" Version="2022.1.6-2.38.45-nightly" />
      <PackageReference Include="Yoakke.Parser.Generator" Version="2022.1.6-2.38.45-nightly" />

Now, write the following code (all in one file of C# 10 for convenience):

using Yoakke.C.Syntax;
using Yoakke.Lexer;
using Yoakke.Parser.Attributes;

var parser = new Foo.CParser(new CLexer("[ ]"));
Console.WriteLine(parser.ParseDirectAbstractDeclarator().Ok);

public record AbstractDeclarator(IToken? T, DirectAbstractDeclarator? DAD);
public record DirectAbstractDeclarator(DirectAbstractDeclarator? DAD, IToken O, AbstractDeclarator? AD, IToken? Two, IToken C);

namespace Foo
{
    [Parser(typeof(CTokenType))]
    public partial class CParser
    {
        [Rule("abstract_declarator: '1'")]
        private static AbstractDeclarator MakeAbstractDeclarator(IToken t) => new(t, null);

        [Rule("abstract_declarator: '1'? direct_abstract_declarator")]
        private static AbstractDeclarator MakeAbstractDeclarator(
            IToken? pointer,
            DirectAbstractDeclarator directAbstractDeclarator) => new(pointer, directAbstractDeclarator);

        [Rule("direct_abstract_declarator: '(' abstract_declarator ')'")]
        private static DirectAbstractDeclarator MakeDirectAbstractDeclarator(
            IToken o,
            AbstractDeclarator abstractDeclarator,
            IToken c) => new DirectAbstractDeclarator(null, o, abstractDeclarator, null, c);

        [Rule("direct_abstract_declarator: direct_abstract_declarator? '[' '2'? ']'")]
        private static DirectAbstractDeclarator MakeDirectAbstractDeclarator(
            DirectAbstractDeclarator? @base,
            IToken o,
            IToken? two,
            IToken c) => new DirectAbstractDeclarator(@base, o, null, two, c);
    }
}

I expect this program to write the message to the console, but it will throw an exception:

System.InvalidCastException: Unable to cast object of type 'Yoakke.Parser.ParseError' to type 'Yoakke.Parser.ParseOk`1[DirectAbstractDeclarator]'.
   at Yoakke.Parser.ParseResult`1.get_Ok()
   at Program.<Main>$(String[] args) in C:\Temp\ConsoleApp5\ConsoleApp5\Program.cs:line 6

This means the sample wasn't parsed.

Workaround

Let's apply the workaround with explicit rule separation:

using Yoakke.C.Syntax;
using Yoakke.Lexer;
using Yoakke.Parser.Attributes;

var parser = new Foo.CParser(new CLexer("[ ]"));
Console.WriteLine(parser.ParseDirectAbstractDeclarator().Ok);

public record AbstractDeclarator(IToken? T, DirectAbstractDeclarator? DAD);
public record DirectAbstractDeclarator(DirectAbstractDeclarator? DAD, IToken O, AbstractDeclarator? AD, IToken? Two, IToken C);

namespace Foo
{
    [Parser(typeof(CTokenType))]
    public partial class CParser
    {
        [Rule("abstract_declarator: '1'")]
        private static AbstractDeclarator MakeAbstractDeclarator(IToken t) => new(t, null);

        [Rule("abstract_declarator: '1'? direct_abstract_declarator")]
        private static AbstractDeclarator MakeAbstractDeclarator(
            IToken? pointer,
            DirectAbstractDeclarator directAbstractDeclarator) => new(pointer, directAbstractDeclarator);

        [Rule("direct_abstract_declarator: '(' abstract_declarator ')'")]
        private static DirectAbstractDeclarator MakeDirectAbstractDeclarator(
            IToken o,
            AbstractDeclarator abstractDeclarator,
            IToken c) => new DirectAbstractDeclarator(null, o, abstractDeclarator, null, c);

        // [Rule("direct_abstract_declarator: direct_abstract_declarator? '[' '2'? ']'")]
        // private static DirectAbstractDeclarator MakeDirectAbstractDeclarator(
        //     DirectAbstractDeclarator? @base,
        //     IToken o,
        //     IToken? two,
        //     IToken c) => new DirectAbstractDeclarator(@base, o, null, two, c);

        [Rule("direct_abstract_declarator: '[' '2'? ']'")]
        private static DirectAbstractDeclarator MakeDirectAbstractDeclarator(
            IToken o,
            IToken? two,
            IToken c) => new DirectAbstractDeclarator(null, o, null, two, c);

        [Rule("direct_abstract_declarator: direct_abstract_declarator '[' '2'? ']'")]
        private static DirectAbstractDeclarator MakeDirectAbstractDeclarator(
            DirectAbstractDeclarator @base,
            IToken o,
            IToken? two,
            IToken c) => new DirectAbstractDeclarator(@base, o, null, two, c);
    }
}

Now, this program will work, as I explained above.

Analysis

Let's take a quick look at the code generated for the parser:

private Yoakke.Parser.ParseResult<DirectAbstractDeclarator> parseDirectAbstractDeclarator(int offset)
{
    Yoakke.Parser.ParseResult<DirectAbstractDeclarator> a14;
    Yoakke.Parser.ParseResult<DirectAbstractDeclarator> a15;
    Yoakke.Parser.ParseResult<(Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>, AbstractDeclarator,
        Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)> a16;
    Yoakke.Parser.ParseResult<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>> a17;
    if (this.TokenStream.TryLookAhead(offset, out var a18) && a18.Text == "(")
    {
        a17 = Yoakke.Parser.ParseResult.Ok((Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)a18, offset + 1);
    }
    else
    {
        this.TokenStream.TryLookAhead(offset, out var got);
        a17 = Yoakke.Parser.ParseResult.Error("(", got, a18.Range.Start, "direct_abstract_declarator");
    }

    if (a17.IsOk)
    {
        Yoakke.Parser.ParseResult<AbstractDeclarator> a19;
        a19 = parseAbstractDeclarator(a17.Ok.Offset);
        if (a19.IsError && (!this.TokenStream.TryLookAhead(a17.Ok.Offset, out var a20) ||
                            ReferenceEquals(a20, a19.Error.Got)))
        {
            a19 = Yoakke.Parser.ParseResult.Error("abstract_declarator", a19.Error.Got, a19.Error.Position,
                "direct_abstract_declarator");
        }

        a19 = a19 | a17.Ok.FurthestError;
        if (a19.IsOk)
        {
            Yoakke.Parser.ParseResult<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>> a21;
            if (this.TokenStream.TryLookAhead(a19.Ok.Offset, out var a22) && a22.Text == ")")
            {
                a21 = Yoakke.Parser.ParseResult.Ok((Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)a22,
                    a19.Ok.Offset + 1);
            }
            else
            {
                this.TokenStream.TryLookAhead(a19.Ok.Offset, out var got);
                a21 = Yoakke.Parser.ParseResult.Error(")", got, a22.Range.Start, "direct_abstract_declarator");
            }

            a21 = a21 | a19.Ok.FurthestError;
            if (a21.IsOk)
            {
                a16 = Yoakke.Parser.ParseResult.Ok((a17.Ok.Value, a19.Ok.Value, a21.Ok.Value), a21.Ok.Offset,
                    a21.Ok.FurthestError);
            }
            else
            {
                a16 = a21.Error;
            }
        }
        else
        {
            a16 = a19.Error;
        }
    }
    else
    {
        a16 = a17.Error;
    }

    if (a16.IsOk)
    {
        var (a23, a24, a25) = a16.Ok.Value;
        a15 = Yoakke.Parser.ParseResult.Ok(MakeDirectAbstractDeclarator(a23, a24, a25), a16.Ok.Offset,
            a16.Ok.FurthestError);
    }
    else
    {
        a15 = a16.Error;
    }

    if (a15.IsOk)
    {
        var placeholder = a15;
        a14 = a15.Ok;
        while (true)
        {
            Yoakke.Parser.ParseResult<DirectAbstractDeclarator> a26;
            Yoakke.Parser.ParseResult<(DirectAbstractDeclarator?, Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>,
                Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>?, Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)> a27;
            Yoakke.Parser.ParseResult<DirectAbstractDeclarator?> a28;
            Yoakke.Parser.ParseResult<DirectAbstractDeclarator> a29;
            a29 = placeholder;
            if (a29.IsOk)
                a28 = Yoakke.Parser.ParseResult.Ok<DirectAbstractDeclarator?>(a29.Ok.Value, a29.Ok.Offset,
                    a29.Ok.FurthestError);
            else a28 = Yoakke.Parser.ParseResult.Ok(default(DirectAbstractDeclarator?), offset, a29.Error);
            if (a28.IsOk)
            {
                Yoakke.Parser.ParseResult<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>> a30;
                if (this.TokenStream.TryLookAhead(a28.Ok.Offset, out var a31) && a31.Text == "[")
                {
                    a30 = Yoakke.Parser.ParseResult.Ok((Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)a31,
                        a28.Ok.Offset + 1);
                }
                else
                {
                    this.TokenStream.TryLookAhead(a28.Ok.Offset, out var got);
                    a30 = Yoakke.Parser.ParseResult.Error("[", got, a31.Range.Start, "direct_abstract_declarator");
                }

                a30 = a30 | a28.Ok.FurthestError;
                if (a30.IsOk)
                {
                    Yoakke.Parser.ParseResult<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>?> a32;
                    Yoakke.Parser.ParseResult<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>> a33;
                    if (this.TokenStream.TryLookAhead(a30.Ok.Offset, out var a34) && a34.Text == "2")
                    {
                        a33 = Yoakke.Parser.ParseResult.Ok((Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)a34,
                            a30.Ok.Offset + 1);
                    }
                    else
                    {
                        this.TokenStream.TryLookAhead(a30.Ok.Offset, out var got);
                        a33 = Yoakke.Parser.ParseResult.Error("2", got, a34.Range.Start, "direct_abstract_declarator");
                    }

                    if (a33.IsOk)
                        a32 = Yoakke.Parser.ParseResult.Ok<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>?>(
                            a33.Ok.Value, a33.Ok.Offset, a33.Ok.FurthestError);
                    else
                        a32 = Yoakke.Parser.ParseResult.Ok(default(Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>?),
                            a30.Ok.Offset, a33.Error);
                    a32 = a32 | a30.Ok.FurthestError;
                    if (a32.IsOk)
                    {
                        Yoakke.Parser.ParseResult<Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>> a35;
                        if (this.TokenStream.TryLookAhead(a32.Ok.Offset, out var a36) && a36.Text == "]")
                        {
                            a35 = Yoakke.Parser.ParseResult.Ok((Yoakke.Lexer.IToken<Yoakke.C.Syntax.CTokenType>)a36,
                                a32.Ok.Offset + 1);
                        }
                        else
                        {
                            this.TokenStream.TryLookAhead(a32.Ok.Offset, out var got);
                            a35 = Yoakke.Parser.ParseResult.Error("]", got, a36.Range.Start,
                                "direct_abstract_declarator");
                        }

                        a35 = a35 | a32.Ok.FurthestError;
                        if (a35.IsOk)
                        {
                            a27 = Yoakke.Parser.ParseResult.Ok((a28.Ok.Value, a30.Ok.Value, a32.Ok.Value, a35.Ok.Value),
                                a35.Ok.Offset, a35.Ok.FurthestError);
                        }
                        else
                        {
                            a27 = a35.Error;
                        }
                    }
                    else
                    {
                        a27 = a32.Error;
                    }
                }
                else
                {
                    a27 = a30.Error;
                }
            }
            else
            {
                a27 = a28.Error;
            }

            if (a27.IsOk)
            {
                var (a37, a38, a39, a40) = a27.Ok.Value;
                a26 = Yoakke.Parser.ParseResult.Ok(MakeDirectAbstractDeclarator(a37, a38, a39, a40), a27.Ok.Offset,
                    a27.Ok.FurthestError);
            }
            else
            {
                a26 = a27.Error;
            }

            if (a26.IsOk)
            {
                placeholder = a26;
                a14 = a26.Ok;
            }
            else
            {
                break;
            }
        }
    }
    else
    {
        a14 = a15.Error;
    }

    return a14;
}

To me, it looks like it will ever follow the '[' '2'? ']' pattern iff a15.IsOk, and a15 will be Ok only in case it was able to parse the left, optional instance of direct_abstract_declarator in the rule direct_abstract_declarator: direct_abstract_declarator? '[' '2'? ']'.

Environment:

  • OS: Windows 10
  • .NET version: .NET 6
  • Library version: 2022.1.6-2.38.45-nightly

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions