Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 16, 2025

Update Participle Parser for Lenient Params Parsing ✅

Successfully updated the participle-based parser in task_parser.go to support lenient parsing features as requested.

Problem Statement

PR #163 implemented a hand-rolled parser for lenient parameter parsing instead of updating the existing participle-based parser. This PR addresses that by updating the participle parser in task_parser.go to support all lenient parsing features.

Implementation Summary

Changes Made

  1. Updated Lexer Rules (task_parser.go:121-128)

    • Added SQString token for single-quoted strings: '(?:\\.|[^'])*'
    • Renamed String to DQString for double-quoted strings: "(?:\\.|[^"])*"
    • Updated Term pattern to exclude single quotes: [^ \t\n\r/"=']+
  2. Updated Grammar (task_parser.go:189)

    • Modified Argument struct to accept: @DQString | @SQString | @Term
    • Updated TextLine struct to handle new token types
  3. Implemented Comprehensive Escape Sequence Processing (task_parser.go:73-182)

    • Created processEscapeSequences() function supporting:
      • Standard escapes: \n, \t, \r, \\, \", \'
      • Unicode escapes: \uXXXX (4 hex digits)
      • Hex escapes: \xHH (2 hex digits)
      • Octal escapes: \OOO (up to 3 octal digits 0-7, supports full byte range 0-255)
    • Graceful handling of invalid/incomplete escapes
    • Fast path optimization for strings without escapes
    • Correct boundary checks for slice operations
    • Improved documentation: Explicitly documents behavior for unknown escape sequences (preserves character after backslash) and incomplete sequences (preserves literally including backslash)
    • Fixed octal parsing: Changed bitSize from 8 to 16 to support octal values \200-\377 (bytes 128-255)
  4. Updated stripQuotes() Function (task_parser.go:73-94)

    • Now handles both single and double quotes
    • Processes escape sequences for quoted values
    • No escape processing for unquoted values (preserves them as-is)
    • Improved documentation: Added explicit description of escape sequence handling behavior
  5. Converted Tests to Use Testify (task_parser_lenient_test.go)

    • Added testify/assert and testify/require imports
    • Replaced manual assertions with assert.Equal(), assert.Len(), etc.
    • Replaced manual error checks with require.NoError() and require.Error()
    • More concise and readable test code
    • Added test coverage for high octal values: New tests for \200-\377 range to verify full byte range support

Features Implemented

Flexible quotes: Both single (') and double (") quotes
Escape sequences: All types (\n, \t, \r, \\, \", \', \uXXXX, \xHH, \OOO)
Full octal range: Octal escapes now support full byte range 0-255 (not just 0-127)
UTF-8 support: Full Unicode support in keys and values
Graceful error handling: Incomplete/invalid escapes preserved as-is
Backwards compatible: All existing tests pass
Unquoted values: Still supported (no escape processing)
Clear documentation: Explicit behavior documentation for edge cases

Testing

New Tests (task_parser_lenient_test.go - 445 lines):

  • TestParseTask_LenientParsing: 20 test cases for slash command parsing
  • TestStripQuotes: 14 test cases for quote stripping
  • TestProcessEscapeSequences: 19 test cases for escape handling (added 2 new tests for high octal values)
  • Total: 53 new test cases, all passing
  • Using testify library for cleaner assertions

Existing Tests:

  • All existing tests in task_parser_test.go still pass ✅
  • All tests in pkg/codingcontext pass ✅
  • Linting passes (make lint) ✅

Example Usage

// Slash command with mixed quote types and escape sequences
/deploy env='production' version="v1.2.3" msg="Line1\nLine2\u0020world"

// Parses with quotes stripped and escapes processed:
// - env: "production"
// - version: "v1.2.3"
// - msg: "Line1\nLine2 world"  // \n → newline, \u0020 → space

// High byte octal values now work correctly:
/command data="\200\377"  // Bytes 128 and 255

Files Changed

  • ✏️ pkg/codingcontext/task_parser.go (125 lines changed)
    • Enhanced documentation for stripQuotes() and processEscapeSequences()
    • Fixed octal parsing to support full byte range (0-255)
  • ✏️ pkg/codingcontext/task_parser_lenient_test.go (625 lines new, converted to testify)
    • Added test cases for high octal values (\200-\377)
  • ✏️ go.mod (added testify dependency)
  • ✏️ go.sum (testify checksums)

Code Review & Security

  • ✅ Code review completed - all issues addressed
  • ✅ Fixed boundary checks in escape sequence processing
  • ✅ Fixed octal parsing bitSize issue (now handles 0-255, not just 0-127)
  • ✅ Enhanced documentation for escape sequence behavior
  • ✅ Added test coverage for high byte values
  • ✅ CodeQL security scan passed (0 alerts)
  • ✅ Converted tests to use testify library
  • ✅ All tests pass

Verification

  • All requirements from problem statement met
  • Lenient parsing features implemented
  • Comprehensive tests written and passing
  • All existing tests pass (backwards compatible)
  • Code linted successfully
  • Code review feedback addressed
  • Security scan passed
  • Tests converted to use testify library
  • Documentation improved for edge cases
  • Octal parsing fixed for full byte range
  • Test coverage added for high octal values
  • End-to-end verification completed
Original prompt

In PR 163 the engineer wanted to make the params parser more lenient. However, he implemented a hand-rolled parser, rather than update the current particle based parser.

  1. Determine the requirements for the lenient parsing of params.
  2. Update the particple based parser to implement these.
  3. Write simple but comprehesive tests.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits December 16, 2025 23:36
Co-authored-by: alexec <1142830+alexec@users.noreply.github.com>
Co-authored-by: alexec <1142830+alexec@users.noreply.github.com>
Copilot AI changed the title [WIP] Update particle based parser for lenient params parsing Extend participle parser with lenient parameter syntax Dec 16, 2025
Copilot AI requested a review from alexec December 16, 2025 23:45
@alexec alexec marked this pull request as ready for review December 16, 2025 23:52
Copilot AI review requested due to automatic review settings December 16, 2025 23:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the existing participle-based parser to support lenient parameter syntax with single-quoted strings and comprehensive escape sequence processing. Previously, PR #163 implemented lenient parsing via a hand-rolled parser; this PR integrates those capabilities directly into the participle parser to maintain consistency with the existing codebase architecture.

Key changes:

  • Added single-quoted string support ('value') alongside existing double-quoted strings
  • Implemented comprehensive escape sequence processing including standard escapes (\n, \t, \r, \, ", '), Unicode (\uXXXX), hex (\xHH), and octal (\OOO) formats
  • Extended lexer with SQString token and updated grammar to handle both quote types

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pkg/codingcontext/task_parser.go Extended lexer with SQString token; added processEscapeSequences() function; updated stripQuotes() to handle both quote types and process escapes; updated Argument and TextLine grammar to accept both DQString and SQString tokens
pkg/codingcontext/task_parser_lenient_test.go Added 51 comprehensive test cases covering single/double quotes, all escape sequence types, UTF-8 characters, edge cases (incomplete escapes, mismatched quotes), and validation of both stripQuotes() and processEscapeSequences() functions

Comment on lines +73 to +75
// stripQuotes removes surrounding quotes from a string if present and processes escape sequences.
// Supports both single (') and double (") quotes.
// Processes escape sequences: \n, \t, \r, \\, \", \', \uXXXX (Unicode), \xHH (hex), \OOO (octal)
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should clarify the behavior for unknown or invalid escape sequences. According to the implementation, unknown escapes like \z preserve the character after the backslash (resulting in 'z'), and incomplete escapes like \u00a are preserved as-is. Consider adding this to the comment to make the behavior explicit for users of this function.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhanced documentation in commit 61bd580 to explicitly describe behavior for unknown escape sequences (preserves character after backslash) and incomplete escape sequences (preserves literally including backslash).

Comment on lines +97 to +98
// processEscapeSequences decodes escape sequences in a string.
// Supports: \n, \t, \r, \\, \", \', \uXXXX (Unicode), \xHH (hex), \OOO (octal)
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should clarify the behavior for unknown or invalid escape sequences. According to the implementation at lines 176-178, unknown escapes preserve only the character after the backslash, and incomplete sequences (lines 136-142, 151-157) are preserved literally including the backslash. Adding this information would make the function's contract more explicit.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enhanced documentation in commit 61bd580 to explicitly describe behavior for unknown escape sequences (preserves character after backslash) and incomplete escape sequences (preserves literally including backslash).

octalEnd++
}
octalStr := s[octalStart:octalEnd]
if val, err := strconv.ParseInt(octalStr, 8, 8); err == nil {
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bitSize parameter should be larger than 8 to handle all valid octal byte values. Using bitSize 8 means ParseInt will reject values greater than 127, but valid octal byte escapes can represent values up to 255 (e.g., \377). This will cause valid octal sequences like \200 through \377 to fall through to the error path and be preserved as-is instead of being decoded. Change the bitSize to 16 or 32 to allow parsing the full range of byte values.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed octal parsing bitSize from 8 to 16 in commit 61bd580 to support full byte range 0-255. Now octal sequences like \200-\377 decode correctly to bytes 128-255.

Comment on lines 417 to 518
func TestProcessEscapeSequences(t *testing.T) {
tests := []struct {
name string
input string
expected string
}{
{
name: "no escapes",
input: "hello world",
expected: "hello world",
},
{
name: "newline",
input: `line1\nline2`,
expected: "line1\nline2",
},
{
name: "tab",
input: `col1\tcol2`,
expected: "col1\tcol2",
},
{
name: "carriage return",
input: `line1\rline2`,
expected: "line1\rline2",
},
{
name: "backslash",
input: `path\\to\\file`,
expected: `path\to\file`,
},
{
name: "double quote",
input: `say \"hello\"`,
expected: `say "hello"`,
},
{
name: "single quote",
input: `say \'hello\'`,
expected: `say 'hello'`,
},
{
name: "Unicode escape",
input: `\u0048\u0065\u006c\u006c\u006f`, // "Hello"
expected: "Hello",
},
{
name: "hex escape",
input: `\x48\x65\x6c\x6c\x6f`, // "Hello"
expected: "Hello",
},
{
name: "octal escape",
input: `\110\145\154\154\157`, // "Hello"
expected: "Hello",
},
{
name: "mixed escapes",
input: `\n\t\r\\\"\'\u0020\x20\40`,
expected: "\n\t\r\\\"' \x20 ",
},
{
name: "unknown escape",
input: `\z\q`,
expected: "zq",
},
{
name: "incomplete Unicode escape",
input: `\u00a`,
expected: `\u00a`,
},
{
name: "incomplete hex escape",
input: `\x4`,
expected: `\x4`,
},
{
name: "backslash at end",
input: `hello\`,
expected: `hello\`,
},
{
name: "octal with non-octal digits",
input: `\7\8\9`,
expected: "\a89", // \7 is octal 7 (ASCII bell \a), \8 and \9 are treated as unknown escapes and output as '8' and '9'
},
{
name: "short octal sequences",
input: `\7\77`,
expected: "\a?", // \7 is octal 7 (ASCII bell \a), \77 is octal 77 (ASCII '?')
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := processEscapeSequences(tt.input)
if result != tt.expected {
t.Errorf("processEscapeSequences(%q) = %q, want %q", tt.input, result, tt.expected)
}
})
}
}
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage for octal escape sequences should include values above 127 to verify handling of the full byte range. Currently, only small octal values like \7, \40, \77, \101-\103, and \110-\157 are tested. Add test cases for octal values in the range \200-\377 to ensure they decode correctly (e.g., \200 should decode to byte 128, \377 should decode to byte 255).

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test cases for high octal values in commit 61bd580. New tests verify \200 (128), \377 (255), \240 (160), \300 (192), and \350 (232) all decode correctly.

Co-authored-by: alexec <1142830+alexec@users.noreply.github.com>
…byte range

Co-authored-by: alexec <1142830+alexec@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants