-
Notifications
You must be signed in to change notification settings - Fork 1
Extend participle parser with lenient parameter syntax #165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: alexec <1142830+alexec@users.noreply.github.com>
Co-authored-by: alexec <1142830+alexec@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR extends the existing participle-based parser to support lenient parameter syntax with single-quoted strings and comprehensive escape sequence processing. Previously, PR #163 implemented lenient parsing via a hand-rolled parser; this PR integrates those capabilities directly into the participle parser to maintain consistency with the existing codebase architecture.
Key changes:
- Added single-quoted string support (
'value') alongside existing double-quoted strings - Implemented comprehensive escape sequence processing including standard escapes (\n, \t, \r, \, ", '), Unicode (\uXXXX), hex (\xHH), and octal (\OOO) formats
- Extended lexer with
SQStringtoken and updated grammar to handle both quote types
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
pkg/codingcontext/task_parser.go |
Extended lexer with SQString token; added processEscapeSequences() function; updated stripQuotes() to handle both quote types and process escapes; updated Argument and TextLine grammar to accept both DQString and SQString tokens |
pkg/codingcontext/task_parser_lenient_test.go |
Added 51 comprehensive test cases covering single/double quotes, all escape sequence types, UTF-8 characters, edge cases (incomplete escapes, mismatched quotes), and validation of both stripQuotes() and processEscapeSequences() functions |
| // stripQuotes removes surrounding quotes from a string if present and processes escape sequences. | ||
| // Supports both single (') and double (") quotes. | ||
| // Processes escape sequences: \n, \t, \r, \\, \", \', \uXXXX (Unicode), \xHH (hex), \OOO (octal) |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should clarify the behavior for unknown or invalid escape sequences. According to the implementation, unknown escapes like \z preserve the character after the backslash (resulting in 'z'), and incomplete escapes like \u00a are preserved as-is. Consider adding this to the comment to make the behavior explicit for users of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhanced documentation in commit 61bd580 to explicitly describe behavior for unknown escape sequences (preserves character after backslash) and incomplete escape sequences (preserves literally including backslash).
| // processEscapeSequences decodes escape sequences in a string. | ||
| // Supports: \n, \t, \r, \\, \", \', \uXXXX (Unicode), \xHH (hex), \OOO (octal) |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should clarify the behavior for unknown or invalid escape sequences. According to the implementation at lines 176-178, unknown escapes preserve only the character after the backslash, and incomplete sequences (lines 136-142, 151-157) are preserved literally including the backslash. Adding this information would make the function's contract more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhanced documentation in commit 61bd580 to explicitly describe behavior for unknown escape sequences (preserves character after backslash) and incomplete escape sequences (preserves literally including backslash).
pkg/codingcontext/task_parser.go
Outdated
| octalEnd++ | ||
| } | ||
| octalStr := s[octalStart:octalEnd] | ||
| if val, err := strconv.ParseInt(octalStr, 8, 8); err == nil { |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bitSize parameter should be larger than 8 to handle all valid octal byte values. Using bitSize 8 means ParseInt will reject values greater than 127, but valid octal byte escapes can represent values up to 255 (e.g., \377). This will cause valid octal sequences like \200 through \377 to fall through to the error path and be preserved as-is instead of being decoded. Change the bitSize to 16 or 32 to allow parsing the full range of byte values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed octal parsing bitSize from 8 to 16 in commit 61bd580 to support full byte range 0-255. Now octal sequences like \200-\377 decode correctly to bytes 128-255.
| func TestProcessEscapeSequences(t *testing.T) { | ||
| tests := []struct { | ||
| name string | ||
| input string | ||
| expected string | ||
| }{ | ||
| { | ||
| name: "no escapes", | ||
| input: "hello world", | ||
| expected: "hello world", | ||
| }, | ||
| { | ||
| name: "newline", | ||
| input: `line1\nline2`, | ||
| expected: "line1\nline2", | ||
| }, | ||
| { | ||
| name: "tab", | ||
| input: `col1\tcol2`, | ||
| expected: "col1\tcol2", | ||
| }, | ||
| { | ||
| name: "carriage return", | ||
| input: `line1\rline2`, | ||
| expected: "line1\rline2", | ||
| }, | ||
| { | ||
| name: "backslash", | ||
| input: `path\\to\\file`, | ||
| expected: `path\to\file`, | ||
| }, | ||
| { | ||
| name: "double quote", | ||
| input: `say \"hello\"`, | ||
| expected: `say "hello"`, | ||
| }, | ||
| { | ||
| name: "single quote", | ||
| input: `say \'hello\'`, | ||
| expected: `say 'hello'`, | ||
| }, | ||
| { | ||
| name: "Unicode escape", | ||
| input: `\u0048\u0065\u006c\u006c\u006f`, // "Hello" | ||
| expected: "Hello", | ||
| }, | ||
| { | ||
| name: "hex escape", | ||
| input: `\x48\x65\x6c\x6c\x6f`, // "Hello" | ||
| expected: "Hello", | ||
| }, | ||
| { | ||
| name: "octal escape", | ||
| input: `\110\145\154\154\157`, // "Hello" | ||
| expected: "Hello", | ||
| }, | ||
| { | ||
| name: "mixed escapes", | ||
| input: `\n\t\r\\\"\'\u0020\x20\40`, | ||
| expected: "\n\t\r\\\"' \x20 ", | ||
| }, | ||
| { | ||
| name: "unknown escape", | ||
| input: `\z\q`, | ||
| expected: "zq", | ||
| }, | ||
| { | ||
| name: "incomplete Unicode escape", | ||
| input: `\u00a`, | ||
| expected: `\u00a`, | ||
| }, | ||
| { | ||
| name: "incomplete hex escape", | ||
| input: `\x4`, | ||
| expected: `\x4`, | ||
| }, | ||
| { | ||
| name: "backslash at end", | ||
| input: `hello\`, | ||
| expected: `hello\`, | ||
| }, | ||
| { | ||
| name: "octal with non-octal digits", | ||
| input: `\7\8\9`, | ||
| expected: "\a89", // \7 is octal 7 (ASCII bell \a), \8 and \9 are treated as unknown escapes and output as '8' and '9' | ||
| }, | ||
| { | ||
| name: "short octal sequences", | ||
| input: `\7\77`, | ||
| expected: "\a?", // \7 is octal 7 (ASCII bell \a), \77 is octal 77 (ASCII '?') | ||
| }, | ||
| } | ||
|
|
||
| for _, tt := range tests { | ||
| t.Run(tt.name, func(t *testing.T) { | ||
| result := processEscapeSequences(tt.input) | ||
| if result != tt.expected { | ||
| t.Errorf("processEscapeSequences(%q) = %q, want %q", tt.input, result, tt.expected) | ||
| } | ||
| }) | ||
| } | ||
| } |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test coverage for octal escape sequences should include values above 127 to verify handling of the full byte range. Currently, only small octal values like \7, \40, \77, \101-\103, and \110-\157 are tested. Add test cases for octal values in the range \200-\377 to ensure they decode correctly (e.g., \200 should decode to byte 128, \377 should decode to byte 255).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot apply changes based on this feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test cases for high octal values in commit 61bd580. New tests verify \200 (128), \377 (255), \240 (160), \300 (192), and \350 (232) all decode correctly.
Co-authored-by: alexec <1142830+alexec@users.noreply.github.com>
…byte range Co-authored-by: alexec <1142830+alexec@users.noreply.github.com>
Update Participle Parser for Lenient Params Parsing ✅
Successfully updated the participle-based parser in
task_parser.goto support lenient parsing features as requested.Problem Statement
PR #163 implemented a hand-rolled parser for lenient parameter parsing instead of updating the existing participle-based parser. This PR addresses that by updating the participle parser in
task_parser.goto support all lenient parsing features.Implementation Summary
Changes Made
Updated Lexer Rules (
task_parser.go:121-128)SQStringtoken for single-quoted strings:'(?:\\.|[^'])*'StringtoDQStringfor double-quoted strings:"(?:\\.|[^"])*"Termpattern to exclude single quotes:[^ \t\n\r/"=']+Updated Grammar (
task_parser.go:189)Argumentstruct to accept:@DQString | @SQString | @TermTextLinestruct to handle new token typesImplemented Comprehensive Escape Sequence Processing (
task_parser.go:73-182)processEscapeSequences()function supporting:\n,\t,\r,\\,\",\'\uXXXX(4 hex digits)\xHH(2 hex digits)\OOO(up to 3 octal digits 0-7, supports full byte range 0-255)Updated
stripQuotes()Function (task_parser.go:73-94)Converted Tests to Use Testify (
task_parser_lenient_test.go)assert.Equal(),assert.Len(), etc.require.NoError()andrequire.Error()Features Implemented
✅ Flexible quotes: Both single (
') and double (") quotes✅ Escape sequences: All types (
\n,\t,\r,\\,\",\',\uXXXX,\xHH,\OOO)✅ Full octal range: Octal escapes now support full byte range 0-255 (not just 0-127)
✅ UTF-8 support: Full Unicode support in keys and values
✅ Graceful error handling: Incomplete/invalid escapes preserved as-is
✅ Backwards compatible: All existing tests pass
✅ Unquoted values: Still supported (no escape processing)
✅ Clear documentation: Explicit behavior documentation for edge cases
Testing
New Tests (
task_parser_lenient_test.go- 445 lines):TestParseTask_LenientParsing: 20 test cases for slash command parsingTestStripQuotes: 14 test cases for quote strippingTestProcessEscapeSequences: 19 test cases for escape handling (added 2 new tests for high octal values)Existing Tests:
task_parser_test.gostill pass ✅pkg/codingcontextpass ✅make lint) ✅Example Usage
Files Changed
pkg/codingcontext/task_parser.go(125 lines changed)stripQuotes()andprocessEscapeSequences()pkg/codingcontext/task_parser_lenient_test.go(625 lines new, converted to testify)go.mod(added testify dependency)go.sum(testify checksums)Code Review & Security
Verification
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.