-
Notifications
You must be signed in to change notification settings - Fork 44
Description
I parse pipe-separated data using libcsv. The data may have quotes in it, those should be completely ignored. What is the recommended practice? Maybe if I set value 1 for quote, which I guess would be the code for control-A?
csv_set_quote(&cp, quote)
I'm asking because the following pipe-separated input line hit us recently and when libcsv processed it, the library yielded an impossibly long output record; 1,864,280,683 bytes to be precise:
ABC|jkkdf|1664550195943489|28|0|"wxyz.th|"wxyz.th|::|||17301
I consider this a bug. Making this even trickier to debug, when my program is compiled on OSX with Apple clang, libcsv processes an input file with this line as I expect, no large output. When my program is compiled on Ubuntu bullseye with gcc, that's when we see the bad behavior.
Maybe you feel I have misused the library? Here's how I am using it. First, I'm using default (not strict) checking of quotes:
struct csv_parser cp;
int rc = csv_init(&cp, 0);
Second, I left the parser's config at the default value of 0 for the quote character.
Thanks in advance.
p.s. I revised this issue to ask my question, no longer trying to report a bug. I am still struggling to produce a sanitized data file that reproduces the problem. The line shown above is the content that begins the impossibly long line in the output, but when processed alone in a one-line file, the library signals error immediately.