Escape backslashes when building tab-separated row#317
Escape backslashes when building tab-separated row#317mitchellhenke wants to merge 1 commit intopgRouting:developfrom
Conversation
WalkthroughA Boost header for string replacement was added to Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/utilities/utilities.cpp (1)
37-48: Add escaping for tabs, newlines, and carriage returns intab_separated().PostgreSQL's COPY text format requires that backslashes, newlines, carriage returns, and the delimiter character must be preceded by a backslash if they appear as part of a column value. The current implementation at
src/utilities/utilities.cpp:37-48only escapes backslashes but omits the three other required escape sequences. Sincetab_separated()uses tab as the delimiter, actual tab characters in the data would corrupt the format.Add the following escape operations before line 44 (after the backslash escape):
boost::replace_all(column, "\n", "\\n"); boost::replace_all(column, "\r", "\\r"); boost::replace_all(column, "\t", "\\t");This fix applies to all calls to
tab_separated()in the codebase (lines 466 insrc/database/Export2DB.cpp, line 79 insrc/configuration/tag_key.cpp, and line 97 ininclude/database/Export2DB.h), particularly for way names and other OSM-sourced data that may contain these characters.
🧹 Nitpick comments (2)
src/utilities/utilities.cpp (2)
40-40: Simplify the empty check.The condition
column.empty() || column == ""is redundant sinceempty()already checks for zero length.Apply this diff:
- if (column.empty() || column == "") { + if (column.empty()) {
39-39: Optional: Consider optimizing to avoid copying strings without backslashes.The loop uses pass-by-value (
auto column), which copies every string even when no escaping is needed. For better performance when backslashes are rare, consider usingconst auto&and only copying when replacement is necessary.Example optimization:
- for (auto column: columns) { + for (const auto& column: columns) { if (column.empty()) { result += "\\N\t"; + } else if (column.find('\\') != std::string::npos) { + auto escaped = column; + boost::replace_all(escaped, "\\", "\\\\"); + result += escaped + "\t"; } else { - boost::replace_all(column, "\\", "\\\\"); result += column + "\t"; }Note: This adds complexity, so only worthwhile if profiling shows string copies are a bottleneck.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/utilities/utilities.cpp(2 hunks)
🔇 Additional comments (1)
src/utilities/utilities.cpp (1)
23-23: LGTM: Header addition is correct.The Boost string replace header is necessary for the
boost::replace_allfunction used below.
|
All tests are failing, Create the #318 and merged your changes |
|
Thank you for the quick review and merge! |
Thanks for making and maintaining this package, it's really excellent! I have been running into the issue mentioned in #259 (comment) where a name of
"\"causes the process to fail with:I noticed that in the past there was escaping of backslashes (a120fc2), but I wasn't able to track down the removal, so I've taken a similar approach here. I've included a minimal test OSM file below that fails without the patch and succeeds with it. It's based on a much larger file that fails and succeeds in the same way.
Test File
Changes proposed in this pull request:
@pgRouting/admins
Summary by CodeRabbit