Fix: Handle Partial Log Line Writes in CSV Parsing #17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
pg_statsinfo reads PostgreSQL CSV logs, parses each entry, and inserts the data into monitoring tables. In certain cases—such as delayed log flushing or during log rotation—it may encounter a partially written CSV log line and incorrectly treat it as complete.
This can result in:
Skipping the remaining portion of the actual log line on the next read
Malformed parsing due to an incomplete field set
Insert failures like: ERROR: invalid input syntax for type timestamp with time zone: "03:32.829 JST"
Fix Summary
File offset is now updated only after successfully parsing a complete log entry.
Partial lines are safely skipped without updating the read position.
Introduced internal tracking variables:
Improved logic for log rotation and partial flush scenarios to ensure stable and reliable log ingestion
Additional Notes
This fix enhances robustness, especially in environments with:
The behavior was verified through both manual and automated test scenarios using a dedicated partial write reproduction script (see tools/reproduce_partial_write.py)