-
Notifications
You must be signed in to change notification settings - Fork 39
Description
- Version: 3.0.10
When using autodetect_column_names, if either
- logstash is stopped and restarted in the middle of reading a csv file, or
- logstash finishes reading a file with one column layout and starts reading a different file with a different column layout
then the behaviour is not what might be expected.
In the first case, the column names will be re-read from the next event where LS left off before being stopped - so the event data in that row becomes the column names for the rest of the file.
In the second case, column names are not re-read on starting a new file, so the data in the new file is treated as if it were in the format of the previous file.
Additionally, I think in the second case, the header line will be ingested as data, even if it is column names.
We should add a caveat in the docs to cover these scenarios.
Something along the lines of a note like:
When
autodetect_column_namesis set totrue, the column names information is only parsed when Logstash starts. Refrain from using this setting if there's a chance Logstash will restart while in the middle of a file, or if you are ingesting multiple csv files which each have column names as the first line