Hello! I'm using last versions of logstash and filebeat. My goal is usual - harvest logs from windows machine. But I have an unusual case: proprietary software write logs to file every 5 seconds with ~2,5mb of nulls (to avoid disk fragmentation). Because of that, a lot of lines are missing. Can you, please, advise me, what to do?
decanio has quit
decanio joined the channel
Xylakant
linjan: the software writes 2.5MB of nulls to the log file or does it overwrite the existing log lines with nulls?
stavinsky joined the channel
bjorn_
Depending on your log shipper, you might be able to exclude empty lines. What's your shipper setup on your Windows nodes? Filebeat?
linjan
Xylakant: first one. writes a lot of nulls, and next time replaces nulls with actual log lines
bjorn_: yes, filebeat
Xylakant
linjan: well, that sucks. and I don't think there's a way to fix that
other than fixing the software.
filebeat remembers where it stopped in the file, so after it read all the nulls and forwarded them, it will never re-read that portion of the file.
bjorn_
Is the log file location fixed?
Xylakant
you could try and patch filebeat to not continue reading the file if it encounters null bytes.
Heartsbane has quit
stavinsky
Xylakant: this software writes a lot of data in log. About 1Gb per day per one file. They trying to prevent fragmentation. So we cant fix that proprietary software ( I'm thinking about write simple parser in python and send to logstash
linjan
bjorn_: fixed, yep
bjorn_
Perhaps there's a way to catch the file content in transit
Xylakant
1GB per day is less than 12KB per second
Darcidride joined the channel
that's not a lot.
that's actually tiny.
stavinsky
Xylakant: for ntfs fragmentation is like crazy )
Xylakant: for linux is tiny )))
linjan
Xylakant: thank you for advice, maybe there is another solution, except filebeat patching?
bjorn_
I would've tried to intercept the file location, but I have no idea whether that's doable in Windows.
Filtering out null bytes and only forwarding to filebeat when there's real content.
hugh_jass joined the channel
Beams joined the channel
Xylakant
linjan: build your own log shipper. maybe it's also possible to convince the application to log to something like syslog or similar.
wandering_vagran joined the channel
I'd also be willing to place bets that this is a case of premature optimization.
the software tries to optimize for a problem that does not truly exist.
but in practice, that doesn't matter since you can't change the software.
indistylo joined the channel
notebox joined the channel
indistylo joined the channel
bjorn_
I have no idea whether this would work, so I'm just throwing it out there: Read the full log file in regular intervals, and implement something that only picks up events that haven't been seen before. Perhaps the fingerprint filter could be used.
To make this work you might need to install Logstash on the Windows node, not only Filebeat.
stavinsky
Xylakant: this is very old financial software ) they still supporting it since 2000th. But you're right, some of their decisions a little bit stupid)
bjorn_
Filebeat can accept data on STDIN, so you can essentially pipe the log file to Filebeat once a minute. Then you'd have to do some deduplication on the receiving side.
Xylakant
i'd probably rather go and change filebeat :)
it currently reads until EOF, so it shouldn't be too hard to add "read until EOF or NULL"
(last famous words of a programmer)
bjorn_
:D
Xylakant
or maybe have that something that reads the file and writes to filebeat stall on null bytes
advantage: vanilla filebeat.
disadvantage: yet another moving part.
bjorn_
I don't think this case can be solved without having to accept some disadvantage :-)
Mattch joined the channel
Xylakant
life's a trade.
¯\_(ツ)_/¯
linjan
Xylakant: bjorn_: a lot of thanks, guys
hugh_jass joined the channel
inviz joined the channel
wandering_vagran joined the channel
_KaszpiR_ has quit
_KaszpiR_ joined the channel
hugh_jass joined the channel
indistylo has quit
jojo- joined the channel
jojo- has quit
jojo- joined the channel
jojo- has left the channel
indistylo joined the channel
hugh_jass joined the channel
_KaszpiR_ has quit
_KaszpiR_ joined the channel
rofl____ joined the channel
rofl____
how can i add a timestamp field for when the event is processed by logstash? not by logentry timestamp
to see latency between shippers/indexers
notebox joined the channel
basiclaser joined the channel
linjan
rofl____: Unless a message has a @timestamp field when it enters Logstash it'll create that field and initialize it with the current time. Depending on your configuration you might be able to just save that timestamp (possibly in another field)