rashidkpc: I guess I'm not understanding why in the table the request field is not split, and in terms it is?
radiocat_ joined the channel
bemehow joined the channel
hugespoon joined the channel
bemehow_ has quit
jessemdavis has quit
danshultz has quit
radiocats has quit
phantasm66 has quit
danman has quit
radiocat_ has quit
dschaude1 has quit
guhcampos has quit
hugespoon has quit
hugespoon joined the channel
kaspergrubbe joined the channel
failshell has quit
hugespoon has quit
gster_ is now known as gster
hugespoon joined the channel
kaspergrubbe has quit
gster
Does it make sense to only use grok for filtering out the date and pass it to the date filter ? It does not seem to be recommended as i am seeing this "# I'm still on the fence about using grok to perform the time match" in grok patterns. Can anyone share its experience ?
torrancew
gster: right now, it's the best way to go
the comment is essentially a note that there may be a more performant way to do it
whack
gster: that was actually a comment describing a commented-out pattern
2 # TIME %{POSINT<24}:%{POSINT<60}(?::%{POSINT<60}(?:\.%{POSINT})?)?
torrancew
hahaha
nice
whack
that pattern is not valid in the current incarnation of grok
you cannot do predicates like that anymore because Ruby's regexp engine can't do it (and my patch adding support for it was rejected by matz 6 years ago)
saddly
jspeck has quit
torrancew
o.O
_Bryan_
whack: how high of a throughput have you seen in a real world setting..not just testing? I am seeing alot slower speed when I have real logs to parse
whack
_Bryan_: "not just testing" ?
_Bryan_: not sure what that means, but I've seen clusters of 7 machines do 100000 events/sec
_Bryan_
most of mu logs have between 50-100 fields being parsed
whack
in practice few folks *need* that kind of throughput
_Bryan_: well that'll slow things down
_Bryan_
whack: when I was just throwing data through I was hitting 230,000-250,000 eps....but now with real logs I am having issues keeping up with 50k eps
whack
"real logs"
seems... ike a strange concept
why not test with real logs?
_Bryan_
but this is with 2 ES nodes (2 shards 1 replica), 4 indexers, 2 redis servers in cache only mode, and 4 shippers
whack
afk for a bit
sam_stav has quit
SpeeR joined the channel
freezey has quit
_Bryan_
whack: was working with real logs files just was not fully parsing the data out...
when you jsut pull the data and insert into ES it is faster than when you add in the parsing... 8-) I was originally just trying to see how fast data would go into ES
freezey joined the channel
now I am working on the real logs...and full parsing
jlawson has quit
one thing I have been wondering...so most people put the indexers, redis, and ES nodes all on seperate systems? or do they combine some of them onto a singel physical system?
Maior
_Bryan_: distinct
_Bryan_: esp when having srs load
_Bryan_
and what kind of network are the systems running on that does 100,000 eps...
Maior
y'know
gster
torrancew: ok thx. For now I am not using grok or any date filter. This seems to be the reason why some of my logs are beeing indexed in the same document.
_Bryan_
Maior: mine are all seperate just was not sure what the general consensus out there was
Maior
in a week or two I'll be doing this kind of cap planning :P
mb3
wow, the "workers" option on the elasticsearch output really does the trick!
torrancew
gster: yeah, you really want to use the date filter, more often than not
_Bryan_
I will still be doing it then... so ping em if you want any numbers....I have been on this for about 3 weeks now and shoudl be incresing the hardware footprint ALOT next week for production level testing
my best guess is that setting the date filter would solve the problem but I still find this behavior weird though
untergeek has quit
unterfeed_ls has quit
torrancew
gster: it looks more like a multiline filter gone wrong
threesome has quit
nstielau_ joined the channel
JoeJulian
davuxx: My first test failed because I tested on one of my openstack servers and I have a release version of beaver installed there. When testing against master your regex worked fine.
DigiAngel joined the channel
torrancew
gster: I don't see an obvious error in your multiline, but it may be worth disabling/tweaking that to see if it changes things
davuxx
JoeJulian: weird
DigiAngel
Question all
Oh
torrancew
What you are seeing is /not/ a result of overlapping dates
davuxx
JoeJulian: can you please paste the conf you used and a sample input?
DigiAngel
How does one NOT get lines that you don't want to see from say syslog?
torrancew
DigiAngel: you can use the grep filter to drop messages
_Bryan_
DigiAngel: Check with a conditional and drop the event
DigiAngel
I looked at that, but what spooked me was:
A hash of matches of field => regexp. If multiple matches are specified, all must match for the grep to be considered successful
torrancew
... ok, what of it?
DigiAngel
If I want to match snort OR kernel
davuxx
JoeJulian: oh, not weird actually: I installed beaver using PIP, so I guess it's a released version as well
hugespoon
DigiAngel: you can use more than one grep filter in your config
hughlolrus joined the channel
torrancew
DigiAngel: then you'd use a pattern that can match either, or 2 grep filters
kindjal has quit
hugespoon
right
DigiAngel
So....(snort|kernel)
_Bryan_
DigiAngel: I do this for raw java dumps in the log files....if any line starts with "{" or white space I drop it for that log
torrancew
It really comes down to how easy the data you want to filter on is to get at
DigiAngel
It's easy :)...I think ;)
torrancew
DigiAngel: if it's a field like "type", a conditional and drop{} may be best
hugespoon
_Bryan_: what, you dont want full stack traces? :-p
DigiAngel
Ok ya....I have the fields set
torrancew
if it's something you have to parse out, grep{} is probably more effecient than doing something like running a grok *then* a drop
DigiAngel
The aptly named snort and kernel fields ;)
Ok I'll give that a go for testing
_Bryan_
hugespoon: also used it to "encourage" my developers to make thier json output single line for parsing rather than pretty and over many lines for legibility in the raw log file
DigiAngel
Thanks for the info...gonna go and try that now
Thanks all...appreciate it...I'll report my findings later :)
DigiAngel has quit
nstielau_ has quit
jerius has quit
bemehow has quit
postwait has quit
nstielau_ joined the channel
bemehow joined the channel
gster
torrancew: thank you for your input.I'll try to configure a different multiline pattern for tomcat exceptions