jamesturnbull: have you used statsd count before? would it be as basic as this, if all i want to do is count the number of hashes that are the same?
whack
LuxuryMode: sometimes the brain makes you do things without you knowing, then you spend an hour trying to figure out what the hell is going on!
rystic
count => [ "%{hash}"]
avleen
whack: i did some tidying up of my config today, removed some unnecessary mutates and prunes. My apache access log was only ~1hour behind at peak time. that's a marked improvement over before. still not certain where the new bottleneck is, but I'll find it ;-)
whack
avleen: do let me know if you find moar bottlenecks
LuxuryMode
whack works now, thanks a lot
whack
avleen: you *should* be able to test throughput with the generator input, if that helps
LuxuryMode: high five!
LuxuryMode
now to figure out a sane logging format
whack
LuxuryMode: glad it was something simple to fix ;)
LuxuryMode
yeah me too
logstash is very friendly
so easy to set up
supyo joined the channel
now to figure out what kind of EC2 instance i really need for this. and what kind of volume should be attached
semiosis
whack: u experrt! s/\\/\//
LuxuryMode
so for now i just took an existing AMI and launched that
avleen
whack: generator goes damn fast, as it should :D my hunch right now is that it's a lot of work to just parse my logs. each apache log line is ~3k, and has 33 fields, so you can imagine the regex for grok is pretty big. that's why i'm also really keen to try this patch
LuxuryMode
anyone know how to change the username and password that kibana uses?
jamesturnbull
rystic: try the metrics plugin
avleen
whack: oh, a question actually. if i drop a plugin in my plugins dir, which has the same name as a built in plugin.. could i expect mine to take precidence over the built in one?
whack
avleen: I mean generator -> your filters
avleen: your plugin should override an existing one
becuase --pluginpath prepends to the RUBYLIB directory list
avleen: would you be open to trying json logs in apache?
avleen: would remove any need to use grok for this specific case
100% yes. Once these apache logs are migrated over fully to logstash, that's my intention :)
semiosis
probably also fail
whack
avleen: you can do it simultaneously, btw
since apache can output logs to multiple places
avleen
well, i suppose i could write the file out twice
whack
yeah
avleen
:D JINX!
whack
I mean, whatever works ;)
just saying you don't have to do a cold cutover
avleen
I'll make a ticket formyself to try that next monday or wednesday.
i know my logstash boxes aren't anywhere neat 100% CPU
it's 20%-30% CPU at most
semiosis
avleen: make it monday!
whack
avleen: interesting
avleen
but.. that doesn't mean they aren't spinning and context switching a lot, etc
semiosis believes in avleen
whack
avleen: can you take a sample log using the generator input, through your filters, and use stdout output? try to measure peak throughput?
silenth has quit
avleen
semiosis: dude, if i could, i would! I might be able to. it's possible.
warkolm has quit
whack: I'll do it in about 5 mins, let's see what happens.
rystic
jamesturnbull: will metrics work for my use case? i want to find unique hashes over time. e.g. in the last 7 days, i want to find hashes that have the lowest counts
whack
avleen: woot
avleen: I measure throughput (poorly) by doing: generator -> filters -> output { stdout { codec => dots } }
avleen: piped to: | pv -War > /dev/null
avleen
dude i love that
whack
one dot == one event, pv outputs bytes/sec, ergo, events per second.
xorred joined the channel
xorred has quit
xorred joined the channel
nicholasf has quit
hacky, but it's a beautiful thing.
avleen: in your case, use same filterworker count as you do today