0:03 AM
tchiang has quit
0:03 AM
badbenarnold has quit
0:05 AM
artbikes has quit
0:06 AM
behrends_ joined the channel
0:08 AM
behrendsj has quit
0:15 AM
behrends_ has quit
0:15 AM
skinp_ joined the channel
0:16 AM
Jerry__ joined the channel
0:17 AM
juarez has quit
0:17 AM
hooper has quit
0:18 AM
skinp has quit
0:21 AM
Jerry__ has quit
0:21 AM
Jerry90 joined the channel
0:30 AM
badbenarnold joined the channel
0:35 AM
juarez joined the channel
0:38 AM
badbenarnold has quit
0:39 AM
santiagoR has quit
0:40 AM
duck_cpd joined the channel
0:40 AM
Jerry90 has quit
0:40 AM
Jry joined the channel
0:40 AM
GregMefford
0:40 AM
logstashbot
0:44 AM
bdpayne has quit
0:47 AM
zwi1 has quit
0:48 AM
darrend joined the channel
0:49 AM
jjfalling is now known as jjfalling_off
0:49 AM
juarez has quit
0:53 AM
_Bryan_ joined the channel
0:53 AM
lukewaite joined the channel
0:53 AM
pu22l3r joined the channel
0:59 AM
lukewaite has quit
1:03 AM
Wolland_ has quit
1:05 AM
badbenarnold joined the channel
1:08 AM
pu22l3r has quit
1:08 AM
zwi joined the channel
1:12 AM
neurodrone has quit
1:14 AM
antons_ has quit
1:14 AM
whack
_Bryan_: sup?
1:14 AM
badbenarnold has quit
1:14 AM
antons joined the channel
1:14 AM
_Bryan_
whack: is there anything I need to catch other than gc_logs that might tell me why my indexers are dieing north of 70k/s?
1:18 AM
I am seeing this problem alot more since I added another 1000 shippers
1:19 AM
Damm
and what do you write to logstash directly? redis?
1:19 AM
rabbit?
1:20 AM
|PiP| joined the channel
1:20 AM
|PiP|
im trying to use logstash with the embedded elasticsearch/kibana, but get this warning: WARN: org.elasticsearch.discovery: [logstash-logs-1118-4006] waited for 30s and no initial state was set by the discovery
1:21 AM
im running logstash with this command: bin/logstash agent -f conf.conf web
1:22 AM
1:22 AM
logstashbot
1:23 AM
skinp_ has quit
1:23 AM
bnzmnzhnz joined the channel
1:24 AM
zwi has quit
1:25 AM
behrendsj joined the channel
1:25 AM
jlintz
_Bryan_: you able to get a heap dump or stack trace?
1:26 AM
_Bryan_
I am watching them now....my site reliability team turned on handlers and was bringing it back up...
1:26 AM
is there a way to tell java to not overwrite the gc log file on startup
1:26 AM
swc|666 has quit
1:27 AM
I am runing these options with a 24GB heap
1:27 AM
-Djava.io.tmpdir=${LS_HOME} -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/data-fio/ELK1/logs/gc_log.log
1:28 AM
and 24 workers
1:28 AM
OH..I can edit the init.d script to save log....I will go do that...
1:28 AM
colinsurprenant joined the channel
1:29 AM
zbp has quit
1:29 AM
szhem has quit
1:30 AM
zwi joined the channel
1:30 AM
PaulCzar has quit
1:30 AM
neurodrone joined the channel
1:30 AM
jalaziz has quit
1:30 AM
Zolmeister joined the channel
1:32 AM
jmreicha_ has quit
1:32 AM
jmreicha_ joined the channel
1:33 AM
szhem joined the channel
1:33 AM
jlintz
what java are you using?
1:33 AM
_Bryan_
1:33 AM
logstashbot
1:34 AM
_Bryan_
this seems to be at many of the crashes
1:34 AM
jlintz
-XX:+PrintCompilation may be helpful as well
1:34 AM
_Bryan_
java-1.7.0-oracle-1.7.0.25-1jpp.1.el6_4.x86_64
1:34 AM
jlintz
paired with -XX:+TraceClassLoading and unloading
1:35 AM
also -XX:-PrintConcurrentLocks
1:36 AM
whack
_Bryan_: dying how?
1:36 AM
_Bryan_
-XX:+TraceClassLoading -XX:+TraceClassUnloading
1:36 AM
currently running but trying to set it up to get the info when it dies...
1:36 AM
I expect it to crash this evening sometime..as I added alot of load to it about 2 hours ago
1:36 AM
if not tonight when real production kicks up tomorrow
1:37 AM
jmreicha_ has quit
1:37 AM
jlintz: is that right for the unloading option?
1:37 AM
this is now java opts
1:37 AM
-Djava.io.tmpdir=${LS_HOME} -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/data-fio/ELK1/logs/gc_log.log -XX:+PrintCompilation -XX:+TraceClassLoading -XX:-PrintConcurrentLocks
1:38 AM
zbp joined the channel
1:38 AM
jlintz
_Bryan_: ya, looks right, although that last one, should be a + instead of minus
1:38 AM
for concurrent locks
1:38 AM
_Bryan_
k
1:38 AM
I will maek the change on all 5 indexers
1:39 AM
federated_life1 has quit
1:39 AM
whack
_Bryan_: my laptop can do 140k eps with a signle logstash agent.
1:39 AM
_Bryan_: what error/crash/symptoms do you see?
1:39 AM
jlintz
whack: can i put your laptop into production here =)
1:39 AM
_Bryan_
in that link
1:39 AM
1:39 AM
logstashbot
1:39 AM
_Bryan_
that seems to always be there when it dies...
1:40 AM
but I am setting up to catch them..have had issues since it overwrite the log file on restart...
1:40 AM
badbenarnold joined the channel
1:40 AM
I edited the init.d so when the handlers restart it the log will be saved
1:40 AM
whack
_Bryan_: what output is that? -verbose:gc?
1:40 AM
what are the symptoms?
1:41 AM
what makes you think it's GC?
1:41 AM
_Bryan_
when th eload gets steady for awhile over about 70k/s the logstash daemon basically stops its flow...and It starts processing only 10-20eps./.not thousands..jsut 10-20
1:42 AM
I dont think it is GC..I dont know what it is..and I am looking for ways to get more information on it
1:42 AM
whack
look at cpu usage first
1:42 AM
what outputs? what logstash config?
1:42 AM
_Bryan_
never over load of 6
1:42 AM
24 cores..32GB ram..24GB heap...
1:42 AM
let me grab config
1:43 AM
whack
can you show your config and logstash command line?
1:43 AM
and if it gets stuck again, capture jstack and top -Hp <logstash_pid> output
1:43 AM
_Bryan_
yeah gimmie a sec
1:43 AM
ok will add to notes for what to grab
1:45 AM
Sht0 has quit
1:46 AM
rhavens has quit
1:47 AM
do you need all filters?
1:47 AM
will take me a few mins to redact if so
1:49 AM
badbenarnold has quit
1:49 AM
swc|666 joined the channel
1:50 AM
1:50 AM
logstashbot
1:50 AM
_Bryan_
that is most of the filters....I have removed others that are just json { source => "message" } and mutate to drop message
1:51 AM
the commandline is at the bottom
1:52 AM
icebourg joined the channel
1:52 AM
zwi has quit
1:52 AM
icebourg has quit
1:53 AM
icebourg joined the channel
1:55 AM
whack
flush_size => 50000 is probably not helping you
1:55 AM
how big is your ES cluster?
1:55 AM
given the behavior, this is probably ES not being able to keep up with the load
1:55 AM
_Bryan_
8 nodes running 2 ES instances each
1:56 AM
on Fusio IO drives on the indexers are conencted via 10GB
1:56 AM
whack
is 192.168.150.11 a load balanacer?
1:56 AM
_Bryan_
primary master
1:56 AM
zwi joined the channel
1:56 AM
whack
don't do that.
1:56 AM
_Bryan_
there is a 3 moster cluster
1:56 AM
whack
don't send any requests to your master
1:57 AM
if you have isolated master roles (very recommended), never send requests there
1:57 AM
why 2 ES instances?
1:57 AM
24gb heap is not recmomended for 32gb ram
1:57 AM
_Bryan_: you'd benefit from support :P
1:57 AM
_Bryan_
ES has 146GB ram..and running 2 28GB heaps
1:58 AM
indexrs have 23GB ram with 24GB heap