#logstash

/

      • rastro
        yeukhon: each index consumes heap space, so if they have the same (or even nearly the same) retention, then you should keep them in the same index.
      • wt0f joined the channel
      • fullerja has quit
      • kjstone00_ joined the channel
      • yeukhon
        rastro thanks. and i would rather use index_type to group similar items, so to make query slightly more efficient.
      • sorry i know this is more ES/Kibana question :-)
      • asimzaidi has quit
      • rpetre
        somewhat related, i'm looking into tips on how to retain data as long as possible
      • leaving data compression aside, what limitations are regarding number of indexes/shards on a single node?
      • yeukhon
        rpetre ES? Single node you lose HA and scalability.
      • rpetre
        i notice ES spawns a lot of threads, maybe one per shard?
      • yeukhon: the number of nodes and the hardware is pretty much given
      • yeukhon
        rpetre oh i read it the wrong way :-)
      • rpetre
        i'm trying to see how long i can promise to keep data searchable
      • rastro
        rpetre: you will eventually run out of heap with more open indexes/shards.
      • rpetre: the idea is to add more nodes with more ram.
      • rpetre
        i've read a lot about space optimizations, but i see having lots of indexes is not very fun
      • rastro
        rpetre: also, you can reduce the heap footprint with doc_values
      • rpetre
        paramedic tends to kill my browser, for instance :)
      • wt0f has quit
      • yeukhon
        we have a cluster with a lot of shards and we would hit the rec 64k file descriptors and have to set something higher
      • ulimit is something we dislike to set.
      • rpetre
        rastro: any resources i could read related to that?
      • the machine is pretty much dedicated to this, so i don't care about various os tune-ups i might make
      • rastro
      • logstashbot
      • rpetre
        rastro: so cpu cores is not a limiting factor, usually?
      • rastro
      • logstashbot
      • rastro
        rpetre: i find LS to be cpu-limited, and ES to be RAM-limited.
      • rpetre
        i'm not very experienced in managing jvm apps
      • rastro: aha, good to know
      • I'll probably give LS workers until it keeps redis empty
      • crshman
        ah found my issue
      • logstashbot
        Title: tcp output plugin does not send newlines · Issue #1650 · elastic/logstash · GitHub (at github.com)
      • crshman
        that was a head scratcher
      • rpetre
        rastro, yeukhon : any idea if it's worth keeping monthly indexes instead of daily?
      • habanero has quit
      • they'll probably get pretty big, but i'll have to see if storage permits
      • rastro
        rpetre: it's harder to change mappings, and you lose granularity with retention. Also, there's some limit on recommended shard size, i think.
      • yeukhon
        rpetre: i just started thinking about this lately, but i think from what i gather above…match with data pattern and rention. if i want to look at 14 days, 21 days, it probably be easier to just do daily, if my data is a monthly type, makes more sense with monthly? but rastro can correct me.
      • JDiPierro has quit
      • rpetre
        i currently keep a year worth of webserver logs, i'd love to be able to expose those to users instead of me awk-ing and perl-ing through them avery now and then
      • brandt_tullis_ has quit
      • rastro
        rpetre: absolutely!
      • rpetre: another option is to close the index, so it's still around but won't be searched until reopened.
      • rpetre
        i need to have at least yearly retention of data for audit purposes as well, i was a bit bummed when i realized it's probably not feasible in elk
      • yeukhon
        rpetre: yeah searchbility and retention are two things
      • we retain the logs in a backup (like S3 and then Glacier)
      • rpetre
        yeah, us as well
      • but at least webserver logs are useful
      • "are clients using this obsolete interface?"
      • spuder joined the channel
      • "have we seen this guy before?"
      • habanero joined the channel
      • FoosMasta
        hey…how can I check if logstash is running when deployed as a service? I thought “ps aux | grep logstash” would do the trick
      • but I don’t see anything…
      • pcmerc_work has quit
      • on linux
      • rastro
        FoosMasta: then it's not running :)
      • rpetre: start building 64GB machines to add to the cluster. lather, rinse, repeat.
      • rpetre: doc_values seems to be giving us a 10x gain.
      • rpetre
        rastro: i have _one_ 64GB machine (actually two, but i'll have to use the other one for graphite, which will kill my io)
      • yeukhon
        wait if you want more than 32, and take advantage of it, you must run 64?
      • rastro
        rpetre: moving to three will give you more heap in the cluster and also an IO boost.
      • rpetre
        i've actually added the other machine as an ES node, but i'm keeping logstash off it, it's there just for kibana and grafana redundancy
      • rastro
        yeukhon: you're supposed to give 50% of the ram to ES, but below 32GB, so a machine with 64GB is the "largest" you would need.
      • rpetre
        yeah, well, this one will have to justify its own budget first
      • rastro
        rpetre: of course
      • rpetre
        why is the 32GB limit there?
      • torrancew
        jvm
      • rpetre
        i've read it last night as well
      • torrancew
        the jvm performance starts to deteriorate with heaps over 32gb IIRC
      • yeukhon
        rastro ah you are right :-) well if you want 64 you'd go 128 then… but I really think no one runs that size. Usually larger heap -> can mean longer pause. I think there are jvm options to make that less painful.
      • rpetre
        oh, i see
      • probably one can have multiple ES instances on the same box
      • rastro
        yeukhon: if all you had was a machine with 128G of ram, then run two nodes on that one box.
      • torrancew
        yeah, that's a work-around
      • split up into different instances of jvm
      • yeukhon
        right.
      • torrancew
        ^^^ like rastro just said!
      • rastro
        yeukhon: but you're sharing disk and other busses.
      • rpetre
        i noticed in the routing settings it can tell if there are multiple nodes on the same server
      • and power supplies ;)
      • torrancew
        rastro: well, you're assuming they are (and you're likely right)
      • but you could certainly point each es at different mount points, and they could certainly be attached over different buses
      • no refuting the power supply point, though!
      • nemothekid joined the channel
      • rpetre
        are there any other good ES maintenance tools besides curator i should look at?
      • sfeinste joined the channel
      • torrancew
        do you have head/kopf/$something already?
      • rpetre
        yes, but i mean for automated index management
      • jbehrends has quit
      • torrancew
        not that I know of
      • rpetre
        kopf is pretty nice, it helps me write json better than in cli :)
      • double-p
        jq can be of help, too
      • rpetre
        i'm thinking maybe after a while i can do some other tricks to indexes, like reimporting data with fewer fields
      • i keep telling myself to learn jq's minilanguage better, but i keep postponing it
      • maybe once we start using ec2 more
      • kepper joined the channel
      • kepper has quit
      • kepper joined the channel
      • hemphill has quit
      • pcmerc_work joined the channel
      • ian_mac
        the rationale, as I understand it, behind limiting heap size to 32gigs, is that the JVM can compress pointers and only use a 32bit pointer up to 32gigs of heap. Once you pass 32gigs of heap the JVM has to use 64bit pointers and so you effectively lose heap size.
      • logstashbot
        <http://tinyurl.com/kjezw4v>; (at blog.codecentric.de)
      • kepper has quit
      • rpetre
        it mentions this reason in the memory usage page linked ;)
      • Knuit_M36 joined the channel
      • Knuit_Mobile has quit
      • Knuit_M36 has quit
      • Knuit_Mobile joined the channel
      • ian_mac
        ah I didn't see a link. I only saw reference to larger heap size resulting in longer GC pauses
      • which isn't really the primary reason.
      • Knuit_Mobile has quit
      • rpetre
      • logstashbot
      • rpetre
        the fielddata limit circuit breaker is interesting as well
      • logger has quit
      • dberry joined the channel
      • dberry has quit
      • dberry joined the channel
      • ian_mac
        I assume I have to use something like the json codec to pass events over udp and keep the fields intact, right?
      • duck_cpd has quit
      • colinsurprenant has quit
      • duck_cpd joined the channel
      • michaelhart has quit
      • wt0f joined the channel
      • Rapture has quit
      • colinsurprenant joined the channel
      • Sartsj joined the channel
      • kepper joined the channel
      • kepper has quit
      • dm3 joined the channel
      • kepper joined the channel
      • FoosMasta has quit
      • rastro has quit
      • kepper has quit
      • iamchrisf joined the channel
      • radiocats joined the channel
      • rastro joined the channel
      • t4nk926 joined the channel
      • t4nk926
        there is a field in the log that shows up as : james.myers
      • how can i combine it into one field. also, sometimes there is only one name :(
      • sangdrax joined the channel
      • kepper joined the channel
      • gentunian joined the channel
      • sangdrax
        Looking for some help using field substitution in a string field of the DNS filter. It seems to not substitute my variable.