#twisted

/

      • runciter
        simpson: so i remember there was some weirdness related to lambda reusing persistent processes?
      • simpson
        I'd *like* to tell you to use https://github.com/nficano/python-lambda . I'd like to tell you that I sent patches to it to make Twisted work. But I can't.
      • Man, I haven't seen that movie in years. So good though.
      • simpson recalls things
      • runciter
        wait which movie was that
      • simpson
        The Incredibles.
      • Okay, so IIRC the way it works is that Lambda will reuse your Python processes, so you can't just react() to victory.
      • runciter
        OMG right
      • maybe i'm just gonna watch the incredibles tonight instead of this whole programming thing
      • simpson: bah
      • simpson
        You can either use Crochet, which is what I ended up doing, or you can use the signal module to kill your worker shortly after the request completes. I don't know which is better, but I suspect that Crochet is the less-unreliable of the two.
      • runciter
        simpson: hmm, ok!
      • simpson
        Aside from that, the only thing to know is that sometimes you might have to vendor DSOs. The target is Amazon Linux and if e.g. Cryptography wants some updated OpenSSL DSOs, you have to bundle them by popping them in your deployment dir.
      • runciter
        simpson: i kind of wonder what aws thinks if you repeatedly murder your lambda processes
      • probably more $$$
      • simpson
        Probably! I'm not sure; I left that route because it was not very stable.
      • python-lambda will vendor *everything* that you have in the deployment dir. Which is great as long as you didn't accidentally put keys in there.
      • runciter
        yeeeeesh
      • jfhbrook
        python-lambda? this sounds relevant to my interests
      • runciter
        lambda's deployment story is pretty bad
      • jfhbrook
        in general tooling around bundling lamb----yes
      • runciter
        i hear they're working on it though
      • jfhbrook
        that
      • runciter
        chalice actually doesn't do things the worst possible way now
      • it does things pretty well
      • but you have to use chalice
      • simpson
        I sent in this patch precisely because of this caveat: https://github.com/nficano/python-lambda/commit...
      • jfhbrook
        it's ok with node cause node works that way by default
      • zipballing on the cli still sucks but your node_modules are local anyway
      • anyway
      • runciter
        simpson: bless you
      • jfhbrook reminds himself that he doesn't "do" node anymore
      • jfhbrook
        simpson: a .lambdaignore would be cool
      • simpson
        runciter: This is what you really want; it makes Twisted work: https://github.com/nficano/python-lambda/commit...
      • jfhbrook
        if not already supported
      • simpson: wait what?
      • simpson
        jfhbrook: That sounds like a great idea! Sketch it out and send a PR. The maintainer is super-open to this kind of safety-belt work.
      • runciter
        simpson: fff it trolls sys.path for packages?
      • jfhbrook
        yeah, that's my understanding ^
      • simpson
        Yeah. Run in a virtualenv that has nothing but your function's requirements.
      • jfhbrook
        simpson: if only I used lambda XD this might be relevant to my interests coming up in q4/q1 but not yet
      • hoping I can get devops to write the lambda function I currently want
      • is that with open/pass pattern the same as a touch?
      • simpson
        Yeah.
      • jfhbrook
        almost surprised there isn't a touch convenience method
      • simpson
        IKR
      • jfhbrook
        that's one thing I liked about early 2010s node, people would package anything
      • in python if it's less than 20 lines people will say it's not worth it
      • runciter
        simpson: thank you
      • i'm sure i'll have to deal with lambda again, and now i can use a twisted!
      • simpson
        runciter: No problem. I hated Lambda but it has a definite niche and the tooling wasn't that bad.
      • runciter
        yeah
      • i was pretty unhappy with the tooling but at that job were pretty bad at everything
      • jfhbrook
        I think lambda is really good for gluing aws services together
      • runciter
        use fewer aws services
      • jfhbrook
        like, we have one that auto-loads s3 blobs into redshift
      • runciter
        cheaper, easier to test, and saner failure modes
      • jfhbrook
        I want one that parses s3 blobs and puts them into logentries
      • so there are some aws services that are pretty janky
      • I've heard nothing but bad things about ecs for instance
      • but redshift is actually nice
      • it's like oracle olap but cheaper and close enough to postgres that 80% of pg stuff works
      • though, warning, if you ask #postgresql redshift questions they'll hate you even if you make it clear you're not asking pg questions
      • they're huge jerks
      • runciter
        redshift requires a lot of optimization to perform well, and it's different enough from postgres is that it's really hard to test locally
      • well, who knows what amazon's done this time
      • jfhbrook
        oh yeah, impossible to test locally
      • you need a stag env and custom schemas
      • runciter
        yeah, that's a deal breaker
      • jfhbrook
        for local testing
      • well it's 6 bucks a day which sucks
      • still, I'd rather that than running oracle
      • and like
      • bigquery as an alternative
      • also impossible to run in a local capacity
      • runciter
        i don't have a use for either of those
      • jfhbrook
        and the billing model means you can accidentally make yourself go bankrupt with a select * equivalent, womp womp
      • well yeah, you'd only use redshift/oracle/bigquery if you were data warehousing
      • runciter
        we were, at the last company, and it sucked
      • we should have just put the data into a regular postgres instance
      • jfhbrook
        so I do think there's an argument for that
      • like I do believe that redshift can handle scales that postgres cannot
      • but
      • runciter
        nobody has that much data
      • jfhbrook
        not all "warehousing" actually takes up that much space
      • simpson
        jfhbrook: There's a fascinating rumor going around that Lambda actually has been *internally* available at AWS for a long time, and that it has been an integral part of some of their older offerings.
      • runciter
        i mean, like 10 companies in the world have that much data
      • jfhbrook
        that *is* fascinating
      • what I mean runciter is that if you're only scraping the google analytics api every hour or two, that's only a couple hundred rows a day depending on how you're segmenting/storing data
      • different scale from, say, importing raw GA sessions into BQ
      • runciter
        how much traffic does your site actually get?
      • jfhbrook
        and I think there is a thing where people are like, oh we're doing DWing that means we need a DW db
      • runciter
        does it generate a terabyte every day?
      • jfhbrook
        hold on, I can get real numbers
      • runciter
        that would be 365 terabytes a year
      • that might be worth putting into redshift
      • is it a GB a day? that's not worth putting into redshift
      • jfhbrook
        we generate like 100m sessions a month
      • runciter
        sure, but how many bytes is that?
      • jfhbrook
        which is distinct from pageviews
      • I don't remember
      • you asked how much traffic we get, not how many records or how big those records are
      • runciter
        computers have a lot of bytes these days!
      • jfhbrook
        we don't store raw sessions anymore
      • because it takes a few *days* for those records to become complete
      • runciter
        jfhbrook: sorry, i meant to ask how much data you had to warehouse every day
      • jfhbrook
        frankly I don't know that it's enough to justify redshift
      • I believe the raw logs *were*
      • ask me on tuesday if you really wanna know runciter and I'll look up our redshift's size
      • runciter
        jfhbrook: hm, ok!
      • that's really cool of you :)
      • anyway, poor data quality made things impossible at the last job, and it was often down to unreliable ETL processes jammed into lambdas
      • jfhbrook
        tbhirlimo I don't actually know that we need redshift dataset-size wise, but the way we CTAS everything might make it a compelling compute engine
      • it's very mapreduce/spark-y
      • runciter
        there was literally no way to test it, so there was no way out
      • jfhbrook
        ouch
      • yeah I, uh, our ETL is custom python running in beanstalk and I don't like it
      • I wanna do a jenkins cluster shelling out to python programs
      • runciter
        or it was down to API gateway 500ing
      • jfhbrook: whatever it is, make it easy to test, easy to debug, and easy to deploy
      • speed and scale is probably irrelevant
      • jfhbrook
        with a little luigi, move the table generation out of our BI tool (looker supports ELT workflows better than our ingestion pipeline does)
      • runciter
        are* probably
      • jfhbrook
        our biggest roadblock right now, honestly, is GA quotas
      • we're hitting 50k reqs/day because the only way to get unsampled data from the api is to make a single request per content_item-day
      • actually more like 35k/day
      • 50 is the cap, we hit it when trying to do historical ingests of GA data
      • runciter: one cool thing w/ jenkins that I think makes things a little better, is openstack has a project that will generate/update jenkins jobs/settings based on yaml configs
      • runciter
        hm
      • back in the day i used the jenkins API
      • jfhbrook
        still some testing problems (install jenkins locally and have some test settings?) but a def improvement
      • runciter
        but the right answer, of course, is buildbot ;)
      • jfhbrook
        hahaha
      • I actually did some cursory research into that
      • I couldn't really justify it to my team
      • but I gave it a relatively serious consideration as far as recs go
      • I feel like buildbot would be dope from a deployment perspective if your company was twisted-centric enough to have a system in place for deploying based on twistd/.tac files
      • ours is mostly scala/jvm w/ a dash of python cron jobs and the odd wsgi app
      • course docker makes that less of a thing I think, oh well
      • Spr0cket joined the channel
      • Spr0cket joined the channel
      • Spr0cket joined the channel