Man, I haven't seen that movie in years. So good though.
simpson recalls things
runciter
wait which movie was that
simpson
The Incredibles.
Okay, so IIRC the way it works is that Lambda will reuse your Python processes, so you can't just react() to victory.
runciter
OMG right
maybe i'm just gonna watch the incredibles tonight instead of this whole programming thing
simpson: bah
simpson
You can either use Crochet, which is what I ended up doing, or you can use the signal module to kill your worker shortly after the request completes. I don't know which is better, but I suspect that Crochet is the less-unreliable of the two.
runciter
simpson: hmm, ok!
simpson
Aside from that, the only thing to know is that sometimes you might have to vendor DSOs. The target is Amazon Linux and if e.g. Cryptography wants some updated OpenSSL DSOs, you have to bundle them by popping them in your deployment dir.
runciter
simpson: i kind of wonder what aws thinks if you repeatedly murder your lambda processes
probably more $$$
simpson
Probably! I'm not sure; I left that route because it was not very stable.
python-lambda will vendor *everything* that you have in the deployment dir. Which is great as long as you didn't accidentally put keys in there.
runciter
yeeeeesh
jfhbrook
python-lambda? this sounds relevant to my interests
runciter
lambda's deployment story is pretty bad
jfhbrook
in general tooling around bundling lamb----yes
runciter
i hear they're working on it though
jfhbrook
that
runciter
chalice actually doesn't do things the worst possible way now
jfhbrook: That sounds like a great idea! Sketch it out and send a PR. The maintainer is super-open to this kind of safety-belt work.
runciter
simpson: fff it trolls sys.path for packages?
jfhbrook
yeah, that's my understanding ^
simpson
Yeah. Run in a virtualenv that has nothing but your function's requirements.
jfhbrook
simpson: if only I used lambda XD this might be relevant to my interests coming up in q4/q1 but not yet
hoping I can get devops to write the lambda function I currently want
is that with open/pass pattern the same as a touch?
simpson
Yeah.
jfhbrook
almost surprised there isn't a touch convenience method
simpson
IKR
jfhbrook
that's one thing I liked about early 2010s node, people would package anything
in python if it's less than 20 lines people will say it's not worth it
runciter
simpson: thank you
i'm sure i'll have to deal with lambda again, and now i can use a twisted!
simpson
runciter: No problem. I hated Lambda but it has a definite niche and the tooling wasn't that bad.
runciter
yeah
i was pretty unhappy with the tooling but at that job were pretty bad at everything
jfhbrook
I think lambda is really good for gluing aws services together
runciter
use fewer aws services
jfhbrook
like, we have one that auto-loads s3 blobs into redshift
runciter
cheaper, easier to test, and saner failure modes
jfhbrook
I want one that parses s3 blobs and puts them into logentries
so there are some aws services that are pretty janky
I've heard nothing but bad things about ecs for instance
but redshift is actually nice
it's like oracle olap but cheaper and close enough to postgres that 80% of pg stuff works
though, warning, if you ask #postgresql redshift questions they'll hate you even if you make it clear you're not asking pg questions
they're huge jerks
runciter
redshift requires a lot of optimization to perform well, and it's different enough from postgres is that it's really hard to test locally
well, who knows what amazon's done this time
jfhbrook
oh yeah, impossible to test locally
you need a stag env and custom schemas
runciter
yeah, that's a deal breaker
jfhbrook
for local testing
well it's 6 bucks a day which sucks
still, I'd rather that than running oracle
and like
bigquery as an alternative
also impossible to run in a local capacity
runciter
i don't have a use for either of those
jfhbrook
and the billing model means you can accidentally make yourself go bankrupt with a select * equivalent, womp womp
well yeah, you'd only use redshift/oracle/bigquery if you were data warehousing
runciter
we were, at the last company, and it sucked
we should have just put the data into a regular postgres instance
jfhbrook
so I do think there's an argument for that
like I do believe that redshift can handle scales that postgres cannot
but
runciter
nobody has that much data
jfhbrook
not all "warehousing" actually takes up that much space
simpson
jfhbrook: There's a fascinating rumor going around that Lambda actually has been *internally* available at AWS for a long time, and that it has been an integral part of some of their older offerings.
runciter
i mean, like 10 companies in the world have that much data
jfhbrook
that *is* fascinating
what I mean runciter is that if you're only scraping the google analytics api every hour or two, that's only a couple hundred rows a day depending on how you're segmenting/storing data
different scale from, say, importing raw GA sessions into BQ
runciter
how much traffic does your site actually get?
jfhbrook
and I think there is a thing where people are like, oh we're doing DWing that means we need a DW db
runciter
does it generate a terabyte every day?
jfhbrook
hold on, I can get real numbers
runciter
that would be 365 terabytes a year
that might be worth putting into redshift
is it a GB a day? that's not worth putting into redshift
jfhbrook
we generate like 100m sessions a month
runciter
sure, but how many bytes is that?
jfhbrook
which is distinct from pageviews
I don't remember
you asked how much traffic we get, not how many records or how big those records are
runciter
computers have a lot of bytes these days!
jfhbrook
we don't store raw sessions anymore
because it takes a few *days* for those records to become complete
runciter
jfhbrook: sorry, i meant to ask how much data you had to warehouse every day
jfhbrook
frankly I don't know that it's enough to justify redshift
I believe the raw logs *were*
ask me on tuesday if you really wanna know runciter and I'll look up our redshift's size
runciter
jfhbrook: hm, ok!
that's really cool of you :)
anyway, poor data quality made things impossible at the last job, and it was often down to unreliable ETL processes jammed into lambdas
jfhbrook
tbhirlimo I don't actually know that we need redshift dataset-size wise, but the way we CTAS everything might make it a compelling compute engine
it's very mapreduce/spark-y
runciter
there was literally no way to test it, so there was no way out
jfhbrook
ouch
yeah I, uh, our ETL is custom python running in beanstalk and I don't like it
I wanna do a jenkins cluster shelling out to python programs
runciter
or it was down to API gateway 500ing
jfhbrook: whatever it is, make it easy to test, easy to debug, and easy to deploy
speed and scale is probably irrelevant
jfhbrook
with a little luigi, move the table generation out of our BI tool (looker supports ELT workflows better than our ingestion pipeline does)
runciter
are* probably
jfhbrook
our biggest roadblock right now, honestly, is GA quotas
we're hitting 50k reqs/day because the only way to get unsampled data from the api is to make a single request per content_item-day
actually more like 35k/day
50 is the cap, we hit it when trying to do historical ingests of GA data
runciter: one cool thing w/ jenkins that I think makes things a little better, is openstack has a project that will generate/update jenkins jobs/settings based on yaml configs
runciter
hm
back in the day i used the jenkins API
jfhbrook
still some testing problems (install jenkins locally and have some test settings?) but a def improvement
runciter
but the right answer, of course, is buildbot ;)
jfhbrook
hahaha
I actually did some cursory research into that
I couldn't really justify it to my team
but I gave it a relatively serious consideration as far as recs go
I feel like buildbot would be dope from a deployment perspective if your company was twisted-centric enough to have a system in place for deploying based on twistd/.tac files
ours is mostly scala/jvm w/ a dash of python cron jobs and the odd wsgi app
course docker makes that less of a thing I think, oh well