17:34 PM
faldridge joined the channel
17:34 PM
faldridge has quit
17:36 PM
brodolfo has quit
17:39 PM
faldridge joined the channel
17:57 PM
faldridge has quit
18:05 PM
frode_ has quit
18:06 PM
faldridge joined the channel
18:07 PM
jamesaxl has quit
18:08 PM
faldridge has quit
18:08 PM
jamesaxl joined the channel
18:15 PM
frode_ joined the channel
18:39 PM
faldridge joined the channel
19:01 PM
faldridge has quit
19:03 PM
jordanl has quit
19:08 PM
faldridge joined the channel
19:19 PM
garetjax has quit
19:43 PM
aix joined the channel
19:43 PM
aix
Hi
19:44 PM
I have a little bit of a situation
19:45 PM
I have 5 million irc addresses and I need to connect once to each of them and store the output that they provide on connecting
19:45 PM
I have a primitive script but it doesn't throttle and I run out of file descriptors pretty quickly
19:45 PM
19:49 PM
asdf
aix, if you'd like some rate limiting, first, you need a way to know when a connection is done
19:50 PM
__marco joined the channel
19:50 PM
so imagine eg. your "bot" function creates a `d = Deferred()` and passes that along with all the other info into the factory, so the protocol has access to it. Then when the protocol decides it's done, so eg. in the connectionLost method, it calls `d.callback(some_return_value)`
19:50 PM
tonythomas has quit
19:50 PM
and let's have the "bot" method return that deferred
19:50 PM
if you just do that, then you'll be able to use DeferredSemaphore with that function
19:51 PM
so, instead of `bot(spl[0], 6667, blahblah)`, you would first prepare a semaphore `sem = DeferredSemaphore(100)`, and then just keep running `sem.run(bot, spl[0], 6667, blahblah)`
19:51 PM
Tooty joined the channel
19:51 PM
aix
ah
19:51 PM
alright
19:52 PM
asdf
that will still create all 5m deferreds in memory up front
19:52 PM
Tooty has quit
19:52 PM
aix
and 100 is the maximum fds at any one time?
19:52 PM
asdf
but meh that's like what, less than 100mb probably
19:52 PM
yep that's how many of the bot() functions can run at a time
19:52 PM
the semaphore will examine the deferred that the function returns to know when to run the next one
19:53 PM
aix
this is exactly why i love twisted
19:54 PM
asdf
btw if you used the more recent endpoints api, you could get rid of that factory entirely so the code would be a lot shorter
19:55 PM
aix
alright
19:58 PM
asdf
20:06 PM
aix
20:07 PM
asdf
hmm .callback() needs an argument actually, can't fire the deferred with nothing
20:07 PM
so you could store some data on the protocol while the connection is under way and return it there
20:08 PM
well you'll also want to actually write the __init__ with storing .nickname and all that :)
20:09 PM
aix
alright so
20:09 PM
the connectionlost callback?
20:10 PM
asdf
indentation is wrong there too btw, you're mixing tabs and spaces
20:10 PM
yep, i mean, `self.finished.callback()` takes a mandatory argument
20:10 PM
can just do .callback(None) if you dont want to return anything
20:11 PM
aix
ah okay
20:11 PM
asdf
btw you copied the `.addCallback(all_done)` but didn't actually write an 'all_done' function
20:11 PM
aix
do i need an all_done method?
20:11 PM
asdf
if you want one
20:12 PM
you didn't actually run the reactor either btw
20:12 PM
aix
right now, this script kinda does nothing
20:12 PM
oh
20:12 PM
that'd be why
20:12 PM
asdf
so twisted.internet.task.react is a newer way to run the reactor, so it takes a function that returns a deferred
20:13 PM
like `@react def main(reactor): results = []; blahblahblah; return gatherResults(results)`
20:13 PM
and it'll exit when that gatherResuls fires
20:13 PM
but just doing plain old reactor.run() is fine too
20:15 PM
aix
alrighty
20:15 PM
20:15 PM
doesn't seem like i'm getting any output
20:16 PM
asdf
reactor.run() is blocking btw so the gatherResults will never run
20:16 PM
aix
and I think the reactor is either not getting called or blocking something
20:16 PM
asdf
hmm looks fine, probably some error somewhere, let's see..
20:19 PM
except for the indentation with mixed tabs and spaces >.>
20:19 PM
so what output are you getting? anything at all?
20:19 PM
the 'all_done' func needs to take an argument btw, it'll be a list with all the results
20:21 PM
aix
well, it prints addbot loads of times
20:21 PM
then waits
20:22 PM
20:22 PM
definitely doing something
20:22 PM
asdf
loads as in should be 100?
20:24 PM
aix
keeps printing addbot
20:24 PM
not sure how many it actually loads
20:24 PM
asdf
yeah, it'll run the whole loop up front preparing the calls, and then it'll be calling one by one, up to 100 concurrently
20:24 PM
aix
I'll try with a million
20:25 PM
jordanl joined the channel
20:25 PM
okay maybe just 10k
20:26 PM
uh
20:26 PM
looks like it loads all of them at once
20:26 PM
and i don't think all_done ever gets called
20:27 PM
asdf
it does load them all at once yes
20:27 PM
it calls sem.run() the 10k times
20:27 PM
and then sem will actually run the functions
20:27 PM
with the rate limit
20:27 PM
aix
ah okay
20:28 PM
oh
20:28 PM
i screwed up the indentation
20:29 PM
asdf
you'll want the gatherResults line before the reactor.run() btw
20:29 PM
aix
20:30 PM
HAH
20:30 PM
it works!
20:30 PM
asdf
nice
20:30 PM
aix
very slowly
20:30 PM
I think my first script did it faster
20:30 PM
asdf
crank it up to 1k at a time or something i suppose
20:31 PM
aix
it's at 900
20:31 PM
or maybe there's only 2 irc networks in the first 10k lines
20:31 PM
i'll run it on a better box with a higher ulimit and see what happens
20:32 PM
asdf
oh btw remember to call the superclass in connectionLost
20:33 PM
otherwise it'll leak LoopingCalls because ircclient uses one for the ping
20:33 PM
just like `irc.IRCClient.connectionLost(self)` if you're on py2 because i think that's still an old-style class
20:34 PM
aix
the superclass?
20:34 PM
asdf
i mean like super()? but you can't actually use super() with old-style classes
20:34 PM
so you need to write the method yourself
20:35 PM
i mean, spell out the base class
20:46 PM
aix
okay, there
20:47 PM
i'll leave this running overnight and see what happens
20:47 PM
asdf
break 10 minutes after you detach, what else ;))
20:53 PM
danilo joined the channel
21:05 PM
jamesaxl has quit
21:10 PM
aix
okay other than my box freezing up i don't think anything is happening
21:12 PM
asdf
how's the RAM use?
21:12 PM
clokep has quit
21:12 PM
aix
1.89/1.93GB
21:12 PM
asdf
maybe loading all the data up front wasn't a great idea after all
21:12 PM
huh, that's a lot
21:12 PM
faldridge has quit
21:12 PM
it goes there right after loading the lines?
21:13 PM
aix
if i try something like 10k it works fine
21:13 PM
pretty fast too
21:14 PM
asdf
hmm ok maybe let's try the other way then
21:14 PM
21:14 PM
sec i'll show an example
21:15 PM
aix
asdf: hold up
21:15 PM
i'm gonna have to sleep soon
21:15 PM
perhaps we can continue this tomorror?
21:15 PM
tomorrow*
21:16 PM
asdf
yeah me too but this should be fast to check
21:16 PM
aix
alright
21:18 PM
asdf
21:18 PM
at the bottom i've put the code from that article
21:18 PM
so basically this will only go through the file as needed
21:18 PM
as opposed to loading the whole thing up front
21:18 PM
aix
alright
21:19 PM
asdf
so the DeferredSemaphore needed like O(lines) of memory, this just needs O(concurrency)
21:20 PM
so this is all generators, right
21:20 PM
the file loader, and the `work` inside the 'parallel' function
21:20 PM
aix
finished = parallel(data, 100, run_bot) what does the 100 mean in this line?
21:20 PM
asdf
that's the concurrency
21:20 PM
aix
ah
21:21 PM
alright, trying it out
21:22 PM
asdf
oh
21:23 PM
add the base class call to connectionLost again
21:23 PM
because i didn't in this paste :)