if not, i will tackle -- i'm working on kafka-python all week
ok
MrScout
I saw!
dpkp
oh congrats! where are you now?
MrScout
I've been pretty jazzed by the huge amount you've accomplished lately
I'm at Facebook
dpkp
well hot diggity
MrScout
Haha
:)
dpkp
guess i can't get you to join rdio now!
MrScout
;-)
dpkp
im planning to write the high-level consumer API in python, but without the auto-balancing
MrScout
Fancy
Back in a few minutes, I've gotta run to a meeting
MrScout has quit
MrScout joined the channel
MrScout has quit
MrScout joined the channel
back
:)
At least, unless the presenter tells us to close laptops. ;-)
dpkp
wow you guys have fast meetings
MrScout
Nah, just got to it
Had to walk across campus
It's a training session. So much cool shit here at FB, seriously.
so, isinstance(s, (basestring, bytes))
?
Except python 3 did away with basestring because they hate compatibility
:-/
dpkp
well that could be fixed in a larger python 3 support PR
but also bytes is a subclass of basestring, right?
MrScout
hmmm, checking
looks like, yes
Actually
>>> bytes
<type 'str'>
it might actually just *be* type str
I keep meaning to take a look at that Python 3 PR, but I'm afraid it just needs rewritten. :-/
dpkp
bigger issue is that we need a consistent way to struct.pack the message into a string of bytes
so to support unicode we could type check with isinstance(s, basestring)
and pack with struct.pack('>i%ds' % len(s), len(s), str(s))
but that doesn't fix the avro problem
MrScout
need a conversion layer
I have a plan for that
I've done it before
dpkp
and also i'm not sure that str(unicode()) is the right way to encode those bytes
or we could just try to convert the message via str() or even bytes(). if it works, encode it; if not, fail
MrScout
It's the simple way
dpkp
and require that any message the user wants to publish has a __str__ method or otherwise can be converted via str() / bytes()
bytes() being the python 3 way?
MrScout
Yep
MrScout has quit
MrScout joined the channel
MrScout has quit
MrScout joined the channel
MrScout has quit
MrScout joined the channel
MrScout has quit
MrScout joined the channel
dpkp
so thinking about this a bit more
i think the larger fix has to be to support custom encode / decode functions when producing / consuming
the problem with adding an encoding layer now to support unicode -- something like data.encode()
is that we would also need to add data.decode() when consuming
and I dont think we can always assume that the data we're consuming is encoded in whatever the default is (like utf-8)
for example, msgpack.dumps([1,2,3]).decode() does not work
so i think for now we should just update the docs, perhaps add an explicit type-check to kafka.producer.send, and then refactor encoding/decoding in a future release
MrScout
You can always pitch it as "you must send bytes to this api"
which is probably what the api was meant for
MrScout_ joined the channel
MrScout has quit
dpkp
just pushed an update to tox.ini that adds a list of 10 slowest tests w/ timings to default test runs
MrScout_
Fancy
dpkp
pypy not happy w/ our gzip test: "test.test_producer_integration.TestKafkaProducerIntegration.test_produce_100k_gzipped: 322.8867s"
ha
MrScout_
Pypy is making me really disappointed lately.
It's not as fast as I'd really like it to be. :-/
I'm looking forward to figuring out what they can do with python 3 type annotations