Well... maybe hooray; maybe not. gRPC Python shouldn't, and I believe doesn't, do anything with the channel arguments other than pass them to gRPC Core.
tedward
nathanielmanista: the behavior we're seeing is that on linux it always round-robins, regardless of which policy it's specified, and on windows, it always picks first
(or it always sends requests to the same endpoint)
nathanielmanista
Sadly I don't think the keys and values of channel arguments are printed with GRPC_VERBOSITY=debug GRPC_TRACE=api. Maybe with GRPC_TRACE=all?
g3p0
we set GRPC_TRACE=round_robin
tedward
we can see from grpc core log output that the round_robin.cc code path is being invoked
nathanielmanista
This is sounding more and more like a Core problem. |Pixel|, are you feeling similarly?
endpoing = 'dns:///cluster.foo.com' where cluster.foo.com resolves to four A records.
nathanielmanista
As for compiling gRPC Python with a debug flag on: we don't particularly support any compilation over any other but if you find us doing something particularly obstructive please let us know about it?
g3p0
Oh the environment variables were key. Thanks.
nathanielmanista
g3p0: have you learned something? Something you're able to share?
here, i've grepped the above logs for the log line where it selects the subchannel: https://paste.ee/p/OYTbs
you can see that in linux it selects pretty randomly from indexs 0-3
and in windows it mostly selects index 1, sometimes 0
btw, we're only making 3 grpc requests in each of these runs, so i'm not sure why it hits the round_robin.cc select code so many times
g3p0
Poking through the logs, in windows land we are seeing that the subchannel lists and channels are being canceled and unreffed, but the same code on linux land just works.
notcarl joined the channel
nathanielmanista
d3p0, tedward: apologies for passing you along like a hot potato but we're beyond the frontiers of my familiarity with the code. I'm trying to find you another team member to come along. :-)
g3p0
Thanks!
notcarl
well, its friday afternoon in the bay, you might need to start a thread in our support forum
g3p0
Always but who knows who will show up.
I think we traced it down to the round_robin policy doesn't exactly work in windows for some reason.
In linux land it appears to be round_robin, even though it is using pick_first. It just happens quicker so it seems like it is balanced.
notcarl
rr is pretty balanced, every new rpc gets a new subchannel
well, the next one in the list anyways
g3p0
Yeah, but on windows land the channels and channel_lists when using the rr policy end up being cancled and unreffed
notcarl
cancelled because the connection is torn down, or because it was never valid to begin with?