only linux can set the buffer size, so not really a solution, could be interesting to tweak for performance
and that won't help if the buffer is cleared
ionelmc
asksol: what if you set the buffer to 0
or some small value
for linux there's F_SETPIPE_SZ
asksol
still would not help unless the parent was constantly reading
ionelmc
asksol: is it hard to switch to dgram ?
bkuberek has quit
asksol: how about this idea: before death, child send this "exit sentinel" and then waits on a read for something around 1 second . when the parent gets the 'exit sentinel' he sends to the children an "ok to die sentiel".
asksol
that's probably easier to do than switching to dgram
AlecTaylor has quit
don't think it needs to wait for one second, it can just wait and exit if the fd is closed
but hmm
if the exitcode is 155
can you not be sure that the result was written?
the result is only used for logs
and that will even change in 3.2
where the logging will happen in the child
so it doesn't really need the return value of the task, it only needs to know that it completed
so it could simply log a success without an actual return value until 3.2
as in, it suceeded but we don't know what it returned, the state is still updated
jeffasinger joined the channel
jeffasinger has quit
surabujin has quit
rarely do you need to see the return value in logs
ionelmc
well it does help sometimes
asksol: hmmm
asksol: the child needs to wait on something otherwise we're back to the original race condition
asksol
what is that?
ionelmc
the close from child/read from parent race
what is the synq for?
asksol
we don't need the return value, only need to know if the task completed (which 155 implies)
synq is not completed, it's intended for safe termination
of tasks
ionelmc
yeah but i don't like to discard results
i want to explore alternatives a bit
what's the 'safe termination'
asksol
it's so that revoke+terminate means terminate the task, not the process
terminate is for manual administration, if you used improperly it may very well terminate a different task
so synq would be a feature that can be enabled
ionelmc
asksol: it could terminate a different task? both celery 3.0 and 3.1 ?
MVXA has quit
asksol
it's for terminating the process, not the task
not to be used programatically
but if you see a process that is stuck it's fine
pilt has quit
ionelmc
asksol: about the sentinel
so i'm looking at state_handlers in pool.py:ResultHandler
i would add a new state handler for DEATH and send that token from the child
right before exit
and then i wait for confirmation (get another token from the queue)
asksol
that will work, but another vector for a deadlock
ionelmc
now i want to send that token from the handler in state_handlers
hmmmm, how?
asksol
how long does the child wait for confirmation?
there is no safe timeout
ionelmc
i suppose we could have some timeout mechanism
can't we?
asksol
what does a timeout mean?
ionelmc
recv should have a timeout
i mean, we can set the timeout on the pipe
and then switch the pipe to blocking
oh wait, the timeouts in python sockets aren't actual timeouts, python just selects for a while on the socket right ?
asksol
it's already blocking in the child, so I guess the timeout will result in an error for the task
ionelmc
well no, the result would be already be sent by then
no way to have an error that way
asksol
that's how all socket timeouts are implemented I believe
ionelmc
well i guess we could do the same
no big deal
now i can't figure how to put something in the queue from the state handler
i've tried to pass _quick_put to the ResultHandler constructor but turns out it's None - what the hell is going on ?
MVXA joined the channel
asksol
ionelmc: result handler cannot do that you need to use _write_job
or more exactly, you need to use pool.apply_async()
sending a different payload than TASK would be way a lot of work for a temporary workaround till 3.2