I set the CELERY_DEFAULT_QUEUE option to a queue, and then unset it
but now all tasks go to that queue
the set up is something like this: I have a machine running `celery multi start ...` that serves as the 'default' queue (i dont have a -Q option set)
and then I have another machine running `celery multi start ... -Q my_queue`, which I only want to run specific tasks from
previously, it was all working as expected: if I specified the queue using send_task(queue='my_queue'), then the task would execute on that queue, and all other tasks would go to the default queue
now, even after having unset the CELERY_DEFAULT_QUEUE option, tasks get routed to BOTH the default queue and my_queue
using redis as a backend... any reason that this might be happening?
it seems that my queue definition was changed permanently
i don't really want to manually define the queues, is there a way to purge the queue definition so that I can just use the Q option?
Frosh joined the channel
yoloseem________ joined the channel
domino14 joined the channel
brockhaywood has quit
domino14 has quit
Debnet joined the channel
brockhaywood joined the channel
goschtl1 joined the channel
goschtl1
hi how can i stop a consumer done with kombu ctrl+c does not work
bgyss joined the channel
mariogonzalez has quit
akitada joined the channel
abele joined the channel
syphar has quit
Whisket has quit
willvw_ has quit
adambrenecki has quit
alexlord has quit
merlinsbrain has quit
jibsheet_ is now known as jibsheet
syphar joined the channel
adambrenecki joined the channel
darkelda has quit
pgb joined the channel
merlinsbrain joined the channel
Roehmer joined the channel
mariogonzalez joined the channel
adrian_lc has quit
kaakku has quit
willvw_ joined the channel
Roehmer has quit
brockhaywood has quit
brockhaywood joined the channel
domino14 joined the channel
jiang42 has quit
Whisket joined the channel
pgb has quit
alexlord joined the channel
jiang42 joined the channel
jiang42 has quit
kaakku joined the channel
goschtl joined the channel
domino14 joined the channel
domino14 has quit
domino14 joined the channel
schmiddy joined the channel
goschtl has quit
schmiddy
Hi, I am running into problems with stuck celery workers, which seems to happen when the workers lose their database (RDS Postgres) connection. Somehow the workers become permanently stuck afterwards. Strace of the workers looks a bit similar to https://github.com/celery/celery/issues/2080 . Anything I should be checking into to help diagnose the p
roblem?
goschtl joined the channel
asksol
schmiddy: is that the strace of a child process?
jiang42 joined the channel
I find it hard to believe something will be stuck on gettimeofday, so it's more likely it's stuck in the task
bmbouter
schmiddy: you could connect to it with gdb with python-debuginfo installed and show its stack trace
I literally was doing that this morning to look into a stuck parent process
asksol
are you using the database to store task result? otherwise it wouldn't make sense for celery itself to be affected by it
bmbouter
(not celery's fault in this case just occuring inside celery)
asksol
remember also celery -A proj inspect active; which will show you a list of running tasks
schmiddy
asksol: hm, I am not sure whether I was looking at parent or child. One task I am testing is invoked with "-c 8" , has PID 22070, with PPID 22061. That parent PID, 22061 belongs to circusd.
asksol
you can `pip install setproctitle` to see the difference in ps listings (but need to restart worker)
jiang42 has quit
schmiddy
asksol, yes I tried `celery -A proj inspect active` -- last time all tasks were hung it showed all tasks as being idle
asksol
another cumbersome way to find it is to call celery -A proj inspect stats
which will include a list of child process pids
schmiddy
and yep, I've got setproctitle installed already
asksol
if tasks are still running it means that the tasks are blocking the worker
you can use --time-limit
gnoze5 has quit
but better to handle timeouts manually, and only use --time-limit as a backup
schmiddy
yeah, I've tried --time-limit=100 .. saw log entries about "Hard time limit (100.0s) exceeded for ... Process 'Worker-8' pid:18264 exited with 'signal 9 (SIGKILL)' ... ", but the task still seemed stuck
asksol: ah, so from looking at `celery -A proj inspect stats`, I see for a queue with concurrency of 8, 8 child processes --- guess I was looking at the strace of the parent process before
dvestal has quit
fwiw, with setproctitle installed, all the child tasks seem to show up with a process title like "/opt/venvs/1.29.4/bin/python -c from billiard.forking import main; main() --billiard-fork 47"
goschtl has quit
So, the child processes seem to all be stuck at: "read(20, "
and the parent process is doing a ton of the "epoll_ctl(3, EPOLL_CTL_DEL, ... ) = -1 ENOENT" , which seemed to be similar to cameronmaske's strace in Issue #2080
asksol
schmiddy: are you on windows?
I guess not if epoll, but seems like you are using force_execv when you shouldn't
schmiddy
asksol Ubuntu 12.04 x86_64 (EC2 instance)
btw, ALL of those epoll_ctl calls return ENOENT after the hang happens.. I watched with `sudo strace -p 22070 2>&1 | grep epoll_ctl | grep -v ENOENT` for a few minutes without seeing any matches
I can try bmbouter's suggestion about getting a backtrace from gdb with python-dbg installed next. The issue is pretty easy for me to reproduce by forcing an RDS failover.