#celery

/

      • sp1rs has quit
      • domino14_ has quit
      • Ergo joined the channel
      • sp1rs joined the channel
      • the_drow joined the channel
      • Bonaparte joined the channel
      • Bonaparte
        Hello. group() is not working for me. http://paste2.org/ye0BNtt5
      • the_drow joined the channel
      • It just hangs forever
      • bltmiller joined the channel
      • Jason_ joined the channel
      • InfoTest joined the channel
      • lexileo has quit
      • sp1rs has quit
      • sawdog joined the channel
      • maryokhin joined the channel
      • kline joined the channel
      • tonytan4ever has quit
      • tonytan4ever joined the channel
      • tonytan4ever has quit
      • mihaj_ has quit
      • tonytan4ever joined the channel
      • nicksloan has quit
      • nicksloan joined the channel
      • maryokhin has quit
      • malinoff has quit
      • bakirelived has quit
      • malinoff joined the channel
      • maryokhin joined the channel
      • malinoff has quit
      • kline joined the channel
      • sawdog has quit
      • sp1rs joined the channel
      • sawdog joined the channel
      • sawdog joined the channel
      • tonytan4ever has quit
      • sawdog has quit
      • maf_ has quit
      • sawdog joined the channel
      • domino14 joined the channel
      • nicksloan has quit
      • nicksloan joined the channel
      • sp1rs has quit
      • sp1rs joined the channel
      • sawdog has quit
      • maryokhin has quit
      • sawdog joined the channel
      • maryokhin joined the channel
      • italorossi joined the channel
      • sawdog has quit
      • sawdog joined the channel
      • sawdog joined the channel
      • sawdog has quit
      • sawdog joined the channel
      • sawdog has quit
      • sawdog joined the channel
      • tonytan4ever joined the channel
      • maryokhin has quit
      • sp1rs has quit
      • maryokhin joined the channel
      • maryokhin has quit
      • maryokhin joined the channel
      • sawdog has quit
      • sawdog joined the channel
      • sawdog has quit
      • maryokhin has quit
      • maryokhin joined the channel
      • nicksloan has quit
      • sawdog joined the channel
      • nicksloan joined the channel
      • asksol
        Bonaparte: make sure worker is using same result backend as client
      • joh: rabbitmq has the concept of stopping flow, but I don't think there's any setting for queue size
      • joh: amqp consumers can use flow control to stop producers from publishing more messages
      • 500k jobs sounds reasonable to me, are you experiencing any problems?
      • InfoTest has quit
      • using the 'amqp' result backend in this setup would be a bad idea, otherwise I think it should work as long as the messages doesn't contain lots of data
      • You could also use multiple brokers that replicate to one main broker mayube
      • s/replicate/forward
      • lexileo joined the channel
      • maryokhin has quit
      • domino14
        is there a spec on how many tasks Celery Beat can handle?
      • a few years ago we had issues with scheduling more than ~1-2K tasks. tasks would just start skipping beats.
      • for example if a task was scheduled hourly, it would actually run every hour, then randomly skip a few hours, then run, then skip, etc.
      • bltmiller joined the channel
      • asksol
        domino14: we have refactored beat in 4.0
      • domino14
        cool, what kind of improvements were made?
      • that's the main thing that's worrying me
      • asksol
        it was O(n), not it should be O(1)
      • domino14
        :D
      • asksol
        now it should be
      • or similar
      • domino14
        how many tasks has it been tested with?
      • asksol
        we use a heapq to keep the next task first in the list
      • domino14
        and when is 4.0 coming ou
      • asksol
        I haven't tested it with a huge number of tasks, but theoretically it should handle a huge number of them. There could still be bottlenecks if you have a huge amount, so would be nice to test
      • 4.0 is almost ready, just polishing and fixing bugs
      • some people already use it in production
      • it uses a similar algorithm to the internal timer in celery, and that handles millions of scheduled events without problems
      • I guess the bottleneck would be if a large number of events happen at the same time
      • but you could set up multiple beat instances with different schedules too
      • if you have many periodic tasks scheduled at the same time, we could maybe introduce some jitter to spread them apart
      • lexileo has quit
      • Zeedox joined the channel
      • sp1rs joined the channel
      • joh
        asksol: ok, so you're saying there shouldn't really be a problem with 500k jobs in the queue per se? My amqp server is consuming lots of memory and cpu usage is very high. The messages do contain quite a lot of data though (sending rather big numpy arrays as arguments to the task).
      • asksol
        joh: it's certainly not optimal, but if that's the volume you need to process then flow control may not make it much better
      • as then you're blocking the clients, which is rarely a choice if your sending tasks from webservers
      • you're
      • it may be easier to provision a more powerful machine than to setup some elaborate flow control scheme
      • giant numpy arrays can be a problem if the message is megabytes big, and you have 500k of them
      • rabbitmq may duplicate messages so the memory required may exceed the message size, and it's usually better to store large data in shared storage, especially if the same array is shared between multiple tasks
      • there are distributed filesystems that try to optimize for data locality too
      • so e.g. you can send a job to a worker that is closer to the data
      • joh
        asksol: ok, l think I need to find a better way to distribute my data arrays then, as they are quite big and the same for a lot of the tasks!
      • asksol
        that's a good idea, also try to send less task messages for tasks that operate on the same data, e.g. instead of having a chain of tasks a.s() | b.s() | c.s(), call them as one task
      • tonytan_brb joined the channel
      • tonytan4ever has quit
      • Zeedox
        I have some issues with getting logging working correctly with Django. I can only get warning or higher levels to print.
      • But my root logger is set to info level, with a streamhandler to stdout as handler.
      • asksol
        task logging? or celery logs in general? The task loggers do not propagate to the root logger, as they have a different log format
      • at least as set up by the celery logging configuration, if you disable CELERYD_HIJACK_ROOT_LOGGER or listen to the setup_logging signal it won't configure any loggers at all
      • Zeedox
        Ah, the standard logging module.
      • For logging inside task execution.
      • asksol
        sorry, disabling hijack root logger will still setup the loggers, but if something is listening to the setup_logging it won't
      • so @celery.signals.setup_logging.connect def setup_celery_logging(**kwargs): pass
      • will force celery to use the Django logging configuration
      • Zeedox
        I've added task_logger.debug and up, logger.debug and up and a print statement to a debug task. Only warn<= level log statements are output, and the print statement.
      • asksol
        if you mess up you may end up with the worker not emitting anything at all, which is why we hijack the root logger
      • Zeedox
        asksol: Ok! Interesting, then maybe one of our dependencies is doing something behind my back.
      • Might CELERY_REDIRECT_STDOUTS_LEVEL interfere?
      • asksol
        That should only affect what you print to stdout/stderr in the task
      • if you print to stdout/stderr redirect_stdouts_level will be the severity used when logging what you print
      • Zeedox
        Ah, I see.
      • asksol
        The worker will set the loglevel of the root logger, and the task logger to the --loglevel argument passed to the worker, but not if you listen to the setup_logging signal
      • Zeedox
        Ah, great, thanks!
      • Listening to the setup_logging signal did the trick.
      • asksol
        we would like to respect outside configuration by default, but there's a lot of misbehaving libraries out there and people get confused when the worker doesn't emit anything
      • (which is understandable:)
      • Zeedox
        Most of our code is handled by the root logger and expects to be output to stdout, so that causes issues I guess.
      • asksol
        I guess we could have an actual configuration variable for this, it must be way more common now that Django has logging support
      • Zeedox
        What's the "recommended" way of logging with celery, if you don't mind me asking?
      • There seem to be a few way to define it.
      • asksol
        The only recommendation is to define loggers at module level, usually one per module
      • Zeedox
        Using get_task_logger or standard library?
      • asksol
        get_task_logger if you want to have the task name in the log format etc
      • the logging stdlib module is lacking guidelines for libraries, so it's been more a journey
      • Zeedox
        Ah, right!
      • asksol
        It's just using the logging library so you can do pretty much anything you want, we just try to configure the loggers by default to ensure we have output