#readthedocs

/

      • agj
        yeah, that ^
      • yeah not really sure
      • it would be helpful if someone more familiar with conda could test the image
      • i suspect changing the permission to install under docs user might make a difference
      • Blendify_
        ericholscher, is snide on irc at all?
      • humitos
        agj: ericholscher I will ask a friend for help on this, maybe he can point me something that we are not finding
      • agj
        Blendify_: what do you need?
      • snide did the initial work on the theme, but is not an active maintainer anymore
      • if you have any design questions i can answer
      • Blendify_
        I opened a pr for a repo of his that we use as a dependency
      • agj
        humitos: that might help us. i don't have time to get pulled much deeper into this
      • humitos: i think testing the permission change might make sense though
      • this could be tested manually by running the image and manually updating conda
      • humitos
        agj: I just tried at my local pc (outside RTD env and everything) just create one of the failing conda project and install of the packages and it worked without problem; I will try to test the permissions issue now
      • agj yes
      • xafer has quit
      • Blendify_
        https://github.com/snide/wyrm/pull/8 should fix some of our warnings
      • Blendify_ has quit
      • xafer joined the channel
      • MarkAtwood joined the channel
      • Blendify_ joined the channel
      • Blendify_ is now known as Blendify|afk
      • humitos
        I have a question that maybe doesn't have a easy answer: what's the main idea of have pinned the version of the packages and do not require a `requirements.txt` file? I would say that this is because RTD wants to support users without a sphinx/python project at all which don't care/know about it. So, when creating a project on read the docs and hit the url the user is suggested to create the `index.rst` to see their docs. If that is
      • like that, do you know how many project RTD has with that kind of setup? (I mean, using all the default settings)
      • ericholscher
        yea, its meant to make it easy to onboard
      • also it allows us to build basic sphinx docs that don't have any python requirements without users having to futz with python packaging
      • humitos
        ok, great
      • should I have write permission in rtfd/readthedocs-docker-images repo?
      • ericholscher
        probably :)
      • humitos
        so, there is something that I'm doing wrong or I don't have :D
      • I will do a fork
      • ericholscher
        yep
      • fixed it
      • humitos
        there it goes, THE comment about the issue with conda :)
      • i don't understand why people get so angry on commenting the issues... it disappointed me :(
      • having a FOSS project doesn't seem to be so cool as it seems, haha
      • ericholscher
        yuuup
      • humitos
        what's the _real_ value for building time? documentation says 15 min but it seems that it's only 10 (http://docs.readthedocs.io/en/latest/builds.htm...)
      • Blendify|afk has quit
      • r04r joined the channel
      • agj
        humitos: the setting is 15m on the build servers
      • humitos
        why most of the failing build ends at ~590 seconds, so?
      • I jumped into the code and I saw this https://github.com/rtfd/readthedocs.org/blob/re...
      • agj
        is this with builds failing without an error response?
      • humitos
      • agj
        hrm, so this is the issue i've wanted to track down
      • docker's timeout is 15m
      • the celery timeout is 10m apparently?
      • it should be something like 15m + 20% overhead
      • that is, we want the celery task to outlive the docker container with enough overhead to account for VCS cloning, as that doesn't happen in the container
      • oh!
      • humitos
        I need to check what is the difference between soft_time_limit and time_limit, it's not clear for me
      • because you are increasing only the time_limit
      • agj
        one is a hard time limit, where the build is just killed
      • ok. found 1/2 the bug
      • the DOCKER_LIMITS setting isn't set on our web servers, so it defaults to 600, even though our builders specify 900
      • this is because the task is triggered from the webs
      • humitos
      • agj
        oh this explains everything actually!
      • so the builders are set with a timeout of 900s on the containers
      • so when the task is triggered, the task gets a timeout of 600s
      • this means the container will *never* reach a timeout state
      • instead the task will *always* be killed by celery
      • humitos
        > the task is triggered from the webs
      • what that means?
      • agj
        the celery task is queued from the web servers
      • the build servers will pick up this task from celery
      • humitos
        yeah, but it defaults to 600s because the DOCKER_LIMITS value is not overriden in the server?
      • agj
        on the web servers, correct
      • on the build servers, this is higher
      • humitos
        where did you get the 900s from? I don't find that in the code
      • agj
        i checked directly on the servers for the setting
      • humitos
        ahhhh! got it... build server and web server :)
      • agj
        we default to 600s in the base settings though, you are correct
      • humitos
        the web server trigger some celery task with a time limit of 600s, and the build server _should_ trigger celery task with a time limit of 900s, but as it's doesn't have the setting overriden it gets the default
      • agj
        oh sorry, yeah, i didn't explain that part
      • humitos: close
      • humitos
        the art of guessing, I practice that every day, don't worry :P
      • agj
        there are two timeouts: celery's task and docker's container timeout
      • the build servers have a docker container timeout of 900s
      • if builds hit this limit, they are killed, but the failure is reported
      • however
      • humitos
        yeah, up to there I'm following you
      • agj
        the web servers, when queueing these tasks are setting a celery timeout of 600s (because they are missing the 900s DOCKER_LIMITS['time'] = 900 in their production settings)
      • humitos
        cool!
      • agj
        so, the builds execute a task that is set to time out through celery in 600s and in docker at 900s
      • celery then kills this task and doesn't report, because we aren't handling the soft timeout properly
      • sorry if that's a lot of information :)
      • humitos
        no, I'm almost 100% with you, but it's tricky since I don't have access to the production settings, so... for me the documentation was wrong by saying 1GB and 900s since I see 200m and 600s :)
      • agj
        yup, we deploy our servers with a custom settings file with some changes
      • humitos
        it seems to be something easy to fix by just defining the DOCKER_LIMIT with the `time` key
      • agj
        i see two fixes here: add handling for celery.exceptions.SoftTimeLimitExceeded in our project build task (to report on celery killing the task) and patch our provisioning to give the web servers the correct docker limits
      • yup, i can take care of the second one, as it requires server provisioning
      • would you want to try at fixing the first part? i'd be happy to pair if you want
      • it should be a pretty straight forward addition