it would be helpful if someone more familiar with conda could test the image
i suspect changing the permission to install under docs user might make a difference
Blendify_
ericholscher, is snide on irc at all?
humitos
agj: ericholscher I will ask a friend for help on this, maybe he can point me something that we are not finding
agj
Blendify_: what do you need?
snide did the initial work on the theme, but is not an active maintainer anymore
if you have any design questions i can answer
Blendify_
I opened a pr for a repo of his that we use as a dependency
agj
humitos: that might help us. i don't have time to get pulled much deeper into this
humitos: i think testing the permission change might make sense though
this could be tested manually by running the image and manually updating conda
humitos
agj: I just tried at my local pc (outside RTD env and everything) just create one of the failing conda project and install of the packages and it worked without problem; I will try to test the permissions issue now
I have a question that maybe doesn't have a easy answer: what's the main idea of have pinned the version of the packages and do not require a `requirements.txt` file? I would say that this is because RTD wants to support users without a sphinx/python project at all which don't care/know about it. So, when creating a project on read the docs and hit the url the user is suggested to create the `index.rst` to see their docs. If that is
like that, do you know how many project RTD has with that kind of setup? (I mean, using all the default settings)
ericholscher
yea, its meant to make it easy to onboard
also it allows us to build basic sphinx docs that don't have any python requirements without users having to futz with python packaging
humitos
ok, great
should I have write permission in rtfd/readthedocs-docker-images repo?
ericholscher
probably :)
humitos
so, there is something that I'm doing wrong or I don't have :D
I will do a fork
ericholscher
yep
fixed it
humitos
there it goes, THE comment about the issue with conda :)
i don't understand why people get so angry on commenting the issues... it disappointed me :(
having a FOSS project doesn't seem to be so cool as it seems, haha
that is, we want the celery task to outlive the docker container with enough overhead to account for VCS cloning, as that doesn't happen in the container
oh!
humitos
I need to check what is the difference between soft_time_limit and time_limit, it's not clear for me
because you are increasing only the time_limit
agj
one is a hard time limit, where the build is just killed
ok. found 1/2 the bug
the DOCKER_LIMITS setting isn't set on our web servers, so it defaults to 600, even though our builders specify 900
this is because the task is triggered from the webs
so the builders are set with a timeout of 900s on the containers
so when the task is triggered, the task gets a timeout of 600s
this means the container will *never* reach a timeout state
instead the task will *always* be killed by celery
humitos
> the task is triggered from the webs
what that means?
agj
the celery task is queued from the web servers
the build servers will pick up this task from celery
humitos
yeah, but it defaults to 600s because the DOCKER_LIMITS value is not overriden in the server?
agj
on the web servers, correct
on the build servers, this is higher
humitos
where did you get the 900s from? I don't find that in the code
agj
i checked directly on the servers for the setting
humitos
ahhhh! got it... build server and web server :)
agj
we default to 600s in the base settings though, you are correct
humitos
the web server trigger some celery task with a time limit of 600s, and the build server _should_ trigger celery task with a time limit of 900s, but as it's doesn't have the setting overriden it gets the default
agj
oh sorry, yeah, i didn't explain that part
humitos: close
humitos
the art of guessing, I practice that every day, don't worry :P
agj
there are two timeouts: celery's task and docker's container timeout
the build servers have a docker container timeout of 900s
if builds hit this limit, they are killed, but the failure is reported
however
humitos
yeah, up to there I'm following you
agj
the web servers, when queueing these tasks are setting a celery timeout of 600s (because they are missing the 900s DOCKER_LIMITS['time'] = 900 in their production settings)
humitos
cool!
agj
so, the builds execute a task that is set to time out through celery in 600s and in docker at 900s
celery then kills this task and doesn't report, because we aren't handling the soft timeout properly
no, I'm almost 100% with you, but it's tricky since I don't have access to the production settings, so... for me the documentation was wrong by saying 1GB and 900s since I see 200m and 600s :)
agj
yup, we deploy our servers with a custom settings file with some changes
humitos
it seems to be something easy to fix by just defining the DOCKER_LIMIT with the `time` key
agj
i see two fixes here: add handling for celery.exceptions.SoftTimeLimitExceeded in our project build task (to report on celery killing the task) and patch our provisioning to give the web servers the correct docker limits
yup, i can take care of the second one, as it requires server provisioning
would you want to try at fixing the first part? i'd be happy to pair if you want