#deis

/

      • jonesy has quit
      • ykitazume joined the channel
      • gvilarino has quit
      • sstarcher joined the channel
      • sstarcher has quit
      • ffoeg has quit
      • ykitazume has quit
      • ykitazume joined the channel
      • necrogami joined the channel
      • raydeo joined the channel
      • bf-slack has quit
      • bf-slack joined the channel
      • s1w
        is anyone around thats able to help me try and debug #4378 ?
      • joshua-anderson
        s1w: I'm around but I haven't spend much time with the registry. The Deis team mostly operates on PST and MDT, so if you ask during business hours they are much more likely to be around.
      • s1w
        joshua-anderson: yeah - opposite side of the world haha :(
      • cz20xx has quit
      • NOTICE: [13deis] 15carmstrong opened issue #4385: Stress testing results in client Error: invalid character '<' looking for beginning of value 02http://git.io/vGgJ6
      • cz20xx joined the channel
      • Reefrunner has quit
      • bobpattersonjr joined the channel
      • bobpattersonjr has quit
      • bobpattersonjr joined the channel
      • scottymeuk has quit
      • scottymeuk joined the channel
      • KarolisK joined the channel
      • wollner joined the channel
      • wollner
        is anything strange happening with deis + aws, on 1.9.0 or 1.9.1 (maybe any aws changes that could have impacted the deis components somehow). Earlier yesterday I had a stable working cluster which simply stopped working...no I provisioned a new one which was working unitl now, and it is not anymore...
      • again out of nowhere
      • joshua-anderson
        :( Were you getting the same error on both clusters?
      • wollner
        tes
      • yes
      • I just did I fresh install about 1 hour ago
      • so far so good...and suddenly
      • my app is not answering anymore
      • joshua-anderson
        Have you used 'deisctl list' to see what components are up?
      • And deisctl journal to get the logs of failed components?
      • wollner
        Detail:
      • aborting, failed to create some containers
      • trying to deis ps:scale my app
      • the journal says everything is up and running
      • joshua-anderson
        What about ceph and etcd status?
      • And router logs
      • wollner
        I dont see anything wrong with etcd
      • joshua: how can i access ceph logs?
      • joshua-anderson
        You can run 'deisctl install store-admin' and 'deisctl start store admin'. Then ssh onto one machine in your cluster and enter store admin container with 'deisctl ssh deis-controller nse deis-store-admin'. Then try 'ceph -s'
      • wollner
        joshua-anderson: doint it right now....but it really strange that the app stopped responding out of nowhere
      • joshua-anderson
        There are a few things that could be going wrong, but ceph and etcd are the most likely things to bring down a cluster.
      • wollner
        joshua: when I executed Error: SSH_AUTH_SOCK environment variable is not set. Verify ssh-agent is running inside the cluster
      • joshua-anderson
        huh. Which command threw that error?
      • wollner
        deisctl ssh deis-controller nse deis-store-admin
      • joshua-anderson
        I'd just ssh in via aws or normal ssh then
      • Then run 'nse deis-store-admin'
      • wollner
      • joshua-anderson
        Well, the good news is that's working. :) Is there anything interesting in the router or registry logs. Also, are you seeing your units in 'fleetctl list-units'?
      • ^ Apps, not units
      • wollner
        my app are not shown on fleetctl list-units
      • the router logs just prints normal (people are trying to access the website)
      • joshua-anderson
        Also look at the builder. The fact that they're not in fleet is really weird.
      • wollner
        joshua-anderson: I dont see anyhting wrong...the only thing strange to me, is that a stable app, working for more than 3 month without issues is crashing two days now
      • Now the first time that crashed I simply built a new cluster (didnt want to waste time anymore)
      • joshua-anderson
        Do new apps works?
      • wollner
        Now the new cluster with the same problem
      • let me try
      • joshua-anderson
        I'd try scouring the logs of all the components to see if you can find any errors. I'd also take a look at `journalctl` on the coreos hosts to see if anything was logged there.
      • What version of coreos are you running?
      • wollner
        joshua-anderson : the deploy (git push deis master) is working fine until now...lets see
      • joshua-anderson: 647.2.0
      • the one recommended by Deis
      • joshua-anderson
        Cool. I was just making sure, as I'm running out of ideas ;)
      • wollner
        joshua-anderson: It is really freak to me.... I cant understand why it stopped responding like that...second time today, and now with a brand new 3 node cluster
      • joshua-anderson: version 1.9.x gave me so much headache
      • joshua-anderson: version 1.6.x worked far better for me...had 6 month running with almost zero downtime
      • joshua-anderson
        Yeah. Normally this sort of thing only starts to happen if etcd or ceph starts to go down. I'd recommend looking into fleet if you can't find any errors in the deis platform itself.
      • Reefrunner joined the channel
      • ykitazume has quit
      • wollner
        joshua-anderson: im not a sysadmin.... deis up to now "just worked" for me...looking to all these logs is a nightmare to me...
      • joshua-anderson
        As far as I know nobody else has reported an error like this... if you figure out anything about source of the error, opening a issue could be useful to get some maintainers looking into it.
      • wollner
        joshua-anderson: well what would cause a app to stop responding to the deis controller?
      • joshua-anderson
        There are a few things that could be going on: The router isn't routing to the app (the publisher component isn't reporting it running or etcd isn't working to let the router know). The app isn't being run by the scheduler (from what you are telling me this is the case). The fleet is trying to run the app but failing because it can't pull the docker image
      • from the registry or docker can't run the image.
      • The couldn't create containers error is interesting and hints towards the last two possibilities.
      • wollner
        joshua-anderson: I hope this is not something related to memory
      • joshua-anderson
        What instances are you running deis on?
      • wollner
        joshua-anderson: I'm running the cluste on m3-large instances
      • ChesFTC has left the channel
      • joshua-anderson
        That should be fine, especially if none of your apps are running (I'd run 'docker ps' to check)
      • :P
      • wollner
        joshua-anderson: About deploying the new app
      • joshua-anderson
        Did it work?
      • wollner
        joshua-anderson: the deploy was fine, but then 'no build associatd with this release to publish'
      • joshua-anderson
        Huh. The deploy didn't throw any errors?
      • wollner
        joshua-anderson: no
      • joshua-anderson
        What? Now I'm really confused. I'm not really sure how that is possible. That would imply that the builder isn't sending build information to the controller
      • wollner
        joshua-anderson: I ran a simple git push deis master
      • the deploy was all fine
      • but then tried to access the app... not responding
      • joshua-anderson
        What does 'deis releases' show and what does 'deis builds' show
      • wollner
        then I look into the log deis logs --app=prod
      • joshua-anderson: deis-releases shows v1 and v2, btw
      • joshua-anderson
        But builds doesn't?
      • wollner
        builds prints the hash too
      • joshua-anderson
        But you get the error again when you try to scale?
      • wollner
        true
      • Scaling processes... but first, coffee!
      • 400 BAD REQUEST
      • Detail:
      • No build associated with this release
      • deis ps:scale web=1 --app=prod
      • joshua-anderson
        gh#4160 is a little similar
      • [o__o]
        Containers are out of sync with releases: https://github.com/deis/deis/issues/4160
      • joshua-anderson
        But it's not quite the same
      • Reefrunner has quit
      • wollner
        joshua-anderson: It is very strange anyways, bc this is a brand new cluster.... nothing could be wrong now... I can't really investigate further (hope I could help you guys somehow). I am either going back to Heroku or downgrading to 1.6.x
      • which, I know was pretty stable with the provisioned amazon aws
      • It wasnt a good strategy upgrading to 1.9.x right now
      • joshua-anderson
        Please open an issue if you can figure anything else about this. Sorry you're having such a hard time (I promise it's not the norm :P)
      • wollner: Do you also know about migration upgrades? You can have 2 clusters running at the same time and migrate to the new one slowly. That's a really safe strategy for upgrading (you can always roll back if something goes wrong).
      • wollner
        joshua-anderson: Yes, I am using Deis for 6 month now and I am happy with it, but the 1.9.x version was really a mistake...two new clusters with the same problem (btw following the documentation which is very easy)
      • joshua-anderson
        It's really odd though, cause lots of people use aws just like you are and nobody else has reported this.
      • wollner
        If there is anything specifically with aws, then something is wrong with the cloudformation template, because i am using it to provision
      • joshua-anderson
        If it's a issue with clean clusters, it should have been caught be now.
      • wollner
        joshua-anderson: Are u sure, that m3-larges can handle Deis? The default says m3-xlarge
      • joshua-anderson: the only strange thing I would suspect now is memory
      • joshua-anderson
        I could be wrong. I know that ci uses m3-medium, but that's very short lived.
      • A lot of people do use xlarge
      • KarolisK joined the channel
      • brutuscat joined the channel
      • brutuscat has quit
      • ykitazume joined the channel
      • mdedetrich joined the channel
      • KarolisK joined the channel
      • radamanthus has quit
      • radamanthus joined the channel
      • mdedetrich has quit
      • cz20xx has quit
      • Reefrunner joined the channel
      • pwalsh
        wollner: Deis Pro defaults to M3 large, and I've been playing with that (without real traffic) with success. If that is default on Deis Pro, I'd expect it is a good starting point
      • Reefrunn_ joined the channel
      • Reefrunner has quit
      • mdedetrich joined the channel
      • Reefrunner joined the channel
      • Reefrunn_ has quit
      • zimmermanc has quit
      • dvalfre joined the channel
      • dvalfre1 joined the channel
      • dvalfre has quit
      • ZippoWeb joined the channel
      • brutuscat joined the channel
      • brutuscat has quit
      • dvalfre1 is now known as dvalfre