#docker-dev

/

      • shykes
        So, 2 things
      • 1) yes we should allow every and all customization of the system networking stack. Specify IP, specify bridge, multiple interfaces, macvlan, tunnels, custom netfilter rules, whatever
      • 2) But only in a way that it doesn't affect inter-container communication (ie. application-level networking)
      • that means, first, we have to upgrade links to longer depend on the underlying IP stack
      • and use file descriptor passing instead (with optional in-namespace adapters)
      • BeanDip
        darren0: ":" will not work well when you are also specifying a mac address
      • shykes
        So I am going to sequence things
      • darren0
        BeanDip: parsing will still work fine. the first : is the deliminator.
      • lk4d4
        maybe --net=bridge --net=ip:... --net=gateway:... ?
      • cpuguy83
        BeanDip: mac just wouldn't use the colons in it.
      • lk4d4
        "," looks like hell
      • vish_
        shykes: go on.
      • shykes
        first, let's upgrade links
      • then, when links are shielded from people fucking around with their system networking stack, we can merge these system networking changes
      • brendanburns_ joined the channel
      • darren0
        wait... are you saying before you can do vish_ proposal you want to fix links?
      • shykes
        yes
      • unless it doesnt break links
      • does it?
      • darren0
        why? vish_ proposal is simple and small in scope and doesn't break the contract with libcontainer. so no poratbility issue or anything
      • lk4d4
        nope
      • vish_
        shykes: Specifying IPs should not break links
      • shykes
        this is not about libcontainer
      • darren0
        weren't not asking for all possible changes, just don't use ipallocator and let the user tell you the IP
      • juztin_ has quit
      • pblittle has quit
      • shykes
        will inter-container traffic still be routable?
      • lk4d4
        darren0: hmmm, ipallocator traces for ip collisions
      • why we should avoid it?
      • polvi_ is now known as polvi
      • darren0
        lk4d4: ip collisions are up to the user now
      • shykes: yes the inter-container traffic will be routable throw the host networking, but its really up to the user to ensure that works proper since they are taking over management of the IP space
      • shykes
        it's up to the user if it's the user calling docker directly
      • if it's a 3d-party orchestration tool, it's up to that tool not to break it
      • timthelion joined the channel
      • and I'm seeing examples of orchestration tools abusing those options to break links
      • and implement their own service discovery system
      • lk4d4
        shykes: because now links not very cool :)
      • shykes
        I've been pretty tolerant of that because we didn't really have a good solution to portable service discovery anyway
      • but now we do
      • darren0
        shykes: why's that bad? if they respect the current format of the link ENV vars, why not let them?
      • shykes
        that's what I mean, they don't
      • darren0
        then shame on them, they broke the contract
      • mrunalp has quit
      • mrunalp1 joined the channel
      • shykes
        fair enough
      • tombar_ has quit
      • Ok, so specifying IP seems reasonable. What about different bridges?
      • Wouldn't that break routing?
      • tombar joined the channel
      • darren0
        i think higher level orchestration tools should be able to interject their own service discovery as long as they comform to the docker "interface" which stands today as only the link ENV vars
      • cpuguy83
        If the host can route it, it shouldn't break anything... but also I don't think his PR has custom bridges, I could be wrong.
      • darren0
        i think custom bridge PR will be a seperate PR, vish_ right?
      • vish_
        darren0: Yes. That is PR #6704
      • lk4d4
        darren0: I never use link ENV vars, only /etc/hosts records :)
      • darren0
        lk4d4: how do you know the PORT then?
      • cpuguy83
        lk4d4: Yes, this was the most awesome feature added to Docker ever.
      • lk4d4
        why I need the port? I run all my applications on default ports as long as them on different ips
      • cpuguy83
        darren0: You generally have to know the port using the envvars anyway.
      • vish_
        shykes: When users specify the bridge to use, the host networking is outside of docker's contract right?
      • lk4d4
        all my postgreses on 5432 for example
      • darren0
        shykes: links today are L3 routing, so if i have two containers on two bridges, they *should* route fine. i'm thinking there could be a hiccup with the NAT rules, but proabbly not
      • s_b joined the channel
      • BeanDip
        Service discovery can definitely live within the docker realm if you are 100% dockerized in your app stack
      • lk4d4
        we have sort of service discovery with fleet from coreos and ambassadors
      • BeanDip
        However, there is a high degree of likelihood that people are going to be using containers as a component of a hybrid stack and have need to utilize a third party service discovery piece
      • lk4d4
        but network engineers is pretty afraid of ambassadors
      • the want multihost links
      • darren0
        cpuguy83: lk4d4 your supposed to look for TCP_MYSQL_3306 (or whatever the format is). and that tells you 1.1.1.1:45123. if your just looking up the IP and then using the container 3306, your doing it wrong and that will make fancy service discovery harder in the future
      • erikh
        the problem with ENV is re-linking
      • BeanDip
        lk4d4: that's because ambassadors provide a very rudimentary sceurity and policy model
      • cpuguy83
        darren0: But that's not what happens, it does MYSQ_PORT_3306_TCP_PORT=3306
      • darren0
        i link we've gotten off on a tangent. shykes, do you have further comments
      • erikh
        e.g., if the web service is already started, and the mysql container needs to be replaced
      • lk4d4
        and this is hell
      • darren0
        cpuguy83: your not supposed to know that =3306, your screwing with the contract really.
      • cpuguy83
        But we are talking about vish_'s PR about being able to not use the port allocator and just use a custom IP
      • vish_
        shykes: Are you ok with changing '--net' to take in key value pairs, which will include IP, Gateway and possibly mac in the future.
      • This should not break links.
      • lk4d4
        I'm for array with pairs for this
      • like -v and --link
      • shykes
        I don't think it's a tangent
      • but yeah, I'm fine with it
      • darren0
        lk4d4: does that internally break the docker remote API, I'm assume the remote API
      • lk4d4: has it as a string today
      • shykes
        I'm fine with adding these capabilities
      • not sure about the exact syntax
      • anything that breaks reverse-compatibility of flags and api is going to be hard to merge
      • vish_
        shykes: This change should be backwards compatible.
      • Ok I will go ahead and send a PR out some time soon. We can continue this discussion on that PR.
      • jpetazzo is now known as zz_jpetazzo
      • tianon: I think you can move on to the next topic now... :)
      • tianon
        haha
      • good call
      • it's closely related
      • #topic gh#6101 - allocating IPs from CIDR (fkautz)
      • [o__o]
        Implement allocating IPs from CIDR within bridge network : https://github.com/dotcloud/docker/pull/6101
      • tianon
        lk4d4's favorite topic
      • vish_
        :)
      • lk4d4
        shykes: it's really waiting for your word :)
      • seventh week also :D
      • fredlf has quit
      • hah, that's funny
      • channel full of who cares
      • dockerbot joined the channel
      • dockerbot
        [13docker] 15vieux closed pull request #6938: Style fixes (06master...06style_fixes) 02http://git.io/uUvo5g
      • dockerbot has left the channel
      • tianon
        looks like fkautz isn't here ;)
      • vbatts|work
        :-\
      • he was around just a bit ago
      • lk4d4
        what more important, that shykes isn't here :)
      • timthelion has quit
      • fkautz
        I'm on
      • Shykes isn't here :p
      • I'll find another way to ping him
      • Next topic
      • tianon
        will do
      • #topic gh#6791 - run docker from a unit file (darren0)
      • [o__o]
        Propose better way to run docker from a unit file: https://github.com/dotcloud/docker/issues/6791
      • darren0
        I'd like to talk about gh#6791 which is about running docker containers from a systemd unit.
      • [o__o]
        Propose better way to run docker from a unit file: https://github.com/dotcloud/docker/issues/6791
      • darren0
        I'd like to keep this discussion focused solely on one specific issue, not a general systemd/docker discussion.
      • This will be a bit of a monologue, just hold in there and let me explain this first.
      • The main problem is that the only way to launch docker from a systemd unit today is to have systemd actually monitor the docker client, not the docker server
      • You have to do ExecStart=docker start -a or =docker run, without the -d. This will keep the docker client running in the foreground.
      • So the state of your systemd unit really has nothing to do with the state of your docker container.
      • You can kill the docker client, the container is still running, and systemd will say the unit is stopped/failed.
      • To try to get this to work nicely is really hard, and in the end just doesn't work.
      • After some hackery I've found that to get systemd to properly monitor the docker container and not the client, you need two things
      • First, you need to tell systemd what pid to monitor, so pid 1 of your container needs to be communicated to systemd
      • Second, you need the pid 1 of your container to be in the name=systemd cgroup of your systemd unit.
      • If you minimally do those two things, the state of your systemd unit and the docker container seem to stay intact
      • vrm joined the channel
      • You can reliably start/stop/kill from systemd, and you can also docker stop/kill, or you can directly kill the PID 1. It all seems to stay in sync.
      • What I'm currently proposing is the following systemd specific change.
      • I'd like to add an option "docker run --from-systemd-unit" which is used specifically for when running docker in a systemd unit
      • This flag will minimally do two things
      • First, it will pick up the $NOTIFY_SOCKET and the cgroup path of the name=systemd cgroup and pass it on to the docker daemon
      • Now when the docker daemon launches your container, we don't use cgroup/systemd but instead go back to cgroup/fs (we can discuss why when I'm done with this monologue)
      • So we setup the cgroups using cgroup/fs but the only difference is that we put the PID of PID 1 into the name=systemd/cgroup.procs.
      • Then, right before we exec the users PID in dockerinit we notify systemd of the PID, by writing "MAINPID=1234" to $NOTIFY_SOCKET.
      • That's it, if we do that minimally, systemd should monitor the docker container and not the docker client and docker from a systemd unit will be far more feasible.
      • Thoughts? Proceed to point out the flaws in my logic...
      • lk4d4
        darren0: did you look at coreos fleet?
      • we use it in production
      • darren0
        yes, the root of this is based in issue with fleet
      • fleet calls systemd calls docker
      • systemd->docker is unreliable, therefore fleet is unreliable
      • lk4d4
        and unit is controlled by etcd records IIRC
      • darren0
        so fleet doesn't solve anything