#kubevirt

/

      • doonhammer has quit
      • jniederm__ joined the channel
      • fabiand_ joined the channel
      • fabiand has quit
      • doonhammer joined the channel
      • vladikr has quit
      • vladikr joined the channel
      • vladikr has quit
      • vladikr joined the channel
      • jniederm__ has quit
      • biakymet has quit
      • vladikr has quit
      • vladikr joined the channel
      • vladikr has quit
      • vladikr joined the channel
      • vladikr has quit
      • vladikr joined the channel
      • vladikr has quit
      • vladikr joined the channel
      • vladikr has quit
      • vladikr joined the channel
      • vladikr has quit
      • vladikr joined the channel
      • rmohr joined the channel
      • vladikr has quit
      • vladikr joined the channel
      • mkletzan joined the channel
      • fabiand_
        rmohr, morning
      • rmohr
        fabiand_: hi!
      • fabiand_
        rmohr, I need your support to debug the issues
      • rmohr
        fabiand_: 5 minutes
      • fromani joined the channel
      • fabiand_: I need the events
      • fabiand_: make sure to collect them with --all-namespaces included
      • fabiand_: otherwise I don't see much
      • fabiand_
        aren't they?
      • I thought those were the events with all-namespaces
      • no ..
      • okay
      • let me re-run
      • rmohr
        fabiand_: I just saw the events from the default namespace in what you have collected yesterday
      • mzamazal joined the channel
      • fabiand_: also, there is a test replicaset included, which creates three VMs, you can also deploy it and see how long it takes until all three vms are ready
      • fabiand_: I know now why it fails for you!
      • fabiand_
        oh
      • rmohr
        fabiand_: I think the issue is that it is pulling down the docker image
      • fabiand_
        tell me :)
      • here is an up to date paste:
      • I see a few weird lines:
      • default 2017-10-05 09:06:09 +0200 CEST 2017-10-05 09:06:09 +0200 CEST 1 haproxy-574c8d574-94w4z.14ea9abee0014edc Pod spec.containers{haproxy} Normal Killing kubelet, master Killing container with id docker://haproxy:Need to kill Pod
      • rmohr
        since your network connection is slow, it is not fast enough with downloading it
      • fabiand_
        docker://haproxy:Need
      • which image?
      • ain't all images present on the vms?
      • rmohr, ^^
      • default 2017-10-05 09:06:08 +0200 CEST 2017-10-05 09:06:08 +0200 CEST 7 libvirt.14ea9abe8327a0ad DaemonSet Warning FailedCreate daemon-set Error creating: pods "libvirt-" is forbidden: service account default/kubevirt-infra was not found, retry after the service account is created
      • rmohr
        fabiand_: I thought that one: kubevirt/cirros-registry-disk-demo:devel
      • lbednar joined the channel
      • fabiand_: but it says Container image "kubevirt/cirros-registry-disk-demo:devel" already present on machine
      • fabiand_: so that is not it
      • fabiand_: the service account is not an issue
      • fabiand_
        kubevirt-test-default 2017-10-05 09:12:04 +0200 CEST 2017-10-05 09:12:04 +0200 CEST 1 virt-launcher-testvm8rm7b-----6ztx4.14ea9b1182a19103 Pod spec.containers{disk0} Warning Unhealthy kubelet, node0 Readiness probe errored: rpc error: code = Unknown desc = container not running (3702c136d527472f0796252ec8c15f332e7d7efbfeef111cca588fe36f958a3b)
      • rmohr
        fabiand_: it will retry and after a few times it is there
      • fabiand_: rror: failed to start container "disk0": Error response from daemon: cannot join network of a non running container: 09fc33c58f98565d8311dab910c5705722c9cb0d4009d77f6aa640dbacb35f75
      • fabiand_
        kubevirt-test-default 2017-10-05 09:12:15 +0200 CEST 2017-10-05 09:12:15 +0200 CEST 1 virt-launcher-testvmnmnrr-----2vhg2.14ea9b14074cf33d Pod spec.containers{compute} Warning Unhealthy kubelet, node0 Readiness probe errored: rpc error: code = Unknown desc = container not running (e348e53624151d2521a48f5ae5b2273721482f7ee5617e7d75ce836e338a094a)
      • rmohr
        fabiand_: rror: failed to start container "disk0": Error response from daemon: cannot join network of a non running container: 09fc33c58f98565d8311dab910c5705722c9cb0d4009d77f6aa640dbacb35f75
      • that one should be it
      • biakymet joined the channel
      • biakymet has quit
      • fabiand_
        other tests were nto related to registry disks
      • rmohr, did you see the stack trace aet the end of https://paste.fedoraproject.org/paste/sRkKz4vIp...
      • rmohr
        fabiand_: yes the stack trace is not so much an issue
      • fabiand_
        ok
      • rmohr
        fabiand_: our tests will try to clean up the namespace for one minute
      • fabiand_
        so is
      • VirtualMachineReplicaSet
      • [90m/root/go/src/kubevirt.io/kubevirt/tests/replicaset_test.go:162[0m
      • A valid VirtualMachineReplicaSet given
      • [90m/root/go/src/kubevirt.io/kubevirt/tests/replicaset_test.go:161[0m
      • [91m[1mshould update readyReplicas once VMs are up [It][0m
      • pkotas joined the channel
      • [90m/root/go/src/kubevirt.io/kubevirt/tests/replicaset_test.go:109[0m
      • [91mTimed out after 60.000s.
      • Expected
      • <int>: 0
      • to equal
      • <int>: 2[0m
      • also using registry disk?
      • rmohr
        fabiand_: yes
      • fabiand_
        ah
      • rmohr
        fabiand_: so the stacktrace is printed because ginkgo kills all goroutines
      • fabiand_: so for some reason, the disk0 container does not start for ages because of this error (but sooner or later it seems to succeed)
      • fabiand_: this error: cannot join network of a non running container
      • fabiand_: so our tests time out
      • fabiand_
        hm
      • I'm reverting to before vossels rework
      • let's see if they pass there
      • actually, vossel provided a patch for some issue
      • rmohr
        fabiand_: btw. they pass for me with vossels change
      • fabiand_: they also pass on CI
      • fabiand_
        right
      • then it's really interesting to understand how this issue can be debugged
      • rmohr
        fabiand_: could be a bug in our code or k8s hard to tell: https://github.com/kubernetes/kubernetes/issues...
      • fabiand_: others see that too, but not much data there :/
      • fabiand_
        I'm checking now before vossels changes, and then with his + his prs
      • biakymet joined the channel
      • what-a-bot joined the channel
      • NOTICE: [kubevirt] rmohr opened pull request #490: Update debugging guide on how to collect events for all namespaces (master...debugging) https://git.io/vdBZi
      • what-a-bot has left the channel
      • rmohr, when did you recreat your vagrant env the last time?
      • rmohr
        fabiand_: yesterday
      • fabiand_
        hm
      • and NUM_NODES=1 ?
      • rmohr
        yes
      • fabiand_
        ok
      • two failures
      • volumes for pod "virt-launcher-testvmpvhn8-----vv29b_kubevirt-test-default(fca2e188-a99c-11e7-a0ad-525400e23a00)": timeout expired waiting for volumes to attach/mount for pod "kubevirt-test-default"/"virt-launcher-testvmpvhn8-----vv29b". list of unattached/unmounted volumes=[sockets default-token-5fjcj]
      • avctually
      • kubevirt-test-default 2017-10-05 09:17:45 +0200 CEST 2017-10-05 09:17:45 +0200 CEST 1 virt-launcher-testvmpvhn8-----vv29b.14ea9b60e4265409 Pod Warning FailedMount kubelet, node0 Unable to mount volumes for pod "virt-launcher-testvmpvhn8-----vv29b_kubevirt-test-default(fca2e188-a99c-11e7-a0ad-525400e23a00)": timeout expired waiting for volumes to attach/mount for pod "kubevirt-test-default"/"virt-launcher-testvmpvh
      • n8-----vv29b". list of unattached/unmounted volumes=[sockets default-token-5fjcj]
      • rmohr, could you please take a look at the events again?
      • correlation between test and events is difficult
      • rmohr
        fabiand_: I see the same errors again
      • fabiand_
        so, how can I fix i?
      • rmohr,
      • rmohr
        fabiand: I don't know. Either an error in the registry disk code or in k8s
      • fabiand: the failed mounts are btw. always there
      • fabiand
        do you know why?
      • rmohr
        fabiand: no, but they appear in general with different frequency with different k8s versions, also on ordinary pods
      • fabiand: so I think the only relevant one is "cannot join network of a non running container", since it will prevent virt-launcher from becoming "ready", and therefor the VM will not be started ...
      • fabiand goes back to v0.0.2-141-g6e76b8a and retries
      • fabiand
        right
      • rmohr
        fabiand: I am not sure if that will help you, since you will still have the newer k8s version
      • fabiand
        are you suggesting to pin an older k8s version?
      • rmohr
        fabiand: could you try to update weave to the latest version?
      • fabiand
        currently rebuilding vagrant
      • biakymet has quit
      • rmohr
        fabiand: btw, are you using the docker cache?
      • fabiand
        not at all
      • rmohr
        fabiand: too bad
      • fabiand
        I'd say that it depends on the POV :)
      • rmohr
        fabiand: yes, if you have plenty of time why not
      • fabiand
        … or if I want to make sure that the whole chain works.
      • rmohr
      • fabiand: that will install the latest weave version
      • fabiand: maybe the errors go away then