Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial start of SSVM sometimes fails resulting in failed builds #87

Open
remibergsma opened this issue Oct 1, 2016 · 0 comments
Open

Comments

@remibergsma
Copy link
Member

remibergsma commented Oct 1, 2016

We see builds hanging waiting on the templates to become ready. This never happens, because the deployment/start of the ssvm fails somehow.

On the KVM agent the VM is never started. All it gets is a stopCommand to clean it up:

2016-10-01 03:55:33.179 DEBUG (logid: 3bd85172) 10333 --- [agentRequest-Handler-4] com.cloud.agent.service.Agent            : Request:Seq 2-2335397881768312842:  { Cmd , MgmtId: 90520732674657, via: 2, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"checkBeforeCleanup":false,"isProxy":false,"vmName":"s-2-VM","executeInSequence":false,"wait":0}}] }
2016-10-01 03:55:33.179 DEBUG (logid: 3bd85172) 10333 --- [agentRequest-Handler-4] com.cloud.agent.service.Agent            : Processing command: com.cloud.agent.api.StopCommand
2016-10-01 03:55:33.179 DEBUG (logid: 3bd85172) 10333 --- [agentRequest-Handler-4] c.c.h.k.r.LibvirtComputingResource       : Processing cmd com.cloud.agent.api.StopCommand
2016-10-01 03:55:33.179 DEBUG (logid: 3bd85172) 10333 --- [agentRequest-Handler-4] c.c.h.kvm.resource.LibvirtConnection     : Looking for libvirtd connection at: qemu:///system
2016-10-01 03:55:33.181 DEBUG (logid: 3bd85172) 10333 --- [agentRequest-Handler-4] c.c.h.kvm.resource.LibvirtConnection     : Can not find KVM connection for Instance: s-2-VM, continuing.
2016-10-01 03:55:33.181  WARN (logid: 3bd85172) 10333 --- [agentRequest-Handler-4] c.c.h.kvm.resource.LibvirtConnection     : Can not find a connection for Instance s-2-VM. Assuming the default connection.
2016-10-01 03:55:33.181 DEBUG (logid: 3bd85172) 10333 --- [agentRequest-Handler-4] c.c.h.kvm.resource.LibvirtConnection     : Looking for libvirtd connection at: qemu:///system
2016-10-01 03:55:33.184 DEBUG (logid: 3bd85172) 10333 --- [agentRequest-Handler-4] c.c.h.k.r.LibvirtComputingResource       : Failed to get dom xml: org.libvirt.LibvirtException: Domain not found: no domain with matching name 's-2-VM'
2016-10-01 03:55:33.185 DEBUG (logid: 3bd85172) 10333 --- [agentRequest-Handler-4] c.c.h.k.r.LibvirtComputingResource       : Failed to get dom xml: org.libvirt.LibvirtException: Domain not found: no domain with matching name 's-2-VM'
2016-10-01 03:55:33.185 DEBUG (logid: ade9abca) 10333 --- [agentRequest-Handler-1] com.cloud.agent.service.Agent            : Request:Seq 2-2335397881768312843:  { Cmd , MgmtId: 90520732674657, via: 2, Ver: v1, Flags: 100111, [{"org.apache.cloudstack.storage.command.CopyCommand":{"srcTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"path":"template/tmpl/1/3/","origUrl":"http://cloudstack.apt-get.eu/systemvm/4.6/systemvm64template-4.6.0-kvm.qcow2.bz2","uuid":"f470b845-698c-4d5b-af58-3682b75f58e2","id":3,"format":"QCOW2","accountId":1,"checksum":"c059b0d051e0cd6fbe9d5d4fc40c7e5d","hvm":false,"displayText":"SystemVM Template (KVM)","imageDataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://192.168.22.1:/data/storage/secondary/MCCT-SHARED-1","_role":"Image"}},"name":"routing-3","hypervisorType":"KVM"}},"destTO":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"origUrl":"http://cloudstack.apt-get.eu/systemvm/4.6/systemvm64template-4.6.0-kvm.qcow2.bz2","uuid":"f470b845-698c-4d5b-af58-3682b75f58e2","id":3,"format":"QCOW2","accountId":1,"checksum":"c059b0d051e0cd6fbe9d5d4fc40c7e5d","hvm":false,"displayText":"SystemVM Template (KVM)","imageDataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"812ea6a3-7ad0-30f4-9cab-01e3f2985b98","id":1,"url":"NetworkFilesystem://192.168.22.1/data/storage/primary/MCCT-KVM-1/?ROLE=Primary&STOREUUID=812ea6a3-7ad0-30f4-9cab-01e3f2985b98","poolType":"NetworkFilesystem","host":"192.168.22.1","path":"/data/storage/primary/MCCT-KVM-1","port":2049}},"name":"routing-3","hypervisorType":"KVM"}},"executeInSequence":true,"options":{},"options2":{},"wait":10800}}] }
2016-10-01 03:55:33.185 DEBUG (logid: ade9abca) 10333 --- [agentRequest-Handler-1] com.cloud.agent.service.Agent            : Processing command: org.apache.cloudstack.storage.command.CopyCommand
2016-10-01 03:55:33.185 DEBUG (logid: ade9abca) 10333 --- [agentRequest-Handler-1] c.c.h.k.r.LibvirtComputingResource       : Processing cmd org.apache.cloudstack.storage.command.CopyCommand
2016-10-01 03:55:33.185 DEBUG (logid: 3bd85172) 10333 --- [agentRequest-Handler-4] c.c.h.k.r.LibvirtComputingResource       : Failed to get dom xml: org.libvirt.LibvirtException: Domain not found: no domain with matching name 's-2-VM'
2016-10-01 03:55:33.185 DEBUG (logid: 3bd85172) 10333 --- [agentRequest-Handler-4] c.c.h.k.r.LibvirtComputingResource       : Executing: /opt/cosmic/agent/./scripts/vm/network/security_group.py destroy_network_rules_for_vm --vmname s-2-VM 

On the mgt server side we see that it wants to spin a new SSVM:

2016-10-01 03:55:32.698 DEBUG [c.c.s.s.SecondaryStorageManagerImpl] (logid: 4a2cdd27) (ctx: ca3f04b1) Zone 1 is ready to launch secondary storage VM
2016-10-01 03:55:32.698 INFO  [c.c.s.s.SecondaryStorageManagerImpl] (logid: 4a2cdd27) (ctx: ca3f04b1) Scanning secondary storage pool 1
2016-10-01 03:55:32.699 INFO  [c.c.s.s.SecondaryStorageManagerImpl] (logid: 4a2cdd27) (ctx: ca3f04b1) Found less (0) secondary storage VMs than image stores (1) in dcId=1, starti
ng 1 new VMs
2016-10-01 03:55:32.699 INFO  [c.c.s.s.SecondaryStorageManagerImpl] (logid: 4a2cdd27) (ctx: ca3f04b1) Resizing secondary storage pool (dcId=1) with action com.cloud.vm.AfterScanA
ction@3fd95ab1[action=EXPAND,value=1]
2016-10-01 03:55:32.699 DEBUG [c.c.s.SystemVmManagerBase] (logid: 4a2cdd27) (ctx: ca3f04b1) Expanding pool [iteration 1/1]
2016-10-01 03:55:32.700 INFO  [c.c.s.s.SecondaryStorageManagerImpl] (logid: 4a2cdd27) (ctx: ca3f04b1) No stopped secondary storage vm is available, need to allocate a new seconda
ry storage vm
2016-10-01 03:55:32.714 DEBUG [c.c.s.s.SecondaryStorageManagerImpl] (logid: 4a2cdd27) (ctx: ca3f04b1) Assign secondary storage vm from a newly started instance for request from d
ata center : 1

It provisions a new one:

2016-10-01 03:55:32.733 DEBUG [o.a.c.e.o.NetworkOrchestrator] (logid: 4a2cdd27) (ctx: ca3f04b1) Found existing network configuration for offering [Network Offering [1-Public-System-Public-Network]: Ntwk[200|Public|1]
2016-10-01 03:55:32.733 DEBUG [o.a.c.e.o.NetworkOrchestrator] (logid: 4a2cdd27) (ctx: ca3f04b1) Releasing lock for Acct[61609720-8779-11e6-8d58-5254001daa61-system]
2016-10-01 03:55:32.735 DEBUG [o.a.c.e.o.NetworkOrchestrator] (logid: 4a2cdd27) (ctx: ca3f04b1) Found existing network configuration for offering [Network Offering [3-Control-System-Control-Network]: Ntwk[202|Control|3]
2016-10-01 03:55:32.735 DEBUG [o.a.c.e.o.NetworkOrchestrator] (logid: 4a2cdd27) (ctx: ca3f04b1) Releasing lock for Acct[61609720-8779-11e6-8d58-5254001daa61-system]
2016-10-01 03:55:32.736 DEBUG [o.a.c.e.o.NetworkOrchestrator] (logid: 4a2cdd27) (ctx: ca3f04b1) Found existing network configuration for offering [Network Offering [2-Management-System-Management-Network]: Ntwk[201|Management|2]
2016-10-01 03:55:32.736 DEBUG [o.a.c.e.o.NetworkOrchestrator] (logid: 4a2cdd27) (ctx: ca3f04b1) Releasing lock for Acct[61609720-8779-11e6-8d58-5254001daa61-system]
2016-10-01 03:55:32.742 DEBUG [c.c.v.VirtualMachineManagerImpl] (logid: 4a2cdd27) (ctx: ca3f04b1) Allocating entries for VM: VM[SecondaryStorageVm|s-2-VM]
2016-10-01 03:55:32.743 DEBUG [c.c.v.VirtualMachineManagerImpl] (logid: 4a2cdd27) (ctx: ca3f04b1) Allocating nics for VM[SecondaryStorageVm|s-2-VM]
2016-10-01 03:55:32.744 DEBUG [o.a.c.e.o.NetworkOrchestrator] (logid: 4a2cdd27) (ctx: ca3f04b1) Allocating nic for vm VM[SecondaryStorageVm|s-2-VM] in network Ntwk[200|Public|1] with requested profile NicProfile[0-0-null-null-null
2016-10-01 03:55:32.752 DEBUG [o.a.c.e.o.NetworkOrchestrator] (logid: 4a2cdd27) (ctx: ca3f04b1) Allocating nic for vm VM[SecondaryStorageVm|s-2-VM] in network Ntwk[202|Control|3] with requested profile null
2016-10-01 03:55:32.753 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl] (logid: 47775c69) (ctx: 723322da) Submitting Async Job class org.apache.cloudstack.framework.jobs.impl.VmWorkJobVO for object type VmWorkJobQueue
2016-10-01 03:55:32.754 DEBUG [o.a.c.e.o.NetworkOrchestrator] (logid: 4a2cdd27) (ctx: ca3f04b1) Allocating nic for vm VM[SecondaryStorageVm|s-2-VM] in network Ntwk[201|Management|2] with requested profile null
2016-10-01 03:55:32.757 DEBUG [c.c.v.VirtualMachineManagerImpl] (logid: 4a2cdd27) (ctx: ca3f04b1) Allocating disks for VM[SecondaryStorageVm|s-2-VM]
2016-10-01 03:55:32.761 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (logid: 47775c69) (ctx: 723322da) Sync job-14 execution on object VmWorkJobQueue.1
2016-10-01 03:55:32.767 DEBUG [c.c.v.VirtualMachineManagerImpl] (logid: 4a2cdd27) (ctx: ca3f04b1) Allocation completed for VM: VM[SecondaryStorageVm|s-2-VM]
2016-10-01 03:55:32.769 DEBUG [c.c.a.SecondaryStorageVmAlertAdapter] (logid: 4a2cdd27) (ctx: ca3f04b1) received secondary storage vm alert

That seems to complete just fine (deployment/resources/etc):

2016-10-01 03:55:32.767 DEBUG [c.c.v.VirtualMachineManagerImpl] (logid: 4a2cdd27) (ctx: ca3f04b1) Allocation completed for VM: VM[SecondaryStorageVm|s-2-VM]

This is the start job:

2016-10-01 03:55:33.406 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (logid: d4f62c28) (ctx: 38e4a9ae) (job: 13/job: 15) Add job-15 into job monitoring
2016-10-01 03:55:33.410 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (logid: 3bd85172) (ctx: 38e4a9ae) (job: 13/job: 15) Executing AsyncJobVO {id:15, userId: 1, accountId: 1, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart, cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 90520732674657, completeMsid: null, lastUpdated: null, lastPolled: null, created: Sat Oct 01 03:55:32 CEST 2016}

This however fails:

2016-10-01 03:55:33.939 ERROR [c.c.v.VmWorkJobHandlerProxy] (logid: 3bd85172) (ctx: f0c74c94) (job: 13/job: 15) Invocation exception, caused by: java.lang.ClassCastException: net
.sf.cglib.proxy.Enhancer$EnhancerFactoryData cannot be cast to net.sf.cglib.proxy.Factory
2016-10-01 03:55:33.939 INFO  [c.c.v.VmWorkJobHandlerProxy] (logid: 3bd85172) (ctx: f0c74c94) (job: 13/job: 15) Rethrow exception java.lang.ClassCastException: net.sf.cglib.proxy.Enhancer$EnhancerFactoryData cannot be cast to net.sf.cglib.proxy.Factory
2016-10-01 03:55:33.939 DEBUG [c.c.v.VmWorkJobDispatcher] (logid: 3bd85172) (job: 13/job: 15) Done with run of VM work job: com.cloud.vm.VmWorkStart for VM 2, job origin: 13
2016-10-01 03:55:33.940 ERROR [c.c.v.VmWorkJobDispatcher] (logid: 3bd85172) (job: 13/job: 15) Unable to complete AsyncJobVO {id:15, userId: 1, accountId: 1, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart, cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAAAnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 90520732674657, completeMsid: null, lastUpdated: null, lastPolled: null, created: Sat Oct 01 03:55:32 CEST 2016}, job origin:13
com.cloud.exception.CloudException: Caught exception while handling a VmWorkJob
        at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:104)
        at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:3477)
        at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:89)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$1.runInContext(AsyncJobManagerImpl.java:260)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:39)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:38)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:84)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:35)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:36)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:206)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: net.sf.cglib.proxy.Enhancer$EnhancerFactoryData cannot be cast to net.sf.cglib.proxy.Factory
        at org.apache.cloudstack.framework.async.AsyncCallbackDispatcher.getTarget(AsyncCallbackDispatcher.java:79)
        at org.apache.cloudstack.storage.volume.VolumeServiceImpl.createBaseImageAsync(VolumeServiceImpl.java:160)
        at org.apache.cloudstack.storage.volume.VolumeServiceImpl.createVolumeFromTemplateAsync(VolumeServiceImpl.java:756)
        at org.apache.cloudstack.engine.orchestration.VolumeOrchestrator.recreateVolume(VolumeOrchestrator.java:1187)
        at org.apache.cloudstack.engine.orchestration.VolumeOrchestrator.prepare(VolumeOrchestrator.java:1028)
        at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1936)
        at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1769)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:88)
        ... 14 common frames omitted
2016-10-01 03:55:33.954 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (logid: 3bd85172) (job: 13/job: 15) Complete async job-15, jobStatus: FAILED, resultCode: 0, result: rO0ABXNyACJjb20uY2xvdWQuZXhjZXB0aW9uLkNsb3VkRXhjZXB0aW9ueeiOG_HkRMcCAAJMAAtjc0Vycm9yQ29kZXQAE0xqYXZhL2xhbmcvSW50ZWdlcjtMAAZpZExpc3R0ABVMamF2YS91dGlsL0FycmF5TGlzdDt4cgATamF2YS5sYW5nLkV4Y2VwdGlvbtD9Hz4aOxzEAgAAeHIAE2phdmEubGFuZy5UaHJvd2FibGXVxjUnOXe4ywMABEwABWNhdXNldAAVTGphdmEvbGFuZy9UaHJvd2FibGU7TAANZGV0YWlsTWVzc2FnZXQAEkxqYXZhL2xhbmcvU3RyaW5nO1sACnN0YWNrVHJhY2V0AB5bTGphdmEvbGFuZy9TdGFja1RyYWNlRWxlbWVudDtMABRzdXBwcmVzc2VkRXhjZXB0aW9uc3QAEExqYXZhL3V0aWwvTGlzdDt4cHNyABxqYXZhLmxhbmcuQ2xhc3NDYXN0RXhjZXB0aW9ugAAFzs5n5VwCAAB4cgAaamF2YS5sYW5nLlJ1bnRpbWVFeGNlcHRpb26eXwZHCjSD5QIAAHhxAH4AA3EAfgAMdABcbmV0LnNmLmNnbGliLnByb3h5LkVuaGFuY2VyJEVuaGFuY2VyRmFjdG9yeURhdGEgY2Fubm90IGJlIGNhc3QgdG8gbmV0LnNmLmNnbGliLnByb3h5LkZhY3Rvcnl1cgAeW0xqYXZhLmxhbmcuU3RhY2tUcmFjZUVsZW1lbnQ7AkYqPDz9IjkCAAB4cAAAABpzcgAbamF2YS5sYW5nLlN0YWNrVHJhY2VFbGVtZW50YQnFmiY23YUCAARJAApsaW5lTnVtYmVyTAAOZGVjbGFyaW5nQ2xhc3NxAH4ABkwACGZpbGVOYW1lcQB-AAZMAAptZXRob2ROYW1lcQB-AAZ4cAAAAE90AD1vcmcuYXBhY2hlLmNsb3Vkc3RhY2suZnJhbWV3b3JrLmFzeW5jLkFzeW5jQ2FsbGJhY2tEaXNwYXRjaGVydAAcQXN5bmNDYWxsYmFja0Rpc3BhdGNoZXIuamF2YXQACWdldFRhcmdldHNxAH4AEAAAAKB0ADZvcmcuYXBhY2hlLmNsb3Vkc3RhY2suc3RvcmFnZS52b2x1bWUuVm9sdW1lU2VydmljZUltcGx0ABZWb2x1bWVTZXJ2aWNlSW1wbC5qYXZhdAAUY3JlYXRlQmFzZUltYWdlQXN5bmNzcQB-ABAAAAL0cQB-ABZxAH4AF3QAHWNyZWF0ZVZvbHVtZUZyb21UZW1wbGF0ZUFzeW5jc3EAfgAQAAAEo3QAPW9yZy5hcGFjaGUuY2xvdWRzdGFjay5lbmdpbmUub3JjaGVzdHJhdGlvbi5Wb2x1bWVPcmNoZXN0cmF0b3J0ABdWb2x1bWVPcmNoZXN0cmF0b3IuamF2YXQADnJlY3JlYXRlVm9sdW1lc3EAfgAQAAAEBHEAfgAccQB-AB10AAdwcmVwYXJlc3EAfgAQAAAHkHQAJmNvbS5jbG91ZC52bS5WaXJ0dWFsTWFjaGluZU1hbmFnZXJJbXBsdAAeVmlydHVhbE1hY2hpbmVNYW5hZ2VySW1wbC5qYXZhdAAQb3JjaGVzdHJhdGVTdGFydHNxAH4AEAAABulxAH4AInEAfgAjcQB-ACRzcQB-ABD____-dAAkc3VuLnJlZmxlY3QuTmF0aXZlTWV0aG9kQWNjZXNzb3JJbXBsdAAdTmF0aXZlTWV0aG9kQWNjZXNzb3JJbXBsLmphdmF0AAdpbnZva2Uwc3EAfgAQAAAAPnEAfgAncQB-ACh0AAZpbnZva2VzcQB-ABAAAAArdAAoc3VuLnJlZmxlY3QuRGVsZWdhdGluZ01ldGhvZEFjY2Vzc29ySW1wbHQAIURlbGVnYXRpbmdNZXRob2RBY2Nlc3NvckltcGwuamF2YXEAfgArc3EAfgAQAAAB8nQAGGphdmEubGFuZy5yZWZsZWN0Lk1ldGhvZHQAC01ldGhvZC5qYXZhcQB-ACtzcQB-ABAAAABYdAAiY29tLmNsb3VkLnZtLlZtV29ya0pvYkhhbmRsZXJQcm94eXQAGlZtV29ya0pvYkhhbmRsZXJQcm94eS5qYXZhdAAPaGFuZGxlVm1Xb3JrSm9ic3EAfgAQAAANlXEAfgAicQB-ACNxAH4ANXNxAH4AEAAAAFl0ACBjb20uY2xvdWQudm0uVm1Xb3JrSm9iRGlzcGF0Y2hlcnQAGFZtV29ya0pvYkRpc3BhdGNoZXIuamF2YXQABnJ1bkpvYnNxAH4AEAAAAQR0AD9vcmcuYXBhY2hlLmNsb3Vkc3RhY2suZnJhbWV3b3JrLmpvYnMuaW1wbC5Bc3luY0pvYk1hbmFnZXJJbXBsJDF0ABhBc3luY0pvYk1hbmFnZXJJbXBsLmphdmF0AAxydW5JbkNvbnRleHRzcQB-ABAAAAAndAA-b3JnLmFwYWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQuY29udGV4dC5NYW5hZ2VkQ29udGV4dFJ1bm5hYmxlJDF0ABtNYW5hZ2VkQ29udGV4dFJ1bm5hYmxlLmphdmF0AANydW5zcQB-ABAAAAAmdABCb3JnLmFwYWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQuY29udGV4dC5pbXBsLkRlZmF1bHRNYW5hZ2VkQ29udGV4dCQxdAAaRGVmYXVsdE1hbmFnZWRDb250ZXh0LmphdmF0AARjYWxsc3EAfgAQAAAAVHQAQG9yZy5hcGFjaGUuY2xvdWRzdGFjay5tYW5hZ2VkLmNvbnRleHQuaW1wbC5EZWZhdWx0TWFuYWdlZENvbnRleHRxAH4ARXQAD2NhbGxXaXRoQ29udGV4dHNxAH4AEAAAACNxAH4ASHEAfgBFdAAOcnVuV2l0aENvbnRleHRzcQB-ABAAAAAkdAA8b3JnLmFwYWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQuY29udGV4dC5NYW5hZ2VkQ29udGV4dFJ1bm5hYmxlcQB-AEFxAH4AQnNxAH4AEAAAAM5xAH4APHEAfgA9cQB-AEJzcQB-ABAAAAH_dAAuamF2YS51dGlsLmNvbmN1cnJlbnQuRXhlY3V0b3JzJFJ1bm5hYmxlQWRhcHRlcnQADkV4ZWN1dG9ycy5qYXZhcQB-AEZzcQB-ABAAAAEKdAAfamF2YS51dGlsLmNvbmN1cnJlbnQuRnV0dXJlVGFza3QAD0Z1dHVyZVRhc2suamF2YXEAfgBCc3EAfgAQAAAEdnQAJ2phdmEudXRpbC5jb25jdXJyZW50LlRocmVhZFBvb2xFeGVjdXRvcnQAF1RocmVhZFBvb2xFeGVjdXRvci5qYXZhdAAJcnVuV29ya2Vyc3EAfgAQAAACaXQALmphdmEudXRpbC5jb25jdXJyZW50LlRocmVhZFBvb2xFeGVjdXRvciRXb3JrZXJxAH4AV3EAfgBCc3EAfgAQAAAC6XQAEGphdmEubGFuZy5UaHJlYWR0AAtUaHJlYWQuamF2YXEAfgBCc3IAJmphdmEudXRpbC5Db2xsZWN0aW9ucyRVbm1vZGlmaWFibGVMaXN0_A8lMbXsjhACAAFMAARsaXN0cQB-AAh4cgAsamF2YS51dGlsLkNvbGxlY3Rpb25zJFVubW9kaWZpYWJsZUNvbGxlY3Rpb24ZQgCAy173HgIAAUwAAWN0ABZMamF2YS91dGlsL0NvbGxlY3Rpb247eHBzcgATamF2YS51dGlsLkFycmF5TGlzdHiB0h2Zx2GdAwABSQAEc2l6ZXhwAAAAAHcEAAAAAHhxAH4AY3h0ACtDYXVnaHQgZXhjZXB0aW9uIHdoaWxlIGhhbmRsaW5nIGEgVm1Xb3JrSm9idXEAfgAOAAAAD3NxAH4AEAAAAGhxAH4AM3EAfgA0cQB-ADVzcQB-ABAAAA2VcQB-ACJxAH4AI3EAfgA1c3EAfgAQAAAAWXEAfgA4cQB-ADlxAH4AOnNxAH4AEAAAAQRxAH4APHEAfgA9cQB-AD5zcQB-ABAAAAAncQB-AEBxAH4AQXEAfgBCc3EAfgAQAAAAJnEAfgBEcQB-AEVxAH4ARnNxAH4AEAAAAFRxAH4ASHEAfgBFcQB-AElzcQB-ABAAAAAjcQB-AEhxAH4ARXEAfgBLc3EAfgAQAAAAJHEAfgBNcQB-AEFxAH4AQnNxAH4AEAAAAM5xAH4APHEAfgA9cQB-AEJzcQB-ABAAAAH_cQB-AFBxAH4AUXEAfgBGc3EAfgAQAAABCnEAfgBTcQB-AFRxAH4AQnNxAH4AEAAABHZxAH4AVnEAfgBXcQB-AFhzcQB-ABAAAAJpcQB-AFpxAH4AV3EAfgBCc3EAfgAQAAAC6XEAfgBccQB-AF1xAH4AQnEAfgBheHNyABFqYXZhLmxhbmcuSW50ZWdlchLioKT3gYc4AgABSQAFdmFsdWV4cgAQamF2YS5sYW5nLk51bWJlcoaslR0LlOCLAgAAeHAAABCzc3EAfgBiAAAAAHcEAAAAAHg
2016-10-01 03:55:33.955 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (logid: 3bd85172) (job: 13/job: 15) Publish async job-15 complete on message bus
2016-10-01 03:55:33.955 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (logid: 3bd85172) (job: 13/job: 15) Wake up jobs related to job-15
2016-10-01 03:55:33.955 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (logid: 3bd85172) (job: 13/job: 15) Update db status for job-15
2016-10-01 03:55:33.956 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (logid: 3bd85172) (job: 13/job: 15) Wake up jobs joined with job-15 and disjoin all subjobs created from job- 15
2016-10-01 03:55:33.961 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (logid: 3bd85172) (job: 13/job: 15) Done executing com.cloud.vm.VmWorkStart for job-15
2016-10-01 03:55:33.963 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (logid: 3bd85172) (job: 13/job: 15) Remove job-15 from job monitoring

Code shows it's rethrowing exception (VmWorkJobHandlerProxy.java:104):

                // legacy CloudStack code relies on checked exception for error handling
                // we need to re-throw the real exception here
                if (cause != null && cause instanceof Exception) {
                    s_logger.info("Rethrow exception " + cause);
                    throw new CloudException("Caught exception while handling a VmWorkJob", cause);
                }

Probably this is the real problem:

Caused by: java.lang.ClassCastException: net.sf.cglib.proxy.Enhancer$EnhancerFactoryData cannot be cast to net.sf.cglib.proxy.Factory
        at org.apache.cloudstack.framework.async.AsyncCallbackDispatcher.getTarget(AsyncCallbackDispatcher.java:79)
        at org.apache.cloudstack.storage.volume.VolumeServiceImpl.createBaseImageAsync(VolumeServiceImpl.java:160)
        at org.apache.cloudstack.storage.volume.VolumeServiceImpl.createVolumeFromTemplateAsync(VolumeServiceImpl.java:756)
        at org.apache.cloudstack.engine.orchestration.VolumeOrchestrator.recreateVolume(VolumeOrchestrator.java:1187)
        at org.apache.cloudstack.engine.orchestration.VolumeOrchestrator.prepare(VolumeOrchestrator.java:1028)
        at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1936)
        at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1769)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:88)

Method createBaseImageAsync hints on the systemvm template. Although the cpvm does work fine. It also shows the failed SSVM als being 'stopped':

screen shot 2016-10-01 at 12 08 30

When I click 'start" all is well:
screen shot 2016-10-01 at 12 09 51

After this, the templates get ready and the build continue just fine.

Full logs here:

management.log.txt
agent-kvm2.log.txt
agent-kvm1.log.txt

@remibergsma remibergsma changed the title Initial start of SSVM sometimes fails resulting in hanging builds Initial start of SSVM sometimes fails resulting in failed builds Oct 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant