NSX VXLAN Enable Agent fails on ESXi hosts with error “Cannot complete the operation.” (SKKB1004)

It's only fair to share...Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Digg thisPin on PinterestShare on TumblrPrint this pageEmail this to someone

In this post we will take a look at problem you might experience when using VMware NSX/VXLAN network virtualization with VMware ESXi hosts.

The following environment has been used in all of the examples below:

  • Active Directory Domain called vmware.com
  • Windows Server 2012 R2 (with FQDN lan1dc1.vmware.com)
    • Domain Controller
    • DNS Server.
    • PowerShell v4 Host.
    • .NET 4 Framework installed (4.0.30319)
  • Windows Server 2008 R2 SP1 (with FQDN lan1dm1.vmware.com)
    • Domain Member
    • PowerShell v3 Host.
    • .NET 4 Framework installed (4.0.30319)
  • VMware vCenter 5.5.0 Update 2
  • VMware ESXi 5.5.0 Update 2 hosts
  • VMmware NSX Manager 6.1.0

 

Problem

When you reboot your ESXi hosts you might see the follosing error being logged in vCenter:

Enable Agent: Cannot Complete the operation. See the event log for details

If you check the /var/log/netcpa.log on the host you can see similar being logged:

~ # cat /var/log/netcpa.log
2014-09-30T05:38:28.312Z [FF9E07F0 info ‘Default’] Log path: /var/log/vmware/
Starting rmqClient_NETEventLoop Using Unused Arg (nil)
2014-09-30T05:38:28.312Z [FF9E07F0 info ‘Default’] host-id is host-31
Registration Init for App Type VNET App Id host-31 Client Type 1
Connect Returned  -1
2014-09-30T05:38:28.312Z [FF9E07F0 info ‘Default’] CtrlConnPool initialized!
2014-09-30T05:38:28.312Z [FFB47B70 info ‘ThreadPool’] Thread enlisted
2014-09-30T05:38:28.312Z [FF9E07F0 info ‘Handle checker’] Setting system limit of 64
2014-09-30T05:38:28.312Z [FF9E07F0 info ‘Handle checker’] Set system limit to 64
2014-09-30T05:38:28.312Z [FF9E07F0 info ‘Memory checker’] Check resources every 60 secs, soft limit 30720, hard limit 51200.
2014-09-30T05:38:28.312Z [FF9E07F0 info ‘Default’] Vxlan: vmklink up
2014-09-30T05:38:28.312Z [FF9E07F0 info ‘Default’] connected to unix socket llcolor-server
2014-09-30T05:38:28.312Z [FF9E07F0 info ‘Default’] Vxlan: llcolord link up
2014-09-30T05:38:28.313Z [FF9E07F0 error ‘Default’] VxlanTester not initialized.
2014-09-30T05:38:28.313Z [FF9E07F0 error ‘Default’] VdrbTester not initialized.
2014-09-30T05:38:28.313Z [FFB88B70 info ‘ThreadPool’] Thread enlisted
2014-09-30T05:38:28.313Z [FFBC9B70 info ‘ThreadPool’] Thread enlisted
2014-09-30T05:38:28.313Z [FFC0AB70 info ‘ThreadPool’] Thread enlisted
2014-09-30T05:38:28.313Z [FFC4BB70 info ‘ThreadPool’] Thread enlisted
2014-09-30T05:38:28.313Z [FFC8CB70 info ‘ThreadPool’] Thread enlisted
2014-09-30T05:38:28.313Z [FFCCDB70 info ‘ThreadPool’] Thread enlisted
2014-09-30T05:38:28.314Z [FFD0EB70 info ‘ThreadPool’] Thread enlisted
Registration Init for App Type VNET App Id host-31 Client Type 1
2014-09-30T05:38:33.314Z [FFD4FB70 info ‘ThreadPool’] Thread enlisted
2014-09-30T05:38:33.315Z [FFD90B70 info ‘ThreadPool’] Thread enlisted
Connect Returned  -1

Shortly after the host has powered on you should see in the NSX Manager that the host has unresolved issues:

SSH to the host and you will see not VDR Instance information

 

Cause

This might happen if during reboot of the host, the host doesn’t have connection to the NSX Manager appliance. Once the NSX Manager appliance is again reachable you can either reboot the host or use the solution below.

 

Solution

After you click Resolve you shold see installation in progress.

SSH to the host and run /etc/init.d/netcpad restart to restart the netcpad deamon.

In the /var/log/netcpa.log you will see similar being logged showing VDR and VXLAN information and configuration:

~ # cat /var/log/netcpa.log
2014-09-30T06:07:52.566Z [FF9BA7F0 error ‘Default’] VxlanTester not initialized.
2014-09-30T06:07:52.566Z [FFB41B70 info ‘Default’] Vxlan: sharding elem ID is created for 5000
2014-09-30T06:07:52.566Z [FFB41B70 info ‘Default’] Vxlan: sharding is not ready for ID 5000
2014-09-30T06:07:52.566Z [FFB41B70 info ‘Default’] Vxlan: sharding elem ID is created for 5002
2014-09-30T06:07:52.566Z [FFB41B70 info ‘Default’] Vxlan: sharding is not ready for ID 5002
2014-09-30T06:07:52.566Z [FFB41B70 info ‘Default’] Vxlan: sharding elem ID is created for 5001
2014-09-30T06:07:52.566Z [FFB41B70 info ‘Default’] Vxlan: sharding is not ready for ID 5001
2014-09-30T06:07:52.566Z [FFB41B70 info ‘Default’] Vxlan: sharding elem ID is created for 5003
2014-09-30T06:07:52.566Z [FFB41B70 info ‘Default’] Vxlan: sharding is not ready for ID 5003
2014-09-30T06:07:52.566Z [FF9BA7F0 error ‘Default’] VdrbTester not initialized.
2014-09-30T06:07:52.567Z [FFD08B70 info ‘Default’] App CORE : 0 register connection to new controller 192.168.1.201 port 1234
Received Message on Socket fdmax = 5
Received Data for Client 0x1f01b248 length 66 from Client fd 5
Client 5 Received Len 66
2014-09-30T06:07:53.558Z [FFA1DB70 info ‘Default’] Got Message in Function Callback Type 2
2014-09-30T06:07:53.558Z [FFA1DB70 info ‘Default’] Got Message in App Id host-31 Type VNET
2014-09-30T06:07:53.558Z [FFA1DB70 info ‘Default’] Received Registration ACK, can Start Sending Messages
Sent Data to Client 5
2014-09-30T06:07:53.739Z [FFD49B70 info ‘Default’] Core: Hello sent: 192.168.1.201:0
2014-09-30T06:07:53.739Z [FFD49B70 info ‘Default’] Vxlan: received freqCtrlPeriod 1000 freqCtrlQuery 100 freqCtrlUpdate 20
2014-09-30T06:07:53.739Z [FFD49B70 info ‘Default’] Vxlan: received bteAgeingTime 300
2014-09-30T06:07:53.739Z [FFD49B70 info ‘Default’] Vxlan: received arpAgeingTime 300
2014-09-30T06:07:53.739Z [FFD49B70 info ‘Default’] Core: Max Pkt Len of peer 192.168.1.201: 4096
2014-09-30T06:07:53.739Z [FFD49B70 info ‘Default’] Core: KeepAlive Interval of peer 192.168.1.201: 10
2014-09-30T06:07:53.739Z [FFD49B70 info ‘Default’] Core: Msg Frequency of peer 192.168.1.201: 200
2014-09-30T06:07:53.739Z [FFD49B70 info ‘Default’] Core: ShardingSlice length of peer 192.168.1.201: 4194304
2014-09-30T06:07:53.739Z [FFD49B70 error ‘Default’] Core: unknown data type for hello message: 192.168.1.201: 6
2014-09-30T06:07:53.739Z [FFD49B70 info ‘Default’] sharding slice length is set to 4194304
2014-09-30T06:07:53.739Z [FFD49B70 info ‘Default’] Core: Controller is ready: 192.168.1.201:0
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Core: Sharding Segment Update message: server 192.168.1.201 startSliceId 0 numSlices 1024
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: ID 5000’s controller is changed from NULL to 192.168.1.201
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: Change sid 729860360 to controller 192.168.1.201:0
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] App VXLAN : 0 register connection to existing controller to 192.168.1.201 port 1234
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: core app ready on 192.168.1.201:0
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: send VNI Membership Update(Join) to the controller: VNI 5000 controller 192.168.1.201
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: ID 5001’s controller is changed from NULL to 192.168.1.201
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: Change sid 3384296121 to controller 192.168.1.201:0
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: send VNI Membership Update(Join) to the controller: VNI 5001 controller 192.168.1.201
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: ID 5002’s controller is changed from NULL to 192.168.1.201
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: Change sid 1743764586 to controller 192.168.1.201:0
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: send VNI Membership Update(Join) to the controller: VNI 5002 controller 192.168.1.201
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: ID 5003’s controller is changed from NULL to 192.168.1.201
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: Change sid 103233051 to controller 192.168.1.201:0
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vxlan: send VNI Membership Update(Join) to the controller: VNI 5003 controller 192.168.1.201
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vdrb: ID 1460487507’s controller is changed from NULL to 192.168.1.201
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vdrb: Change sid 2496014691 to controller 192.168.1.201:0
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] App VDRB : 0 register connection to existing controller to 192.168.1.201 port 1234
2014-09-30T06:07:53.779Z [FF9BA7F0 info ‘Default’] Vdrb: core app ready on 192.168.1.201:0
2014-09-30T06:07:53.781Z [FFD49B70 info ‘Default’] Vxlan: receive VNI Membership Update(Join) from the controller: VNI 5000 controller 192.168.1.201 len 23
2014-09-30T06:07:53.781Z [FFD49B70 info ‘Default’] Vxlan: set VNI 5000 (mcast proxy: Enabled, arp proxy: Enabled)
2014-09-30T06:07:53.782Z [FFD08B70 info ‘Default’] Vxlan: receive VNI Membership Update(Join) from the controller: VNI 5001 controller 192.168.1.201 len 23
2014-09-30T06:07:53.782Z [FFD08B70 info ‘Default’] Vxlan: set VNI 5001 (mcast proxy: Enabled, arp proxy: Enabled)
2014-09-30T06:07:53.785Z [FFD08B70 info ‘Default’] Vxlan: receive VNI Membership Update(Join) from the controller: VNI 5003 controller 192.168.1.201 len 23
2014-09-30T06:07:53.785Z [FFD08B70 info ‘Default’] Vxlan: set VNI 5003 (mcast proxy: Enabled, arp proxy: Enabled)
2014-09-30T06:07:53.785Z [FFD8AB70 info ‘ThreadPool’] Thread enlisted
2014-09-30T06:07:53.786Z [FFD08B70 info ‘Default’] Vxlan: receive VNI Membership Update(Join) from the controller: VNI 5002 controller 192.168.1.201 len 23
2014-09-30T06:07:53.786Z [FFD08B70 info ‘Default’] Vxlan: set VNI 5002 (mcast proxy: Enabled, arp proxy: Enabled)
Looking up Client 5
Send returned 330 Err No 0, While Sending Len 330
Error 0 While Sending VSM Payload for Session Id 5
Looking up Client 5
Send returned 243 Err No 0, While Sending Len 243
Error 0 While Sending VSM Payload for Session Id 5
Looking up Client 5
Send returned 253 Err No 0, While Sending Len 253
Error 0 While Sending VSM Payload for Session Id 5
Looking up Client 5
Send returned 255 Err No 0, While Sending Len 255
Error 0 While Sending VSM Payload for Session Id 5
Received Message on Socket fdmax = 5
Received Data for Client 0x1f01b248 length 2000 from Client fd 5
Client 5 Received Len 2000
Received Message on Socket fdmax = 5
Received Data for Client 0x1f01b248 length 2000 from Client fd 5
Client 5 Received Len 2000
2014-09-30T06:07:57.590Z [FFA1DB70 info ‘Default’] Got Message in Function Callback Type 3
2014-09-30T06:07:57.590Z [FFA1DB70 info ‘Default’] Got Message in App Id host-31 Type VNET
Sent Data to Client 5
Received Message on Socket fdmax = 5
Received Data for Client 0x1f01b248 length 666 from Client fd 5
Client 5 Received Len 666
2014-09-30T06:07:57.590Z [FFA1DB70 info ‘Default’] Got Message in Function Callback Type 3
2014-09-30T06:07:57.590Z [FFA1DB70 info ‘Default’] Got Message in App Id host-31 Type VNET
2014-09-30T06:07:57.590Z [FFA5EB70 info ‘Default’] Received host credential message …
Sent Data to Client 5
Looking up Client 5
2014-09-30T06:07:57.591Z [FFA5EB70 info ‘Default’] No change of key and certificate
2014-09-30T06:07:57.591Z [FFA5EB70 info ‘Default’] Received controller info message numConn 1 …
Send returned 469 Err No 0, While Sending Len 469
Error 0 While Sending VSM Payload for Session Id 5
Looking up Client 5
2014-09-30T06:07:57.591Z [FFA5EB70 info ‘Default’] Updated conn 192.168.1.201:1234::13:6E:36:8C:7E:4D:83:68:51:4F:D4:0A:54:DE:6C:99:6B:EC:1A:4D:ssl enabled(updated by VSM)
Send returned 468 Err No 0, While Sending Len 468
Error 0 While Sending VSM Payload for Session Id 5
2014-09-30T06:07:57.592Z [FFA5EB70 info ‘Default’] Core: Receive controller list(#num 1) from ConfigManager
Received Message on Socket fdmax = 5
Received Data for Client 0x1f01b248 length 1078 from Client fd 5
Client 5 Received Len 1078
2014-09-30T06:07:57.614Z [FFA1DB70 info ‘Default’] Got Message in Function Callback Type 3
2014-09-30T06:07:57.614Z [FFA1DB70 info ‘Default’] Got Message in App Id host-31 Type VNET
Sent Data to Client 5
2014-09-30T06:07:57.614Z [FFA5EB70 info ‘Default’] Received vdr dvs info message, number of entries 1 …
2014-09-30T06:07:57.614Z [FFA5EB70 info ‘Default’] Number of dvs info entries 1, operation set_vxlan
Looking up Client 5
2014-09-30T06:07:57.615Z [FFA5EB70 info ‘Default’] Updated vdr dvs entry b3 d6 19 50 c1 4f 86 0c-56 c5 75 74 82 ce 0a f4:(updated by VSM):vxlanOnly:teamingPolicy = FAILOVER_ORDER:numUplink = 4:numActiveUplink = 1:uplinkPortNames = dvUplink1,dvUplink2,dvUplink3,dvUplink4
Send returned 494 Err No 0, While Sending Len 494
Error 0 While Sending VSM Payload for Session Id 5
Received Message on Socket fdmax = 5
Received Data for Client 0x1f01b248 length 966 from Client fd 5
Client 5 Received Len 966
2014-09-30T06:07:57.638Z [FFA1DB70 info ‘Default’] Got Message in Function Callback Type 3
2014-09-30T06:07:57.638Z [FFA1DB70 info ‘Default’] Got Message in App Id host-31 Type VNET
Sent Data to Client 5
2014-09-30T06:07:57.638Z [FFA5EB70 info ‘Default’] Received vdr instance message numVdrId 1 …
Looking up Client 5
Send returned 495 Err No 0, While Sending Len 495
Error 0 While Sending VSM Payload for Session Id 5
2014-09-30T06:07:57.639Z [FFA5EB70 info ‘Default’] Updated vdr instance vdr name = default+edge-3, vdr id = 1460487507, auth token = ec257a4f-2413-4dd9-998d-56e839b8de38

Checking the VDR instance info on the host again will show similar output showing the VDR configuration:

Special thanks to my college Timo Sugliani!

It's only fair to share...Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Digg thisPin on PinterestShare on TumblrPrint this pageEmail this to someone

One thought on “NSX VXLAN Enable Agent fails on ESXi hosts with error “Cannot complete the operation.” (SKKB1004)

Leave a Reply

Your email address will not be published. Required fields are marked *