Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"socket path does not exist or cannot be found" only when running cisco.ios.ios_bgp_address_family #682

Closed
netexgb opened this issue Oct 31, 2022 · 14 comments · Fixed by #687
Assignees
Labels
bgp_address_family bgp_address_family resource module

Comments

@netexgb
Copy link

netexgb commented Oct 31, 2022

SUMMARY

Only when I run a "cisco.ios.ios_bgp_address_family" task I get the error "socket path does not exist or cannot be found", working with other cisco.ios modules is fine.

ISSUE TYPE
  • Bug Report
COMPONENT NAME

cisco.ios.ios_bgp_address_family

ANSIBLE VERSION
xxxx@xxxx:~$ ansible --version
  3202 1667244033.36045: starting run
ansible [core 2.13.5]
  config file = /home/xxxx/ansible.cfg
  configured module search path = ['/home/xxxx/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/xxxx/.local/lib/python3.8/site-packages/ansible
  ansible collection location = /home/xxxx/.ansible/collections:/usr/share/ansible/collections
  executable location = /home/xxxx/.local/bin/ansible
  python version = 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0]
  jinja version = 3.1.2
  libyaml = True
COLLECTION VERSION
ansible-galaxy collection list cisco.ios
  3244 1667244282.90811: starting run
  3244 1667244282.91758: Validate TLS certificates for https://galaxy.ansible.com: True

# /home/xxxx/.local/lib/python3.8/site-packages/ansible_collections
Collection Version
---------- -------
cisco.ios  4.0.0

# /usr/local/lib/python3.8/dist-packages/ansible_collections
Collection Version
---------- -------
cisco.ios  4.0.0

# /home/xxxx/.ansible/collections/ansible_collections
Collection Version
---------- -------
cisco.ios  4.0.0
CONFIGURATION
xxxx@xxxx:~$ ansible-config dump --only-changed
  3247 1667244312.84236: starting run
CALLBACKS_ENABLED(/home/xxxx/ansible.cfg) = ['timer', 'profile_tasks', 'profile_roles']
DEFAULT_DEBUG(env: ANSIBLE_DEBUG) = True
DEFAULT_LOG_PATH(env: ANSIBLE_LOG_PATH) = /home/xxxx/ansible.log
ENABLE_TASK_DEBUGGER(/home/xxxx/ansible.cfg) = True
TASK_DEBUGGER_IGNORE_ERRORS(/home/xxxx/ansible.cfg) = False
STEPS TO REPRODUCE

I just configured the YAML file as seen below and ran it against the Cisco Router and got the problem as shown below. The strange thing is, I also ran tasks using other cisco.ios modules, i.e. cisco.ios.ios_config and cisco.ios.ios_route_maps and they all work fine. The YAML configuration below is pretty simple, just wanting to add an aggregate-address command to no avail.

---
- name: Configure BGP address-family cloud_transit
  hosts:
    - xxxx
  gather_facts: False

  tasks:
    - name: BGP Address Family cloud_transit
      cisco.ios.ios_bgp_address_family:
        config:
          as_number: 1
          address_family:
            - afi: ipv4
              safi: unicast
              vrf: cloud_transit
              aggregate_address:
                - address: 10.192.0.0
                  netmask: 255.255.0.0
                  attribute_map: test
        state: merged
EXPECTED RESULTS

I would have expected the following configuration in the router:

router bgp 1
 address-family ipv4 vrf cloud_transit
  aggregate-address 10.192.0.0 255.255.0.0 attribute-map test
DEBUGGING
2022-10-31 14:50:30,897 p=3329 u=xxxx n=ansible | ansible-playbook [core 2.13.5]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/home/xxxx/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/xxxx/.local/lib/python3.8/site-packages/ansible
  ansible collection location = /home/xxxx/.ansible/collections
  executable location = /home/xxxx/.local/bin/ansible-playbook
  python version = 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0]
  jinja version = 3.1.2
  libyaml = True
2022-10-31 14:50:30,897 p=3329 u=xxxx n=ansible | Using /etc/ansible/ansible.cfg as config file
2022-10-31 14:50:32,731 p=3329 u=xxxx n=ansible | setting up inventory plugins
2022-10-31 14:50:32,748 p=3329 u=xxxx n=ansible | host_list declined parsing /etc/ansible/hosts as it did not pass its verify_file() method
2022-10-31 14:50:32,748 p=3329 u=xxxx n=ansible | script declined parsing /etc/ansible/hosts as it did not pass its verify_file() method
2022-10-31 14:50:32,748 p=3329 u=xxxx n=ansible | auto declined parsing /etc/ansible/hosts as it did not pass its verify_file() method
2022-10-31 14:50:32,767 p=3329 u=xxxx n=ansible | Parsed /etc/ansible/hosts inventory source with yaml plugin
2022-10-31 14:50:32,852 p=3329 u=xxxx n=ansible | Loading collection cisco.ios from /home/xxxx/.ansible/collections/ansible_collections/cisco/ios
2022-10-31 14:50:32,857 p=3329 u=xxxx n=ansible | Loading callback plugin default of type stdout, v2.0 from /home/xxxx/.local/lib/python3.8/site-packages/ansible/plugins/callback/default.py
2022-10-31 14:50:32,874 p=3329 u=xxxx n=ansible | Skipping callback 'default', as we already have a stdout callback.
2022-10-31 14:50:32,874 p=3329 u=xxxx n=ansible | PLAYBOOK: test_cloud_bgp_proctor.yml ********************************************************************************************************************************************************************************************************
2022-10-31 14:50:32,875 p=3329 u=xxxx n=ansible | 1 plays in test_cloud_bgp_proctor.yml
2022-10-31 14:50:32,878 p=3329 u=xxxx n=ansible | PLAY [Configure BGP address-family cloud_transit] *******************************************************************************************************************************************************************************************
2022-10-31 14:50:32,927 p=3329 u=xxxx n=ansible | META: ran handlers
2022-10-31 14:50:32,945 p=3329 u=xxxx n=ansible | TASK [BGP Address Family cloud_transit] *****************************************************************************************************************************************************************************************************
2022-10-31 14:50:32,961 p=3334 u=xxxx n=ansible | Loading collection ansible.netcommon from /home/xxxx/.ansible/collections/ansible_collections/ansible/netcommon
2022-10-31 14:50:32,984 p=3334 u=xxxx n=ansible | redirecting (type: terminal) ansible.builtin.ios to cisco.ios.ios
2022-10-31 14:50:32,987 p=3334 u=xxxx n=ansible | redirecting (type: cliconf) ansible.builtin.ios to cisco.ios.ios
2022-10-31 14:50:33,000 p=3334 u=xxxx n=ansible | <router> attempting to start connection
2022-10-31 14:50:33,000 p=3334 u=xxxx n=ansible | <router> using connection plugin ansible.netcommon.network_cli
2022-10-31 14:50:33,000 p=3334 u=xxxx n=ansible | Found ansible-connection at path /home/xxxx/.local/bin/ansible-connection
2022-10-31 14:50:33,343 p=3340 u=xxxx n=ansible | Loading collection ansible.netcommon from /home/xxxx/.ansible/collections/ansible_collections/ansible/netcommon
2022-10-31 14:50:33,361 p=3340 u=xxxx n=ansible | redirecting (type: terminal) ansible.builtin.ios to cisco.ios.ios
2022-10-31 14:50:33,363 p=3340 u=xxxx n=ansible | Loading collection cisco.ios from /home/xxxx/.ansible/collections/ansible_collections/cisco/ios
2022-10-31 14:50:33,366 p=3340 u=xxxx n=ansible | redirecting (type: cliconf) ansible.builtin.ios to cisco.ios.ios
2022-10-31 14:50:33,417 p=3334 u=xxxx n=ansible | <router> local domain socket does not exist, starting it
2022-10-31 14:50:33,417 p=3334 u=xxxx n=ansible | <router> control socket path is /home/xxxx/.ansible/pc/cc85743af8
2022-10-31 14:50:33,421 p=3334 u=xxxx n=ansible | network_os is set to ios
2022-10-31 14:50:33,421 p=3334 u=xxxx n=ansible | <router> ssh type is set to auto
2022-10-31 14:50:33,422 p=3334 u=xxxx n=ansible | <router> autodetecting ssh_type
2022-10-31 14:50:33,422 p=3334 u=xxxx n=ansible | [WARNING]: ansible-pylibssh not installed, falling back to paramiko

2022-10-31 14:50:33,422 p=3334 u=xxxx n=ansible | <router> ssh type is now set to paramiko
2022-10-31 14:50:33,422 p=3334 u=xxxx n=ansible | <router>   3335 1667245833.32981: trying /home/xxxx/.local/lib/python3.8/site-packages/ansible/plugins/connection/__pycache__
  3335 1667245833.32996: trying /home/xxxx/.local/lib/python3.8/site-packages/ansible/plugins/connection
  3335 1667245833.33682: trying /home/xxxx/.local/lib/python3.8/site-packages/ansible/plugins/doc_fragments
  3335 1667245833.33716: Loading ModuleDocFragment 'connection_pipelining' from /home/xxxx/.local/lib/python3.8/site-packages/ansible/plugins/doc_fragments/connection_pipelining.py
  3335 1667245833.33744: Loaded config def from plugin (connection/ssh)
  3335 1667245833.33750: Loading Connection 'ssh' from /home/xxxx/.local/lib/python3.8/site-packages/ansible/plugins/connection/ssh.py (searched paths: /home/xxxx/.local/lib/python3.8/site-packages/ansible/plugins/connection/__pycache__:/home/xxxx/.local/lib/python3.8/site-packages/ansible/plugins/connection) (found_in_cache=False, class_only=True)

2022-10-31 14:50:33,422 p=3334 u=xxxx n=ansible | <router> local domain socket path is /home/xxxx/.ansible/pc/cc85743af8
2022-10-31 14:50:33,423 p=3334 u=xxxx n=ansible | <router> Using network group action cisco.ios.ios for cisco.ios.ios_bgp_address_family
2022-10-31 14:50:33,428 p=3334 u=xxxx n=ansible | <router> ANSIBLE_NETWORK_IMPORT_MODULES: enabled
2022-10-31 14:50:33,620 p=3334 u=xxxx n=ansible | <router> ANSIBLE_NETWORK_IMPORT_MODULES: found cisco.ios.ios_bgp_address_family  at /home/xxxx/.ansible/collections/ansible_collections/cisco/ios/plugins/modules/ios_bgp_address_family.py
2022-10-31 14:50:33,620 p=3334 u=xxxx n=ansible | <router> ANSIBLE_NETWORK_IMPORT_MODULES: running cisco.ios.ios_bgp_address_family
2022-10-31 14:50:33,622 p=3340 u=xxxx n=ansible | <router> ESTABLISH PARAMIKO SSH CONNECTION FOR USER: xxxx on PORT 22 TO router
2022-10-31 14:50:34,040 p=3340 u=xxxx n=p=3340 u=xxxx | paramiko [router] | Connected (version 2.0, client Cisco-1.25)
2022-10-31 14:50:34,883 p=3340 u=xxxx n=p=3340 u=xxxx | paramiko [router] | Auth banner: b'\r\n********************************************************\r\nUse is restricted to Schlumberger authorized users who\r\nmust comply with the Information Security User Standard.\r\nUsage is monitored; unauthorized use will be prosecuted.\r\n********************************************************\r\n'
2022-10-31 14:50:35,097 p=3340 u=xxxx n=p=3340 u=xxxx | paramiko [router] | Authentication (password) successful!
2022-10-31 14:51:08,935 p=3340 u=xxxx n=ansible | persistent connection idle timeout triggered, timeout value is 30 secs.
See the timeout setting options in the Network Debug and Troubleshooting Guide.
2022-10-31 14:51:09,037 p=3340 u=xxxx n=ansible | shutdown complete
2022-10-31 14:59:03,245 p=3334 u=xxxx n=ansible | <router> ANSIBLE_NETWORK_IMPORT_MODULES: complete
2022-10-31 14:59:03,249 p=3329 u=xxxx n=ansible | fatal: [router]: FAILED! => {
    "changed": false,
    "module_stderr": "socket path /home/xxxx/.ansible/pc/cc85743af8 does not exist or cannot be found. See Troubleshooting socket path issues in the Network Debug and Troubleshooting Guide",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"
}
2022-10-31 14:59:03,253 p=3329 u=xxxx n=ansible | PLAY RECAP **********************************************************************************************************************************************************************************************************************************
2022-10-31 14:59:03,253 p=3329 u=xxxx n=ansible | router : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
@bentole
Copy link
Contributor

bentole commented Nov 1, 2022

Hi Gian! Long time no see 😄

I could be off on this but could you try libssh instead of paramiko?

pip install ansible-pylibssh

Read this for more info: New LibSSH Connection Plugin for Ansible Network Replaces Paramiko

Hope it works!

@netexgb
Copy link
Author

netexgb commented Nov 1, 2022

Hi @bentole, long time indeed :) I would love to, however I filed another bug #642 not so long ago about libSSH :)

In any case, this problem is only present with cisco.ios.ios_bgp_address_family. Running scripts with cisco.ios.ios_config and cisco.ios.ios_prefix_lists using Paramiko work fine.

@bentole
Copy link
Contributor

bentole commented Nov 1, 2022

Sorry about that. Bad research from my side. Figured you had a reason for using paramiko :)

I noticed the trace says persistent connection idle timeout triggered, timeout value is 30 secs

Have you tried increasing the timeout by adding the following to ansible.cfg:

[persistent_connection]
connect_timeout = 60

It appears to me that the ios.bgp_address_family is a bit slower to start executing than other modules. Would you agree? But I must admit I'm not being very scientfic on this :) I'll look more into it.

@bentole
Copy link
Contributor

bentole commented Nov 1, 2022

I did some research. The module is doing some heavy duty parsing on the neighbor statements which makes it appear slower than other modules and in some cases triggers the persistent connection idle timeout mentioned in the previous post. Increasing the timeout value should work, though. Could you verify @netexgb ?

@KB-perByte
Copy link
Collaborator

Hey @netexgb as @bentole suggested increasing the persistent connection timeout is definitely a good option, but the parsers for bgp_address_family should not be a core reason for the module to lose connection or the error you got.
Considering this a possibility could you please give an estimated configuration line for show running-config | section ^router bgp I can test on basis of that.
Regards

@netexgb
Copy link
Author

netexgb commented Nov 2, 2022

Hi @bentole I changed the command_timeout to 90 under [persistent_connection} to no avail. When I ran the playbook, I still got the error:

TASK [BGP Address Family cloud_transit] ********************************************************************************
[WARNING]: ansible-pylibssh not installed, falling back to paramiko
fatal: [router]: FAILED! => {"changed": false, "module_stderr": "socket path /home/gbloise/.ansible/pc/7a1bec8d96 does not exist or cannot be found. See Troubleshooting socket path issues in the Network Debug and Troubleshooting Guide", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error"}

@bentole
Copy link
Contributor

bentole commented Nov 2, 2022

Thanks Gian! That is some major league BGP configuration :)

@KB-perByte I havent spent much time looking into this, but I noticed by removing objs = bgp_af_parser.parse() from the facts module speed things up considerably. I tried @netexgb's configuration on my CSR1k and I didnt even bother to wait for the parser to finish. Took forever :)

@netexgb I notice you're running a recent IOS-XE version. I feel like I am cheating on the IOS module by saying this, but have you considered ansible.netcommon.netconf_config? Sounds like a better fit for your configuration volume. Let me know if you need a sample.

@netexgb
Copy link
Author

netexgb commented Nov 2, 2022

Hi @bentole I did consider using NETCONF however (a) not all our devices are on IOS-XE, and (b) that would require upgrading and changing our whole landscape to enable it :)

@KB-perByte
Copy link
Collaborator

@bentole @netexgb I am working on optimizing the code a bit for the issue reported and will be sharing the update asap.

@KB-perByte KB-perByte added the bgp_address_family bgp_address_family resource module label Nov 10, 2022
@KB-perByte KB-perByte self-assigned this Nov 10, 2022
@bentole
Copy link
Contributor

bentole commented Nov 13, 2022

Hi @KB-perByte ! I noticed you have a PR up that address this issue. I did some research and one of my customers are experencing same issues as @netexgb so I spent some time the other day digging into this.

So what i could find so far is that there are two things that caught my attention. There's "catastrophic backtracking" issues with regular expressions when peer-group names are too long. Another issue is with the parser funcs inside the very core of ansible.netcommon. They have a potential for optimization. Too much to explain here so I have created a project to demonstrate this in a pure python environment here using @netexgb BGP configuration. As you (hopefully) can see running the optimized parser takes seconds while the original parser takes minutes.

I also have a PR ready for this, but I'm wondering, are we even "allowed" to override ansible.netcommon funcs for custom optimization? the funcs in question are parse and _deepformat in NetworkTemplate.

@KB-perByte
Copy link
Collaborator

@bentole I appreciate you taking the time to investigate it, yes the parsers for bgp_address_family needed some defragmentation and the facts processing needed improvement. With the PR that I have added, the running time has come down significantly as a lot of config parsing heavy lifting is done via regex, not jinja templates which was the case previously.
That should ideally solve the issue we are facing here.
In my environment, the previous parsers took ~11 min to process which was sad but with the same config lines with new parses takes 20secs max to generate the whole facts.

And, I am unsure about the part where you are asking about overriding netcommon functions for a particular use case?

Taking about parse specifically we are looking at ways to optimize it for the faster crunching of config lines ( _any ideas or PRs are appreciated ) but overriding a function might work just fine for a specific collection you are looking at but may have consequences in performance of other collections relying on the same.
We can discuss it further.
Regards

@bentole
Copy link
Contributor

bentole commented Nov 14, 2022

Hi again @KB-perByte! I pulled your PR and ran it in my test case, but couldnt see any effects yet. I notice it's a WIP so I should probably not whine about it 😄

Anyhow, I put my PR #688 up. It differs from your approach by optimizing the parsing of the existing jinja2 templates and fixing the broken regex. It's tested against @netexgb configuration and playbook and it reduces the time by about 40 times the original.

But I do agree, replacing jinja2 with regex is a better approach altogether so keep up the good work 😃

Oh and btw, @netexgb. To make the playbook above idempotent you should leave out unicast from the address-family when using VRFs. IOS will accept the command address-family ipv4 unicast vrf x but it will, as we all know, leave it as address-family ipv4 vrf x in the configuration which breaks idempotency

@netexgb
Copy link
Author

netexgb commented Nov 14, 2022

Hi @bentole Thank you for the suggestion. Once the programming has been optimized, I will give it a go without using safi: unicast in the playbook 👍

@KB-perByte KB-perByte linked a pull request Nov 16, 2022 that will close this issue
@bentole
Copy link
Contributor

bentole commented Nov 16, 2022

@KB-perByte I peeked at your PR and noticed you've done some serious crunching the last couple of days. Respekt! Since the WIP was removed I pulled and tested your changes using @netexgb's configuration and playbook and it brought down the deployment time on my system from about 8 minutes to mere seconds. So as far as I see it this issue is now fixed. I've closed PR #688. No need for a workaround by overriding superclasses when things are done properly 😄 Thanks for all the hard work! 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bgp_address_family bgp_address_family resource module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants