Python Networking Tools, Libraries, and Frameworks
For anyone who may just be starting in their network automation journey and is looking to simply get their head around things a bit, this page is a curation / summation / collection of notes regarding doing network DevOps work that in some form involves Python.
If it helps save at least one person time, whether in learning, figuring out what not to spend time on, etc., then it will have served its purpose. (It is also for myself as a refresher, as nothing helps one learn like having to write it down or teach it to someone else.)
So here is the skinny on all this:
- Ansible is a tool. It is now under Red Hat, who in turn is under IBM.
- Key point: TOOL.
Ansible happens to be written in Python, but Ansible is a tool in the end. You can simply install/use Ansible without knowing Python. (You do need to know YAML, though.) That said, there is a point where knowing Python and Jinja2 become important, as without it, how you escape and deference variables in templates/etc. becomes challenging.
- Originally written to do sysadmin work like managing Linux servers, it has expanded to handling network gear due to its unique position in the space. Unlike CFEngine, Chef, Puppet, and SaltStack, which are all agent-based (meaning you have to install an agent on each node you wish to manage), Ansible is agent-less. 1
- Ansible relies on SSH to do its work.
- Ansible does concurrent connections to devices, which explains its speed. Update 10 devices, or update 50. Can easily take same amount of time.
- Ansible uses YAML playbook files, which let you do a lot... IF the particular vendor you are trying to interact with is well supported. (e.g., Cisco is. Extreme, at least the Enterasys line... is not. YMMV.)
- Ansible is idempotent. Fancy word meaning "if you run the playbook once or you run it 50x, the end result is the same." That is, Ansible playbooks define some end result you want (e.g., "I want all Cisco IOS routers to have their DNS settings pointing to Cloudflare"). This is known as a declarative model. You specify WHAT you want, not HOW to get there. And when you execute the playbook, if it is successful, you know what the state of the system will be. If you run the same playbook again, and the system is already in the right state, then nothing changes. No harm done.
- Key point: TOOL.
- SaltStalk is a tool. It is now under VMware.
- Salt is an agent-based tool written in Python that uses 0MQ (ZeroMQ) by default to create a persistent bus between their agents (called minions) and the server (called master).
- Salt is unique in the agent-based space. Think of it as interrupt-driven vs. polling-driven. Where other solutions like Puppet and Chef have agents that "check in" every so often (e.g., Puppet's default was every 15 minutes for agents to check in and look at the manifest for their node), Salt's minions build and maintain a persistent path back to the master using a message bus topology. It is a "phone home to mommy" architecture. Minions can easily be behind firewalls. Once connection is established, this allows the master to "ride back through" that connection to the nodes. More importantly, the master can send a message on the bus that all the minions receive, allowing for a more event-driven approach. You do not have to update a node's manifest and wait for it to check in. You have the master in the moment signal the minions via the bus to do something. Similarly, minions can signal the master when an event occurs. Similar to a botnet that utilizes IRC, this is extremely powerful, as it gives you an architecture that lets you react more quickly and interact between master and minion.
- Salt uses a lot of salt-related metaphors, including
- grains (pieces of info),
- pillars (key-value stores of info, so several grains), and
- mines (area on the master where results from regularly executed commands on minions can be stored).
- There are also
- reactors (mechanism for triggering actions in response to generated events),
- runners (modules that execute on the master), and
- returners (specify alternative locations where results of an action run on a minion will be sent).
- Salt also has a decent security model. You configure minions to point at a master. But just knowing a master is there does not give you access. The master must acknowledge/accept the minion. Minions are identified by a key that they generate and send to the master. Once a master accepts a key, then and only then can the minion participate. This allows for quick/easy deployment while maintaining control over which nodes are managed/have access. Keys can be rejected (e.g., if a node is stolen/replaced). The security model is simple yet elegant.
- Even though Salt is an agent-based solution, they offer something called a minion-proxy. This is a minion which runs on a device such a a Linux node, Raspberry Pi, or other computer, which "proxies" connections to network devices which otherwise do not allow for agents to be installed (e.g., older Cisco IOS gear). So it is possible to drop a box behind a firewall running one minion-proxy per network device, thereby still giving you access to gear that otherwise might be unreachable.
- Paramiko is a Python library for doing SSH. Yep, that about covers it.
- Netmiko is a Python library that leverages Paramiko to specifically deal with network equipment (vs. managing Linux hosts, for example). This is where a ton of drivers exist for various network vendors. As of this writing, they include
- ['a10', 'accedian', 'adtran', 'alcatel', 'apresia', 'arista', 'aruba', 'base_connection', 'broadcom', 'calix', 'centec', 'checkpoint', 'ciena', 'cisco', 'cisco_base_connection', 'citrix', 'cloudgenix', 'coriant', 'dell', 'dlink', 'eltex', 'endace', 'enterasys', 'ericsson', 'extreme', 'f5', 'file_transfer', 'flexvnf', 'fortinet', 'hp', 'huawei', 'ipinfusion', 'juniper', 'keymile', 'linux', 'log', 'logging', 'mellanox', 'mikrotik', 'mrv', 'netapp', 'netgear', 'netmiko_globals', 'nokia', 'oneaccess', 'ovs', 'paloalto', 'platforms', 'pluribus', 'progress_bar', 'quanta', 'rad', 'raisecom', 'redispatch', 'ruckus', 'ruijie', 'scp_functions', 'scp_handler', 'sixwind', 'sophos', 'ssh_autodetect', 'ssh_dispatcher', 'ssh_exception', 'terminal_server', 'tplink', 'ubiquiti', 'utilities', 'vyos', 'watchguard', 'yamaha', 'zte']
- Netmiko does NOT offer concurrency itself. That burden lies with you. I mention this as many folks start by using Netmiko, then eventually complain that their sequentially-looping code runs slow. Just sayin'.
- NAPALM is a Python library that is an abstraction above Netmiko, giving you easier access to network gear as an API... again, IF the vendor is well supported. There is still no concurrency here. That is still on you. The list of vendor types is WAY shorter here, but they provide easier access to info/etc. (a bit like Ansible network modules). As noted here, currently they include
- Arista EOS
- Cisco IOS
- Cisco IOS-XR
- Cisco NX-OS
- Juniper JunOS
- Nornir is a Python framework. Nornir is like the love child of Ansible and Django. That is, it is a Python framework that leverages similar concepts to Ansible like an inventory of hosts, groups, etc. (defined with YAML files by default). And like Ansible, it does concurrency. Also like Django, there is the core, and then it has additional plugins, including ones for Netmiko, NAPALM, Ansible, Netbox, and more.
Terraform... wait, why are you looking here? I said "that in some form involves Python". Terraform is written in Go. Come back if/when I write about that.
But put simply, as their main page says,
Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services.
Most folks use Terraform when they are deploying to the public cloud providers (e.g., Amazon AWS, Microsoft Azure, Google Compute Platform (GCP)). I cannot say much more, as I have not yet found a need for it in my own work.
It should be noted that ALL of the above are open-source. Some have commercial support/backing (e.g., Ansible and Salt), which should help to keep them going. Well, that and provide you "one throat to choke" if you are willing to pay for it.
Which To Use
As to which to use, as often is the case in tech, "it depends".
e.g., Ansible vs. Netmiko vs. Nornir
For example, where
- Ansible is a tool that utilizes an inventory and you have to write playbooks using YAML,
- Nornir is a framework that utilizes an inventory and you write in pure Python.
They each have their benefits.
Ansible is by far quicker to get started with (i.e., it has a lower "friction level"). This is likely why it is as popular as it is for network automation. That and the fact it doesn't require agents, of course. It also does not require knowing Python to use it.
And for simpler tasks, Ansible is easier/quicker to finish (assuming your vendor of choice is well supported, of course). Ansible is also very handy for doing quick, ad-hoc commands against a series of devices, thanks to its concurrent nature and being a tool vs. a library or framework.
However, there comes a time where the logic complexity of your playbooks hits a tipping point. You find yourself trying to shoe horn more and more code logic into YAML 2, and you find that you are spending more time than you would vs. using Nornir (ONCE you have learned it) and coding straight up Python. [NOTE: If you do not hit this point, awesome! Stick to Ansible.]
Now this is where most folks start looking into using Netmiko. It has a ton of modules, similar to Ansible, so it is the logical progression from that perspective. The thing about Netmiko in the long run is that, at some point, shy of only working with a few devices, you eventually run into the fact that it does not handle any concurrency on its own. And once you write code that works well for one device, and then you unleash that on 100 devices in a sequential loop, you will quickly see the issue.
Now you can handle this yourself, using such things as the Python built-in library
concurrent.futures library in order to add concurrency to your code. But you will need to understand what that entails and the challenges that go with it, such as avoiding race conditions and deadlocks.3 And if that fits your workflow, absolutely consider doing that. But this is where things shift towards Nornir. Nornir actually leverages that same library, but it has all the bits built-in, handling error messaging, etc.
Now Nornir has a "higher bar to entry". It requires coding in Python, so you need to know that. And then you need to learn to use the Nornir framework itself. And depending on your use case, you may still need to go back and at least grasp how Netmiko works if you end up using the Nornir module that leverages that.
So Nornir takes longer to get up to speed. However, once you have your head around it all, you get the best of all worlds: inventory and concurrent connections combined with all the logic/manipulation Python offers. Not only that, but you can leverage Netmiko, NAPALM, etc. as well.
There has also been discussion of performance. Nornir executes faster than Ansible. With smaller data sets this is not a big deal (and likely not very noticeable). But the larger the set of devices involved, the more noticeable the gap.
Then there is SaltStack. Under certain circumstances, SaltStack's architecture can be a real boon. Keep in mind that for Ansible or Python code written with Netmiko, NAPALM, and/or Nornir, the expectation is that the Ansible control node or box where your Python code runs can reach out to the network devices you wish to manage. If those devices sit behind firewalls at remote sites, then you will need to do a bit of work up front basically "punching holes" in the firewalls in some form (whether ACLs, via VPNs, etc.) in order to let Ansible or your code reach those devices. With Salt's design, as long as something behind the firewall is running a salt minion, in most cases you should be fine. And Salt, too, can leverage things like Netmiko and NAPALM.
Not only that, but SaltStack brings an event-driven model to bear as well. Ansible and other Python code would need to be run either manually or via some other scheduled setup, whether basic cron jobs or using a Web UI tool such as AWX/Ansible Tower. And this would still be a one-way trip. With SaltStack, once in place, it would be possible to have events that occur on a network device trigger a minion to send an event on the bus. That event could be watched for by a reactor, which in turn executed some code, and that code might then send messages out on the bus for either the same or other network devices to trigger yet more actions.
So again... "it depends."
Hopefully this has helped to at least give a baseline from which to go off and learn more. There are far too many resources available for me to list here. But when time permits, I will try to append to this with some ideas/possibilities.
For starters go with the project sites themselves, as most have decent documentation. Ansible by far has more tutorials, videos, etc., when it comes to network automation. There are also folks online like Kirk Byers (creator of Netmiko and one of the NAPALM maintainers) who run training/tutorial programs, both free and paid, to help folks get up to speed. And then there are courses on LinkedIn Learning, Udemy, and elsewhere.
For clients which support/have Python installed, like Linux boxes, Ansible sends an agent script written in Python via the SSH connection that it makes to the device, and it runs that agent code in memory on the far end. The moment the SSH session ends, so does the agent. The key thing is no work need be done in advance to "prep" a device, as compared to agent-based solutions. ↩
YAML was intended to be human-readable, and it is really more a declarative language. It offers certain logic such as simple looping, but you reach a point of diminishing returns soon after that. ↩