What Did You Do to Get Rid of Manual VLAN Provisioning?
I love(d) listening to the Packet Pushers podcast and came to expect the following rant in every SDN-focused episode: “I’m sick and tired of using CLI to manually provision VLANs”. Sure, we’re all in the same boat, but did you ever do something to get rid of that problem?
After all, you don’t need more than a few tens of VLANs in a typical enterprise data center or private cloud (clouds with thousands of tenants are obviously a totally different story) and most vendors have some sort of VMware-focused automatic edge port VLAN provisioning, from on-switch solutions like VM Tracer (Arista) or Automatic Migration of Port Profiles (Brocade) to network management applications (like Junos Space). Are you using them? If not, why not? What’s stopping you?
But let’s assume you’re unfortunate and use switches that have no hypervisor integration tools. Would it be THAT hard to write an application that would read the LLDP or CDP tables on ToR switches (populated by LLDP or CDP updates from the vSphere hosts), build a connectivity table, and allow server/hypervisor administrators to provision their own VLANs (within limits) on server-facing switch ports? I know that an intern could do it in a week (given reasonably complete functional specs), but we never did it, because doing automatic VLAN provisioning simply wasn’t worth the effort.
Assuming we’re truly sick-and-tired of manual VLAN provisioning in enterprise data centers, there must be other reasons we’re not deploying the vendor-offered features or rolling out our own secret sauce. It might have to do with the critical impact of the networking gear.
Let’s assume you manage to mess up a server configuration with Puppet – you lose a server, and hopefully you’re using a cluster or a scale-out application, so the impact is negligible.
If a vSphere host crashes, you lose all the VMs running on it. That could be 50-100 VMs if you’re using recent high-end server, but if you care about their availability, you have an HA cluster and they get restarted automatically.
Now imagine the vendor-supplied or home-brewed pixie dust badly misconfigures or crashes a ToR switch. Worst case (switch hangs and links to servers are not lost), you lose connectivity to tens of physical servers, which could mean a few thousands VMs; best case those same VMs lose half the bandwidth.
Faced with this reality, it’s understandable we’re scared of software automatically configuring our networking infrastructure. Now please help me understand how that’s going to change with third-party SDN applications.
I’m describing various VM-aware networking solutions in numerous webinars, including Introduction to Virtual Networking, VMware Networking Technical Deep Dive, Cloud Computing Networking and Data Center Fabric Architectures.
If we leave too much to automating who is going to control that the configuration is working as expected?
I know some people that provision all VLANs at once. It's easy to script. Negative side is number of STP processes if you run RPVST+ but if you run MST it's not an issue.
The image which everybody sees is that with SDN (for example) or other kind of automatic handling of network resources, is that we get rid of those old dinosaurs called network engineers.
Next step, everybody wants automatic deployment of everything, but when some people encounter a problem (ex. network) they rush back to the dinosaurs for help, pointing again that network is an issue. The fact that everybody wants control over network resources without understanding the technical background, well, that's not an issue.
I see SDN as an innovative technology, but I don't see it as the magic pill which will replace knowledge and experience.
I don't want to offend anybody, but I meet people working with VMware products which had no idea how the product works actually.
It's not VMware's fault, don't get me wrong.
He was explaining a VM machine like you click here and then click there...ok, ok but what's going on in the background, how is the vSwitch communicate with physical network for example? Silence.
This is the direction in which we want to go? Click here and there? I understand that we can do now more with less brain usage than 20 years ago, but this is only because there are "dinosaurs" which consider reading, learning and using brain for more than day to day activities.
Don't worry, if things are going on this path and nobody understand, by everybody uses terms like SDN to hide real problems, in another 20 years we can click here and click there to eliminate the last IT "dinosaurs".
Then server/virt team would configure/reconfigure vSwitches via SDN/whatever, while the transport network stays stable and secure. Everybody wins - networking team doesn't have to deal with high volumes of moves/adds/changes; no weird-ass protocols to track VMs; and server guys can do whatever they want without endangering the whole shabang.
This doesn't (of course) prevent a server guy from borking a vswitch and complaining to the network guys. THAT'S where SDN comes in to play. SDN should allow the network to be provisioned dynamically and automagically at that lower level from the Ethernet transport infrastructure. Ideally, SDN would allow a client to send a tagged frame (with some form of handshake I would presume) and the SDN faeries provision the access ports and ensure any trunk ports which connect to a switch with the same VLAN in use are configured to allow it.
Of course both of those still rely on some form of STP which is a waste. If we're redefining the DC infrastructure, surely we can "flatten" it out a bit.
Talking about Puppet: Recently, Juniper also launched Puppet for JUNOS: http://www.juniper.net/techpubs/en_US/junos-puppet0.8/topics/concept/automation-junos-puppet-overview.html . But it requires you to install a UNIX-like daemon on the machine, which comes "as-is and without any warranty" so that basically means nobody sensible will install it (hello memory leaks in there!)...
Data Center: If Cisco, Provisioning and unpruning new VLANs going to pruned VMware trunks is quite easy with port-profiles and your configuration management tool of choice. Never touch a port again, just update the VLAN DB, MST Region, and port-profile. Not a hassle at all.
For physical devices, in large data centers, puppet seems to make sense for VLAN configs if there are lots of devices always being added/removed. For the Enterprise, I'm not sure yet, but the CLI and single-device mgmt isn't the way forward.
Its a hard problem for the network alone to solve as it does not typically know which nodes are meant to communicate, so we don't know if changing a VLAN is right or not, so a person does it.
We need a way to tell us at the application level, which nodes are meant to talk, and then an automated way to deterministically verify that communication is valid before changes are made to device configurations.
The problem of course was cost. As soon as the first major bank moved to TCP/IP and IPX Routers (and it worked), all the other banks were at a competitive disadvantage. So in the 90's everybody moved from a stable, high cost network to a less stable lower cost network - because it worked most of the time and costed a hell of a lot less. (Let's not forget that it was mostly networking that broke the back of what was an expensive, proprietary and arrogant vendor).
Today the network is a bigger problem than it was back then.
It's highly UNSTABLE (just ask anyone who has to do a IOS or NX-OS bug scrub)
It has a high capital cost
And to your point Ivan, it has a very high OPEX. Whilst Network Engineers are the greatest guys in IT (by a mile), you guys are much slower to respond than a computer running a program. You guys sleep, you take lunch, you drive a car and you sleep.
You can call it SDN or whatever. But what we are talking about is automation. That is the revolution that is coming. And Network Guys can either understand that and embrace it.
Or as Calin intimated, in a few years the Human Resources Department will be "mouse clicking" you out of the building
You end up having to manage VLANs and when you consider that you might one day have a need for VLANs to span multiple data centers, you need to reserve a set of VLANs for that purpose. I know how adamant you are against that and frankly, I agree that there are very few needs for it.
One other thing you have to watch out for in these environments is your VLAN port counts (or STP logical interfaces in Cisco speak). A cloud provider can quickly run up on that number long before they run up on the supported VLAN limit. Every time you add a VLAN to a VNIC, it creates a new STP logical interface and that is a limited resource on the N5Ks, etc...
Layer 2 sucks.
You prove the point again. You Network Engineers are extremely smart. But that is most of the problem. To do your job, you HAVE to be really smart ! The incumbent vendor insists that you work all this out for yourself. Your incumbent vendor insists that you write your own scripts. You say it's fairly simple. Maybe it is. But all your CIO sees is OPEX OPEX OPEX!
* OPEX in that it's manual
* OPEX in that you have to spend lot's of time working this out, and then selling it to other members of the team.
* OPEX in that you are really smart - meaning I have to pay you twice as much as a Server Admin that can point and click vCenter - because he DOESN'T HAVE TO UNDERSTAND what is really going on underneath the covers (anymore that a developer needs to understand the x86 instruction set)
Your CIO can't understand why he can have abstraction in everything else IT - but not the Holy network.
When some customers start to replace "you never get fired for buying... types", with the same kind of pioneering Engineers that threw out their FEP's, 3270 Terminals and Token Ring for a better/faster/cheaper alternative - guess what? Your CIO will start to as well.
In the last few years, we've seen new [non-STP] bridging technologies that don't require VLAN provisioning on core-facing links -- VXLAN, QFabric, FabricPath, SPB, etc. These are all overlay based bridging technologies. With these solutions, the major reason for slow VLAN deployments is done away with. What remains is the lack of a standards-based solutions for the network to autonomically attach access ports to VLANs. VDP as a solution seems to have gone nowhere because of bloat possibly. However, we can expect to see an "MVRP UNI" arrive soon enough that will be coupled with overlay-based (ex: VXLAN) core bridging networks.
In the MVRP UNI approach, a hypervisor will send VLAN declaration to a TOR when a VM requiring it shows up. The TOR attaches the port/channel to the required VLAN, and the rest is handled by the overlay protocol. The TOR never propogates or declares a VLAN to it's neighbors (including hypervisors). This is an unconventional use of MVRP, but works fine for the purpose of autonomic VLAN configuration and satisfies the needs of the average enterprise. Linux will have MVRP support (http://comments.gmane.org/gmane.linux.network/244153). Now we just need to get Openstack support and the rest will follow.
This isn't the glorious approach, but for most companies good-enough will do for now, and hopefully some measure of sanity restored. There are a number of benefits for the average enterprise which I'll leave for another day.