Network Node Shutdown Is a Process, not an Event

In theory, you should shut down a network device with a well-defined procedure:

  • Drain the traffic from the device;
  • Verify the device is no longer forwarding traffic;
  • Turn off the device.

In practice, network devices don’t have a shutdown command, and reload typically just restarts the network OS.

Graceful Shutdown

Every major vendor claims they have graceful shutdown functionality, but there’s a small problem: the shutdown is usually not so very graceful. For example:

  • BGP router tears down BGP sessions;
  • OSPF router sends a HELLO packet with no neighbors (tearing down all adjacencies).

In both cases, the neighbors immediately remove routes advertised by the device, go in panic mode and try to find alternate paths.

The only benefit of the so-called graceful shutdown is that the neighbors discover session/adjacency loss immediately and not after TCP/BGP/OSPF timeout/dead interval.

Is there a better way?

Of course – you could use overload bit in IS-IS (and IS-IS based fabrics like Cisco’s FabricPath), max-metric router-lsa in OSPF, and route revocation in BGP. The first two can be configured with a single configuration command in many routers and data center switches, the last one requires a bit more work.

Next steps

The very minimum you should do if you care about traffic loss following a network device shutdown/restart is a more controlled shutdown process: instead of typing reload, reconfigure the routing process (see above), wait for the network to converge (10 seconds should be more than enough) and then execute reload making sure the latest changes are not saved to permanent configuration.

You can also change the FHRP priority while waiting, and it’s pretty easy to automate the whole process.

Cisco Nexus OS and Arista EOS also support maintenance mode, which shuts down all interfaces apart from the management port – an ideal alternative to power-off or reload.

Finally, you could sprinkle some magic SDN dust on top of this solution: verify adjacent devices stopped using the network device before shutting it down. You could use BGP-LS for OSPF or IS-IS, or BGP Monitoring Protocol or BGP-based SDN controller for BGP.

More details

Watch BGP-based SDN webinar to see how you could solve the problem in BGP-based data center fabrics, or Facebook’s RIPE71 presentation to see how Facebook uses similar functionality in their network.

9 comments:

  1. Good Post. These are "knobs" tools that should always be considered. I have designed my FabricPath based networks to utilize the overload bit. I even created for a client with a menu and cli alias commands to use OL for maintenance window use too. It works very well.
  2. Has anyone ever see a vendor/model-specific cheat sheet that shows how to gracefully shut down OSPF, BGP, ISIS, etc?
  3. Offloading BGP through local-preference and prepending instead of route revocation has the advantage of keeping the path through the device available as backup for as long as possible until the reload is actually performed, and will help you identify traffic that erroneously does not have an alternative path.

    Any pitfalls?
    Replies
    1. Local preference on the other side might keep an eBGP peer from dropping the routes until they are revoked even with significant prepending. If they support communities to manipulate it that could resolve that issue.
  4. A network node shutdown is an event. You can have a process leading up to that event if needed, but it's clearly an event.

    Terms do matter.

  5. Several scientific papers have analysed how these graceful shutdown operations could be performed without arming the network, see e.g. :
    For BGP : http://inl.info.ucl.ac.be/publications/avoiding-disruptions-during-maintenan
    http://inl.info.ucl.ac.be/publications/requirements-graceful-shutdown-bgp-sessions-0
    http://inl.info.ucl.ac.be/publications/improving-network-agility-seamless-bgp-reconfigurations
    For OSPF/IS-IS : http://inl.info.ucl.ac.be/publications/disruption-free-topology-reconfigurat
  6. I'll talk for my IOS-XR focus view and take this opportunity to share my own field experience.

    XR supports BGP GSHUT since 5.3.2. Here is a good paper of Bertrand Duvivier about this feature: http://fr.slideshare.net/bduvivie/bgp-graceful-shutdown-ios-xr, covered by the RFC Olivier mentionned.

    From a network perspective, I think everything is already covered in this post.

    From an IOS-XR system and infrastructure point of view, it's recommended to put RSP/RP in rommon (config-register 0x0 location all) and then reload the router (reload location all). This avoids potential filesystem corruption. Once the router in rommon, you can safely proceed to the power isolation.
    Note you'll have to manually reconfigure the config-register to 0x102 or 0x2102 when turning on the router later to boot on the committed software.

    Fred

  7. Or... run Shortest Path Bridging in your network and you wont have to do any of this. Failover is automatic without user session loss
    Replies
    1. And there's bandwidth fairy and magical pixie dust... Please don't quote vendor marketing materials and don't conflate packet loss (and consequent performance issues) with session loss.

      TCP can survive for 30 seconds, so you effectively claimed SPB can converge in 30 seconds. Hooray!

      See http://blog.ipspace.net/2015/10/what-happens-when-data-center-fabric.html for actual technical details, and they apply to every single fabric architecture (the differences between various solutions are in failure detection, flooding and convergence timers).
Add comment
Sidebar