David Gee on Automated Workflows

David Gee is coming back to Building Network Automation Solutions online course – in early March 2019 he’ll talk about hygiene of network automation. Christoph Jaggi did an interview with him to learn more about the details of his talk, and they quickly diverted into an interesting area: automated workflows.

Automation is about automated workflows. What kind of workflows can be automated in IT and networking?

Workflows most often fall into categorizations of build, operations and remediation.

Taking a moment to define workflows: they are processes converted to mechanizable flow charts which describe the logic required to perform one or more tasks. They account for data input and output, transformations and validations. Logic dictates, if statically typed input can be programmatically obtained, it can be used in decision making and be passed with API/RPC calls executed within a workflow. What kind of workflows can be automated in IT and networking? Almost anything!

Each of the aforementioned categorizations includes testing elements like pre-checks, validations and post-checks. It's also common to have a final state check within workflows categorized under build, which becomes a seed for current versus desired state checks.

The build deals with provisioning and topology creation and even covers the onboarding of new nodes or systems to monitoring and billing systems.

Operations covers everything from daily business-as-usual changes like access-layer to dealing with software upgrades.

Remediation targets the automatic fixing of problems through workflows, using proven codified processes and also workflows that gather and triage data in order to prepare for a human to solve more complex problems.

The Operations and Remediation phases deal with the transition between the current and desired-state, with the Build moving a system to a desired state from zero.

What can trigger and what can end an automated workflow?

There are two answers to this question, so let’s start with the simplest.

In the early days of an automation journey, humans execute workflows, using business events as the trigger. At run-time, the operator gathers input data and enters it as arguments to the appropriate command or script which is a manifestation of a workflow.

When the workflows prove to be effective, it’s typical to see data gathering being done by the workflows through key/value stores or more complex databases.

After some time, trust builds and it’s common to see the triggering mechanism come from sourced events. Orchestrated maintenance windows, control-panel updates and even errors and faults can trigger workflows with input data, eventually resulting in event-driven automation.

Two types of workflow exist in principle. These are run-to-completion and long-lived. Both have their own termination pattern. Run-to-completion workflows end when all the tasks are complete, irrelevant of task success. It is possible for long-lived workflows to never exit and, instead, spawn children sub-workflows which execute tasks, return run-time information and exit. These kinds of workflows can have lots of loops and require finite-state machine handling of logic. In some ways, amplifier feedback is similar to long-lived workflow feedback.

Is it worthwhile to automate every workflow that can be automated?

The answers to this vary question due to organization culture and perceptions of time that all organizations have. Let’s take an example.

On 31 December every year, IPEngineer PLC's operation team executes a workflow, taking about twenty minutes. The workflow involves multiple touch points like databases, web servers and middleware. The operations team estimates three days to convert this to an automated process. The team is energized to learn about automation but has lots of daily tasks that need converting first. The time investment isn’t worthwhile.

Now, imagine the same task for an organization with high automation coverage that has a mature automation culture. If all of the low hanging fruit has been consumed and the team is running super-efficiently, an extra twenty minutes of time is a huge gain. If the culture is built on solid hygiene, then the conversion will not take three days, thanks to reusable components and patterns.

The TL;DR is this: “Attack Goliath first”. Aim for high-gain workflows, hone your approach and reapply learnings whilst working your way through the workflows which cost the highest time or deliver the highest failure rates.

Want to know more? Register for the Building Network Automation Solutions online course.

8 comments:

  1. This comment has been removed by a blog administrator.
    Replies
    1. You can be a grumpy nay-sayer, or you can get your hands dirty and eventually get the job done. I know plenty of networking engineers in both categories. The choice is yours.
    2. I'll make my hands dirty with writing you some useful comments.
  2. I truly do not get where the marketing campaign is Anonymous
  3. I believe that Anonymous runs competitive blog where he/she shares really cools stuff for free.
  4. https://xkcd.com/1205/

    There is of course the factor of time and money saved in automating remediation workflows. The same hold true for monitoring. Instead of repetitive 'ops tasks', these tasks should ideally only be to analyze, document and automate anomalies.
    I only posted the link to xkcd because there is a tendency amongst some to act a wee bit on the overzealous side and to spend days on converting tasks into ansible/salt/... when the actual steps of the task could easily have been executed and documented in at most an hour, including checking if the task resulted in the desired changes.
    Replies
    1. "I only posted the link to xkcd because there is a tendency amongst some to act a wee bit on the overzealous side and to spend days on converting tasks into ansible/salt/... when the actual steps of the task could easily have been executed and documented in at most an hour, including checking if the task resulted in the desired changes."

      While I totally agree with that sentiment (and refer to the XKCD comic in my automation webinars), there are also "consistency" and "rare tasks" considerations. I automated things that took me 10 minutes per quarter just so I wouldn't have to think about them (and reverse-engineer what needs to be done from too-sparse documentation) every quarter.
    2. Yes, you hear everywhere: "you must automate, or at least collect some logs automatically to use automation" regardless of the cost. Sometimes it looks like the solution looking for a problem (just to sell the solution).

      BTW. I am doing automation for 10+ years (system development area) for real business cases but I cannot understand why to automate just to automate.
Add comment
Sidebar