Enhancing Network Management Capabilities

01: Overview

Introduction

Application Quality of Service or AppQoS gives the users the ability to create policies which manage how the data travels on those data links and with what configurations. This helps in classifying important data in the traffic, and ensuring that appropriate kind of data is traversed under appropriate configurations and policies.

01: Overview

The Problem

As part of a new SaaS release, I was asked to improve the AppQoS capability by identifying existing issues and recommending changes to the design and architecture of the feature while keeping the timeline and scope in check.

01: Overview

The Solution

As part of the solution:

Understood Technicality

Understood the detailed technical and architectural complexity which goes behind AppQoS.

Identified Challenges

Identified existing challenges across 3 different architectural layers in the AppQoS stack.

Recommended Improvements

Recommended design and framework changes to create a unified experience.

Selected Screens:

01: Overview

My Role and The Team

As the only UX Designer, I led the design of the AppQoS enhancements across Traffic Type, SLA Profiles, and SD-WAN Policies, and was supported by a Researcher, and two Visual Designers.

For research and discovery, I collaborated with two Sales Engineers and one Technical Marketing Engineer. I also relied on them for quick feedback on my designs.

To make designs a reality, I worked alongside two Network Architects, and two Product Managers in the US, four Back-end Platform engineers in India, and two Front-end engineers in China.

02: Discover
Kick off with Stake-holders
At the core of the process is the kick off meetings with a multidisciplinary team of SD-WAN experts that focus on aligning all members about capabilities and scoping of the project.

The Outcome:

From the sessions, I understood the business objectives, got in-depth information about what we wanted to offer as a product, and gained insights into what are the benefits of using SD-WAN for customers.

Last but not least, I got to know each member of the team who was involved in this effort, and set 1:1 conversations to gather more detail information about what they know.

Business Objective

Understood the business objective of being a competition in the cloud business

SD-WAN Technicalities

Was exposed to what SD-WAN is and how it works. Especially the complex technicalities lying within layers.

Introduction to the Team

I got to know each member of the multidisciplinary team which was located in USA, India, and China.
See How SD-WAN Works
Hide How SD-WAN Works
02: Discover

How does SD-WAN Work?

As a very simple example between a hub and spoke, an SD-WAN policy enables how traffic between them will move on the underlay links that they are connected with. So, an SD-WAN policy can say that “Media traffic” from Site 1 takes “Link 1” to go to the Hub, and another SD-WAN Policy can say that “Collaboration” traffic from Site 1 takes “Link 2”, as it is a more reliable and faster link.

An SD-WAN Policy can further be configured to say that, all my “Collaboration” traffic from Spoke 1 takes “Link 2” as long as the quality of the traffic is meeting a certain threshold. If the quality threshold is not being met, switch over to Link 1 to ensure that the quality is being retained. The threshold that needs to be met is defines as an SLA profile (Service Level Agreement) Profile. The quality of the traffic between 2 sites is denoted by parameters like “Packet Loss”, “Latency”, “Jitter”.

When traffic is being sent from one campus to the other, it takes a data link to travel. Each link is classified into 8 different types of queues for 8 different types of classes of data. So for eg, there could be a separate queue for all the “Voice” traffic going on the link, a separate queue for “Video”, and a separate queue for “Important-Data” if certain type of data is classified like that.

Each of these queues can have their own configurations which define how the traffic moves in them, and QoS settings enables the network admin to define those properties for each queue.

02: Discover

How do you define an SD-WAN Policy?

For a Tenant network admin to define an SD-WAN Policy, they have to take a stepped approach, where first they familiarize themselves with something called a “Traffic Profile” which defines the configurations for the different queues that carry different classes or types of data. These traffic profiles define the different traffic types and their configuration.
Once familiarized, the Tenant Admin defines an SLA profile or a quality threshold, per traffic type, saying that “For my Video traffic, I want these quality parameters, and I want the traffic to take this link”. These "quality parameters" are stored in the SLA Profile as "SLA Parameters" and the link they prefer is denoted as "Path Preference".
Once an SLA profile is created, the Tenant Admin, defines the policy saying, “For my Social Applications from Campus 1, I want the data to maintain the quality parameters defined in SLA Profile 1”. Thus, selecting the Source Location, the Application, and the SLA Profile with the SLA parameters.
02: Discover
Understanding User Goals via Sales Engineers and Technical Marketing Engineers
While the kickoff with Stakeholders gave me insights on the business requirements and feature requests, I wanted to think beyond that and identify the following things:
Given the enterprise nature of the product and the legal concerns, it is time consuming to get access to customers frequently. To get the closest insights about the end users, I reached out to Sales Engineers (SEs) and Technical Marketing Engineers (TMEs).

The Outcome:

As a result of my discussions with SEs and TMEs, I identified the target users along with their behaviors and motivations.

User Personas

Identified the different users and their Behaviors, Motivations, and Goals
See User Personas
Hide User Personas
Service Provider Network Admin
Hands-On people who implement solutions for their customers

Manage some or all customers worldwide
ABOUT
Tenant Network Admins
Does everything from requirement gathering to road mapping to helping other tenant admins.

Would like someone to assist him, but knows that it'll take a lot of time to onboard someone .
BEHAVIORS
Works with other network admins to get the requirements and then makes recommendations for the solutions they want.

On a daily basis receives tickets about things not working and then tries to find the cause of the problem as soon as possible.
Manage all network appliances worldwide.

The first thing in the morning they check whether everything is working or not by going through his check-list.
MOTIVATIONS
The solution should strive to communicate technical information regarding what they can do with it, so as to see the value.

Problem solving ability to push through difficult items fast.
They value solutions which can help them understand a product's technical features and benefits in the shortest time possible.

Increase overall productivity per hour.
02: Discover
3: Deep Dives with Platform Engineers
After our deep dives sessions, with SEs and TMEs, to better understand the domain, I set meetings to understand the system working of the various components that support SD-WAN.

The Outcome:

I understood how the architectural models worked and clarified the components, relations, and interactions of the SD-WAN environment (Traffic Profiles, SLA Profiles, and SD-WAN Policies).

These gave me insights into the functional constraints that the design needed to account for, to provide solutions that satisfied the users’ goals and at the same time were possible from the engineers’ capabilities.
03: Define
Synthesis from Research
After our deep dives sessions, with SEs and TMEs, to better understand the domain, I set meetings to understand the system working of the various components that support SD-WAN.

I: Challenges at "Traffic Type" level:

Single Set of 8 Profiles

Only a single set of 8 Traffic Profiles were applied to the global network. Did not cater to different requirements by different tenants.

No Access to Traffic Profiles

The Service Provider admin created and monitored these Traffic Profiles. Tenant admin was not exposed to the details of these profiles.

No Control Over Configurations

Only the Service Provider admin could create, edit or modify these configurations which affected all the Tenants and their sites.
See Challenges Details
Hide Challenges Details

A single set of 8 profiles for the whole network

  • All the Tenants (and all the campuses within them) across a single Service Provider, used only a single set of 8 Traffic Profiles.
  • This meant that all the traffic across all the different tenants, and within them the different campuses, had to classified within a single set of 8 Traffic Types.
  • Thus, all the same kind of traffic coming from multiple sites across all  tenants had to follow a single configuration.

Tenants had no access to the Traffic Profiles

  • Tenants inherited the Traffic Profiles from the Service Provider Admins, who defined the Traffic Types for the tenants to be used.
  • The tenants could not see the configurations or how the different Traffic Types were configured as.
  • The tenants had to reach out to the Service Provider admin or documentation provided by them to understand the Traffic Types and configurations associated with each Traffic Type.

Tenants had no control over the configurations.

  • The tenants had to figure out the configurations set by the Service Provider Admin, and once they did, they only had a set of 8 configurations to apply to different kinds of traffic across their system.
  • This meant that they could not tweak or change the configurations set by the Service Provider Admin.
Before: Traffic Profiles were the most granular entity which defined Traffic Types. Only a single set of 8 of these profiles were used throughout the system by multiple Tenants and hundreds of sites within each of them.

II: Challenges at "SLA Profile" level:

1 Profile For 3 Different Cases

A single "SLA Profile" did 3 jobs. Many of the users did not even realize other capabilities that existed.

Links Defined Only on Link Type

Links could be selected only on their Type, and by not any other classification. This was restricting in many ways.
See Challenges Details
Hide Challenges Details

One profile definition for three different non-related use cases

  • On further understanding how SLA were configured, I realized that an SLA profile defined 3 different use cases, but was cramped up in one single profile.
  • This resulted in users not realizing that 2 other capabilities were also present, but the name "SLA Profile" threw them off and would hinder the discoverability.

Too many data points to configure

  • Because one single profile did the job of 3 different use cases, there were a lot of data fields which were presented to the user.
  • Even though a lot of data fields were present, the user would just have to deal with one or two of the data fields in almost 90% of the use cases.

Could only select preferred links based on Link type

  • One of the main job of an SLA profile was to specify quality thresholds to maintain during traffic flow, and define the preference of what link to take while maintaining the thresholds.
  • But, on discussing the current implementation with engineers, I found out that the preference for the links could be specified only based on the link type.
  • This was problematic, because there were several cases of implementation where all the links from a site were of the same type, and the preference there became meaningless.
    Also, there could be sites within the same tenant where the type of link which was most preferred could change. This would require duplicating a policy rule for separate link types, when they solved the same purpose.
Before: SLA Profile definition had an overload of data points as it solved multiple purposes which were not clearly represented to the user.

III: Challenges at "SD-WAN Rule" level:

Create Framework From Scratch

To deploy an SD-WAN Policy, the user had to start from Traffic Profile > SLA Profile > SD-WAN Policy.

Figure out Thresholds and Apply Manually

The users had to figure out best SLA thresholds, create SLA profiles for them, and attach them to the policy.
See Challenges Details
Hide Challenges Details

Create the framework from scratch to get SD-WAN Policy Deployed

  • As mentioned above, users had to start from the Traffic Type Level, Configure SLA Profiles, and then configure SD-WAN Policy rules, before they could get a SD-WAN Policy deployed.
  • This resulted in users having to do a lot of heavy lifting just to get even the basic stuff deployed.

Figure out what are the best SLA Thresholds and apply them to appropriate Sites and Applications

  • Once the user had manually created the SLA Profiles, he would then have to select sites, select applications and select the associated SLA profile for that particular application or set of applications.
  • The onus was on the user to create SLA Profiles, with appropriate SLA Parameters, for appropriate applications or application groups.
04: Ideate
Ideation and Participatory Design with System Architects
After the deep dives with SEs, TMEs and System Engineers, I had a good understanding of how things worked across the different layers of SD-WAN Policy and what challenges the user faced.

My primary tool to ideate was a giant whiteboard and markers, to outline shortcomings in the current model and how we could tackle current challenges and accommodate future capabilities. I would often have informal meetings with engineers and pull them over to clarify a doubt or validate an idea.

I also conducted a structured participatory sessions to define and validate the architectural models I was proposing and get input from System Architects and Engineers. This enabled me to create a sense of ownership with them later in meetings across various Stakeholders, they would prove to be my biggest advocates and would echo my reason.

The Outcome:

During the creation of the architectural models with the team, the approach was to start the process by designing the perfect experience not thinking about technology, resources nor even the current product. From that outcome, we adjusted the model according to the capabilities that our system was able to support.
05: Design
Design Iteration and Design Sharing Sessions
With the help of the various deep dives and collaborative sessions, I had a good understanding of what the problem was and what would architecturally be the best way to solve this problem. Now, I focused on the Interaction Design aspect of the problem, and how we could achieve what we wanted architecturally via the interface.

To help me establish a continues conversation with stakeholders I introduced weekly walkthroughs for the design concepts created. Each weekly walkthrough produced refined information that updated the components and relations of the user experience.
As part of the design process, the design team met weekly to do voluntary internal design reviews. The purpose was to provide another set of eyes to the problems that as a designer I was trying to solve and to corroborate that we were tackling the right issues that would benefit our users the most before taking feedback from stakeholders.

The Outcome:

Helped Identify Overlap With Other Projects

Sharing designs internally with the design team helped identify the overlapping with other projects and how we needed to align to have a cohesive experience.

Built Ownership Across Stakeholders

This method built ownership with Stakeholders and earn their trust, with the team assuming responsibilities in the choices made.
05: Design
Design Enhancements
As part of the design process, the design team met weekly to do voluntary internal design reviews. The purpose was to provide another set of eyes to the problems that as a designer I was trying to solve and to corroborate that we were tackling the right issues that would benefit our users the most before taking feedback from stakeholders.

I: Enhancements at the "Traffic Type" level:

Multiple Sets of 8, Not One

Introduced CoS profiles, which were multiple sets of 8 traffic types. This provided flexibility in configurations.

Providing Control To Tenant

Tenant admin could overwrite some settings that the SP admin had configured to suit his needs.

Intent Approach to Assign CoS

Introduced an intent approach to deploy these CoS profiles across different Tenants and their sites.
See Design Enhancements Details
Hide Design Enhancements Details

1) Introducing CoS Profiles; Multiple sets of 8 Traffic Types, instead of just one set of 8 Traffic Types

For Service Provider (SP) Admin:
Having multiple sets of Traffic Type configurations enabled the Service Provider Admin to create different sets of configurations for different Tenants. He could create one for each tenant, or could create multiple sets of configurations for one single tenant.

For Tenant Admin:
Having multiple sets of Traffic Type configurations enabled the Service Provider Admin to apply these at various sites or site links, which enabled him to provide more flexibility in the Traffic Type configurations for same type of traffic coming from different locations.

This single entity which had multiple (8) sets of configurations became the most granular form of control for Traffic Types and was called Class of Service (CoS) Profile.

2) Apart from SP admins, providing Tenant admins access and control over the configurations as CoS Profiles

Tenant admins could now view and control each set of Traffic Type configuration as a CoS Profile. This enabled them to override and change values set by the SP admin for those CoS profiles, and also control what sites and links within the sites did those configurations or CoS profiles apply to.

Challenge:
However, providing Tenant admins the access to CoS profiles came with a challenge, which was that certain parts of the configurations had to be controlled only by the SP admin, and only a limited part of the configuration should be edited by the Tenant admins.

Resolution: To solve this problem, I designed the configuration panel in such a way which split up the parts that the tenant had to control onto a single page, and had more detailed configurations as different tabs for that profile. By urging engineers to implement Role Base Access Control (RBAC), I ensured that the same profile looked differently to SP admins and Tenant admins.

3) Implementing an intent based approach at assigning the CoS profiles to sites and their links.

Challenge: Assigning a set of configurations for different sites and within them different links was hard to implement and maintain. While bulk applying the configuration was still manageable, one of the biggest challenge was making edits to over hundreds of sites by replacing a profile with another profile.

Resolution: To solve this problem, I leveraged an intent based policy model which enabled the tenant admin to just specify the sites, and then select the links to apply a particular CoS Profile. By giving them the option to chose sites by site groups or departments, and further selecting links based on "Link Type", "Link Tag", and "Link number" ensured that all possible cases were taken care of.

II: Enhancements at the "SLA Profile" level:

Splitting 1 Profile into 3 Different Logical Profiles

Introduced 3 different profiles to make it clear the 3 different aspects.

Removing Fields and Leveraging Recommendations

Removed input fields and introduced recommendations based on broader choices for best practices.

Introducing Link Tags for Better link Management and Control

Enabled users to tag links into a group and use that link to make selections for links.
See Design Enhancements Details
Hide Design Enhancements Details

1) Splitting one single SLA Profile into 3 logical profiles; SLA Based Profile, Path Based Profile, and Breakout Profile

Instead of using only profile to configure settings which did 3 different things, we used 3 different profiles namely, SLA Based Profile, Path Based Profile, and Breakout Profile.

This brought a clear distinction between what the 3 profiles catered to, and further, made it easy to apply these profiles in the SD-WAN Policy definition.

As a result of this, the SLA profile lost excess fields and became easy to digest and configure.

3) Introducing Link Tags for better link preference

Having "Link Type" as the only way to set the link preference was limiting in selecting the preferred link. The user could only select the preferred link by Link type; MPLS or Internet. To overcome this issue, we introduced Link Tags, as a link preference method.
This enabled the user to tag the links based on their capacity and performance, while being agnostic of the type. This helped in creating policies to steering traffic over the best link available, irregardless of their link type.
During site creation, the tenant could mark links with specific link tags, and these links under that tag would function as one group, irrespective of type. This enabled Administrators to select links based on their performance and capacity, instead of the type of the link.

III: Enhancements at the "SD-WAN Policy" level:

From Bottom-Up to Top-Down

Went from a bottom-up approach to a top-down approach, with capability to enable a rule with a single click.

Providing Recommendations

Created default profiles for different traffic types and used those are recommendations to the user.

Showing Policy Rule Details

Introduced a view which showed the complex relationship between the entities of an SD-WAN rule for clarity.
Hide Design Enhancements Details
Show Design Enhancements Details

1) Moving from a Bottom-Up approach to a Top-Down approach

Rather than having the user move from the bottom to the top to create an SD-WAN Policy Rule we decided to move to a more bottom down approach. The focus shifted from asking the user to created the infrastructure to enabling the user to tweak the infrastructure if required.

This was enabled by leveraging domain expertise and shipping with pre-defined SD-WAN Policy rules, based on industry and technical best practices.

As a result of this, the user just had to enable a policy rule to have a full-fledged SD-WAN Policy implemented. If needed, the user could go into further deeper levels like SLA Profiles and CoS Profiles to make changes.

2) Leveraging domain expertise and providing recommended SLA Profiles

In addition to providing the infrastructure by offering predefined SD-WAN Policy rules, we decided to further make it easier for the user to make edits or create new rules.

By creating system generated SLA Profiles, which contained industry and domain best practices as SLA Parameters, we could recommend specific SLA Profiles, which catered to specific Applications or Application Groups.

This further allowed the user to leverage the system platform to have a best possible configuration.

Viewing a policy rule:

Creating a policy rule:

3) Showing Policy Details for better understanding

An SD-WAN Policy rule consists of several parameters, like the Source sites, targeted Application Groups, associated SLA Profiles, the parameters within those profiles, the Traffic Type and the associated CoS details with it.

While this information spanned across various levels of hierarchy, it was important for network administrators to view them, or double check that information before deploying policy rules.

We introduced a detailed side panel view for more information, which had all the information associated with the SD-WAN Policy rules, that the network administrator would want to view, without having to jump to various sections to find different pieces of information and then tie them all down mentally.  
06: Test
User Testing with Select Customers
Once I had ideated and narrowed down to a particular idea with the help of feedback from engineers, architects, and designers, I felt we were in a position to get initial feedback from some select customers.

We conducted a usability test session, to understand if the proposed enhancement across "Traffic Type", "SLA Profile", and "SD-WAN Policy" resonates with their use cases and mental model.

The Outcome:

07: Feedback
Feedback From User Testing
Once I had ideated and narrowed down to a particular idea with the help of feedback from engineers, architects, and designers, I felt we were in a position to get initial feedback from some select customers.

I: Challenges at the "CoS Profile" level:

Version Control and Duplication

To provide different bandwidth settings the admin would have to duplicate the CoS profiles and change the settings on each. This would introduce duplication.

Unable to Give Multiple Sets of Bandwidths

Having the capability to choose different CoS settings was good, but Tenants also offer various bandwidths on each CoS settings.
See  Feedback Details
Hide Feedback Details

2) Version Control and Duplication

In order for the Tenant Admin to provide multiple bandwidths for different traffic types, he would have to:
  • Be able to have multiple "Scheduler Configurations", to provide multiple bandwidths.
  • But be under the same "Queue Configurations" and "Probe Configurations" as that was set and controlled by the SP Admin.
With the new proposed implementation, the Tenant Admin would have to create a copy of the the CoS profiles to retain the "Queue" and "Probe" configurations, but go ahead and change the "Scheduler Configuration" on the copy to have a new configuration(resulting in a new bandwidth) that he could apply to a different site.

Apart from that, it became really complicated to track down what CoS Profile copy came from what original CoS Profile, and how would that be updated when the SP Admin made changes to the Queue and Probe Configurations.
08: Improvements
Post-User Testing Enhancements
After discussing probable solutions with engineers and network architects, we converged on the following solution to the above problem:

I: Enhancements at the "Cos Profile" level:

Introducing Scheduler Profiles

Removed "Scheduler" settings from CoS Profiles and introduced them as a separate profile that can be attached in the SD-WAN policy creation. Users could not mix and match CoS Profiles and Scheduler Profiles.
Hide Design Enhancements Details
Show Design Enhancements Details

Taking "Scheduler Configuration" out of the CoS Profile and keeping it as a separate profile called "Scheduler Profile"

To provide more flexibility, we decided to remove the scheduler configuration out as a separate profile, as it provided the following advantages:
  • Less complication in cases of version control and duplication. The CoS Profiles could now live as their own entity and the Tenant admin could not edit them at all, just apply them across his network.
  • The freedom of multiple Scheduler Profiles enabled the Tenant admin to create multiple Scheduler Configurations to provide multiple bandwidths across his sites even on the same CoS Profile Settings.
  • More flexibility, less number of entities. CoS Profiles and Scheduler Profiles could now be mixed and matched in CoS Policy, thus just requiring "N + M" number of profiles, instead of "N x M" in case of creating copies of old CoS Profiles.
The new CoS Profiles just included the "Queue Configuration" and "Probe Configuration"
With the help of CoS Policy, different combinations of "CoS Profile" and "Scheduler Profile" could be created and applied across not only different sites, but also across different links within a single site.

This gave the Tenant admin tremendous control and a wide variety of configuration options, without having to create and manage multiple CoS profiles, and their duplicates.