The A’s and E’s of modern network test and assurance


What do modern network test and assurance tools and strategies need to look like, to satisfy operator needs and provide a positive user experience?

Complexity is one of the fundamental issues that the wireless industry—and the test segment in particular—has to deal with constantly. Making network services work can be challenging. What does modern network test and assurance need to look like, in order to bring simplicity to what is fundamentally complicated?

Here are the essentials, summarized into what we’re calling the A’s and E’s of modern network test.

Adaptable. The push toward 5G Standalone is resulting in rapid change in how the networks are architected, moving from physical infrastructure to cloud infrastructure and expansion on the edge. This means that network test strategies have to meet both current and future testing needs.

“Carriers are going from consumers of network infrastructure, to providers of the very fabric that these network services run on,” said Ross Cassan, senior director of assurance strategy for Spirent Communications. “That comes with a lot of demand for testing and being able to support not only the infrastructure, but then the proliferation of network functions and protocols that ride on top of that.” Meanwhile, legacy technologies and services still need to be supported.

UScellular has launched a 5G SA core and recently began offering network slicing. Greg Agami, the carrier’s director of network solutions, said that as the the carrier has moved to cloud-native architectures, that has meant redesigning its testing processes.

The deployment of a 5G core, cloud-native development and smaller micro-services mean that the company can iterate faster—which has changed the way UScellular tests, to a continuous integration, deployment and testing (CI/CD/CT) process that has test fundamentally built into development. Validation has to cover a wide range of scenarios, Agami noted: 5G SA itself, coexistence with NSA and LTE on the network and device sides; across spectrum bands, across traffic for mobile vs. fixed wireless, and in terms of IoT, across performance across device classes. 

Authentic. Sameh Yamany, CTO of Viavi Solutions, points out that optimizing the network doesn’t start from a single point. Really, it starts in the lab—where a lot of network operators don’t particularly want to spend a lot of time. They tend to jump straight to thinking about the upgrades they want and how to manage vendors, Yamany said. Why the avoidance of the lab? Because traditionally, it hasn’t effectively or accurately mimicked what people actually saw in the field. 

Yamany argues that lab-based digital twins are increasingly capable and can be more dynamic and reflective of real-world network conditions, especially if they have access to real-time network data. But digital twins shouldn’t only be used to reflect the current state of the network, Yamany said—they can also be used to create and explore new ones. What happens if you upgrade a regional or national train system with 5G? How does the network respond to a hurricane?

“It’s not just creating reality, it’s also trying to create extra reality,” he offered. “You’re taking it to the next step, because we really want resiliency.” Similarly, digital twins can play an important role in what-if scenarios, optimization and figuring out technically challenging new network services like NTN—but only if they sufficiently reflect real-world conditions. 

Active and automated. Network testing and service assurance have to keep up with fundamental changes associated with speed of change in cloud-native networks—that CI/CD/CT pipeline. This means being more proactive than testing frameworks of the past. 

“We’ve largely relied on passive analysis in the past for things like service assurance,” said Cassan. “If you just think about that term passive, it means that you’ve detected something after it happened, right? It’s already happened to you, it’s already happened to your customer. … What we’re seeing is a move towards active assurance.” 

Cassan outlined several examples of recent case studied of active service assurance. In one case, an operator was providing a mobile slice for enterprise, but that required a huge amount of manual testing whenever a change was made, to ensure that other customers’ SLAs weren’t be impacted—so much testing, in fact, that they couldn’t test changes across every one of the large, Fortune 500 customers which were using the service.

“They really didn’t have good end-to-end visibility and were often getting calls from their customers when things weren’t going well—and that was the first time they would hear about an issue, when their customer called in,” Cassan said. With active testing and automated fault isolation, he said, “we’ve been able to evolve that to the point where within a five minute window, all 5,000 of those slices are being tested.”

In another instance, Spirent focused on turning routers at customer premises into test agents for a fixed line service, and using automation to be able to run tests from a central location, so that engineers were only deployed to the field for very specific needs. The changes not only reduced the time to detection and reduced operating costs by 25%. 

Active assurance helps to address the gap between lab scenarios and the behavior of the real network. “There’s always things that are going to be different in the live network. You just can’t create that scale. You can’t create the dynamicism of everything your customer is going to do,” said Cassan. “So you need to be able to constantly have the eyes and ears ready—whether it’s closed-loop or open-loop systems—to be gathering that information and ideally, trying to get ahead of those issues before they impact customers.”

Part of that means distilling data and increasing automation. We need to make the job easy for humans,” he said. “So we want to do things like automating the triage procedures, and making sure that we’re providing as much information as we can to the folks that are solving the problem and also getting the right folks in the room.”

Artificial intelligence-powered. AI, of course, is the technology with perhaps the biggest potential for change within the test and assurance space. While network testing, validation and assurance have used AI or machine learning for years, the capabilities of generative AI have yet to be applied at scale. 

Chris Hristov, AVP of network engineering and automation for AT&T, said that AT&T has used closed-loop, rules-based, AI/ML-decision-making to predict, model and improve network performance for years. For the newest genAI capabilities, he said, “I think there’s been some struggles. I think finding the right use cases, where it can bring value to the business, is probably where we spend a lot of time.”

“Some of the capabilities with LLMs, of gathering … information and providing actionable insights, is definitely a path forward and has made a lot of growth just the last year or two,” said UScellular’s Agami. But, he continued, having a controlled environment and automation in a lab to run all sorts of tests, from modeling to stress testing, is still important. In the field, the focus is drive testing and use of network probes to gain a picture, plus anomaly detection to look at areas that need more investigation about whether features or functions aren’t working, or if optimization is needed. 

Hristov also pointed out that over time, the business priorities and strategy for the network have changed—and algorithms need to change to reflect that. Five or so years ago, when 5G first launched, operators were focused primarily on network speed.

“It was very, very much performance driven,” Hristov said. “So a lot of the automation algorithms focused on that. I would say nowadays … performance is table stakes, pretty much. Now it’s about, how can I run a network at the lowest cost per gigabyte while delivering that excellent customer experience? So a lot of the automation … is changing a little bit. The use cases are changing more towards lower opex. I think that’s driving also a lot of the innovation,” Hristov said.

AI to watch the AI. As operators start to operationalize AI, they are also going to need tools to check up on how those AIs operate.

“If you really want to test what AI is doing, you have to use AI,” Yamany said—anything else is effectively bringing a knife to a gunfight. He sees another role for digital twins here, in using AI to mimicking both a real network and its built-in AI capabilities in order to see how those internal AI agents react under normal conditions, impairments, or large-scale disasters; and, to look for opportunities for upgrades and potentially, different places where AI or functions could be implemented, trained on real and real-time data from the network. 

However, Yamany also says that it would be naive to think that we currently understand how AI is going to be tested, because so many things are not standardized, and because the models themselves are so dynamic.

“But I will also say this: one of the fundamental thing about AI is that we know that it’ll be biased. It depends on data,” he adds. “So if you don’t govern what the data is that you’re using to train your AI, by default, you’ll be biased.” That bias may not be able to be completely fixed, but it could be monitored with something like a bias meter, which compares bias and model drift over time.

“I think we have a good idea how you can measure these kind of biases,” Yamany reflects. “I always say if you don’t measure, you can’t control it. If you don’t measure, you can’t optimize. So that’s how we are thinking there is a new era for testing right now is AI.” 

Emphasize end-user experience. So what does all this testing and optimization hopefully result in? The metric which trumps them all: A good end-user experience. Which also happens to be widely subjective, and in many cases, more difficult to pinpoint than one might think. No single data source or tool is perfect when it comes to understanding what the end-user experience is across the totality of device capabilities; why it’s good, or degraded; and if it’s degraded, what is causing the issue and how it can be fixed. Sometimes the available stats don’t match up. 

“There’s so many times where—at an event, let’s just say—our metrics look great,” lamented Hristov. “We’re maintaining calls. Throughput looks great. Quality of experience looks great. And then I get some executive that complains—they say, oh my God, it was horrible. I was at that basketball game and it was horrible.”

So more, better and varied data is needed to understand the end-user experience. “The more metrics we have, the better we can do a job in terms of testing and optimizing our networks,” UScellular’s Agami said. Network probes are still necessary, he added, but UScellular also sees increasing value in crowd-sourced metrics, which are based on end-user experience.

In terms of data sources, Hristov said, “I think that you kind of have to take each one with its weaknesses and strengths.” Crowdsourced data is likely to lean Android vs. iOS, he pointed out—a big issue for a company like AT&T, which has an iPhone-heavy device base. Granularity from some tools may be limited to 15-minute cycles and aggregated, as opposed to isolating individual user experiences. 

And at some point, even the best data becomes overwhelming in terms of volume, obscuring any insights. Once more, AI is seen as a potential solution. “I do think that’s where AI can come in and help a lot in terms of anomaly detection and look at patterns, and understand why it’s most likely X or Y,” Hristov added.

Interesting in learning more about modern network test and assurance strategies and tools? Check out sessions from this year’s Test and Measurement Forum, available on-demand.

Leave a Reply

Your email address will not be published. Required fields are marked *