Five Things You Should Know About Machine Learning - www.infinera.com
contact contact

Five Things You Should Know About Machine Learning

September 14, 2020
By Teresa Monteiro
Director of Solutions, Software and Automation

Understanding the value of machine learning in telecommunication networks

The maturity of machine learning (ML) solutions has been stealthily growing in recent years. An interesting survey from Algorithmia on the current state of enterprise ML shows that while its operationalization across industry sectors is still in its initial stages, budgets for ML programs are growing and companies expect this technology to reduce costs, generate customer insights, and improve customer experience.

Looking around, we can see ML silently permeating many domains in our lives – it helps our GPS navigation software find the best way around town, it helps detect and mitigate cyber-attacks and  online credit card fraud, it suggests the next great streaming series to watch, and it is driving progress in speech recognition and natural language processing, as well as in real-time video content analysis such as facial recognition and motion/object/event detection.

Where does strength of machine learning lie?

Traditional computing approaches require that a computer be programmed by a human with an algorithm (a set of rules or a sort of recipe) in order to solve a problem. However, many problems are too complex to be well described by codable algorithms: traffic patterns in navigation systems, intrusion patterns in credit card fraud, movie preferences, etc. This is where ML stands out.

ML is the ability for a computer to learn to solve a problem without having been programmed for it specifically. During the learning process, ML engines derive models from sample data, the “training data.” Once the learning phase is over, those models are applied to new data, enabling ML to make predictions or take decisions.  With the right ML algorithm and training methodology, ML is successful at providing solutions for a variety of problem types: classification, regression (prediction), pattern finding, and ranking.

The planning, implementation, and operation of telecommunications transport networks are areas where these types of problems arise and are complex to solve. Rule-based systems are showing their limitations. What are the top things to know about ML as it applies to transport networks?

Five Things You Should Know:

  1. It’s all about the data!

ML as an academic subject has been studied for decades. But an effective ML system requires massive amounts of example data, repetitions of similar events from which to learn. Until a few years ago, it was hard to find such large volumes of data on any given subject.

A few recent technological shifts and advances have contributed to the ubiquity of data, the most relevant for this particular discussion being the availability of sensor data from all kinds of hardware, including networking devices.

Today, routers, switches, ROADMs, optical amplifiers and transponders continuously monitor rich sets of real-time data, pushing the data via streaming telemetry and using protocols such as gRPC, toward collectors (data lakes or data repositories) that subscribe to each data stream. As an example, Infinera’s latest optical engine, ICE6, monitors hundreds of parameters that can be leveraged for Infinera’s own use (engine optimization, flight data recording) or exposed to operators via user interfaces.

The amount, scope, and quality of data available from a modern network, and the ability to stream it and store it, are enablers for the use of ML techniques. For more on data quality and quantity, read How Not to Fail at Network Automation.

  1. …and about computing power

Hand in hand with large data sets comes large computational scale.

The learning processes require processing power that is big enough to take advantage of big data in a reasonable timeframe, and for a long time this was a bottleneck for efficient ML applications.  Processor evolution, including specialized hardware optimized for “intelligent tasks” such as learning, and processor virtualization techniques and cloud computing approaches have made computing power cheaper and more accessible than ever.

  1. ML has a lot to offer to telecommunication networks

Over the last few years, telecommunication solution vendors have been working together with operators on the application of ML to networking problems.

The following use cases highlight how ML can contribute to the network’s future mode of operation:

  • Network failure prediction and preventive maintenance is maybe the most talked-about application of ML to networking. It involves forecasting network problems based on patterns learned from historical network data sets and addressing them before they occur with closed loop mechanisms. An interesting illustration can be found in this proof of concept presented at MEF19. This approach will provide additional network availability, help operators meet their SLAs, and improve customer experience.
  • Root-cause Analysis and troubleshooting is another typical ML application. It consists of using an ML algorithm trained with a large range of network failure snapshots to classify events according to the underlying root cause, speeding up their resolution and reducing downtime.
  • Network augmentation recommendation, where an ML algorithm, trained with large sets of network data to discover traffic and occupancy trends, predicts traffic growth and load evolution, suggesting capacity upgrades ahead of time to better match the timing of expenditure to service revenue.
  • Routing problems are also good candidates for ML techniques. An ML engine can provide forecasts on traffic patterns, growth, and load evolution to a path computation engine, enabling it to dynamically match bandwidth to network load in a closed loop automation process.
    This kind of automation will result in cost reductions and investment maximization for operators.
  • Accurate and fast optical performance evaluation for arbitrary paths in open optical networks is another topic where ML can play a role. This type of assessment enables optical constraint-based routing and offers operators the means to maximize transmission capacity while ensuring that the desired optical performance is met – another way to reduce costs and maximize investments.
  • Energy-efficient network elements use the ML engine to monitor unused resources and data infer when to shut them down based on historical data.
  • Non-linear compensation (NLC) in a future optical engine using a machine learning algorithm is an innovative approach that Infinera presented in a that demonstrated the effectiveness of this technique. NLC is notoriously demanding on computing power in the coherent DSP. Using ML techniques for this type of functionality will help achieve superior long/haul-subsea optical engine performance, ultimately bringing transport costs down.

There is no denying the enormous potential impact of these approaches in networking. The maturity of these ML solutions is also reaching a point where implementation in real-life networks is feasible. What else is needed?

  1. ML will not “magically” solve all problems

It is important to set the correct expectations for ML. Most ML solutions are not “off-the-shelf” products that can be installed and ready to run without some human guidance: each project is unique and ML results are also subject to human error.

  • A careful analysis and definition of the problem at hand is the first step to success.
  • ML requires data preparation.
    • The quality of the collected data needs to be assessed and the data need to be cleansed; statistical noise may need to be corrected and information that could skew the model outcome and result in bias must be removed.
    • Excluding input variables that are sure to be irrelevant to the task when possible simplifies the problem and improves results – this potentially involves not only knowhow in data science and networking, but also business understanding.
    • The data then needs to be converted into a form that is suitable for modeling, and to be split into training and evaluation samples, making sure both samples come from the same distribution.
  • Next, models need to be evaluated and the best one selected; this involves identifying performance metrics for evaluating a model and defining the testing methodology and test cases.
  • The steps before the productization of the ML solution, such as integration into a software environment and into the business process, also need to be considered.

But the complex process of bringing ML into operation in a network is no reason to hinder its adoption – one way to ease into it is to use the expertise and experience provided by the solution vendor’s consulting services.

A different aspect to be considered when adopting ML is that these systems are not deterministic but probabilistic. Additionally, ML decisions cannot always be described in terms of simple physical parameters, and the network operator may not always be able to understand the why behind them. This is a paradigm change with respect to the traditional mode of operating networks, and it may require vendors and operators to work jointly in building trust in ML solutions, starting from low-risk applications and evolving toward fully automated detection and resolution of network issues.

  1. Machine learning is already benefiting network operators today

The growing use of ML applications in a variety of sectors is contributing to increased bandwidth demands on the network. Many typical ML deployments integrate a set of distributed sensors and basic processing at the edge, but, due to local computing constraints, rely on massive central data storage and processing, putting new loads on network connectivity.

As long as the right expectations are set and best practices are followed, the future is promising for ML applications in the transport network domain.

After all, there is a certain serendipity to the fact that the same techniques at the source of new bandwidth demands will soon be used by operators to simplify and improve the delivery of that same network capacity.