Networking from the City by the Bay
By Bill Kautz
Director of Analyst Relations & Market Intelligence
Highlights from NANOG 78 in San Francisco
I recently attended NANOG 78, held by the North American Network Operators Group in San Francisco, California. It was great to be back in the City by the Bay and to see that so little has changed with the cable cars, Fisherman’s Wharf, and of course the iconic Golden Gate Bridge. This stood in stark contrast to the content of the NANOG conference, where much of the focus was on change – open and software-defined networking, disaggregation, analytics, and automation.
There were many interesting presentations, too many to discuss in detail, but I will highlight a few examples below.
An implementation of open networking software and disaggregation was covered in a presentation titled “SONiC: Software for Open Networking in the Cloud” by Rita Hui from Microsoft. The goals of this open source network operating software solution are to enable fast technology evolution, reduce operational costs, provide a modular and composable software base, and reduce vendor lock-in by providing switch abstraction. SONiC is deployed in the Microsoft Azure cloud network. As I have commented before, I believe that we will see more disaggregated and open switching and routing applications, including in service provider networks. A relevant example is Infinera’s deployment of our open, hardware-agnostic CNOS routing software and DRX disaggregated router family with Telefonica Deutschland.
In terms of analytics and network performance, there was a fascinating presentation called “Comparing the Network Performance of AWS, Azure, GCP, IBM Cloud and Alibaba Cloud” from Angelique Medina, ThousandEyes, that detailed the performance and connectivity architecture variations between public cloud providers. Data was collected on the end-to-end network metrics of latency, loss, and jitter in 10-minute testing intervals for a 30-day data collection period in September of 2019. It turns out that loss and jitter were negligible, but latency differences were most prominent. Findings from the presentation included:
- A variation of cloud routing preferences – backbone vs. internet-centric vs. hybrid
- Other than Alibaba Cloud, inter-regional connectivity stays with the cloud provider network
- Inter-AZ latency is less than 2 ms
- AWS Global Accelerator performance varies but optimization continues
- GCP Europe-to-India backbone route is still in pre-rollout
- All cloud providers pay a performance toll for traffic crossing China’s Great Firewall
In addition, the presentation covered performance metrics that were gathered from six broadband ISPs, including AT&T, Verizon, Comcast, CenturyLink, Cox, and Charter, in six cities across North America. The findings indicate that U.S. broadband-to-cloud connectivity is strong, but that routing anomalies can affect performance. There was a lot more detailed data presented for both broadband connectivity and cloud provider performance, so I would recommend taking a closer look at the presentation, which is posted on the NANOG website.
While the performance data collected in the above exercise did not report any significant network downtime, we have all now become so dependent on cloud networks that best-effort networking is no longer sufficient. Amin Vahdat from Google discussed how Google views reliability in his presentation, “Failing Last and Failing Least: Design Principles for Highly Available Networks.” He points out that while the internet was based on soft-state, best-effort, and decentralized routing protocols, these must be augmented with large distributed system enhancements to build an even better network. Part of this is leveraging centralized state and software-defined networking where it makes sense.
Amin underscored that Google can build highly reliable networks because they have seen a lot of failures and learned from them. There was a discussion of a specific event, GCP Incident #19009, and the design principles that Google learned from the experience: make no global scope changes, maintain a single source of truth, have a conservative hitless upgrade process with a low-dependency safe mode, and require high-resolution visibility to both control plane and data plane health. I found this presentation to be very open and transparent on the evolution of thinking and implementation of reliability at Google. We all need to focus relentlessly on network reliability, which is at the core of all Infinera’s solutions.
The one presentation that I thought captured the heart of the transition underway in networking was “Networking 3.0” by Bikash Koley from Google, who discussed how networking has changed through three phases. Networking 1.0 was connection-led and human-operated, with primarily closed, chassis-based equipment. Networking 2.0 was data-led with “scale-out” fabrics, the introduction of software-defined networking, and a focus on automation. Now we have Networking 3.0, which is application-led based on public and hybrid cloud networking, with disaggregated and open solutions where machine learning evolves toward artificial intelligence and self-driving automation.
The presentation also brought up some interesting things to think about in terms of the future. While many of us are not ready for machines and artificial intelligence to take over operating our networks, it appears to be a logical progression based on the path that we are on. Case in point: Infinera is also leveraging machine learning capabilities in our Transcend intelligent automation solutions, as we demonstrated in the Proof of Concept Showcase at MEF19 with CenturyLink and Telia Company.
If anyone is worried about ending up in the dystopian world of Skynet (from the Terminator movies), consider that intent-based networking (part of Networking 3.0) still requires humans to define what we want or expect the network to do, and then the machines use machine learning and artificial intelligence work to determine how to do it. In this way we can keep in control of things, as long as we know what we want…
I had great conversations with many network operators at the Beer and Gear session, and noted that my conversations were about equally balanced between Infinera’s Groove Network Disaggregation Platform, 800G ICE6 innovation, and DRX/CNOS disaggregated routing solution.
I will end with the thought that while there has been a great deal of innovation in networking, I suspect that we are still near the beginning of advancements in both networking hardware and software. One thing that I am sure of is that they will likely involve many people like the ones I meet at NANOG. I look forward to continuing our networking conversations in Boston June 1-3, 2020 for NANOG 79.