Google goes down after major BGP mishap routes traffic through China
Google lost control of several million of its IP addresses for more than an hour on Monday in an event that intermittently made its search and other services unavailable to many users and also caused problems for Spotify and other Google cloud customers. While Google said it had no reason to believe the mishap was a malicious hijacking attempt, the leak appeared suspicious to many, in part because it misdirected traffic to China Telecom, the Chinese government-owned provider that was recently caught improperly routing traffic belonging to a raft of Western carriers though mainland China.
The leak started at 21:13 UTC when MainOne Cable Company, a small ISP in Lagos, Nigeria, suddenly updated tables in the Internet’s global routing system to improperly declare that its autonomous system 37282 was the proper path to reach 212 IP prefixes belonging to Google. Within minutes, China Telecom improperly accepted the route and announced it worldwide. The move by China Telecom, aka aka AS4809, in turn caused Russia-based Transtelecom, aka AS20485, and other large service providers to also follow the route.
The redirections, BGPmon said on Twitter came in five distinct waves over a 74-minute period. The redirected IP ranges transmitted some of Google’s most sensitive communications, including the company’s corporate WAN infrastructure and the Google VPN. This graphic from regional Internet registry RIPE NCC shows how the domino effect played out over a two-hour span. The image below shows an abbreviated version of those events.
“Almost certainly an error”
BGPmon said MainOne made a second announcement on Monday that caused traffic sent to Cloudflare-owned IP addresses to follow an almost identical roundabout path. As was the case with the Google IP addresses, China Telecom improperly accepted the Cloudflare route and announced it to its peers. Transtelecom accepted the route and other large service providers soon followed, causing the route to propagate worldwide. This BGPlay graphic shows it playing out. Below is a snapshot:
The misdirection of the Cloudflare-owned IP addresses added to the suspicions of foul play, even as company CEO Matthew Prince told Ars that Monday’s routing event “was almost certainly an error,” rather than a deliberate move intended to hijack potentially sensitive Internet traffic. In an email, Prince explained:
Some circumstantial background to back that up: there was a large network meeting in Nigeria a couple weeks ago (NgNOG). Those meetings always spur more peering—interconnecting networks that previously weren’t directly connected. While setting up a new interconnection, the Nigerian ISP almost certainly inadvertently leaked the routing information to China Telecom who then leaked it out to the rest of the world. If there was something nefarious afoot there would have been a lot more direct, and potentially less disruptive/detectable, ways to reroute traffic. This was a big, ugly screw up. Intentional route leaks we’ve seen to do things like steal cryptocurrency are typically far more targeted.
The impact on us was minimal. Cloudflare’s systems automatically noticed the leak and changed our routing to mitigate the effects.
Why Google and Cloudflare? Because we’re both present in Nigeria’s Internet Exchange and, I believe, peered with this ISP. Most other large providers aren’t yet in Nigeria. We only turned up our presence there a few weeks ago. I believe Google has been there for a while. Martin Levy, from our team and cc:ed here, was at NgNOG and can add more color and correct anything I’ve gotten wrong.
Long term, the solution is for us to make BGP more robust. We’ve started this with our efforts around RPKI:
https://blog.cloudflare.com/rpki/
If we, as a community, can drive more routes to be cryptographically signed and verified we can then begin to reject routes that are improperly announced. The merely trust-based BGP routing infrastructure remains one of the last remaining core bugs of the Internet and today we saw it rear its ugly head. High time we fix it.
In a statement, Google representatives wrote: “We’re aware that a portion of internet traffic was affected by incorrect routing of IP addresses, and access to some Google services was impacted. The root cause of the issue was external to Google and there was no compromise of Google services.”
A Google representative told Ars that company officials also believe the routing event was an inadvertent prefix leak caused by by MainOne mistakenly advertising a range of IP addresses it didn’t own. The Google representative said officials suspect the leak was accidental and not a malicious hijack. The representative also said all affected traffic was encrypted, a measure that limited the harm that could result from malicious hijackings.
Slamming into China’s Great Firewall
Unlike the previously reported 30-month event that routed Internet traffic on a roundabout path through China, traffic in Monday’s incident involving Google never arrived at its intended destination. Instead, as the following traceroute shows, the traffic terminated at an edge router inside China Telecom.
The dropped traffic further supported the narrative that the routing event was a mistake. BGP hijackings are more effective when they go undetected by end users instead of causing an obvious outage. Still, there was no doubt that even if the mishap was inadvertent, it amounted to a major disruption. A blog post published by network intelligence firm ThousandEyes observed:
This incident at a minimum caused a massive denial of service to G Suite and Google Search. However, this also put valuable Google traffic in the hands of ISPs in countries with a long history of Internet surveillance. Overall ThousandEyes detected over 180 prefixes affected by this route leak, which covers a vast scope of Google services. Our analysis indicates that the origin of this leak was the BGP peering relationship between MainOne, the Nigerian provider, and China Telecom. MainOne has a peering relationship with Google via IXPN in Lagos and has direct routes to Google, which leaked into China Telecom. While we don’t know if this was a misconfiguration or a malicious act, these leaked routes propagated from China Telecom, via TransTelecom to NTT and other transit ISPs. We also noticed that this leak was primarily propagated by business-grade transit providers and did not impact consumer ISP networks as much.
IXPN refers to the Internet Exchange Point of Nigeria, where regional ISPs with peering agreements meet to exchange traffic free of charge. The existence of peering agreements between Google and MainOne and Cloudflare and MainOne further supports the assurances the routing mishap wasn’t intentional.
Readers are reminded that the border gateway protocol that routes Internet traffic from autonomous system to autonomous system around the globe is as fragile as it is intricate. While its foundation of trust was never designed to withstand the hostile actors that so often populate today’s Internet, the intricacies of BGP are enough to make major blunders a fact of life. While it’s too early to declare this routing mishap an accident, indications at this point aren’t supporting the suspicions this was a deliberate hijacking.
Either way, the event and its ability to go undetected until end users began to report dropped traffic underscores the continued inability of major providers to address the performance and security limitations of BGP.
“Through a small Nigerian ISP, Google’s prefixes were leaked to worldwide Tier 1 carriers, bringing traffic to a halt,” Kris Slevens, a technical account engineer specializing in network security at Continuous Networks, told Ars. “This still shows that China Telecom hasn’t reined in their infrastructure for any type of filtering, and how inherently fragile BGP is being based on trust. Also with a paper coming out last week about [China Telecom’s] past with traffic rerouting, this isn’t new. And more important, peering exchanges need stricter prefix limits or filtering. That is highly overlooked.”