Building a Private Global Backbone — Lessons Learned

Two and a half years ago we started building a private backbone with the naive assumption that getting from 1 PoP to 4 was mostly a procurement and deployment problem. It's not. Here's what we actually learned.

Why Build Private at All?

The obvious question is why not just use the public internet with a CDN overlay, the way most companies do. The honest answer is that for latency-critical workloads, the public internet has a structural ceiling that you can't engineer around with caching or TCP optimization.

Public transit routing is optimized for cost, not latency. The paths your traffic takes across the public internet are determined by commercial peering agreements, most of which are opaque and frequently suboptimal. A packet from São Paulo to Singapore might traverse five different transit providers' networks, each making independent routing decisions that have nothing to do with your application's latency requirements.

When we measured actual latency on key city pairs using public internet paths versus our early private backbone, the gap was consistently 30–45%. That's not a marginal optimization — it's the difference between an architecture being feasible or not for real-time applications.

Colocation is the Hard Part

Software and hardware problems are relatively easy. The genuinely difficult part of building global infrastructure is the physical: finding the right colocation facilities, negotiating competitive pricing in markets where you have no leverage, and acquiring the cross-connects that actually matter.

In mature markets — US, UK, Germany, Singapore, Japan — the colocation ecosystem is deep and competitive. Getting a cage in Equinix SG1 is a procurement exercise. Getting fiber diversity to three different submarine cable landing stations in the same city is still a multi-month negotiation.

In emerging markets it's categorically harder. When we were building out West Africa, we found that the only facilities with meaningful submarine cable access in Lagos were also the most expensive per-rack in the world on a PPP-adjusted basis, because there were effectively no alternatives. Negotiating from that position is humbling.

What worked for us was treating colocation providers as long-term partners rather than commodity vendors. Several of our contracts include provisions for shared infrastructure improvements — we commit to minimum contract terms in exchange for the provider investing in additional cross-connects or power upgrades that benefit us both. It adds complexity but it's the only way to get reasonable economics in constrained markets.

routekey-cli — backbone health

$ routekey backbone health --summary

SEGMENTTYPELATENCYSTATUS

SIN–TYOprivate9ms● nominal

LHR–AMSprivate6ms● nominal

JNB–NBOprivate47ms● nominal

GRU–MIAprivate91ms● nominal

LAG–LHRtransit68ms◐ partial

backbone coverage: 3 countries · private segments: 83% · transit fallback: 17%

The Peering Ecosystem

Private backbone doesn't mean fully isolated. Traffic has to leave your network at some point, and the quality of those handoffs matters enormously. We peer at every major Internet Exchange where we have a presence, and we maintain bilateral peering agreements with the 40 largest ASNs that carry meaningful traffic to our customers.

Getting peering right took longer than we expected. The large transit providers are straightforward — they want your traffic. The tricky part is the mid-tier ISPs that have meaningful regional coverage but no formal peering policy, or worse, a peering policy optimized for the 2010s that doesn't account for the traffic patterns that modern applications generate.

We've had three situations where a regional ISP terminated peering after we grew faster than their IX port could handle. In one case in Southeast Asia, we were injecting traffic at 10x the rate the ISP had provisioned for, and they simply dropped the BGP session without notice. We now monitor our traffic contribution to every peering relationship and provision proactively.

What We'd Do Differently

If we were starting over, we'd invest earlier in redundant submarine cable access. Our first-generation PoPs in several regions had single-cable exposure — acceptable during the low-traffic period of early growth, but a real operational risk by the time we had enterprise customers with SLAs. Retrofitting fiber diversity into existing colocation arrangements is significantly more expensive than building it in from the start.

We'd also invest earlier in automated PoP onboarding tooling. The first 40 PoPs were partly hand-configured, which created subtle inconsistencies in routing policy and monitoring coverage that took 18 months to fully remediate. The last 80 PoPs have been fully automated from day one, and the operational difference is substantial.

Where the Backbone Is Today

As of Q1 2026: 4 Premium PoPs across 3 countries, 83% dedicated routing coverage on our busiest city-pair routes, and 17% transit fallback for long-tail paths we haven't yet built dedicated capacity for. Average backbone latency on our busiest routes is 11–14% below what commodity transit could achieve for the same path.

The remaining 17% transit coverage is our roadmap. Some of it will become private routes as we expand in H1 2026. Some of it will remain transit — there are routes where the economics of private capacity don't work at our current traffic volumes. That's fine. The backbone exists to serve customers, not to hit a metric.