Moving Beyond End-to-End Path Information to Optimize CDN
Performance
Rupa Krishnan, Harsha V. Madhyastha, Sridhar Srinivasan, Sushant Jain,
Arvind Krishnamurthy, Thomas Anderson, Jie Gao
Abstract: Replicating content across a geographically distributed set
of servers and redirecting clients to the closest server in terms of
latency has emerged as a common paradigm for improving client
performance. In this paper, we analyze latencies measured from servers
in Google’s content distribution network (CDN) to clients all
across the Internet to study the effectiveness of latency-based server
selection. Our main result is that redirecting every client to the
server with least latency does not suffice to optimize client
latencies. First, even though most clients are served by a
geographically nearby CDN node, a sizeable fraction of clients
experience latencies several tens of milliseconds higher than other
clients in the same region. Second, we find that queueing delays often
override the benefits of a client interacting with a nearby server.
To help the administrators of Google’s CDN cope with these
problems, we have built a system called WhyHigh. First, WhyHigh
measures client latencies across all nodes in the CDN and correlates
measurements to identify the prefixes affected by inflated latencies.
Second, since clients in several thousand prefixes have poor latencies,
WhyHigh prioritizes problems based on the impact that solving them
would have, e.g., by identifying either an AS path common to several
inflated prefixes or a CDN node where path inflation is widespread.
Finally, WhyHigh diagnoses the causes for inflated latencies using
active measurements such as traceroutes and pings, in combination with
datasets such as BGP paths and flow records. Typical causes discovered
include lack of peering, routing misconfigurations, and side-effects of
traffic engineering. We have used WhyHigh to diagnose several instances
of inflated latencies, and our efforts over the course of a year have
significantly helped improve the performance offered to clients by
Google’s CDN.
Accepted to 2009 Internet Measurement Conference, (PDF).
The dataset is available here.