Measuring broadband speed

“Speed” is the primary way people describe their Internet connection but actual speed test results are often misleading or completely wrong.  A recent paper, Understanding broadband speed measurements, by Steve Bauer, David Clark and William Lehr reports on their review of 400,000 tests made using the FCC-endorsed M-Labs network diagnostic tool.  Fully 38% of those tests never managed to fill the access link.  This means 38% of the tests never measured the available speed!

Speed measurements are a big concern for netBlazr as we need to know we’re actually delivering the performance we expect.  We also plan to display our complete network status, in real time and historically, and we want to sure netBlazr members can duplicate our measurements using widely available tools.  So we need to understand the measurement problem.

I’ve accumulated some relevant knowledge over many years of measuring and investigating my own Internet connections, but the paper by Bauer et al. trumped that in a moment, as they’ve analyzed 400K sessions in detail and talked with the staff at many of the speed test measurement sites.  If you want their in-depth story, their 39-page paper is well worth the effort.  If you want the short story, here goes.

It turns out that correctly measuring the speed of an Internet connection is rather complex.  There are many potential bottlenecks beside the actual Internet access link. Some are bottlenecks elsewhere in the Internet; some are bottlenecks in the local area network; some are configuration issues with the PC running the tests and some are design limitations at network test sites.

The biggest issues revolve around the TCP part of TCP/IP.  TCP or Transmission Control Protocol is the most widely used Internet protocol.  The word control is key.  TCP controls the rate at which a source transmits data in response to the network’s ability to carry the data and the destinations ability to absorb it.  Because a control protocol can only react when it gets signals (in this case, from the network or from the destination), there are some time lags.  So it’s not surprising we find TCP’s behavior is influenced by the round trip time between the source and the destination.

Here is an interesting graph (Figure 11 in the paper by Bauer et al.) based on the M-Labs data from 2009:

Figure 11 in Bauer et al.

This graph has a dot for each of the tests in the US that failed to saturate their access link (or indeed any part of their connection).  For each such test, it compares the measured download speed against the average round trip time (RTT). Individual tests are color-coded based on size of the buffer at the receiving end (the “receive window” or Rwnd).  For example, blue dots represent tests where the client’s PC’s TCP receive window was set to 64 KB, the typical default for Windows XP.

From this one graph we see two major issues.  First, round trip time has a dramatic impact on achievable speeds and second, for many people’s PCs, the receive window setting is a major limitation.

Suppose we’re trying to measure a 50 Mbps or 100 Mbps connection (the sort of connection that’s widely available in Stockholm, Seoul, Tokyo, Hong Kong, Amsterdam, etc.). If our PC operating system is configured for a TCP receive window of 512KB, we’d still need less than 30 milliseconds between our client PC and the test server.  Typically that means the test server should be in the same city with no bottlenecks between the access network and the network hosting the speed test server.

In practice, the most complete diagnostic information is available when you use the M-Labs broadband diagnostic tools [5], but for me (in Boston) the closest test server is in Atlanta. I’m writing this in a coffee shop where I just measured the round trip time to the nearest M-Labs test server in Atlanta at 690 ms!  That’s fine for a coffee shop where the DSL modem limits everything to ~750 Kbps, but it’s hopeless for measuring any of the netBlazr wireless trunks (which typically run at over 90 Mbps).

The second, even more common problem is with the receive window size.  Here are some excerpts from the test I just ran in the coffee shop:

— Client System Details —
OS data: Name = MAC OS X, Architecture = x86_64, Version = 10.6.5
Java data:  Vendor = Apple Inc., Version = 1.6.0_22

— Web100 Detailed Analysis —
[…]
This connection is receiver limited 69.85% of the time.
Increasing the client’s receive buffer (64.0 KB) will improve performance.
[…]

And this was using a one-year-old MAC running the latest operating system version, yet it’s TCP-receive-buffer limited on a coffee shop DSL line!

For my next act I plan some experiments on netBlazr’s network in Boston’s Back Bay neighborhood.  Stay tuned for the next post.