TCP and HTTP for web developers

When developing web applications, you can do some mistakes related to basic peculiarities of TCP and HTTP. The post explains some of the pain points and shows approaches to avoid such problems.

TCP connection establishment

The TCP connection establishment takes some time, because TCP uses a three-way handshake. So 3 packages are necessary before the TCP connection is established and can be used for HTTP requests. This behavior leads to an increased latency until the page content reaches the user’s browser.

The solution for this is simple: Use HTTP 1.1 (instead of HTTP 1.0) as it supports the reuse of TCP connections. To be honest, this is something most applications don’t do wrong as modern browsers and web servers simply do it the right way nowadays. Nevertheless, I think this is useful information for web developers as you should know that something is potentially wrong if you read HTTP 1.0 (e.g. in some log file).

Effects of latency and bandwidth

Latency and available bandwidth of the user’s internet connection is something that massively influences the usability of your web application. Regarding this topic, you can do some mistakes. This is because HTTP doesn’t use the duplex capability of TCP connections. Even with HTTP 1.1, requests are executed in a serial order. A new request can only be sent using the existing TCP connection if the response of an earlier request is completely transferred.

If your application consists of many small resources and the user’s internet connection is affected by a high latency, your TCP connection is waiting most of the time with only short times of data transfers. So this causes a waste of time and bandwidth.

In summary, you should prefer few bigger resources over many small ones in your web application. But how can you achieve this?

TCP slow start

On Wikipedia, TCP slow start is explained as follows:

Slow-start is part of the congestion control strategy used by TCP, the data transmission protocol used by many Internet applications. Slow-start is used in conjunction with other algorithms to avoid sending more data than the network is capable of transmitting, that is, to avoid causing network congestion. The algorithm is specified by RFC 5681.

The slow start algorithm is explained this way:

Slow-start is one of the algorithms that TCP uses to control congestion inside the network. It is also known as the exponential growth phase.

During the exponential growth phase, slow-start works by increasing the TCP congestion window each time the acknowledgment is received. It increases the window size by the number of segments acknowledged. This happens until either an acknowledgment is not received for some segment or a predetermined threshold value is reached. […]

So in summary, this behavior causes the TCP connection to not use all of the available bandwidth. This means the TCP connection needs some time to warm up.

You should do two things to reduce negative effects due to TCP slow start:

  1. Reduce the overall size of the resources to be loaded
    • Use HTTP compression
    • Minify your CSS/JavaScript files
    • Use pure CSS to achieve some styling instead of using images
  2. Prioritize the important stuff
    • Include JavaScript for secondary functionality at the end of the html page
    • Load optional functionality on demand
    • Fill in optional content by using JavaScript to e.g. make images of the dynamic content to be loaded after the important stuff

Connection bottleneck

In RFC 2616 (HTTP 1.1) in chapter 8.1.4 you can read:

[…] A single-user client SHOULD NOT maintain more than 2 connections with any server or proxy. […]

In reality, most current browsers open up to 6 or even 8 connections per server. Regardless of the concrete number, this causes that only a limited number of resources can be transferred in parallel. You can see this by using the network analysis of the developer tools in your browser. It will look like this:

connection_bottleneck

With this in mind and the fact that HTTP potentially wastes available bandwidth (due to the facts above), you should again try to reduce the number of requests necessary to load your application. Ideally, the requests needed in parallel should be <=6 to not cause resource loading to be deferred on any current browser.

HTTP Caching

Caching is one of the most powerful tools to optimize your application. Every resource not transferred can’t be affected by the effects explained above.

In theory, caching of resources is easily achieved by properly setting HTTP headers, but it’s not that easy: To be able to use caching, your application must be made for cacheability. This means every resource that should be cached must not be changed afterwards as you can’t be sure which version of the file will be used on the client. To be able to use caching anyways, every cached resource that gets changed needs a new file name or path. Thinking of icons, this isn’t a big problem, as icons typically don’t change very often. A counterexample are JavaScript files that typically change frequently. The time needed to manually adjust the file names to make caching work is not acceptable.

To make it really work reliably you should have a look at perfect caching.

Some thoughts on optimization

There is much tooling available to help doing things the right way. When writing JavaScript applications you will need some amount of time to implement optimizations that will make your application perform much better.

One note for Java developers: When using GWT, most of the aforementioned best practices are automatically done for you.

The future of HTTP

HTTP/2 will solve many of the problems that exist in HTTP 1.1, but HTTP/2 isn’t final yet (late 2014)… However when HTTP/2 is final it will take some years for the adoption of the new protocol and for client and server software to be really available in companies. So we will have to deal with these problems many years.

Short URL for this post: http://wp.me/p4nxik-2dJ
This entry was posted in Web as a Platform and tagged , , , , , . Bookmark the permalink.

Leave a Reply