HTTP/2 under the hood
How HTTP/2's request/response multiplexing, header compression, and server push boost web site performance
HTTP/2’s overriding objective is to improve the experience of web application users. As a binary protocol (HTTP 1.1 is a text based protocol), it has all the benefits of being lightweight, secure, and fast. HTTP/2 maintains the semantics of the original HTTP protocol, but changes the way that data is transmitted between systems. These intricacies are mostly managed by the client and server, so that web sites and applications may benefit from the advantages of HTTP/2 without significant changes.
In this article you’ll get an overview of HTTP/2, including the problems it seeks to resolve and its host of new performance-enhancing features—including request/response multiplexing, header compression, and server push.
History of HTTP
Before we dive into the details of the HTTP/2 protocol, let’s step back in time and review its origins in HTTP.
The protocol first came to light in 1989, in its incarnation as HTTP 0.9. Initially outlined by Sir Timothy Berners-Lee at CERN, near Geneva, Switzerland, it consisted of just one line. The sole method was
GET, and a request looked so simple as:
GET /index.html. The response was equally straightforward, containing just the requested file.
HTTP 0.9 was not an official standard, and was referenced this way to distinguish it from the official version that followed. HTTP 1.0 was introduced as an IEFT standard in 1996, under RFC 1945. By 1999, HTTP 1.1 had been published in RFC 2616. Shortcomings in the first major release prompted a minor revision in 1999, which ushered in a myriad of optional features and bitty details—and therein lay the devil.
Almost no browser (or server) implementation adopted every aspect of the protocol, which led to inconsistencies in user experience across different browsers. Notably, browser vendors failed to implement the performance-enhancing feature of HTTP pipelining, introduced in HTTP 1.1.
As web usage became more ubiquitous, performance needs increased exponentially, and the demands on HTTP took their toll. Developers started creating hacks to overcome the protocol’s inadequacies. HTTP’s inefficient use of TCP sockets put a damper on performance, for example, so developers resorted to using elaborate racks of servers to meet application demands. In this way, the failure to get pipelining working properly spurred the need to majorly rethink HTTP.
More than 15 years passed before the HTTPbis working group convened to formally identify intractable issues with the protocol, and eventually draft expectations for HTTP/2. Tasked with significantly improving end-user perception of latency over HTTP 1.1, the working group’s protocol recommendation featured solutions for the “head of line blocking” problem, header compression, and server push. Altogether, RFC 7540 (HTTP/2), along with 7541 (HPACK) promised an evolutionary growth spurt for web application performance.
Figure 1. Total transfer size and total requests (2012-2017), source: HTTPArchive
Hacks and workarounds
While it’s true that the Internet is capable of delivering highly complex content at great speeds, this happens despite the HTTP 1.1 protocol, not because of it. In its current incarnation, HTTP is not able to handle the demands of today’s web experience. As a result, web developers have come up with a range of workarounds for these performance issues. Let’s consider some of the more popular hacks and the problems they patch.
Head of line blocking
HTTP 1.0 permitted only one request to be made, via a single TCP connection. This led to the so-called “head of line blocking” issue, which forced the browser to wait for tardy responses. HTTP 1.1 addressed this with pipelining, which would enable a browser to make multiple requests in parallel. Browser vendors experienced implementation difficulties with pipelining, however, and most browsers (including Firefox) ship with the feature disabled by default. Chrome has removed it completely.
Multiple TCP connections
TCP connections are expensive to open and there is little information about how clients should use them. The only protocol stipulation is that a maximum of two connections can be opened per host. Given just two TCP connections, developers struggled to deliver the number of resources required for a modern web page—and so found a way to circumvent that limitation.
Using a popular technique known as domain sharding, developers are able to create multiple hosts, each serving a subset of the resources required by the site. Sharding has become fairly ubiquitous, and as a result the average number of TCP connections opened during page loading has hit around 35 (source: HTTPArchive).
Not to be outdone, browser vendors have also defied the protocol, by arbitrarily increasing the number of open connections allowed in browser implementations. This helps parallelize resource loading within individual browsers, but is an inefficient use of TCP sockets. The following table shows the maximum number of open ports allowed per hostname, and how it differs among the top three browsers.
Table 1. Maximum concurrent open TCP connections (source: browserscope.org)
|Browser||Maximum parallel connections per hostname|
|Internet Explorer 12||11|
Inconsistency in browser implementation means that the quality of a user’s web surfing experience is determined by their choice of browser, rather than how well the site has been designed and constructed.
Resource inlining and concatenation
Domain sharding isn’t the only clever trick that web application developers employ in the quest for better performance.
Neither of these techniques is desirable, least of all from a design perspective. In both cases, the structure of the page is mixed with the style, and time is consumed decoding images. Caching also cannot be easily achieved.
Nevertheless, if the goal is simply to reduce the number of files requested, these workarounds succeed. And with fewer file requests comes a need for fewer open TCP sockets.
HTTP/2 owes most of its functioning features to work initiated by Google on the SPDY protocol. By the time the HTTPbis working group commenced drafting the first version of the HTTP/2 RFC, SPDY had already proved that a major HTTP version update was workable. Because SPDY had been deployed and adoption had begun, there was evidence that an updated protocol was more performant in the wild.
It was vital to HTTP/2’s success that it accomplish substantially improved performance while maintaining HTTP paradigms, as well as HTTP and HTTPS schemes. The working group stipulated that the migration to HTTP/2 must be transparent, and end users should not experience any disruption.
The protocol’s headline features are:
- New upgrade path
- Binary framing
- Request/response multiplexing
- Header compression
- Stream prioritization
- Server push
- Flow control
Let’s consider each of these features.
New upgrade path
The HTTP/2 upgrade path is slightly different from normal and shortcuts some negotiations. Requesting a protocol switch via the upgrade header and receiving a reassuring “101 switching” HTTP status is not available for secure connections over HTTP/2. Instead, using a new extension called Application Layer Protocol Negotiation (ALPN), the client advises the server of the communication protocols it understands, in order of preference. The server then responds using the first protocol from the list that it also understands.
SPDY requires a secure connection, but the HTTP/2 specification did not make such a connection mandatory, despite community pressure to do so. All major browser vendors implement HTTP/2 over TLS only, however, and do not support unsecured connections. This effectively forces web application implementers to use TLS for all HTTP/2 traffic (source: caniuse.com). The upgrade path via the HTTP upgrade header is still available to curl users, as it will implement both clear and secure connections.
Perhaps the most important change with HTTP/2 is the switch to a binary protocol. For developers, this is arguably the epicenter of performance enhancements. Known as the binary framing layer, the new protocol redesigns the encoding mechanism without altering the familiar semantics of methods, verbs, and headers.
Most importantly, all communication is carried over a single TCP connection, which remains open throughout a conversation. This is possible thanks to how the binary protocol breaks communication down into frames: they are interweaved in a bidirectional logical stream between the client and server.
Topology of a connection
In the new paradigm of HTTP/2, as I’ve mentioned, a single TCP connection is established between the client and the server, and is open for the duration of the interaction. Over this connection, messages are passed through logical streams. A message consists of a complete sequence of frames. When collated, these represent a response or request.
Figure 2 illustrates the relationship between connection components, showing a connection through which multiple streams have been established. In stream 1, a request message is sent and the corresponding response message returned.
Figure 2. Topology of an HTTP/2 connection
We’ll look at each of these concepts separately.
Connections and streams
A single connection is established with a peer, and multiple streams flow over that connection. Because streams can be interweaved, multiple streams may be in flight at the same time.
Messages are a collection of frames. When reconstructed at the peer, these frames form a complete request or response. Frames of a particular message are sent over the same stream, meaning that a request or response can be mapped to a single identifiable stream.
The basic unit of communication is the frame. Each frame has a header which contains its length and type, some boolean flags, a reserve bit, and a stream identifier, as shown in Figure 3.
Figure 3. Breakdown of a frame
The length field records the size of the frame, which can carry up to 2 24 bytes (about 16 MB) in a
DATA frame, although the default maximum is set at 2 14 bytes (16 KB). Frame size can be negotiated upwards.
The type field identifies the purpose of the frame, and can be one of 10 types:
HEADERS: The frame contains only HTTP header information.
DATA: The frame contains all or a part of the message’s payload.
PRIORITY: Specifies the importance to give to the stream.
RST_STREAM: Notifies of an error: a rejection of a push promise. Terminates stream.
SETTINGS: Specifies connection configurations.
PUSH_PROMISE: Notifies of an intent to push resources to the client.
PING: Heartbeat and round-trip time.
GOAWAY: Desist notice to stop producing streams for the current connection.
WINDOW_UPDATE: Used to manage the flow control of streams.
CONTINUATION: Used to continue a sequence of header fragments.
See the specification section 11.2 for more detail on the functioning of each frame type.
The flag field is a boolean value and specifies state information about the frame:
DATAframes can define two boolean flags:
END_STREAM, which when set signifies the end of the data stream; and
PADDED, which indicates that padding is present.
HEADERSframes can specify the same flags as the
DATAframe, plus two additional flags:
END_HEADERS, which when set indicates the end of the headers frames; and
PRIORITY, which indicates that a stream priority has been set.
PUSH_PROMISEframes can set
All other frame types are unable to set flags.
The stream identifier is used to track the frames membership of a logical stream. Membership is exclusive to just one message and stream at a time. A stream can advise of priority, which helps determine the network resources allocated to it. I’ll explain more about stream prioritization in a moment.
The problem with a single TCP connection is that only one request can be made at a time, so the client must wait for a response before making another request. This is the “head of line blocking” issue. As I discussed earlier, the typical workaround is to open multiple connections; one for each request. However, if it were possible to break down the message into smaller, independent parts and send those over the wire, then this problem would be immediately solved.
This circumstance is exactly what HTTP/2 has sought to do. Messages are broken into frames, given a stream identifier, and sent independently over a single TCP connection. This technique enables full bidirectional multiplexing of request and response messages, as shown below.
Figure 4. Frames interweaved over TCP connection
The diagram in Figure 4 shows three streams in flight over a single connection. The server sends two responses and the client sends one request.
In stream 1, the server sends the
HEADERS frame for a response; in stream 2, it sends the
HEADERS frame for a different response, followed by the
DATA frames for both responses. The two responses are interwoven as shown. While the server sends the responses, the client makes a request by sending the
DATA frames of a new message. These are also interwoven with the response frames, as shown below.
Figure 5. HTTP/2 interweaves request/response streams
All of the frames are reassembled at the other end to form the complete request or response message.
The benefits of frame interweaving are multiple:
- All requests and responses occur over a single socket.
- No response or request can block any other.
- Reduced latency.
- Increased page loading.
- Eliminates the need for HTTP 1.1 hacks.
Figure 6. Mapping an HTTP request to HTTP/2 frames
On the left, we have an HTTP request mapped to a
HEADERS frame on the right.
HEADERS frame, two flags are set. The first is
END_STREAM, which is set to true (as indicated by the plus sign), indicating the frame is the last one for the given request. The
END_HEADERS flag is also set to true, indicating the frame is the last one in the stream containing header information.
The header properties in the
HEADERS frame reflect those set in the HTTP 1.1 request. This must be the case because HTTP/2 is bound to maintain the HTTP protocol semantics.
Next, we’ll have a look at the response to this request.
Mapping an HTTP response to frames
On the left in Figure 7 is an HTTP 1.1 header response. On the right is this same response represented using two HTTP/2 frames:
Figure 7. Mapping an HTTP response to HTTP/2 frames
HEADERS frame, the
END_STREAM indicates that the frame is not the last one in the stream, while
END_HEADER indicates it is the last frame with header information. In the
DATA frame, the
END_STREAM indicates that it is the last frame.
The HTTP/2 protocol is accompanied by HPACK. The objective of HPACK is to reduce the overhead caused by duplication of header information between client requests and server responses. Header compression is achieved by requiring both the client and the server to maintain a list of header fields previously seen. This list is used to build future messages that reference the seen-headers list.
Figure 8. Header compression for two requests over the same connection
Between the two requests in Figure 8, header information is duplicated. The only difference is the resource requested, as highlighted in yellow. This is where HPACK header compression comes in. After the first request, it only needs to send the delta from the previous header, because the server maintains a list of previously seen headers. Unless a header value is set, it is assumed that subsequent requests have the same header values as previous requests.
Message frames are sent over streams. Each stream is allocated a priority, which determines the order in which it will be processed and, by extension, the amount of resources it will receive.
The priority is entered into the header frame or the priority frame for the given stream, and can be any number between 0 and 256.
Dependencies can be defined to allow one resource to be loaded before another. Priorities can also be mixed into a dependency tree, giving the developer more control over the importance allocated to each stream.
Figure 9. A dependency tree for stream prioritization
In Figure 9, the letters represent stream identifiers and the numbers represent the weight given to each stream. The root of the tree is stream A, which is allocated resources ahead of its dependents, streams B and C. Stream B is allocated 40 percent of available resources, while stream C receives 60 percent. Stream C is the parent of streams D and E, each of which receives an equal allocation of resources from its parent.
Stream priority is only a suggestion to the server and can be changed on the fly, or ignored completely. In drafting the HTTP/2 protocol, the working group determined it would be incorrect to allow the client to obligate a server to adhere to a particular resource allocation. Instead, the server is free to adjust priorities to match its own capabilities.
Server push enables the server to anticipate the resource requirements of a client request. It can then send those resources to the client before the request processing has completed.
So how does HTTP/2 manage server push without overloading the client? The server sends a
PUSH_PROMISE frame for each resource it wants to send, but the client can reject the push (for instance, if the resource is already in the browser’s cache) by responding back with an
RST_STREAM frame. It’s important that all the
PUSH_PROMISEs are sent before the response data, so the client knows what resources it needs to request.
Flow control manages the transport of data so that the receiver is not overwhelmed by the sender. It allows the receiver to stop or reduce the quantity of data being sent. For example, consider a streaming service that offers videos on demand. While the viewer is watching a video stream, the server is sending data to the client. If the video is paused, the client informs the server to stop sending video data, in order to not exhaust its cache.
As soon as a connection is opened the server and client exchange
SETTINGS frames that establish the size of the flow-control window. By default, the size is set to about 65 KB, but can be controlled by issuing a
WINDOW_UPDATE frame that sets a different size for the flow control.
HTTP/2 in the wild
HTTP/2 adoption by vendors has been almost universal. In the browser space, all major browsers currently support the new protocol over TLS only. Global support is more than 80 percent at the time of writing.
Server support is advanced, with all major server families supporting HTTP/2 in current versions. There is a good chance that your hosting provider already supports HTTP/2. You can track all known server implementations of HTTP/2 on the specification’s Wiki page.
Tool support is also extensive, with all your favorite utilities supporting HTTP/2. Wireshark is the most important of these for developers wishing to debug HTTP/2 communication between server and client.
HTTP/2 and you
Web users don’t care what protocol you use to deliver content, just as long as it’s fast. By optimizing the way sites load resources, you’re already working to give your customers what they want. With HTTP/2, you no longer need to concatenate files, collate icons into one image, set up numerous domains, or inline resources.
Put simply, HTTP/2 obviates the need for workarounds. In fact, continuing to use the performance hacks I’ve described in this article could inhibit your site from benefiting from HTTP/2 performance enhancements.
So the million dollar question for most developers is: Is it time to refactor my web site for HTTP/2? In my mind, it largely depends on factors related to the application makeup and the browsers in use. It’s a balancing act: you don’t want to penalize users with older browsers, but you do want to deliver an overall faster user experience.
Optimizing for HTTP/2 is an unknown, particularly with regard to best practices. It’s not just a matter of removing the workarounds and hoping for the best. Each of us must do our own research. In the process, we’ll discover new ways to squeeze out performance, how HTTP/2 works in the wild, which server has the most performant implementation, and more.
HTTP/2 represents a brave new world for web development. Adventurous developers will reap the benefits as we accept the challenges that it presents.