Implementing HTTP: When does it end?

Note: When I say “HTTP” in this article, I am referring to HTTPv1.1, though these principles do extend to the rarer, more recent versions of HTTP.

What I cannot create I do not understand
– Richard Feynman

HTTP is the text-based protocol that powers the web and much else besides. It runs on top of TCP, a connection-abstraction protocol. All this I knew when I decided to implement HTTP myself in Go for a laugh.

TCP provides an open connection to send packets to the listener at the other end. HTTP clients write HTTP request packet to such a connection. The TCP connection is an open readable stream, and the client awaits a response written back via the same connection.

Trying to read

In order to correctly interpret the response, the HTTP client must read every byte from the stream. The idea of a stream is that you can read it in chunks, telling the reader where to read up to. It’ll stop based on what you tell it. There are broadly 3 approaches:

  • Read til the end, i.e. until we hit EOF
  • Read til a delimiter, e.g. new line
  • Read n bytes

Reading until the end is the easiest approach, and works great when the total data being read is low. I kicked off by trying to read the TCP connection this way, which didn’t work. It hung indefinitely. I quickly realised that I wasn’t going to get an EOF because the TCP connection remains open for further messages, until I as the client close it. So that’s not going to work for reading a message.

Next I tried reading line by line, and that seemed to work, but then towards the end of the message, it would hang. I soon spotted the problem here too. The message might not end with a new line, and even if it did, I didn’t know which was the last line, and therefore when to stop the loop.

N & No More

So I came to the n bytes approach, but I didn’t have a way of knowing how many bytes I should read. Except of course, I did. The Content-Length HTTP header tells me exactly how many bytes the payload is going to be. So if I can determine when the payload starts, I can read exactly that many bytes, and when I’ve got them I’ll be done.

How can I know when the payload starts? Well, an HTTP response packet looks like this:

status line e.g. 
header
header
...
header
[\n]
body start
more body
...
more body
last line of body

The salient detail here is that between the headers and the body, there is always an empty line. So that means I can go through the headers, reading line by line, parsing the content length on the way.

When I hit an empty line I know that okay, the body is about to start. Now I can swap out reading line by line (reader.ReadBytes('\n') in Go), for reading in chunks of N bytes. If you’ve got a particularly large payload, of course you’re going to want to read the body in chunks, but however you approach it, you should read up to Content-Length number of bytes, and then you know you’re done. In Go you can do that with reader.Read(b) where b := make([]byte, desiredChunkSize).

Googling Google

This worked perfectly with the jsonbin endpoint I was testing with. So I decided to try something more extravagant, so I pointed my new client at google.com….and it didn’t work. Google, as it turned out, didn’t return a Content-Length header. What?! How was I supposed to know how long the body was?

I scratched my head and read through the headers I did get back from Google. I Googled (not with my HTTP client) some of the headers and struck gold with Transfer-Encoding. In particular I was receiving Transfer-Encoding: chunked. This means that the body is sent in chunks, each chunk new-line separated and preceded by the byte-length of the chunk, in hex format.

So it looked something like this:

2e3
some HTML....

4ad
some more HTML...

etc

So armed with this second header, I knew that if I got this one, I should now read the first line of the response, parse it as a hex-encoded integer, read that many bytes for a chunk, and then repeat.

And it worked. The little client happily read the Google home page.

That was my little lesson in building an HTTP client, and it’s the small implementation details around these protocols that I find oddly fascinating. If I decide to flesh out the client to more widely implement the HTTP protocol, and maybe have a go at HTTP2, I’m sure there’ll be plenty more interesting surprises.

What I cannot create I do not understand
– Richard Feynman



Occasionally I send out an idea & ask for your thoughts.