What is the "S" in HTTPS

HTTP (short for Hypertext Transfer Protocol) is a protocol for transmitting hypermedia documents, such as HTML, XML, script, CSS, image and text in the format of JSON. It can be used for communication between browsers and web servers, or between mobile applications and web servers, or among web severs.

A typical HTTP transaction consists of two parts: an HTTP request (from client to server) and an HTTP response (from server to client), and both include protocol version (like “HTTP/1.1”), headers and entity body.

An HTTP request header:

1
2
3
4
5
6
7
8
GET / HTTP/1.1
Host: qconferences.com
Connection: keep-alive
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) ...
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Referer: https://www.google.com/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7,zh-TW;q=0.6

An HTTP response header:

1
2
3
4
5
6
7
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Language: en
Content-Type: text/html; charset=utf-8
Server: Apache/2.4.7 (Ubuntu)
Content-Length: 17089
Connection: keep-alive

HTTP over SSL/TLS = HTTPS

HTTP is not secure, meaning that

  1. data in transportation are not encrypted and hence are exposed to sniffing attacks
  2. data can be tampered by man-in-the-middle attacks
  3. web servers might not be genuine

HTTPS is a secure version of HTTP. As shown in the diagram below (quoted from Reference 1):

HTTP is a protocol for application layer, and it is built upon TCP which is a transport-layer protocol providing reliable, ordered and error-checked transportation of data. The HTTPS solution of making HTTP secure is to add a security layer between HTTP and TCP, which is know as SSL (Secure Sockets Layer) at the beginning and later changed to TLS (Transport Layer Security).

What happens when we access a HTTPS website in browser?

i. The browser establishes a TCP connection to port 443 (which is the default port for HTTPS) on the web server.

ii. The browser and server initialize the SSL/TLS layer, which is known as SSL/TLS handshake. This process is used mainly for verifying the website authentication and establishing a secret key for data encryption. As depicted in the diagram below (quoted from Reference 1):

  1. The browser receives a digital certificate from the server and runs some checks on the certificate (which has been discussed in a previous post on Asymmetric Cryptography). If the browser does not trust the certificate, then it will show a warning message and let the user decide whether to move on.
  2. The browser extracts a public key from the certificate, then generates a random secret key and encrypts it using the public key, and sends is back to the server. This is how the browser and server establish a secret key.

iii. The browser builds an HTTP request and sends it over SSL/TLS where the request data is encrypted using the secret key before sent over TCP. When the encrypted request arrives at the server through TCP, SSL/TLC has it decrypted and sends it to the application layer where the request is handled by the application. The process is the same when an HTTP response is sent over the network.

This is how SSL/TLS makes HTTP secure.

Why secures HTTP by introducing a new layer

As we has talked about the parts of What and How, it is time to discuss Why HTTPS is designed in this way.

Reason 1. HTTP is the cornerstone of internet, it is well-known and widely used, so it would be extremely difficult to replace it with a new secure protocol even it is perfectly designed.

Reason 2. It requires no changes to HTTP or TCP by adding a security layer between HTTP and TCP. The security layer can be viewed as an extra wrapper of the HTTP package, so it is easy to put on or remove the layer - conforming to the principals of layer model of computing networking.

Reason 3. It is flexible to apply SSL/TLS to other scenarios (like working with other applicaiton protocols) by decouple SSL/TLS from HTTP.

Note that it is important that HTTPS should not introduce too much overhead, so asymmetric cryptography cannot be used for data encryption and decryption. Symmetric cryptography is what we have, but first we need to solve the issue of how to establish a secret key over public channels. There are several solutions to such problems, and as introduced above, the one mostly adopted today uses the help of digital certificate and asymmertirc cryptography.

References

1 HTTP - The Definitive Guide