msxnet.org > Making your website super fast-loading

Making your website super fast-loading

The time it takes to download a webpage is of huge concern for usability, as the user gets distracted if it takes too long or he might even get irritated and leave your site.

Low graphics

The first thing to do is self-explanatory: use as little graphics as possible, don't use frames (as those are multiple pages), use png where graphics is needed as it compresses better than gif, use style-sheets (as this is downloaded only once, and you don't have all those <FONT> tags all over the place which just increase the html file size.

Marking the pages as cachable

If the pages are marked cachable then the browser can keep a copy in its cache, or the object will stay in a cache for a while. This has advantages for both the server and the client: the server gets less requests, and the site is more responsive (ie. loads faster) for end users. So you use less bandwith for you server, which can also make it less expensive for you to run the server.

This results in an excellent performance hit. For a while I had a counter on my website (which was marked uncachable otherwise it wouldn't work), and by far most requests were for the counter; other pages were just in a cache.

Some people express worries about that you won't be able to known how many users have visited your site. This means you haven't thought it through very well. First of all, think of how accurate "page impressions" and "unique visitors" are. Not at all. If you can't understand that, why not just mark the images as cachable? You're not going to count that are you.

For more information, here is an excellent introduction.

Compressing the pages

As of HTTP/1.1, you can compress html pages which can result in a huge performance increase. Specially users on a low-bandwidth connection (think modems) can benefit from this.

Note that if you compress the pages, caching won't work for most pages. This will change over time but for now it's really one or the other as current caches don't implement content-negotiation properly. So, I recommend compressed pages only for dynamic content (as in, generated for every request).

How does it work?

If the browser sends the http request-header:

Accept-Encoding: gzip

Then the content can be compressed. Older browsers like Internet Explorer 3.x don't sent this header (as this is new to HTTP/1.1) and don't understand gzip'ed content.

So it is safest to assume that only if this header is set AND gzip is present, we can compress the response. This is done by gzipping the content and adding the header Content-Encoding: gzip to the http response header.

Possible compressions algorithms

The possiblities are gzip, compress, deflate. compresses uses LZW, the others don't.

LZW compression and decompression are licensed under Unisys Corporation's 1984 U.S. Patent 4,558,302 and equivalent foreign patents. This kind of patent isn't legal in most coutries of the world (including the UK) except the USA. Patents in the UK can't describe algorithms or mathematical methods.

Because of this and the fact that gzip is simply better, use gzip.

Which browsers support this?

Internet Explorer 4.0 for Windows and higher support gzip, and so does Netscape 4 on all platforms. All versions of Mozilla (and thus Netscape 6) support it, so does Opera. The only common browser that does not support it is Internet Explorer for the Macintosh; the last version I tested was 5.5.

Concluding, for most users it will work, but we have to acommodate for user-agents that don't. For this purpose there is content negotiation.

Which files?

Notes on caches and proxies

In order to cache negiotated content, the proxy (or cache) needs to support the Vary: header. The most common proxy, squid, has support for it in version 2.4, but this version is fairly recent and is not installed at every ISP.

Squid has more information on its Vary header support.

How to implement

It is easy in php. As from php-4.0.4 (note that this version has several security issues) there is support for it. You need to compiled php with zlib support (pass --with-zlib to ./configure). When this is done, simply add the following at the top of the page:

ob_start ("ob_gzhandler");

That's it. Note that this must be executed before any output is generated. php compresses the page if the user-agent supports it.

Support for static pages is also available in Apache. You can either use mod_gzip (which is not part of Apache but available under the Apache license) or the built-in content negotiation.

This site uses content negotiation in Apache. As you may notice, there are no extension on the html files; for example, this page should have the URL http://www.msxnet.org/fast-website. This has two advantages:

All .html files are stored like this:

-rw-r--r--  1 sean  sean  2510 Feb  3 18:45 index.html
-rw-r--r--  1 sean  sean  1311 Feb  3 18:59 index.html.gz
-rw-r--r--  1 sean  sean  6572 Feb  4 12:31 fast-website.html
-rw-r--r--  1 sean  sean  2956 Feb  3 18:59 fast-website.html.gz

And in the .htacess I have:

Options +MultiViews
DirectoryIndex index

The same trick can be used for images; so you can serve both png and gif images for browsers that do not support png. However because of the unisys patent problems, it's better not to use gif at all, IMHO.

To enable caching, I have added the following lines. Note that mod_expires needs to be compiled in (or as a module, obviously).

ExpiresActive on
ExpiresDefault A2592000
ExpiresByType text/html A86400
ExpiresByType text/plain A86400

Note that if you compress documents on the fly, this does mean a performance hit. Compressing is generally expensive in CPU time; on a Pentium II / 266MHz gzipping 40kb of html takes about 20ms cpu time. This time adds to the response time and also increases the load on the server.

Recording of a session

sean@behemoth:~$ telnet www.msxnet.org 80
Trying 194.134.73.94...
Connected to atlantis.8hz.com.
Escape character is '^]'.
GET / HTTP/1.1
Host: www.msxnet.org
Accept-Encoding: gzip
Connection: close

HTTP/1.1 200 OK
Date: Fri, 06 Sep 2002 16:02:36 GMT
Server: Apache/1.3.12 (Unix) PHP/3.0.16
Content-Location: index.html.gz
Vary: negotiate,accept-encoding
TCN: choice
Cache-Control: max-age=86400
Expires: Sat, 07 Sep 2002 16:02:36 GMT
Last-Modified: Sun, 01 Sep 2002 22:27:50 GMT
ETag: "f065-57b-3d729466;3d729467"
Accept-Ranges: bytes
Content-Length: 1403
Connection: close
Content-Type: text/html
Content-Encoding: gzip

<<< index.html.gz follows >>>


sean at mess dot org