msxnet.org > Making your website super fast-loading
The time it takes to download a webpage is of huge concern for usability, as the user gets distracted if it takes too long or he might even get irritated and leave your site.
The first thing to do is self-explanatory: use as little graphics as possible, don't use frames (as those are multiple pages), use png where graphics is needed as it compresses better than gif, use style-sheets (as this is downloaded only once, and you don't have all those <FONT> tags all over the place which just increase the html file size.
If the pages are marked cachable then the browser can keep a copy in its cache, or the object will stay in a cache for a while. This has advantages for both the server and the client: the server gets less requests, and the site is more responsive (ie. loads faster) for end users. So you use less bandwith for you server, which can also make it less expensive for you to run the server.
This results in an excellent performance hit. For a while I had a counter on my website (which was marked uncachable otherwise it wouldn't work), and by far most requests were for the counter; other pages were just in a cache.
Some people express worries about that you won't be able to known how many users have visited your site. This means you haven't thought it through very well. First of all, think of how accurate "page impressions" and "unique visitors" are. Not at all. If you can't understand that, why not just mark the images as cachable? You're not going to count that are you.
For more information, here is an excellent introduction.
As of HTTP/1.1, you can compress html pages which can result in a huge performance increase. Specially users on a low-bandwidth connection (think modems) can benefit from this.
Note that if you compress the pages, caching won't work for most pages. This will change over time but for now it's really one or the other as current caches don't implement content-negotiation properly. So, I recommend compressed pages only for dynamic content (as in, generated for every request).
If the browser sends the http request-header:
Accept-Encoding: gzip
Then the content can be compressed. Older browsers like Internet Explorer 3.x don't sent this header (as this is new to HTTP/1.1) and don't understand gzip'ed content.
So it is safest to assume that only if this header is set AND gzip is present, we can compress the response. This is done by gzipping the content and adding the header Content-Encoding: gzip to the http response header.
The possiblities are gzip, compress, deflate. compresses uses LZW, the others don't.
LZW compression and decompression are licensed under Unisys Corporation's 1984 U.S. Patent 4,558,302 and equivalent foreign patents. This kind of patent isn't legal in most coutries of the world (including the UK) except the USA. Patents in the UK can't describe algorithms or mathematical methods.
Because of this and the fact that gzip is simply better, use gzip.
Internet Explorer 4.0 for Windows and higher support gzip, and so does Netscape 4 on all platforms. All versions of Mozilla (and thus Netscape 6) support it, so does Opera. The only common browser that does not support it is Internet Explorer for the Macintosh; the last version I tested was 5.5.
Concluding, for most users it will work, but we have to acommodate for user-agents that don't. For this purpose there is content negotiation.
In order to cache negiotated content, the proxy (or cache) needs to support the Vary: header. The most common proxy, squid, has support for it in version 2.4, but this version is fairly recent and is not installed at every ISP.
Squid has more information on its Vary header support.
It is easy in php. As from php-4.0.4 (note that this version has several security issues) there is support for it. You need to compiled php with zlib support (pass --with-zlib to ./configure). When this is done, simply add the following at the top of the page:
ob_start ("ob_gzhandler");
That's it. Note that this must be executed before any output is generated. php compresses the page if the user-agent supports it.
Support for static pages is also available in Apache. You can either use mod_gzip (which is not part of Apache but available under the Apache license) or the built-in content negotiation.
This site uses content negotiation in Apache. As you may notice, there are no extension on the html files; for example, this page should have the URL http://www.msxnet.org/fast-website. This has two advantages:
All .html files are stored like this:
-rw-r--r-- 1 sean sean 2510 Feb 3 18:45 index.html -rw-r--r-- 1 sean sean 1311 Feb 3 18:59 index.html.gz -rw-r--r-- 1 sean sean 6572 Feb 4 12:31 fast-website.html -rw-r--r-- 1 sean sean 2956 Feb 3 18:59 fast-website.html.gz
And in the .htacess I have:
Options +MultiViews DirectoryIndex index
The same trick can be used for images; so you can serve both png and gif images for browsers that do not support png. However because of the unisys patent problems, it's better not to use gif at all, IMHO.
To enable caching, I have added the following lines. Note that mod_expires needs to be compiled in (or as a module, obviously).
ExpiresActive on ExpiresDefault A2592000 ExpiresByType text/html A86400 ExpiresByType text/plain A86400
Note that if you compress documents on the fly, this does mean a performance hit. Compressing is generally expensive in CPU time; on a Pentium II / 266MHz gzipping 40kb of html takes about 20ms cpu time. This time adds to the response time and also increases the load on the server.
sean@behemoth:~$ telnet www.msxnet.org 80 Trying 194.134.73.94... Connected to atlantis.8hz.com. Escape character is '^]'. GET / HTTP/1.1 Host: www.msxnet.org Accept-Encoding: gzip Connection: close HTTP/1.1 200 OK Date: Fri, 06 Sep 2002 16:02:36 GMT Server: Apache/1.3.12 (Unix) PHP/3.0.16 Content-Location: index.html.gz Vary: negotiate,accept-encoding TCN: choice Cache-Control: max-age=86400 Expires: Sat, 07 Sep 2002 16:02:36 GMT Last-Modified: Sun, 01 Sep 2002 22:27:50 GMT ETag: "f065-57b-3d729466;3d729467" Accept-Ranges: bytes Content-Length: 1403 Connection: close Content-Type: text/html Content-Encoding: gzip <<< index.html.gz follows >>>
sean at mess dot org