[Previous Page]
[Table of Contents]
[Next Page]
All caches have a set of rules that they use to determine when
to serve an object from the cache, if its available. Some of these
rules are set in the protocols (HTTP 1.0 and 1.1), and some are set
by the administrator of the cache (either the user of the browser
cache, or the proxy administrator).
Generally speaking, these are the most common rules that are
followed for a particular request (don't worry if you don't
understand the details, it will be explained below):
- If the object's headers tell the cache not to keep the object,
it won't. Also, if no validator is present, most caches will mark
the object as uncacheable.
- If the object is authenticated or secure, it won't be
cached.
- A cached object is considered fresh (that is, able to
be sent to a client without checking with the origin server) if:
- It has an expiry time or other age-controlling directive set,
and is still within the fresh period.
- If a browser cache has already seen the object, and has been
set to check once a session.
- If a proxy cache has seen the object recently, and it was
modified relatively long ago.
Fresh documents are served directly from the cache, without
checking with the origin server.
- If an object is stale, the origin server will be asked to
validate the object, or tell the cache whether the copy that
it has is still good.
Together, freshness and validation are the most important ways
that a cache works with content. A fresh object will be available
instantly from the cache, while a validated object will avoid sending
the entire object over again if it hasn't changed.
There are several tools that Web designers and Webmasters can
use to fine-tune how caches will treat their sites. It may require
getting your hands a little dirty with the server configuration,
but the results are worth it. For details on how to use these tools
with your server, see the Implementation
sections below.
HTML authors can put tags in a document's <HEAD> section
that describe its attributes. These Meta tags are often
used in the belief that they can mark a document as uncacheable, or
expire it at a certain time.
Meta tags are easy to use, but aren't very effective. That's
because they're usually only honored by browser caches (which
actually read the HTML), not proxy caches (which almost never read
the HTML in the document). While it may be tempting to slap a
Pragma: no-cache meta tag on a home page, it won't necessarily
cause it to be kept fresh, if it goes through a shared cache.
On the other hand, true HTTP headers give you a lot of
control over how both browser caches and proxies handle your
objects. They can't be seen in the HTML, and are usually
automatically generated by the Web server. However, you can control
them to some degree, depending on the server you use. In the
following sections, you'll see what HTTP headers are interesting,
and how to apply them to your site.
- If your site is hosted at an ISP or hosting farm and they don't
give you the ability to set arbitrary HTTP headers (like Expires
and Cache-Control), complain loudly; these are tools necessary for
doing your job.
HTTP headers are sent by the server before the HTML, and only
seen by the browser and any intermediate caches. Typical HTTP 1.1
response headers might look like this:
HTTP/1.1 200 OK
Date: Fri, 30 Oct 1998 13:19:41 GMT
Server: Apache/1.3.3 (Unix)
Cache-Control: max-age=3600, must-revalidate
Expires: Fri, 30 Oct 1998 14:19:41 GMT
Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT
ETag: "3e86-410-3596fbbc"
Content-Length: 1040
Content-Type: text/html
The HTML document would follow these headers, separated by a
blank line.
Many people believe that assigning a Pragma: no-cache HTTP
header to an object will make it uncacheable. This is not
necessarily true; the HTTP specification does not set any
guidelines for Pragma response headers; instead, Pragma request
headers (the headers that a browser sends to a server) are
discussed. Although a few caches may honor this header, the
majority won't, and it won't have any effect. Use the headers below
instead.
The Expires HTTP header is the basic means of controlling
caches; it tells all caches how long the object is fresh for; after
that time, caches will always check back with the origin server to
see if a document is changed. Expires headers are supported by
practically every client.
Most Web servers allow you to set Expires response headers in a
number of ways. Commonly, they will allow setting an absolute time
to expire, a time based on the last time that the client saw the
object (last access time), or a time based on the last
time the document changed on your server (last modification
time).
Expires headers are especially good for making static images
(like navigation bars and buttons) cacheable. Because they don't
change much, you can set extremely long expiry time on them, making
your site appear much more responsive to your users. They're also
useful for controlling caching of a page that is regularly changed.
For instance, if you update a news page once a day at 6am, you can
set the object to expire at that time, so caches will know when to
get a fresh copy, without users having to hit 'reload'.
The only value valid in an Expires header is a
HTTP date; anything else will most likely be interpreted as 'in the
past', so that the object is uncacheable. Also, remember that the
time in a HTTP date is Greenwich Mean Time (GMT), not local
time.
For example:
Expires: Fri, 30 Oct 1998 14:19:41 GMT
Although the Expires header is useful, it is still somewhat
limited; there are many situations where content is cacheable, but
the HTTP 1.0 protocol lacks methods of telling caches what it is,
or how to work with it.
HTTP 1.1 introduces a new class of headers, the
Cache-Control response headers, which allow Web publishers to
define how pages should be handled by caches. They include
directives to declare what should be cacheable, what may be stored
by caches, modifications of the expiration mechanism, and
revalidation and reload controls.
Interesting Cache-Control response headers include:
- max-age=[seconds] - specifies the maximum amount of
that an object will be considered fresh. Similar to Expires, this directive
allows more flexibility. [seconds] is the number of seconds from the time of
the request you wish the object to be fresh for.
- s-maxage=[seconds] - similar to max-age, except that it
only applies to proxy (shared) caches.
- public - marks the response as cacheable, even
if it would normally be uncacheable. For instance, if your pages
are authenticated, the public directive makes them cacheable.
- no-cache - forces caches (both proxy
and browser) to submit the request to the origin server for
validation before releasing a cached copy, every time. This is
useful for to assure that authentication is respected (in
combination with public), or to maintain rigid object freshness,
without sacrificing all of the benefits of caching.
- must-revalidate - tells caches that they must obey
any freshness information you give them about an object. The HTTP allows
caches to take liberties with the freshness of objects; by specifying this
header, you're telling the cache that you want it to strictly follow your
rules.
- proxy-revalidate - similar to must-revalidate,
except that it only applies to proxy caches.
For example:
Cache-Control: max-age=3600, must-revalidate
If you plan to use the Cache-Control headers, you should have a
look at the excellent documentation in the HTTP 1.1 draft; see
References and Further Information.
In How Web Caches Work, we said that
validation is used by servers and caches to communicate when an
object has changed. By using it, caches avoid having to download
the entire object when they already have a copy locally, but
they're not sure if it's still fresh.
Validators are very important; if one isn't
present, and there isn't any freshness information (Expires or Cache-Control)
available, most caches will not store an object at all.
The most common validator is the time that the document last
changed, the Last-Modified time. When a cache has an
object stored that includes a Last-Modified header, it can use it
to ask the server if the object has changed since the last time it
was seen, with an If-Modified-Since request.
HTTP 1.1 introduced a new kind of validator called the ETag.
Etags are unique identifiers that are generated by the server and
changed every time the object does. Because the server controls how
the ETag is generated, caches can be surer that if the ETag matches
when they make a If-None-Match request, the object really is the
same.
Almost all caches use Last-Modified times in determining if an
object is fresh; as more HTTP/1.1 caches come online, Etag headers
will also be used.
Most modern Web servers will generate both ETag and
Last-Modified validators for static content automatically; you
won't have to do anything. However, they don't know enough about
dynamic content (like CGI, ASP or database sites) to generate them;
see Writing Cache-Aware Scripts.
[Previous Page]
[Table of Contents]
[Next Page]