Friday, 31 July 2015

mod_deflate in apache is breaking caching

I was planning a video of caching in Yii framework and had a quick look at one of my sites to see what happens by default.

I was surprised to see that although all the files had ETags and although they had no cache-control by default (which is what I expected), when the page was refreshed, all the items were re-requested and SOME of them got a 304 response, while others got 200.

The short answer is that mod_deflate breaks it (and there is a workaround below) but first, a bit more background if you don't understand.

Caching is good. Rather than send you the same files every time you visit my site, I make you keep copies of them in your browser. When you request the page, if resources are already present locally and haven't "expired", then you get the local copies and save BOTH the request/response to the web server and also the network bandwidth to send them from server to browser. This strength is also it's weakness. How long should I let my resources live for before they expire? Too short and I lose the advantage of caching. Too long and I might change something but the browser keeps using the old file, which hasn't been expired.

Because the answer to cache duration depends on the actual site, most web servers do very little or no caching by default. The 304, however is not a bad solution for uncached objects. When my browser visits the page, if it already has copies of the objects, it still goes back to the server and basically says either "give me this object if it has been modified since X" or otherwise "Give me this object if the etag I have doesn't match the etag of the latest version". The etag is more flexible since it allows you to change an object and retain its etag if it is functionally the same as the previous version and you could also replace a newer one with an older one and still get the same functionality. The ETag is just a quoted string and could be a hash or date or anything the server wants to do.

If the server looks at these requests and decides that the object hasn't changed, instead of sending the whole thing back, it just sends a 304 "Not modified" and the browser can use its cached version. This all works well but why is it not working consistently on Apache?


mod_deflate uses (usually) gzip to reduce the bandwidth required for the object being sent to the browser. What mod_deflate also does is to modify the etag by adding "-gzip" to the end of it. Why? I can't quite understand but I think it is to match the W3C spec for something but effectively it is trying to differentiate between the gzipped and the non-gzipped responses. I don't think this is correct (whatever the spec says) because the underlying object is the same so if the browser has a version that worked previously, all it is asking is whether the underlying object has changed - the transport is irrelevant.

So what is actually breaking is that the original object is, say, ETag 123456 and mod-deflate makes this 123456-gzip. The browser revisits the page and says, "Can I have this object if it doesn't match 123456-gzip" and the server looks and thinks, "The object being requested is 123456 so it DOESN'T match, so I'll send the correct one". The whole thing then repeats each time.

Some people suggest disabling ETags as a workaround but another suggested workaround is a bit more specific and involves rewriting the ETag in the "If-Not-Match" request header and removing the gzip bit. When the server then checks, the match will succeed and everything is happy!

The second line, I am unsure about. The original code, commented out in my example, seems to add "gzip" to every ETag that doesn't have it already. Commenting out the code and none of the ETags have gzip in them, even the ones that were gzipped. I think I prefer the second line commented out. I put this code into .htaccess in the web root and have to set AllowOverride to FileInfo in the Apache config. You also need to ensure that "sudo a2enmod headers" says that it is enabled or already enabled!

<IfModule mod_headers.c>
    RequestHeader  edit "If-None-Match" "^\"(.*)-gzip\"$" "\"$1\""
#   Header  edit "ETag" "^\"(.*)(?<!gzip)\"$" "\"$1-gzip\""

Post a Comment