Thursday, 20 November 2014

Azure CDN and Storage Not Suitable for Production

I've been spending some time trying to optimize loading of the pixelpin web site. There is all the usual nonsense with the "magic" MS scripts which appear automatically (although I found a handler to help this) but then I looked at using a CDN to make things better.

What is a CDN?

A Content Delivery Network really has two benefits over serving content directly from the web server. Firstly, it obviously reduces the number of requests to hit the web server, which is improves performance there, it also allows the browser to make more connections since the CDN will exist on another domain to the web server. Secondly, the CDN caches objects at multiple locations around the world and a call to a single URL uses geo-location information to retrieve it from the physically closest CDN server.

As an example, imagine that static content makes up 50% of your page weight and your web server is based in London, what happens when someone in Sydney, Australia, accesses your site? All of the content has to traverse the globe and at best, it will be slow. Now imagine you use a CDN, that has a location in Australia, only the dynamic content needs to come from London, the other 50% can arrive "locally" - all good.

Why I was considering it

Over a third of our page weight is in the form of 4 font files and these are obviously not going to be changed any time soon. It makes sense to move these to a CDN and I tried a single image as well for comparison. Since we have stuff hosted on Azure, it seemed to make sense to use Azure CDN (although it doesn't really matter and I might try Amazon for comparison).

How did I do it?

Azure allow you to create a CDN endpoint linked to either your storage account or to a folder under your site. I did the site one first but I can't remember now why this wouldn't work. I then created an endpoint that pointed to my storage account.

One thing that is not obvious is that there are two settings that are not exposed on creation and are not obviously present in the CDN page - but you can click on the endpoint to go into a settings page. The first of these is that https is available but disabled by default. I needed this because my page is https. The other option allows query strings and this allows you to cache bust items that you need to reload and, again, this is disabled by default.

Once this is done, you have to upload items to the public container you are using. Bear in mind that any public container in the storage account will be exposed, I created a single "asset" folder and made it public and uploaded all my web fonts and the single png image. Note that you can't upload (for some reason) in the portal so I used Azure Storage Explorer to upload the items. The version I used didn't correctly set the mimetype so I had to edit the blobs afterwards and set them correctly.

I waited a while and tried the CDN endpoint with the image and all seemed to be good.

What went wrong?

I changed my site to link to the fonts on the CDN and deployed it to Azure. It didn't work. The first thing I noticed was that Firefox had downloaded two types of fonts (woff and ttf) but had not applied either. I didn't twig at first but a quick question to the office and I learned that by default, Firefox, IE and others will not load fonts from a different domain by default - you have to specify the root domain in the font response to basically say, "I allow this font to be used on this domain" - this uses something called CORS and uses a response header "Access-Control-Allow-Origin", which specifies the allowed origin. OK, that sounded OK and it looks like Azure supports this so I ran some code in the project (it is not possible via the portal, which is pretty rubbish) and supposedly this would add the header to all responses to the blob storage.

It didn't work. The code ran but the response headers did not change - even after deleting and re-uploading the files. I read that the CDN strips out these headers but as it turns out, even accessing the item directly in blob storage still doesn't return the header.

OK, so that was fonts out the window but I noticed something else that wasn't working correctly. The caching.

The images seemed to cache correctly and refreshing from the CDN would generate a 304 (not modified), which was great. Trying this with fonts however never worked, the response was always 200 and the body returned. Since the fonts were the largest of the items I needed to cache, if it didn't work on that, it was another show-stopper.

What I noticed was the strangeness of the ETag. This tag is a unique key that identifies a resource and which the client can return when checking for another one. If the server knows that the item previously sent to the client and the current one have the same ETag, it can send back a 304. The problem is that ETag MUST be quoted according to the HTTP spec. The reason escapes me and some browsers don't care but others do. Why is this a problem? It is virtually impossible to tell whether it is quoted or not:

  1. In the case of Azure blob storage, the etag is added automatically.
  2. You cannot modify it
  3.  In Azure Explorer, the ETag is displayed in the list view in quotes but get the blob properties and it shows as unquoted (and can't be changed)
  4. Edit the blob properties in the Azure portal and it shows it as quoted (but can't be changed)
  5. Look at it in Firebug for Firefox and it shows quoted - but so do all the other response header values - not helpful!
  6. Look in Fiddler for the request and it is NOT quoted and the caching tab complains about it.
So despite it looking correct in the config (mostly), it is not returned with the quotes. Is that anything to do with why the caching isn't working? Not sure, but it doesn't help trying to debug stuff.

Conclusion - tldr

CDN - possibly stripping out CORS tags - breaks fonts
CORS tags don't seem to work anyway
Caching not working consistently
Not a production ready setup and I won't use it for that reason.
Post a Comment