Friday, 19 December 2014

Planning the deployment for your new scalable site

Lots of us don't get the chance to rewrite a site from scratch and a lot of the time, that is the correct decision. Most changes are relatively small and the risk of rewrites, especially of large sites are large but if you do get the chance, like I do, since we are majorly redesigning the UX then it is worth spending some time thinking about the structure and eventual deployment locations of your site and its resources.

Scaleable?

If your site is never likely to scale beyond a few hundred users, you might not worry too much about performance or architecture but some performance you can get for free and some decisions make it easier to maintain. It is worth correctly factoring these in code so that when you create your new site, you can easily find them and copy them into the new site. For instance, the default version of Response.Redirect in .Net does not terminate the current request or wipe the response buffer, it sends the whole normal page, but it includes the redirect header as well. This is a waste of bandwidth and in most cases is undesirable so I have a version of Redirect in a utility class that wipes the response and only sends the 301 redirect header.

My site is potentially a multi-million user site (although we are an authentication provider so they won't stay logged into our site) so I want to make some good decisions out of the box. I want to minimise load on the main web servers where possible and although I want to minimise database work, I won't be changing those layers in this rewrite, that is already sorted.

The Trade Offs

One of the problems you quickly learn is that there are trade-offs. For instance, using Less CSS makes for a much more easy and flexible way of updating styles and colours on your site but this either introduces the complexity of deploying the changes or otherwise, if done dynamically, would add potential disk, memory and/or CPU load to the web servers. If you have a web server farm, you might also have issues with caching mechanisms not being shared across instances.

Another example is offloading static resources to Content Delivery Networks. This is generally good since CDNs tend to have multiple locations, often closer to the client but this adds some deployment complexity, some cache issues and also the danger of having multiple content domains which require more connections from the client and one of these can stop your page from loading.

Where to Start

I recommend starting from an empty web project. It is tempting to start with a template and that might be alright although, only if you know what the template includes. It is really not worth including anything you don't use unless you think you might in the future and it is easier to include it now.

So in my case, I have used Visual Studio and created an "Empty Web Project". This gives you no content, a single web.config and the only references are to system dlls which are present on all servers that have the .Net framework installed.

Source Control

Time for a cup of tea! Just kidding. One thing that would be useful to setup right away, even with this basic project, is source control. I have just had to rebuild a laptop because some rubbish Azure tools mucked up my Powershell setup (or I mucked it up while trying to fix it) and it seems Windows can't repair this and the web installer thinks everything is already installed. The same thing can happen on a site. You start making a change thinking it's all great and then run into trouble. Can you unpick your changes to go back? You can with source control: Git, Subversion, Perforce, TFS, whatever. Do it now and get into two habits. Firstly, regular check ins/tags so you can easily work out what was changed for what reason and secondly, isolated short-duration tasks, of which you only do one at a time and then check them in. I have made the mistake of perhaps working on 6 tasks at the same time, if one fails test, it can be a nightmare to undo just that one. All deployments should be from a tag - ideally via another machine - so that bugs can be reproduced locally with source that you KNOW matches what is deployed. Try and avoid file-copy deployments unless they came via source control.

HTML and Server-side Scripting

I don't think there is much that can be said specifically about the basic HTML since it is partly site-specific and partly language-specific. I already mentioned my Response.Redirect utility for .Net but that is unlikely to be useful for another other languages, which will potentially have their own quirks.

Making HTML readable is definitely useful for maintenance so indenting and all the rest is important but that is not really a setup issue. Also, if you go too crazy, you can introduce a large amount of whitespace, which although it should be reduce significantly by your web server with GZip is an extreme to be avoided.

CSS (and Less)

Whether you are likely to need Less for CSS will depend partly on whether you are using a framework (like Bootstrap) that is based on less and partly on whether the UX of your site is really important and might need tweaking. If it's a largely functional site, you might not care too much and will just use all the default styles and colours on your user interface and live with it. In that case, you could manually compile the less into CSS (or download it ready compiled) and then just treat it as static css.

Static CSS

Static CSS, like other static resources, incurs a trade-off. On the one hand, you don't want too many separate resources, so you could bundle a whole load of CSS in one file, which only requires one download but then the problem occurs if you make one change to one of those files, the whole thing becomes invalid and needs downloading again. Using a CDN (See later) does not remove this issue of to bundle or not to bundle. Ideally, you should make a list of all the CSS you will be using (at least the stuff you have easy control over, some .Net stuff gets injected for you and is quite hard to modify!). Next to these lists, write the size of the file and whether it will be changed during the immediate life of your site. We aren't talking about upgrading jQuery from 9 to 10 but day-to-day changes. You can generally bundle all the stuff that doesn't change into one file and then depending on the size of the files that do change, make one or more separate bundles. A large file should probably be its own download so only its own changes require it to be downloaded again.

Minification is useful but makes minimal difference with CSS since it is largely whitespace that is removed. Trying to remove unused rules can be hard since some rules might only be used on a few pages on the whole site so trying to work that out can be hard. With frameworks like bootstrap, again, there is so much secret magic, it is hard to know what can be removed or not (although using their selective download feature can help if you don't use, for instance, modals, you don't have to download certain sections).

Less CSS

As already mentioned, Less is a very useful way to create CSS based on rules and on common parameters (as well as other features). This means that, for instance, if you want to change a bootstrap button colour, rather than finding every single reference to it in CSS (including the related colours that aren't an exact match!), you can simply change one parameter, regenerate and it all works.

Once your less becomes CSS, it is treated like other CSS except in one scenario, which is when you are generating it on-the-fly. This allows changes to be made in real-time but can add load onto the web server, which might or might not be able to make effective use of caching to avoid compiling it on every request. This on-the-fly process usually takes a plugin which will handle things for you but personally, I prefer to pre-compile the CSS and then use the normal CSS bundling mechanism to deploy it.

Scripts

Scripts are similar to CSS files in some ways, they are static and some are liable to change but generally speaking, they are not created on the fly and they benefit much more from minification since not only whitespace can be removed but variable names can be changed from their human readable values to single letters. A good bundling plugin will handle bundling and minification for scripts and css files.

Caching and Cache Busting

Caching should be considered early on for all static resources (most of the basic pages are not cacheable since they are dynamic). In general, you should set static resources to have cache expirations of 1 year where possible and cache busting used to force re-downloads. An important tip here is to TEST the caching. There are a few gotchas and it doesn't always do what you think. For instance, as well as the cache headers, your server should support the 304 Not Modified response when a client is asking "has this item been modified since", otherwise it might as well just re-request it and incur the entire download.

One of the issues in HTTP 1x is that although caching can save download time for subsequent page visits, since by definition, the client would not normally re-request the item, how can you update it when it has changed? The only way is to trick the browser into thinking it needs to get a different item. You could obviously change the name of the script/css/image resource but that is clunky. Most often, you can add a querystring to the end with a value that changes when the content changes. Again, most bundling plugins will do this for you but otherwise, a simple way is to tag the md5 hash of the file as its querystring so that any files changes are reflected in the url. This allows long cache times (no greater than 1 year, apparently things get weird in some browsers with longer durations).

Using a CDN

I've already mentioned CDNs. They are effectively locations to serve your content from but there are a few advantages and a few potential disadvantages.

The main intention of a CDN is to replicate content across multiple potentially worldwide locations so that when a user requests a resource, the DNS works out the closest location and gets the resource from there. Naturally, this works best with content that is not changing otherwise you will have different copies of resources until the locations all replicate the change.

If you have a single private CDN location, this works well because you are in control of what is happening with it and should know whether you have a service level agreement with its uptime. If you are using a public CDN for something like jQuery or Bootstrap, it saves you some money but there is always a danger that one of those CDNs could go down and you might not have any control over getting it back up. In reality, are these public CDNs likely to go down any more than your own CDN? I don't know.

If you use CDNs, try to not use too many different domains. Imagine you bring in 10 third-party frameworks from 10 different CDN domains, your browser will need to make 10 connections, possible using SSL to these 10 domains, this could well add noticeable and unacceptable slowness to your site. One or two separate ones are probably OK but otherwise, consider copying all the resources to your own CDN and have them all coming from the same place. The costs are not usually massive and if you are scaling at this level, you hopefully have some income to pay for it!

Disadvantages include the additional cost of the CDN over the web server you are already paying for and some of the risks associated with third-party CDN domains but other disadvantages only really occur when you are pulling in lots of resources from different CDN domains with the already mentioned SSL burden and lack of control of the domain, as well as the fact that any one of these domains going down could break your site.

Clearly, there is also a process implication with using a CDN since objects have to be uploaded to CDN and replicated before they are used.

Conclusion

I will update this post (if I can find it!) once I have my basic shell ready. I already know what frameworks I need so I should be able to plan what to serve from the web server, what from CDN and what from third-party CDNs. I should also have a bundling/minification plugin ready to go and my less folders set up to generate my CSS either on build or possibly manually.
Post a Comment