Friday, 22 January 2016

Designing for a distributed scalable web application

Scalable Applications

At PixelPin, we produce an authentication mechanism that uses pictures instead of passwords. It is fairly lightweight since each login will only require a couple of page hits, sometimes only one although because of the potential for massive uptake across the globe, it has been designed from the outset to be scalable.

However, scalable is fine in a single data centre, as our income increases, we can afford more web servers in the cloud, as well as higher-spec web servers and a higher database service tier that could provide all of the performance we ever need for the UK and Ireland, it might even work for Western Europe which is not very far electronically from the Microsoft data centre in Ireland. What it won't do is work globally, although it might work acceptably for many people, it will never be great with the latency incurred when criss-crossing the globe.

Designing for scale therefore involves a number of decisions, compromises, costs and complexities that at some point you will need to decide on - hopefully before your site becomes really popular. No-one wants to be rushed into making a major redesign to cope with a sudden increase in the number of users.

The first stage involves reasonably low cost and simple measures and will allow a basic scalability of your web application to a reasonable level. Of course, each method described will depend on the exact balance of your application's CPU, network and database loading.

Stage 1

Stage 1 involves the basic understanding of how an application works in a scalable data centre. What this means, simply is that your application needs to work across more than one web server which means a couple of things if you want to stay sane and have a working application.

Firstly, you want the servers to be copies of each other, not each with their own configuration. This is because you want to scale upwards automatically by cloning web servers, not by manually configuring each one.

Secondly, you cannot store session data in-memory on the web server because even if you want to send your users to the same server each time (sticky sessions), these cannot usually be guaranteed and if a web server falls over, which they do, you want another to start up automatically and not drop a load of sessions in the process. This requires either a shared-session mechanism or otherwise designing the application to be stateless, which is possible but usually slightly more complicated to implement. Shared session is easy on Azure because they provide a mechanism that works in .Net invisibly and shares session across N instances (3 in our case), which provides resilience and sharing. You can share using file systems or the database but these are less optimal for performance reasons since the database is often a bottleneck and there are timing/consistency issues if you write anything asynchronously. You might be surprised to find out that there is no real open-source option for shared session, even things like memcache are not designed for session and has several failure modes that make it less than ideal for the job!

You should always make other basic optimisations, or at least consider them. Things like correct caching of assets (including cache busting), Content Delivery Networks, bundling and minification and just good discipline when creating pages to reduce the overall overhead for the network.

At the database level, you should make the database as dumb as possible and move any processing into a software layer that can be scaled up, the database being the hardest thing to scale. It is possible, with correct design, to defer scaling the database as long as possible or even not at all if your application is not database heavy.

Good use of memory caching can also speed things up and reduce CPU, disk access and network as well. Various people have said that hardware is cheap compared to people so upgrading a database server to use 2TB of RAM might avoid having to employ a database team to change the architecture! You should test and monitor cache usage however since you can easily use up memory this way and not necessary get many cache hits for the effort. On the other hand, you might be better moving the caching into a few areas that make a big difference or that are used most often.

Stage 2

So at stage 1, you can scale up in a single data centre and get reasonable performance across a reasonably large geographic area but if you need a global presence you have a couple of options. This is stage 2 time and will involve more cost and complexity since you will probably have to add some techniques that you haven't needed until now.

The first question is whether you can have a geo-presence by simply duplicating the systems across the continents and simply using DNS to send traffic to the "closest" data centre. Clearly, this involves no more work than before but has a severe limitation that you cannot share data directly between the systems, particularly user data for people who travel the globe. You can implement workarounds like having a single login location and then forwarding the user to their "home data centre" but this means that travellers experience poor performance sometimes just to make your life easier! Maybe that is acceptable if it is an edge case but if not, you will need some ability to connect the separate systems together!

There are a few basic guidelines about making geo-located systems work correctly and with the least amount of effort.

  1. All synchronous reading and writing should occur locally to the data centre because latency is not only poor globally but more importantly, it is volatile and might seem OK one day and not the next. Anything that needs to be sent to another data centre needs to be done asynchronously.
  2. Database replication is a hard problem. Design around it wherever possible and prefer to have a single master (authority) and multiple read-only replicas.
  3. Remove or reduce the need to write to the master database by separation of concerns, either into multiple master databases or preferably to local databases that do not require replication.
  4. If you are designing for this scale, you will need to consider how your system will fail and include facilities such as automatic failover. You can use buffers to help you when a destination system has gone down but these won't work forever so you need to consider an efficient disaster recovery process. Would you even know if your application went down? Have you ever tested it?
So at PixelPin, our architecture is currently a single database with failover in a single data centre. What we are going to do as we design for the future involves a couple of techniques, although some of the work we built in from the beginning makes this much easier.
  1. We will plan to have a single master database located in Ireland and several read-only replicas, probably using Azure geo-replication so we don't need to manage it. We can only have 4 slaves so this is slightly limiting but will buy us time for now.
  2. Any temporary data for sessions will no longer live in the database but will instead live in a local database, probably using Azure documentdb.
  3. Any logging/audit type data will no longer live in the database but will be passed via a local message queue to a system that will run on an in-house server that will pull this data back to base into an audit database and will never need to go into the main database.
A local queue is essential in this case because the message sending needs to avoid the latency of the global internet and therefore the call to the queue needs to be local. This means that the receiver of these messages will need to pull them from multiple queues but this way, the receiver is the system that experiences the delays and latencies and not the user.

Our existing CDN will be enabled for more global locations to help feed our static assets and the only question left is how we handle image downloads which could either always come from the origin server (since they are cached anyway) or they could be effectively replicated to other sites on-demand for situations where a user has travelled to another part of the world.

One step at a time!
Post a Comment