Friday, 20 July 2018

Prevent hotlinking of CDN items to harvesting sites with Azure


CDNs are great

They provide a relatively cheap way of hosting static content, usually have all the caching configurable globally and individually, ensuring that maximum use is made of client caching and 304 returns; they provide multiple sites (or caches) globally, often called a Point of Presence (PoP) and which makes requests much faster globally than attempting to serve them from an origin server.

Reducing Application Burden

They also remove work from the application server than mostly doesn't care about serving static stuff when it would rather utilise CPU and disk for important logical things! This should allow maximum resource and minimum complexity for your application, which will be running on higher-cost and possibly load-balanced infrastructure.

This is both an advantage and a disadvantage.

Removing Application State from resources

The problem with moving resources to a static, generally public, endpoint takes away the concept of state and largely any protection you have against people downloading your content. This is mostly OK from a pure privacy point-of-view since your data is accessible to anyone who uses your application anyway, you can't hide anything ultimately but there are two issues that would be easiest to control via the application logic and which are much harder when you use a CDN.

Large Content Costs

One tricky problem is large content. Imagine you have some help videos hosted as part of your application. The design is for a single user to perhaps watch it once and that's it. Not too expensive and not too much bandwidth and can easily be contained inside the application logic to possibly restrict too many downloads per user etc. What happens when this is moved to a CDN? Anyone can download the video lots of times and that could add up to a lot of dollars, especially for a small company where $500 a month is a lot. Why would someone do that? Either because you have put a valuable resource on a public server and people simply like it and download it without you getting any value in the process, like you might if you had ads on your application but also, a bad person might simply want to make you pay loads of money by downloading your big content.

CDNs are not really designed to have application-style logic to prevent this kind of abuse. Any complexity or rate limiting would make the CDN slower and more costly to run and also, in most cases, you would not be able to identify what is real traffic and what isn't. You cannot easily get around this risk without having a specialist video hosting platform that can handle the abuse without you needing to do anything but you CAN do something on the Premium Azure CDN endpoint using token authentication.

I have not set this up personally but the idea is simple. Content that should only be given to specific users is obtained as a URL with a token from the web application. The token is then used by the CDN to prove that the client has permission to obtain the item on the basis that it has something that can only be signed by the web application. So what? What if someone copies it with the token? The token can embed certain properties that are used in conjunction with the token to ensure the request is legitimate, such as the IP address of the valid user at that point in time (or country, url, host etc. see here)

This solves this issue very easily but there is some setup required and you need to use the rules engine to apply the requirement to whichever paths/resources you want to.

Hotlinking

This can be a really annoying issue for anyone who generates valuable content such as images, forum posts etc. Since images are often a single URL, it is hard to stop someone who realises that your site has loads of cool images from simply embedding your images into its pages. People end up on their site because of your images and they get all the revenue as a result.

This is frighteningly common and very hard to prevent because the web is kind of designed to be public and if someone can see your image on your site, a simple link is all that it takes to steal. Even if you know someone is doing it, taking action in any meaningful way is hard, even if the perpetrator is in your own country.

So what can we do? We can use the Azure premium CDN once again to set up rules. You get to these by clicking the Manage button in the Azure portal which takes you to a very simple looking interface at cdn.windowsazure.com inside which you need to select either the Application Delivery Network if you are fronting an application or the HTTP Large menu if you are using normal CDN. Each of these has a Rules Engine option inside it.

Setting up a rule is described here but is fairly self-explnatory if a little clunky looking!

If you give your rule a useful name and then choose the option to base the rule on from the list which defaults to Always (confusingly) which means there is no logic, just do it to all requests. If you selected e.g. Device instead, the options for that come up for you to edit and fortunately, most are documented with little info icons. In this selection, you could choose referrer if your content should only appear in your site and not anyone elses. Note, this wouldn't stop someone from creating a local site on their PC with the same domain as yours and downloading everything but they would not be able to hotlink from a site with a different URL and to physically have to find, download and host the content somewhere else is harder, more expensive and exposes the attacker to a much more blatant crime than "I thought I could just link to the content because it was public".

If you are not sure what referrer origins to add, go to the Advanced HTTP Reports under the Analytics menu, choose HTTP Large Platform and click By Referrer in the left-hand-side, this will list all referrers. You should include the ones you know are yours, which hopefully is fairly easy!

As always, do a step at a time and test, test, test.

Wednesday, 4 July 2018

Another dead SSD, lost files and lost hours...

So recently, my home desktop Windows 10 PC started refusing to boot. It got to the BIOS splash screen, twiddled for a while and then errored. Apparently, it couldn't "automatically repair" the system so it then offers "advanced options".

Long story short, Windows handling of hardware errors really sucks! They have made efforts to have an automatic Recovery Environment, which is great but the navigation and understanding of this is still hard, even for a programmer like me and the information on the web for IT solutions is even worse than it is for programming questions - millions of "helpful" articles from adware sites, old sites, official sites, unofficial sites and people touting their magic tool.

For me, the failure of a boot might mean two things: hardware failure or corrupted system files. What I expected:

  1. Check Hardware option
  2. Check System Files option
What do you get instead?
  1. Reset system. Literally reinstall windows and lose all of your apps in the process! A very inconvenient option and it is the first option you are offered. All the others are under advanced options.
  2. Use a device. Not sure what I am supposed to do with this. I can boot into another device using the boot menu, is this the same thing?
  3. Shutdown. Thanks
  4. Go back to a restore point. OK, this sounds promising but, again, if the problem is a simpe file corruption, I should not be doing something so big. Anyway, despite the fact that restore points are supposed to be automatic, I had none listed from the 6 months that windows has been installed and updated.
  5. A command prompt. Useful for advanced people but what do I do with that? Drive letters are all over the place, a virtual disk contains the contents of the recovery environment?
I finally found some help online and ran chkdsk, with no problems reported. This was a shame because it was the disk that was at fault I later discovered. I also ran the system file checker sfc /scannow and again it reported no problems. Why aren't these just buttons in the recovery environment?

I then found these instructions about running the windows instrumentation console and checking the disks, again both reported OK. I suspect SSDs would only report a problem if the controller was knackered rather than the more likely cause of bad nand memory.

I tried the Windows 10 memory test tool several times but it always hanged at test 4 27%, even after 10 hours! Again, what use is a tool that hangs?

I found another tool, whose name I can't remember, that I ran and it reported various combinations of "no errors", "access is denied" and "no windows installations found". Again, not particularly useful for 100% of the population and should be contained into a utility that can be run graphically and a result that might say something like, "problems were found, possible causes include hardware failure or windows corruption".

Of course, unless you are someone who completely understands the bios, loader and OS, you are relying on unreliable advice online when Microsoft absolutely should have the go-to guide to run the various tools in a certain order to understand what is happening. I was also surprised how many were recommending GUI tools that only run in Windows to fix startup issues!

I decided I only had one option. To copy everything using xcopy from the main disk to the second disk and then reinstall Windows on the main disk and copy stuff back. What I didn't realise until later is that if xcopy encounters an error, it stops copying so I didn't copy my documents at all!

I then ran the windows 10 installer on the main disk and got some random error code - again, why can't the installer simply say: This shouldn't happen, you have a hardware problem? I tried again with the USB installer in another port and this time it just hanged at a certain point - timeout anyone?

This is when I assumed it must be a bad SSD (not a bad assumption considering my experience with Kingston drives!) and had to work out how to move the data off the second drive so I could install windows onto that instead. The second drive had mostly temp stuff and downloads so it was fine to lose that and this is when I realised I did not have anything useful copied from the old disk anyway so wiped it and reinstalled Windows 10. Of course, I then had to wait about 2 hours for all the updates to run and get it back to the latest version of Windows 10. I also need to work out how to recover anything that I have now lost. I have deployed versions of two web apps, which are mostly up-to-date on other servers so thankfully I can get those back. I have lost my VirtualBox VM which had the test databases on, so hopefully I can get those from the live versions too.

Top Tips so you will not write a blog post like this!

  1. As soon as you get a new PC or install Windows, make sure it is all up to date and take a System Restore point.
  2. Every time you install something new of any size, create a new restore point.
  3. Consider spreading things across disks within your PC. The Windows disk is both likely to fail first and also to fail most obviously. The more you have on other disks, the less you lose if you lose the Windows disk. If you have a problem with the other disk, at least you can load Windows and deal with the problems on the other disks - in most cases, you can copy most things off at the first sign of a problem on the other disk.
  4. Backup to a separate physical device - ideally using a decent tool like Acronis TrueImage but even Windows 10 backup is better than nothing. Make sure you include your documents. Most other stuff can be restored onto a clean Windows installation even if it is a pain to install Pinnacle Studio from scratch again! Network attached disks can be fairly cheap and easy to use and fast to backup to (I use a Western Digital MyCloud EX2, but there are plenty of others).
  5. Using Cloud Storage for backups or main files is slightly risky if they are not encrypted because you expose your personal data into a public space. You also risk deleting things on another device that will then delete on your main device.
  6. If you are writing code, make use of off-site repositories like github or bitbucket and get in the habit of frequent commits so you lose the least amount of data. If you can code your database schemas, these can also help you quickly restore databases if you lose your whole system.
  7. If you can, test the restoration of data from a backup at least once. A backup isn't a backup if you've never tested it. I lost an entire backup once for the simple reason I had used a strong password and had forgotten it and never tested it! If I had, I would have realised and simply created a new backup.
  8. Keep a simple list of any non-free software that you have installed, especially nowadays that you might not have physical media and keys. How can you get the installer and keys again? What happens if you bought an upgrade key for an older version? Can you install the new version without having to install the old one?
  9. Don't use Kingston SSDs!

Tuesday, 3 July 2018

Why is software hard?

How long have you got?

Software sounds easy but it's hard. Basic surgery sounds easy but our experience and the amount of training that Doctors have to receive has convinced us that it cannot be that easy. For software, we are starting to leave the earliest days - in the same way that medicine used to be practiced by butchers, quacks and loosely experienced people (chemists, physicists and mentalists!).

Software is hard because it is not a solution, it is a framework to produce a solution!

The first two reasons that software is hard are very simple. We do not understand the problem well enough and we do not know the optimal solution to the problem.

Understanding the problem

For example, we might be "building an application to help us track orders", which sounds simple - about as simple as saying, "I want to build a house in the Victorian style"! Great but not enough detail to even think about starting. When you say, "order" what is an order? How many are processed per day worse case? How many people use this system? How important is this to your business in terms of acceptable downtime etc. The people who should be able to answer these (generally, the Customer) might not be savvy enough to articulate or think about things in a broad enough way to give really distilled valuable information to write good requirements. They might disagree with you just because they are the Customer. They might say something is OK but later realise it is not OK. So many problems and so many challenges, even at this stage.

Requirements Capture is crucial but often rushed or not done at all, leading to problems later but also, many Customers are loathed to pay for this part of the work even though it is no different in intent than having an architect draw plans for your house. Many Customers will expect an up-front price, which is just as hard in software as it is in house building. I can give you a ballpark price but until you pay me to draw out what you want, I can't give you anything more specific than +200%/-50%

Evaluating the solution

The second problem is, of course, related to the first. What is the optimum solution? Perhaps, more generally, what is an optimal solution. It doesn't have to be the best if it is good enough. Clearly, if you don't understand the problem, you cannot determine a good solution but even if you do understand the problem, how can you decide what is a good solution? It's like somebody buying a car, a Ferrari and a Mini are both cars and can both get from one place to another but their strengths and weaknesses are very different and we cannot always clearly understand whether the customer is after cost, flexibility, scale etc. They often don't know themselves or even, "I want it to be fast, scalable, flexible and very cheap"! We would laugh a house-builder out of the room if they asked for that but yet we are not good at educating customers about how things work.

Choosing the framework

Software frameworks are numerous, dissimilar and also have their pros and cons. Some cost money but are arguably better, others are open source and liable to the whim of the core team when they get updated. Some support certain patterns well, others support different architectures, platforms and extensibility. Some are designed to make a narrow range of functionality very easily, others are more designed to do anything, as long as you code it.

A framework will often be chosen due to the experience of the development team and this makes sense in the same way a hospital will use procedures that make sense for the experiences of their staff even if there is something that is slightly better out there. There are almost no cases where something is so much more amazing than any of the other top 20 frameworks that you should choose it over what you already know.

But even here you are torn between choosing something that is largely aimed at a specific market like Wordpress or Moodle when you are worried that the customer will then expect much more functionality that simply cannot be added on later. A bit like buying a small concrete house with 2 bedrooms and then deciding you want it majorly remodelled and the garden extended later - not easy and very expensive. On the other hand, if you choose something more generic like Rails, Drupal or .Net, you are in danger of writing a lot more scaffolding up-front to enable changes later to be made more easily. It should be something that is communicated to the Customer but again, we aren't always on the same page here and the cost and time difference, at this stage, is rarely obvious enough to make the decision easy.

Security

Security is not easy. It is another multi-faceted concept and combines a number of controls across different layers of the development stack from hardware up to the application but also includes management, process and risk assessment, something that Customers might expect to be "Fort Knox level" but without paying for it or allowing the time it takes to execute and test the controls.

Also, security, like other parts of software development are devils for the "unknown unknowns". If I know what a pen-test is but can't do it, I can decide whether to pay someone else to do it or to decide it isn't worth it. What about if I have never heard of padding oracle attacks or timing problems in SSL? Then I won't be asking the right questions to add the relevant controls. Of course, I can (and possibly should) pay a Specialist to do these jobs but it can be hard to know who knows what they are doing. There are plenty of Builders who build things that fall down, how do I know this security specialist is genuine?

Metrics and Logging

Building a system that you can monitor is essential. What are you supposed to do if the customer complains something is not working or something has been hacked if you don't have good logging in place to see these things? Many people get hacked and don't even know but logging well is hard. Logging can generate huge amounts of data so how do we set a baseline to tell good from bad? One event in a million might be important but how can we see that? How do we know if it is a glitch or an attack? If it was an attack, could we do anything about it anyway?

Like security, monitoring is for different reasons and will be driven by different teams. Operations, security, development and marketing will all want to know certain things about the system and the work involved can be significant. Not just logging but visualising the data that comes out and being alerted to unusual activity.

Performance and Measurement

Performance is hard. How quickly should something run when the system is not under load? How much should it slow down when load increases? How many users can it support? Are there elements of the system that are unnecessarily slow or is it all a bit slow? How do we do things well during development so we don't need to try to rework things later? How do we avoid assumptions about performance? How do we measure performance regression? Do we?

Performance can become a problem quickly if you are fortunate enough to become popular but the last thing you need is for a million people to visit your site and your web server fall over under the load. You should always design a scalable system unless there is a reason not to (like the number of users is a known/limited small number).

When your architecture is more than just an app and a database, measuring performance is one thing but what do you do with that information? Databases might be slower than Redis but can we utilise redis to reduce database use? If not, what else could we do? Can we just throw more hardware at it? If we used all the hardware our cloud provider had, would it scale to the whole world or would the bottleneck move? At what point will this happen? How hard will it be to deal with it now?

Abstraction

Abstraction is hard. There are plenty of OO experts who will tell you that you can abstract everything but do you even need to do that? It is definitely easier sometimes just to have an if/else instead of abstracting to another object and method but how can I measure the balance of abstraction being extendable and it just making everything harder to read and understand!

There is no answer to this question, you won't even agree between developers!

Translation

Lots of apps only come in one language but moving to multiple ones is hard. How does your framework do translation and more importantly, how easy is it to change translations? What if you have a marketing person who wants to tweak everything all the time? Do the translations need re-doing? What happens if you have a very high volume of text on your site? Can you automate the translation more effectively?

The Solution

OK, so Software is hard. There are other elements to software that are also hard but what is the solution if this is just a reality?

There are different measures you can take to reduce the burden of these steps and at least leave you with a minimal piece of hard work instead of everything being hard!

Please, please, please have a process, even if it starts as 10 bullet points on a checklist. At each stage you find you are experiencing the same problem, consider how your checklist can be expanded on, new questions added, new checklists for developer work to ensure they "thought about translations" or "considered input validation". You will find loads online but do something that is relevant for you and that will be enforced at your organisation.

Set a culture of working towards zero bugs instead of an acceptance that "stuff just has bugs in it". Any time a bug is seen, ask a very simple pair of questions, "why did that bug get injected?" and "what could we do next time to avoid it?" Most of the time there is a simple answer. Things are too repetitive or fragile or complex or "I just forgot". Most of these can and should be addressed. All bugs are not worth the same amount. Just because something is "just one bug" doesn't mean you shouldn't spend 2 weeks fixing something or writing something because it will fix not only the bug you found but also the ones you didn't and the others that would have come later because of your design.

Automate things! Back in the day, there was very little automation and therefore development was very limited and very slow! Now, we have all kinds of automation tools and loads of them are free so there is no excuse not to automate CI builds, deployments and other kinds of checks. You should ideally have a "click and go" performance test environment as well.

Have somebody who is really good at abstract thinking to own the interface with your customers. They should ask the same questions as developers: "why is this confusion there?" "why is there a difference between the expected cost and the actual cost?" "What is a good way to document the requirements so everyone understands?"

Use services and solutions that already exist. If you are considering translations, for example, spend a day on the web looking for what is for sale, and what other people use, you will mostly find the answers out there. If you need to buy something, insist on a free trial or money-back guarantee if you find that the product isn't a good fit for some reason.

In a nutshell, if you consider your software team as a car that needs constant care, updates and tweaks then you should be able to work most of this out and try and "make sure it doesn't happen again", then maybe each project won't be so hard and you can come in on-time and to-budget and make your customers repeat customers. Maybe!

CryptographicException: Specified padding mode is not valid for this algorithm

Do you ever get this when decrypting AES data?

Me too! Let us understand what this means and why it happens.

tldr: Bad padding on the encrypted data or an incorrect key

The Advanced Encryption Standard was a competition that was won by Vincent Rijmen and Joan Daemen with their Rijndael cipher. AES is basically a set of 3 of the possible combinations of parameters available in Rijndael but although you will see AES in lots of places, some libraries will instead refer to Rijndael and you have to be a bit careful because it is easy to confuse block sizes and key sizes.

Block Size

Because of the way the maths of a Block Cipher work securely, all data to be encrypted has to be a multiple of the block size. For Rijndael, the block size is 128 bits regardless of the key size! i.e. 8 bytes.

What happens if your data isn't a multiple of 8 bytes? The encryption cipher would error unless you enable automatic padding (you could do it manually, but you don't need to). There are different ways to do it but the basic idea is to do it in a way that the decryption process can work out what is padding and what isn't since the padding is just numbers like the plain 'text'!

In .Net, they use PKCS7 padding, which involves adding bytes with a value equal to the number of bytes you are padding, e.g.

01
02 02
03 03 03
etc.

The important thing is that padding is carried out before encryption takes place.

If the padding is somehow corrupted/removed etc. then you will see the Cryptographic Exception, as expected when you attempt decryption. You might also see it if you are attempting to decrypt data that was encrypted using a different block size that doesn't match the block size of AES - since AES has the same block size between families, this would only happen with data encrypted with a different cipher and I assume is uncommon.

Decryption

So what happens when you decrypt? Assuming you use the correct key, the decrypted data will end with 1 x 1 or 2 x 2 or 3 x 3 etc. bytes which will tell the decryptor the amount of padding that was added which can be stripped before the plain text is returned.

What happens if you use an incorrect key? It depends!

If you have no padding or the (wrongly) decrypted data happens to have a valid padding by chance, you will simply get garbage plain text. In any situation where the padding is invalid (for example, the entire plain text with the assumed padding implies it is not a multiple of 128 bits) then you will get the Exception stating that the padding mode is invalid!

You should always have a known working utility to hand to test any problem cipher text to quickly determine whether the data you are decrypting has been corrupted/changed or whether the problem is with the code you are using to decrypt it or even the more likely scenario of using the wrong key to decrypt it.

Friday, 29 June 2018

Filebeat from App Services to Logstash, Elastic and Kibana - Update

Filebeat is working on Azure

So I am trying to fiddle around with Kibana and understand what I can visualise and what I can't now that I have Filebeat running on my 8 production instances of App Services, initially just sending IIS logs back to base.

I am happy that the ELK part works, although I do find Kibana hard to understand. It has a lot of pages and it is not obvious how to setup something as simple as two graphs, one showing all requests and one showing non-200 ones without adding a filter button which sometimes stays and sometimes disappears when revisiting the page.

Anyway there is a bigger annoyance and it relates to the high number of non-200 requests the sites are receiving. The two main guilty parties are AlwaysOn and/or the Traffic Manager on Azure.

Always On

Always On is an app service option that is designed to stop IIS going to sleep, which stops some poor users from waiting 20 seconds for a site to start up again! The way it works, however, is woefully basic and is causing a problem.

All it does, is ping / on port 80 every 30 seconds, I think, which in a simple case is fine but in my case it is not! Firstly, my sites are https only so it doesn't work. Secondly, some of my sites are behind Traffic Manager, so the ping will hit the TM and then get routed wherever, which would return 200 except it might not hit any of the instances that are actually asleep and thirdly, it will not work with apps, like my WCF service, that don't have any resource at path /.

So what? Can't I just disable Always On and write my own? Nope. If I disable it, then the Filebeat Web Job will stop since MS in their wisdom have decided that you cannot run "continuous" web jobs if the site is not always on. If you want to run a job on all instances, you have to use "continuous", so I can't even try the Triggered version.

Workarounds

One option is to simply filter out the log noise. I could do this at the source or the destination end but, of course, it would be nice to see errors that might actually happen and not accidentally hide them in a poorly written filter. This might be my best bet for now!

I could modify the Apps to have a port 80 response at / even if that doesn't work for the App's main purpose just to make the Always On work and it could simply return an empty document or something. I might be able to do some clever stuff so that the http->https mechanism would still work for most clients and only respond for certain clients and only for the root port. I could probably Middleware before the HSTS module to do this. Fortunately, we are about to retire one of these apps and replace the other so I can just make sure this is built into the released system.

Thirdly, I have reported the bug to Azure, so who knows? they might be able to resolve it soon.

Monday, 25 June 2018

You don't necessarily want "async all the way up"!

Introduction

.Net introduced async methods in version 4.5 of the framework and it provided some really easy-to-use functionality to get asynchronous methods without the pain of managing threads and semaphores etc.

Unfortunately, it can be poorly understood because people think of it too much like easy "multi-threading" and overuse it, or they don't use it because they think it is not thread-safe. Neither of these are correct. If understood properly, it can reduce latency on your app. Used incorrectly, you will make your app even slower and just move a bottleneck to somewhere else.

What is Asynchronous?

Asynchronous execution means that the thread you are calling something from is not blocked while waiting for the operation to take place. In general terms, it would involve spinning up another thread for the async operation to use and keeping your calling thread free to continue. There are only two ways this will make your application better! Firstly, if the wait is external to your application e.g. a file or network call, the thread can be made to wait on an external event (usually from the kernel) and so is not taking up resources while it is waiting. Secondly, if you have a UI based model, like Windows applications (and to some extend web applications in .Net), you allow the UI to remain responsive while something else is happening in the background, since only the UI thread is allowed to update the UI and if it is blocked waiting for something else, the application appears to hang.

It should be obvious that if you start another thread but then park the calling thread waiting for the second thread, you are getting no benefit overall.

History

Back in early versions of .Net 1.1, asynchronous support was added in the form of an Async Programming Model (APM), which involved calling a BeginXX method and passing the callback method that will be executed when the operation has finished. The callback will be passed the result.

This was only really a thin layer over the basic threading libraries and did not lead to readable code because you ended up with lots of callback methods (callback hell!) that then had to somehow keep the main operation moving as callbacks were fired, especially since they could come back in different orders.

In .Net 2, this was improved and took the form of a more event driven approach called EAP. Where you subscribed to various events and then called a method to start the async operation. The events would signal back to the caller that something had finished/happened/errored. Although it was slightly cleaner and perhaps more generic, it still didn't really solve the problem of orchestrating the events and dealing with the sequencing of async tasks.

It should also be noted that in both of these scenarios, writing your own async code was also not the neatest thing in the world, although there were some helper classes you could extend to get some functionality. One of the worst ways of writing code is to have more supporting code than main code and these were both a little guilty of that approach.

Async in .Net 4.5

This brings us to the latest and great model: the Task-based Async Pattern (TAP) which finally makes async a first-class citizen and massively simplifies the task of both calling and implementing async code.

The basic style includes two new built-in keywords: await and async and one new type called System.Threading.Task. A method might look like this:

public async Task CallAPIAsync(ApiRequest request) {...}

Firstly, a few basics. If the method is marked async, it MUST return a Task or a Task<> if there is a return type. You do NOT have to mark the method async to return a Task, more on that later. Also, ending the method name with the word Async is not required but has been a convention, although that might change if everything starts becoming async.

If you want to call this method, you have a few options but none of these are specific to the CallAPIAsync method being "async" (the caller doesn't care) but all of them because it returns a Task<>.

CallAPIAsync(req);     // Will not block but not recommended since Task is lost
var myTask = CallAPIAsync(req1);    // Will not block
var returnValue = await CallAPIAsync(req2);    // Will 'block'
var returnValue2 = await myTask;     // Will 'block'

The bit that people struggle with is what actually happens when you call an async method and why not all Task methods are marked as async...

Calling an external async method

We will build up our example from the ground and assume we are writing a method that is calling an API, written by someone else, and assume that all the methods are async and even if they weren't, we don't want to wait for the operation to complete synchronously.

public async Task MyApiMethod(Request request)
{
    return = await ApiLibraryCallAsync(request);
}

First things first. We have four options here:

Use await

If we want to use await, we are basically saying that it is long-running but I can't do anything until the response comes back. What happens when we call await? Under the covers, Windows will allocate a thread from the general pool for the application and send that away to call the API, the calling thread will be released (NOT BLOCKED!) and told to check back later. The calling thread cannot continue with its current request since it is awaiting, but it can go and pick up another request, button press etc.

In a desktop app, you do not usually need massive parallelism but you want the UI to be able to update while it is waiting for the response, including possibly handling another button etc. while still waiting. In web applications, however, this freeing of the calling thread could massively increase the throughput of the application, especially if 1) the latency on the external calls are significant and 2) there are plenty of other less slow requests that could be handled in the meantime. On the other hand, depending on the application, this model could queue up lots of external API calls, although you might simply move the bottleneck and not actually help overall.

Return the task

If the API call does nothing after it calls the external library, it doesn't have to assume that the call needs to block and could instead directly return the Task to the caller and allow the caller (or its caller etc) to await if it needs to:

public Task MyApiMethod(Request request)
{
    return ApiLibraryCallAsync(request);
}

This means that we should not mark the method async. If we do, and don't use await anywhere inside the method, Visual Studio will underline the method and tell us it will run synchronously. The method would still return a Task<> however and the calling code might do something like this:

var task = MyApiMethod(request);
DoSomethingElse();
var response = await task;

If we did this, the calling method would have to be async (because it uses await) but the point here is that MyApiMethod is not making assumptions about blocking. It could await if it had to do something else after the response came back.

Use Task.Run()

There is an overhead to using async (which shouldn't be a surprise). Imagine you update your API one day and all of the methods have changed from Sync to Async, you would have to modify loads of methods, some of them will become async, some will return Tasks, and you might end up with "async all the way up" and rename loads of methods to end with "Async". As well as the code overhead, every method marked async has a hidden but necessarily mechanism for every time it is called. The system creates a new Task object, starts a new thread, attaches the thread and signals to the task and kicks off the new thread. This might be milliseconds but if used heavily will add noticeable time to the overall system.

Now, this might be unavoidable but you need to consider at what level the awaiting needs to take place because you can use Task.Run() to allow a synchronous method to behave like it is asynchronous. If you have a call stack of, say, 5 levels from the initial request and each of those are async and each of them await the method they are calling, you create 5 lots of tasks, 5 threads, multiple context switches before you have even reached any benefit.

What if the code at the bottom doesn't always call a long-running method? You would still normally have to make the entire stack async so you basically make every request slower to improve the response of 1 request every, say, 100 or 1000. So here are some scenarios where you should not use async methods but instead consider using Task.Run():

  1. You only provisionally call the long-running method or perhaps only every >1000 times. Perhaps it is only called after cache goes stale or if the app is restarted.
  2. You can never logically continue with the operation until the long-running operation has completed i.e. there is no reason to return the task to the caller. For example, if an encryption method must obtain a key from an external service, the encryption method cannot logically continue until the key is returned. Does it really make sense to signal to the caller that it can continue and get the result later? Imagine what the calling code would look like just to encrypt or decrypt something.
  3. You have no major desire to maximise the number of threads that can handle requests while your code is awaiting something. As long as the desktop UI thread can continue to run, you probably get little benefit from the extra complexity of async (although the performance penalty would also be lower due to less request.)
If you think these might be true - and you can always test your code to measure the performance difference, you should call the async method like this instead:

var result = Task.Run(() => MyApiMethod(request).Result;

This does a hidden await and returns the result but most importantly, because await is hidden, the method this code lives in does NOT have to be async. Note that this is a synchronous block however, the thread in the method calling this will block in the normal way and go to sleep until MyApiMethod has finished. You can call Wait() instead of Result if the method only returns a Task.

Conclusion

I realise I have only touched the surface of this subject but you should consider using async as a pattern when:
  1. The UI thread needs to be prevented from blocking by any "long-running" method e.g. over 0.5/1 second - at which point it would be noticeable/unacceptable.
  2. You want to maximise the request threads without them being blocked by long-running processes, such as in a web server that makes slow back-end calls to something but only when some of the other requests can be made while waiting as opposed to simply creating a longer queue of long-running requests!
  3. You want to get some easy-to-use parallelism where you might e.g. start a long-running process and be able to do other tasks before waiting for the result.
You should not necessarily use the async pattern when:
  1. Your "long-running" calls are not particularly slow, perhaps anything less than 200mS, at which point you might move the performance hit elsewhere. Most database calls and all memory-cache/redis type calls generally should NOT be async.
  2. You only use a low-level long running method occasionally, and it does not block the UI thread, use Task.Run() instead and take the hit. Consider another background update pattern if necessary to avoid hitting the occasional user.
  3. Your only use of calling async methods is at the lowest level and nothing else happens in parallel, in which case, Task.Run() will usually provide you the functionality without the trouble of putting async all the way up.
Test, test, test! If I have learned one thing about programming, many things don't work as you assume they will.

I have surely missed many things, so please let me know on your comments if I need to change anything!

Wednesday, 20 June 2018

Dotnet core 2.x mocking HttpContext etc.

Unit Test or Integration Test?

Unit testing and integration testing are two very black-and-white concepts - on paper! A Unit Test for an MVC action should call the code as directly as possible, injecting any mocks into either the controller constructor and/or the action - easy right?

Not really! What if you access request, response etc? These should all be injected into the controller right? That would make sense and make it much easier to test writing of headers, reading of request parameters etc.

No, you need an integration test right? Integration testing means testing the application with things joined together: databases wired up, services in place and these can be automated too. Except that dotnet core doesn't (obviously) provide a way to inject any mocks. For integration testing, you shouldn't use mocks, which brings me back to the original problem.

When testing special actions like uploading files, multi-part forms etc. I need to access the context and in one case, the response since I am writing a file to the response directly. I also need to mock certain other services because I do not want to wire everything up, just to check that the things I think I am setting are actually being set!

How do we mock HttpContext etc. in our dotnet core unit tests?

Things to know first:

  • HttpContext is quite complicated!
  • Not all properties have setters, some have to be set indirectly
  • ControllerBase does not use the injected IHttpContextAccessor for its Context property
  • They use this weird Features mechanism to attach data
  • Classes like DefaultHttpResponse have a reference to their parent object (the context), which creates a slight chicken-and-egg problem.
  • In my example, I use DI to get my controller instance but you could instead create one directly and pass the service mocks into the constructor yourself.
So here's what I did, using Moq for mocks and in this case, just providing a concrete response object so that my action could set response headers and query a client ip address without falling over. You could easily extend this for request objects etc:


httpContext = new Mock();
httpContext.Setup(ct => ct.Connection.RemoteIpAddress )
           .Returns(new System.Net.IPAddress(0x2414188f));

contextAccessor = new Mock();
contextAccessor.Setup(ca => ca.HttpContext)
               .Returns(httpContext.Object);

var features = new FeatureCollection();
features.Set(new HttpResponseFeature());
httpContext.Setup(ct => ct.Features)
           .Returns(features);

var response = new DefaultHttpResponse(httpContext.Object);
httpContext.Setup(ct => ct.Response)
           .Returns(response);

var controller = ActivatorUtilities.CreateInstance(services);
controller.ControllerContext = new ControllerContext();
controller.ControllerContext.HttpContext = contextAccessor.Object.HttpContext;

var result = await controller.GetImage(new GetImageModel { accesstoken = VALID_ACCESS_TOKEN, imagename = VALID_IMAGE_NAME}) as FileStreamResult;