Thursday, 10 December 2015

securityheaders.io, Public-Key-Pins, IIS and Azure

Testing Web Server Headers

securityheaders.io is a cool site to test the hardening of your web server. I suspect that many sites would score fairly low and although you could debate the relative value of each measure, why not aim for the top and make an attacker's life that much harder.

Well, our test site has just scored A and I hope to increase that to A+ once I have sorted out the Content Security Policy using my cool ASP.Net CSP builder!

Most of the headers are pretty easy. If you only use https for instance, you might as well set Strict-Transport-Security, which is cached for a period of time and tells the browser only to allow access to this site via https. This is cached in the browser and can be un-cached but helps an unwitting user from suffering some kind of downgrade by a MITM talking to our site with https and the victim's browser with http. Anyway, it's easy to add, I do it in Application_BeginRequest in my application.

Most of the others are pretty easy to and can be applied to most of our sites to prevent things like X-Framing and some amount of XSS protection although those that need to be applied to the entire server, rather than just the app, need to be added either manually to IIS via the interface or in Azure, using a startup script and either command-line or PowerShell.

I have these lines in my startup.cmd:

%windir%\system32\inetsrv\APPCMD.EXE set config -section:system.webServer/httpProtocol /+"customHeaders.[name='X-Frame-Options',value='DENY']"
%windir%\system32\inetsrv\APPCMD.EXE set config -section:system.webServer/httpProtocol /-"customHeaders.[name='X-Powered-By']"

Adding the header for denying framing and removing the one that tells the client what version of ASP.Net I am using!

IMPORTANT NOTE: You MUST save your startup.cmd as ASCII, otherwise the unicode header will confuse Windows command and appear as a weird character, which will break the first line of functionality and possibly have knock-on effects depending on its content. Click save-as and on the save button, choose "save with encoding" and I used US-ASCII as the encoding.

Public-Key-Pins

Public-Key-Pins is slightly more complex to carry out because it requires taking signatures of our SSL certificates and giving them to the browser. In a similar way to Strict Transport Security, the browser will remember these signatures for the amount of time we specify and will require that the signature of the SSL certificate match one of those specified in the header. The article here explains the process of signature gathering better than me but I want to then explain how you can do this on Azure using the startup task.

Backup Signatures

It is VERY important that you understand the purpose of back-up signatures. Imagine that you have told the browser to remember your pins for, say, 6 months. 2 months later, your site is hacked and an attacker has obtained the private key to your SSL certificate and now has the ability to decrypt communications between your customers and you. You decide to revoke the SSL certificate and you now need to obtain another one but you cannot use the same private key, because it was compromised, you instead generate a new one, get a new SSL certificate and deploy it. The browser does not know this and will insist that the new certificate matches one of its pinned signatures, which it won't UNLESS you create, say, 2 new CSRs for certs, add their signatures to your list, so that if the cert is compromised, you use one of these CSRs to get your new cert and its signature will already be valid. The linked article describes this, just make sure not to bypass it otherwise you will have irate customers asking how to clear the pin cache on their browser to access your site!

IIS and Azure

It's pretty easy in IIS to add the header that you have creating for your certificate signatures, you just open IIS Manager and choose "Reponse Headers" and add a new one with the correct key and value but in the case of Azure, you will want to automate this and apply it to the entire server, not just the app, so it applies to any resources that are not obtained from the application. This takes us back to our startup task.

Hopefully, you already have one of these to run various things whenever the role is started or restarted (because of this, make sure you don't always reboot after anything, otherwise you will cycle forever). I use something called startup.cmd, which is set to always copy to output in Visual Studio. I then have these lines in my ServiceDefinition.csdef:

<Startup priority="-2">
   <Task commandLine="startup.cmd" executionContext="elevated" taskType="simple" />
</Startup>

This lives under the WebRole section and tells the system to run the task with elevated permissions and simple just means that it runs as you would expect, rather than a foreground or background task.

You can then put whatever you need into it, including calling other scripts if required. I do the following in mine:

  1. Add X-Frame-Options DENY
  2. Remove X-Powered-By
  3. Add my Public-Key-Pins
  4. Add the web IP security IIS feature so I can...
  5. Enable IP security section to be able to dynamically blacklist IPs
  6. Enable dynamic IP security to provide rate-limiting
  7. Replace the default Location header (containing internal IPs) with a named URL
  8. Setup the best-practice for SSL cipher order using a Powershell script
  9. Install a stripheaders module to remove default Server header from IIS
The code for adding the public-key-pins is standard AppCmd stuff but NOTE that the double-quotes in the pin values need to be tripled " => """ so that they escape properly on the command line. This is probably not required in Powershell but I don't know.

%windir%\system32\inetsrv\APPCMD.EXE set config -section:system.webServer/httpProtocol /+"customHeaders.[name='Public-Key-Pins',value='pin-sha256="""wCJKYZwKJ8LcMVKrPHv+0cBua/ndTZ3aUeogjN6S0xI=""";max-age=2592000']"

I have removed two of the pins to make it simpler to read but you should get the idea! The max-age above is 1 month in seconds, which I will use for now.

The link recommends starting with a short max-age so that any problems can be fixed quickly and easily, you can also append "-Report-Only" to the header so that the browser will log any problems but not block you, this is ideal for testing that you have the correct signatures and format for the header. Another reminder to use the backup CSR process so that you can easily reset your SSL cert if a problem arises.

Wednesday, 2 December 2015

Web App Security - Are you doing it wrong?

Another day, another breach, another few million people are at best probably going to end up on a SPAM email list, at worst, they are going to have other accounts attacked with the leaked credentials. One day, we can hope that governments start appropriately punishing people for negligence in the web application security arena but don't hold your breath.

So are YOU doing it wrong?

If you are not using a modern popular framework as the foundation for your site, you are PROBABLY doing it wrong. There are hundreds of ways in which a site can be compromised, some subtle and some more obvious. Many of these vulnerabilities are squashed in popular modern frameworks or are at least harder to accidentally introduce. Pay attention to session, authentication and authorisation and make sure you know how the validation and encoding controls work.

If you don't know whether your site is susceptible to SQL (and other) injection attacks, you are DOING IT WRONG. If you do not have a fundamental way of protection from injection attacks such as stored procedures or parameterized queries YOU ARE DOING IT WRONG. Validation can help you but it is not a strong enough defence because it is easy to forget validation is a single instance, which can open up your whole site. Remember, it doesn't matter if you have 200 strong locked doors on your house, it only takes 1 open one to undermine it.

If you are not validating all input from the user YOU ARE DOING IT WRONG. User input is always untrusted. Not only can attackers bypass any client-side validation or assumptions that you make about correct use, they can do a whole load of things you probably haven't heard of. Your own staff and trusted friends can also accidentally do something incorrectly and that can have bad consequences.

If you have to write things like "Sign in securely" on your buttons, YOU ARE DOING IT WRONG. Teach your users about real markers of security, don't try and convince them you are secure just because you say so. It is meaningless and more importantly, it doesn't teach real security. If you must boast, use commercial badges such as Site Seal or "This site was code reviewed by XYZ Inc.".

If your logout button doesn't immediately log people out YOU ARE DOING IT WRONG. Log out means log out, people are used to hitting it and walking away. Don't ask if people are sure or leave them on a "just before we log you out...." page.

If you allow people to use weak passwords, YOU ARE DOING IT WRONG. You might assume that they are only letting themselves down but that is both untrue and arrogant. If their password is hacked from your site, an attacker could gain access to your system, he could attack other systems and he can potentially get much quicker information about password hashes if he knows you allow weak passwords.

If you store passwords in a reversible format or plain text and if you ever email people passwords YOU ARE DOING IT WRONG. Sure, sending someone a password seems useful but email is insecure and if the system can reverse passwords, there is no way to know if an attacker has accessed that functionality and gained people's passwords that are almost certainly used elsewhere. Give people a reset mechanism involving their email and/or security questions - the stuff you would check on the phone.

If you do not know the openness of your networks both internally and externally, YOU ARE DOING IT WRONG. An attacker might attack via the web application but might also send malware into your company. Do you know that the malware would still not be able to access sensitive data, even if it was running with the permissions of a valid corporate user? Think of it like someone sitting at their PC in your company what can they access? Would you even know? Could you block it easily if it was suspicious? Can you detect large data transfers?

If you store credit card numbers or cvv numbers, YOU ARE DOING IT WRONG. There are approved ways to do whatever you want, please don't think it is OK that you can just side-step those, make up your own rules and assume that your network security is good enough. We have enough proof now that it probably isn't.

If you have to search for security answers on Google, YOU ARE DOING IT WRONG. Most of the web is very hard to verify for correctness or maybe it was best-practice in 1995 but not any more. Get proper training, proper expertise and for goodness sake take it seriously. You would be frightened how many people still use MD5 for password hashes, this has been frowned on for over 10 years. Be wary of programming books also, which are often woefully poor on security techniques.

If you are not familiar with the work of OWASP, YOU ARE DOING IT WRONG. Owasp pool all of the best information from the industry, please don't think that you don't need that. If you read it all and realise you know it all, great, but you will probably be surprised at certain attack vectors that you would never have imagined. Free code-review checklists, free testing utilities - what's not to like?

If your management are not on-board with security, YOU ARE DOING IT WRONG. I suggest you find another job because you've already lost!

If you don't have a go-to person for security related questions, YOU ARE DOING IT WRONG. We can't all be experts but someone needs to be - perhaps you have a contractor or consultant to do this, as long as it's someone. Pooling mutual ignorance is the downfall of many a company.

If you don't perform code reviews and penetration testing of your sites, YOU ARE DOING IT WRONG. Yes, they cost a couple of thousand dollars but they are an awful lot cheaper than damage limitation and probably cost less than one Developer's monthly salary.

If you realise that security is a big and complicated topic and you have the humility to seek out expert help, if you continue to learn over time, if you pay attention to causes of breaches in the news, if you get trained in web application security controls and if you follow convention instead of inventing your own "good ideas" then YOU ARE PROBABLY DOING IT RIGHT!

Friday, 20 November 2015

RDP not working over VPN

I have just had the joy of setting up a VPN, something which requires a masters degree! Once I got to the point of amazement that it worked, I then wanted to try Remote Desktop over VPN since that was the reason for it in the first place. These Remote Desktops work fine when plugged into the LAN.

I could RDP to one server but not another when connected via the VPN. This was useful, because it meant that RDP was generally being allowed (the VPN didn't also have a firewall blocking stuff) so it narrowed it down to the specific server I couldn't contact.

I looked at the rule in Windows Firewall with Advanced Features and spotted two things that might affect it. One is blocking edge traversal which is a security control that is supposed to block unintended access for remote people who have tunnelled into the network and have a local IP address. I set that to "Enable" since I thought that might be the problem. The second thing was I had locked down access to remote ip addresses from the "Local Subnet" under scope. Although this was kind of correct, the VPN sends out IP addresses in a different range to clients (I couldn't get it to work with the same range of IPs as the internal network) so these were being blocked!

All I did was add another ip address rule for my other range of IP addresses and hey presto, it works!

Monday, 16 November 2015

The fallacy of technology

Back in 1863, a load of men with spades, horses and rope dug up the road between Paddington Station and Farringdon in London and built the first underground metro railway line in the world. There were no diggers, no laser-levels, no computer simulations and no trains to take all the dirt away. Fast forward to the present day and Crossrail is nearing completion in London covering much of the same alignment with additional branches to Docklands and Liverpool Street in a very large and complex east-west train line.

So what? Well the per-mileage cost of Crossrail is twice as much as the original line dug with spades in adjusted rates! Twice as much despite nearly 200 years of supposed progress in technology. Sure, it's not quite the same thing and Crossrail has more safety hoops to jump through but how does that happen? If we burned all the technology, could we have built Crossrail for half as much money?

I read a similarly depressing article about modernizing the signalling of the New York subway system. Sums of money like $200 million were being talked about for systems that were incomplete, only semi-functional and which didn't cover the network. $200 million? I'm pretty sure a company of 50 people could have manually installed all the equipment for a 10th of that money. We're not talking some futuristic technology, just some transponders and software to coordinate. This on top of the reality that train systems are already controlled worldwide by various systems that have already been developed. New York aren't doing anything differently.

I wonder whether the reality is that Technology always promises so much but in reality it doesn't make most things any more efficient. An email system is just a way of wasting 20 minutes writing down what could have been said in 30 seconds on the phone (remember those?).

Take a step back though. Why is it like that? Because we're people and we like to pretend that we know what is going on. Try telling a customer that it would be much more cost effective to change their business processes to match the way some off-the-shelf software already works and they'll tell you to clear off. They would rather change functionality in software or write something completely bespoke with all the costs - both up front and maintenance - even though the chances are it will need to change in 5 years anyway, at which point we do it all again.

I worked for a company that created a mortgage system for a large bank and it was pretty complex. Lots of external services to talk to, lots of users to support. It took a few years and still cost less than £10 million. How does New York and a thousand other organisations spend such heinous amounts of money - very often getting very little in return (cancelled or curtailed projects, technology that is out of date as soon as it's released).

The most important question though is, "what on earth keeps going wrong?". I read about Obama Care and how the original system cost something like $50B and failed miserably. They grabbed a lot of clever people who sorted it out and ended up with a system costing $2B that worked. Where did that $48 B go?

Are we lacking tools? Expertise? Management? Experience? Maybe we are still applying 200 year old project management principles to software where people think it is OK to change the requirements half-way through and maybe we need a completely new approach?

Or maybe, technology benefits are mostly a fallacy that provide additional possibilities but without necessarily improving what we already have?

Tuesday, 29 September 2015

Holder.js doesn't work (Oh yes it does)

Developer error, that's what it was.

Holder.js is a great utility for generating dynamic images of a given size (and colour etc.) that you can use for placeholders when building a web site, if you don't have the real images yet. It allows you to write that carousel code taking up the full size it should as well as fake thumbnails, shopping item images and lots more.

You install a file called holder.js (surprise, surprise) and then you simply create images with the source (or data-src) attribute that looks something like:



The numbers are for the size and in this example, random=yes means that each time you run it, you get a different "theme" colour for the image - spice it up a bit.

You can see all the options here: https://github.com/imsky/holder

The thing is: the "holder.js" in the data-src attribute is NOT a path to the javascript file (that's what I thought it was and why it didn't work) it is purely a label that holder.js will find so it knows which images to generate and which to leave alone. Obviously the selector uses "starts with holder.js", which is why it doesn't work with a full path to it.

You have been warned!

Thursday, 20 August 2015

Interesting Problems - Jury Service

I was just reading an old friend lamenting jury service on Facebook since, apparently, you only get a few weeks notice of performing the service and clearly this can be very hard to accomodate with other family or work commitments. You can defer service but then your card is marked, so to speak, and you will be called again within twelve months and will have to give them your free dates for the next year to avoid missing again. It's a bit like sticking on 14 or risking a bust by defering.

Surely, he suggested, they could do much better than this and provide longer notice for people to arrange cover. What about single-parent families with young children? What about people involved with important projects at work just as they are called to jury service?

Unfortunately not. There is no simple solution, as I will explain below, but that is not to say there couldn't be a better way. With so many unknowns, however, it can never be a perfect solution, only a best-fit solution that is likely to benefit perhaps most people but not everyone.

Prediction

Clearly, the way to solve this problem is to have a reliable system of prediction. If we know far enough in advance that we have a Crown Court trial with jury, surely we can reserve and inform enough jury members in advance to cover those cases. Well in theory yes, however, when do you know that a trial by jury is actually occurring? Certain indictable offences will always go to Crown Court and apart from certain exceptions, will always be heard by Jury but they will go via a Magistrates Court who will decide that the offence is indictable. Other offences that can be tried either way will start in a Magistrates Court and the Court will either decide that the offence is serious enough to warrant a Crown Court case (where sentences can be tougher) but the defendant also has the right to request a jury trial in Crown Court - although the government are trying to restrict this expensive practice where there is minimal legal requirement.

So do we know now? No. The first time that we know whether a jury is required is when the defendant first appears at their Crown Court hearing and enters a plea. If they plead guilty, no jury is required, only sentencing. If they plead not guilty then a jury will need to be ordered to fit the timetable for the case. Even in this case, there is some flexibility since depending on the nature of the case, the defence or prosecution might need several months to prepare. In other cases, they might be ready within a few weeks.

Either way, we cannot predict consistently more than a few weeks in advance as to which jury members we need.

Exclusions

I get the feeling that the current system of Juries worked a lot better in a different time where there were fewer single parents (and those that were almost certainly weren't on a voting register), where you had one breadwinner in each house i.e. someone else to hold the fort while you were away and, dare I say, less jury trials and more summary judgments. This quite rightly begs the question as to whether the list of people who are excluded from jury service, currently: People in the justice system, people in the armed forces, mentally ill people, not an English speaker as well as some random historical ones like Lighthouse Keepers and certain guilds!

Perhaps single parents should be exempted? Perhaps people who earn less than a certain amount be exempted because they cannot afford to pay for alternative arrangements? Perhaps the self-employed should be exempted?

The problem with this approach, as with many similar governmental areas is that it is both very expensive to keep the list, hard to keep it up to date, hard to enforce and therefore is very easy to game the system and reduce the number of potential jurers massively. There are thousands of people who do business as sole-traders and do jobs for cash, paying other employees but not paying all of their tax, NI, insurance and other employee costs - there is no reason to think the same wouldn't apply to jury service lists.

Averaging or Maximising

Another two ways we can approach this issue is to predict on averages. If the Court needs on average, say, 2400 jurers per month, then we could book in that many people. This is less than ideal because if the Court is busy that month, some people still get a last minute request. If the Court is quieter then spare jurers can be sent home - more on that next.

To avoid ever having people on short notice, a similar method to averaging is instead to take the maximum number of jurers that the Court could ever need. If it has, say, 4 Court rooms and could handle 2 cases per week each, then 96 jurers could be booked and then sent home if not required. Clearly, this solves the issue of notice because you would always have as many or more than you need present at the Court room but being sent home is not always as practical as it sounds.

Imagine you are self-employed and have turned down a job because of jury service, only to be cancelled or sent home? I'm guessing they wouldn't pay you for the week's work you've missed - you would be miffed. At least if you did the jury service it would have meaning. Likewise, maybe you had to pay a contractor to cover you at work during a busy period and then get told that you are not needed but are still £5000 worse-off because of the cover. Although short notice is not ideal, long notice with cancellation is equally not ideal.

Another issue with maximising is the numbers of jurers involved. Sure, a Court might need 96 jurers per week, but in reality, it is probably closer to 12. Since people are picked at random, choosing 96 people per week from the area might mean that you are called several times over the course of a few years - each time being sent home. Compare that to only calling as many as you need - it is short notice but you know you will be used (excepting a last-minute guilty plea!) and most people I have spoken to have either never been called or have been called once.

I don't think the system is perfect but it seems to be probably best-fit for most people.

However

These are the kinds of problems that can make somebody rich because if you can see a better solution, using AI, predictions, psycology, flexibility or whatever then invent a system and suggest it to the HM Court Service. If it works, they will love you forever!

Friday, 31 July 2015

mod_deflate in apache is breaking caching

I was planning a video of caching in Yii framework and had a quick look at one of my sites to see what happens by default.

I was surprised to see that although all the files had ETags and although they had no cache-control by default (which is what I expected), when the page was refreshed, all the items were re-requested and SOME of them got a 304 response, while others got 200.

The short answer is that mod_deflate breaks it (and there is a workaround below) but first, a bit more background if you don't understand.

Caching is good. Rather than send you the same files every time you visit my site, I make you keep copies of them in your browser. When you request the page, if resources are already present locally and haven't "expired", then you get the local copies and save BOTH the request/response to the web server and also the network bandwidth to send them from server to browser. This strength is also it's weakness. How long should I let my resources live for before they expire? Too short and I lose the advantage of caching. Too long and I might change something but the browser keeps using the old file, which hasn't been expired.

Because the answer to cache duration depends on the actual site, most web servers do very little or no caching by default. The 304, however is not a bad solution for uncached objects. When my browser visits the page, if it already has copies of the objects, it still goes back to the server and basically says either "give me this object if it has been modified since X" or otherwise "Give me this object if the etag I have doesn't match the etag of the latest version". The etag is more flexible since it allows you to change an object and retain its etag if it is functionally the same as the previous version and you could also replace a newer one with an older one and still get the same functionality. The ETag is just a quoted string and could be a hash or date or anything the server wants to do.

If the server looks at these requests and decides that the object hasn't changed, instead of sending the whole thing back, it just sends a 304 "Not modified" and the browser can use its cached version. This all works well but why is it not working consistently on Apache?

mod_deflate!

mod_deflate uses (usually) gzip to reduce the bandwidth required for the object being sent to the browser. What mod_deflate also does is to modify the etag by adding "-gzip" to the end of it. Why? I can't quite understand but I think it is to match the W3C spec for something but effectively it is trying to differentiate between the gzipped and the non-gzipped responses. I don't think this is correct (whatever the spec says) because the underlying object is the same so if the browser has a version that worked previously, all it is asking is whether the underlying object has changed - the transport is irrelevant.

So what is actually breaking is that the original object is, say, ETag 123456 and mod-deflate makes this 123456-gzip. The browser revisits the page and says, "Can I have this object if it doesn't match 123456-gzip" and the server looks and thinks, "The object being requested is 123456 so it DOESN'T match, so I'll send the correct one". The whole thing then repeats each time.

Some people suggest disabling ETags as a workaround but another suggested workaround is a bit more specific and involves rewriting the ETag in the "If-Not-Match" request header and removing the gzip bit. When the server then checks, the match will succeed and everything is happy!

The second line, I am unsure about. The original code, commented out in my example, seems to add "gzip" to every ETag that doesn't have it already. Commenting out the code and none of the ETags have gzip in them, even the ones that were gzipped. I think I prefer the second line commented out. I put this code into .htaccess in the web root and have to set AllowOverride to FileInfo in the Apache config. You also need to ensure that "sudo a2enmod headers" says that it is enabled or already enabled!

<IfModule mod_headers.c>
    RequestHeader  edit "If-None-Match" "^\"(.*)-gzip\"$" "\"$1\""
#   Header  edit "ETag" "^\"(.*)(?<!gzip)\"$" "\"$1-gzip\""
</IfModule>



Friday, 17 July 2015

Companies still get Customer Feedback sooooooo wrong

Most things in life are on a spectrum from the very good to the very bad but customer feedback seems to fall at the bad end more often than the good. This is not just a web issue, in fact it crosses into Call Centres as well as in-person but some companies are paying a fortune because they get it so wrong!

Why do we want or need feedback? Well the first point, and one which many get wrong, is that we want feedback for different reasons. We might want feedback to improve services, we might want feedback because something is broken, because someone might have a really good idea or even because someone needs to complain. We might also need feedback for everyday things like cancellations, address changes or other Customer Service issues.

If you try and funnel your feedback down the same channel, you will fail unless your company is tiny. But companies like Air Canada, do exactly that. Want to contact them by "email"? Well, click the link on their site and you get the mother of feedback forms. It would be OK if all of the fields were optional apart from perhaps email, so they can contact you back but even that should not be required. If I want to report that a link is not working, my email address is irrelevant and if you are not going to confirm the email address, I can type a fake one in anyway.

But no! You have to fill in your address and even passenger details for the flight. What flight? The flight you went on. But I haven't been on a flight, I'm having trouble booking. Oh.

You need to trust people a bit. Why make the passenger details mandatory? Why not say, "If your contact is about a particular flight, please enter the passenger details so we can find the records"? Why not have a drop-down list for your reason for contacting us? If you did that, you could filter the other fields and not show 50 pointless fields to a person who wants to "report a website issue".

This is one example and a fairly typical one but there are plenty of other ways in which we get feedback wrong.

Lots of big companies use automated telephone systems to "direct your call to the correct department" but for some reason, in EVERY example that I have used, the options are abstract or esoteric and do not obviously relate to everything I might want to do. "Press 1 if you need to change an address; 2 if you need to adjust a Direct Debit amount..." What if I want to do both? What is the problem with saying very clearly, "If your call is about your customer information like name and address, press 1; if it requires access to your financial records, please press 2; If you want to complain or provide feedback in general, please press 3"?

If you get this all wrong (which most people seem to) you generate a massive overhead in Customer Services. How many people are calling back because they got confused? Do you know? Are you employing twice as many people as you need because you are not doing it well.

Most importantly, do you ever think about things from a Customer point of view because I'm certain that many companies quite simply don't!

Let's use Air Canada again. I would bet money on the fact that most people visiting their site are economy passengers looking to get the best price on a ticket with perhaps the desire to have the quickest flight or at least a direct flight. Can you do this? Nope. You type in the dates and say that you're flexible (fine so far) then you see a grid of prices - hmmmm OK. I can choose the dates purely on the basis of price in this grid and then when I select the dates, I go to a list of flights. Oh - none of these are direct and take at least 4 hours longer than a direct flight so I need another day. I'll change my search results and choose another day, that's better, I can get a direct flight and click next. Oh, the return flight is also not direct so now I have to change my search results again. Now I've done that, the price grid now says that the cheapest flight I can get is double the original price! So the two direct flights cost double the flights that are longer, further and have a stopover. Why? No idea.

I tweeted Air Canada, usually the best way to get a quick response and they asked me to contact their reservation lines on the phone. So despite them paying however many millions for their site, I now have to use up some person's time on the phone to do what I should have been able to do on the web site! Waste, waste, waste.

I wanted to feedback this issue but the contact form was so poorly designed, I didn't bother. This is the real danger in getting it wrong, you will push people away until eventually you will not even know why people stopped using your service and you will go bust - just because the few of us who want to help with feedback couldn't do so easily enough!

I sometimes wonder whether this is another skill that falls between the cracks. Customer Service Manager's job to sort the website? Probably not. Technical departments job to make the user journey effective? Probably not.

Thursday, 16 July 2015

Why Web Design is such a pig

Introduction

I remember writing an article a while back about the argos.co.uk website and how very poorly it was designed. It kind of looks OK on paper, the colours are fairly consistent and the layouts are varying shades of average but trying to actually interact it with was painful (it's slightly better but still has some problems).

How can someone like Argos get it so wrong? Let's be honest, the site probably cost several hundred thousand pounds to produce, either in contractor or Argos time and salaries so it pains me that something is already so poor at time of release (as opposed to a site that starts to feel old over time).

The reason is mostly because design is a pig. There are sometimes technical issues but in my experience, most of the difficult things in the technical area are related to design decisions. There are various reasons why design is a pig, some of them easier to fix than others but they all add up to the mess that most of us experience.

1.Terminology

Terminology is really important in so many fields, including software development. We still use this idea of Design/Build, something that has probably been around since the stone-age, despite the fact that software is much more subtle than that.

Software is actually a combination of several different but inter-related practices. The look and feel of the software, the functionality of it, the user journey design and the technical decisions all add up to that single entity that exists in many an amateur's mind: "the product" or the "web site".

Why is this important? Because most web design companies do not have people who specifically look after each of those areas. Sure, they will have "designers" and they will have "developers" but whose job is the functional design? Who is supposed to ensure the user journey works or that we are not asking developers to do things that will cost far more in time than they are worth in usability?

Even the phrase "Designer" irks me in this world of software. What is a Designer? They design stuff? Like what? They definitely do the obvious stuff like choosing Lime Green and Sunbeam Yellow to form a colour pallette but since these designer jobs tend to attract artists, you end up with very nice graphical work but still a vast void between that and the code that needs to make that design happen. We should not be allowed to use phrases like Web Designer but should be more specific. Graphic Designer for Web or User Experience Designer or something.

2.Most design tools are not web friendly

What do most "designers" use for web designs? Photoshop! Of course, the de-facto leader of the pack, the only package that is cool enough to be allowed in Web Design but apart from some small add-ons for web, this package is totally not designed for web application design. It is a glorified photo-editor. What do you get sent from these tools? A load of HTML that can be the start of a web site? Nope. A load of Photoshop or PNG format drawings that a developer has to chop up into individual images.

Back in the day, this was not ideal but was OK because sites were sites were sites but now it is not acceptable to not consider designing for mobile. How do "designers" design for mobile? Many don't - it's the developer's job to work it out and of course because the developer decides, once it is finished, a load of people will disagree with those decisions and make them change it - each of these a potential bug, another load of time and money that either has to be coughed up by the customer or swallowed by the developer. Those that do will often send a second set of drawings showing mobile layouts but again, this can be unhelpful. The two layouts are actually the same site in most cases and why should the developer waste time trying to work out how to make the site respond in the correct way for two drawings that took an hour to draw on Photoshop?

Adobe do actually produce a tool called Edge Reflow, which looks interesting because it allows designers to consider many of these things earlier so if they don't work, it can be designed out now rather than after the developer has either hacked something together or made it work using loads of duplicate code. However, I have never seen one of these designs or what they look like as a design source for the developer, I would like to though!

3. Customers don't know anything

If you went to a doctor for some surgery, would you tell her which way she should cut into you or which part of the liver to remove? No, because she is a doctor and you are not. You might know some stuff about it but you still trust their decision and you know that they don't really care about your opinions.

One of the most annoying things in Web Design is customer input - that is "design" decisions given by the person paying you to build the site. Of course, they are the customer so they get to choose right? Actually, not really. If you let a customer walk all over you, then you probably have an issue with assurance. If they don't believe that you will do something well, they are likely to try and direct it too much. If they do trust you, then you can be nice and clear with them that they get top-level input into the general style of the web site but otherwise you will decide what looks good and what is usable because you are the experts.

We often get this wrong and it can be hard but you should be good enough at your job that if somebody is really insistent on controlling everything, you turn down their work. I have seen some shocking web sites and they have a link to some company "web design by..." and I think to myself that there is no way I would put my name on a site so badly implemented. I would not use that company purely on the basis of one terrible web site so be warned, that site that you take on because you think you need the work might also be the reason why you don't get any more!

4.Functional design is missing

This is pretty common on smaller web sites but who is designing, documenting and SIGNING OFF the functionality of the site? This relates to issue 3 because part of the problem is often that the customer does not actually know what should happen where. Why take on the job if that can't be agreed up front and signed off? Do you think it will come out in the wash because it won't! For any site other than the most basic shop stuff, it is madness to design a site with no functional design.

Employing a Business Analyst is not high on the list of priorities for a web design company but why not? For their wages, which can be half that of a developer, you can employ somebody who is really good at simplification, spotting inconsistencies, nailing customers down to make decisions and producing something that is MUCH easier to work from for a Developer than some half-arsed drawings made on post-it notes. If you are a small company, this person could do other jobs if there isn't enough work but actually functional design is a pretty involved job that reduces the cost of design changes because they are made before the technology has been developed and becomes a risky place to make changes.

5.No-one owns User Experience

I am shocked by how many web sites of all different qualities fail in the most fundamental way: A user needs to do whatever they need to do on your web site as easily as possible!

Your site can look amazing and achieve some slick functional goals but yet the user cannot find out how to journey through the site. This might be because of major bugs but is more likely because the customer and web company did not actually think about things from a user's point of view. Your users may have a wide range of technical abilities, ages, races or whatever might affect their ability to use your site. Just because the Developer could get round it doesn't mean that someone's Grandma will be able to.

User Experience and Design are linked because some things are potentially related to both like the colour and size of buttons but other things exist above the level of design. How do I lead my users through this web site? How will they know what to press? Where do they get taken to after certain things happen like adding items to the shopping basket? Where does it make sense for them to go.

Some of this is, of course, subjective but there are still things that are just considered good practice - period. Navigation must be obvious and not too many levels deep. Button colours need to reflect the seriousness of the action - deleting an account should probably be red or orange as a warning. Update my details should be green because it is saying, "yes, you've finished". You can also cary out user tests if you are unsure, there are companies who provide this or you can organise them yourself with local schools or old peoples homes or whatever.

At the end of the day, you need someone with influence who can fight for the user experience and make sure that no other decisions impact this badly.

6.Design is not always documented

If your company is heavily biased towards Developers then it is likely that you don't like documentation. Graphic Designers are also very keen on making pretty drawings but not documenting their decision process.

A few years ago when we first started designing our product at PixelPin, I wanted that design decisions need to be documented so that if a change was needed, we would know whether we were making things better or regressing. For example, we might have moved a button because user testing showed that the only way for them to press it was to move it away from other buttons. If we know that and someone suddenly wonders about moving the buttons, we can take a view on whether the original decision is still valid. We didn't do this and have paid the price in certain ways.

Let's be honest, designers are known for creativity, not for technical detail and documenting these things is probably not what graphic designers want to spend their time doing but actually, it is part of the process and an important part too. Branding agencies are better because they often have to justify their time and choices more clearly to their customers but as with all jobs, a critical part of being good at your job is knowing what you are doing well or not and changing something to make you do better.

Conclusion

Although we might not think we have the money to have everything the way we want it, there are two types of companies: those who strive towards the way things should be or those that don't bother trying. Guess which ones do the best?

These are just some of the issues in moving from decision to product in the world of web design and there are others but if you actually work out in your own company who should be doing these and educate your customers that these are not just nice-to-haves or excuses to charge money but are a way of making the process robust, the decisions transparent and agreed as well as producing a site that is attractive, usable and consistent.

What will you do?

ServerTooBusyException on Azure

A weird one with an obvious cause (once I worked it out). I had a staged and a live instance of a web service with two urls - one pointing to each. I wanted to make the staged web service live and did a swap in Azure, after doing this, the live web application stopped working and I got the exception above as well as:

The HTTP service located at https://live.mycompany.co.uk/WebService2.svc/standard is unavailable.  This could be because the service is too busy or because no endpoint was found listening at the specified address. Please ensure that the address is correct and try accessing the service again later.

The weird thing was that when I swapped them back, the test web application talking to this same web service worked fine! The server was definitely too busy. I tried again just in case it was temporary but same error.

I panic'd but realised I could run the live web app on my local machine in debug mode and see what was going on. I pointed the live web app to the test web service using stage.mycompany.co.uk.... and it worked fine. I then had an idea. What if I pointed it to live.mycompany.co.uk BUT use a hosts entry to point that URL to the test web service. That should be the most accurate reproduction of the live problem.

Hey presto: I got the same error but this time I could drill down into the inner exception, which actually was a 503 error from IIS - Service Unavailable. The additional detail showed that the problem was the web app was looking for a web service with a specific host header (live.mycompany...) but since this was the test web service, it only had a host header for stage.mycompany...) so there was no web site to serve the content. Like many errors in IIS and .Net, there is some ambiguity which is not helpful but at least I found out the problem:

Solution

Edit the csdef file to set the correct hostHeader for the live site binding element and I also edited a publish profile, which I didn't know was relevant or not but which also had the test values in it.

Re-publish and voila. Relief!

Friday, 3 July 2015

The weird world of .Net Cookies and Forms Authentication

If you've ever had to do anything other than a default implementation of forms authentication in .Net, you might well have also come across some confusing bugs in your application. You might find the site still being logged in after logging out, seeming to be logged out when you just logged in and seeing auth cookies still present when you are not expecting them!

Actually, the system is quite simple but there are some things you need to understand, which will help you to debug your application and what you are doing wrong if it isn't working.

Cookies for Authentication

Firstly, you might have worked out that authentication is not really covered by HTTP. Although you can restrict access to a single resource using the auth header, it is not designed to keep state between requests, which means you have to track people. Why? Because HTTP was designed to be stateless and largely unrestricted. If I want a document, I just ask for it,.. the end. Of course, nowadays, we have applications that function more like desktop applications and totally have to be able to track people across page requests so we know who they are, what they have put in their shopping cart, what their preferences are for the site etc.

The answer is a simple HTTP concept called a cookie - a small text file which can be sent to a client from a server and which will be automatically sent back with each request to the same domain until either cookie expires or if it is a session cookie, until the browser is closed.

In most cases, an authentication mechanism will authenticate the user and then put some kind of encrypted identifier into a cookie. Each time the user comes back to the site, this cookie is sent back, the site can decrypt the contents and see if the user is authenticated. If so, continue, if not, they are sent to the login page.

In fact, in .Net, what is actually put into the auth cookie is an encrypted and encoded packet which describes a System.Web.Security.FormsAuthenticationTicket. This ticket contains a number of fields including the user name, an issue and expiration date and the path used to store the cookie.

When a user logs in, this data is collected, put into an auth cookie and sent to the client. When the client goes to another page, the data needs to be decrypted, verified and then the expiry date check to ensure the auth is still valid. If it IS, then the username is used to login the current user and potentially to load any other data you want to load.

All good so far.

Expiry Dates

There is a problem that surfaces very easily and that relates to COOKIE expiry dates. If you set your forms authentication element in web.config to use, say, 20 days expiry for the cookie but then you create a cookie that only has a 10 day expiry then guess what? The cookie will expire in 10 days and it will look like the auth hasn't correctly remembered the auth duration. If you use the built-in forms authentication, then you shouldn't see this but IF you write your own auth cookies, then you MUST make sure that you set the cookie expiry to the same as the auth expiry in the FormsAuthenticationTicket to ensure they expire at the same time.

If you do not set an expiry date on the cookie then another problem can occur: Session cookies.

Session Cookies

A session cookie (not to be confused with the cookie that stores the session in) is a cookie whose expiry is not explicitly set. What happens if you do this? The cookie will live for as long as the browser is open. This is useful for security reasons since it will help avoid accidental issues with other people logging into your account at a later date. There are two problems with this. Firstly, with multi-tabbed browsers, you have no control over if or when the user will close the browser. They might not close it for weeks, in which case, the auth ticket is still present and can be used - to avoid this, you should set the auth expiry to be a sensible value in the FormsAuthenticationTicket so that even if the browser isn't closed, the auth will still expire after a suitable time.

The second problem is related to deleting cookies.

Deleting Cookies

Lots of people have problems deleting cookies, including the auth cookie. You might have called FormsAuthentication.SignOut() but for some reason, the cookie still seems to be there and when you go back to the site, you are still logged in. This problem is not specific to .Net but is a very easy problem to cause if you use session cookies for auth (either always or as an option).

You can't actually delete a cookie from the server-end. Why? I'm not sure, but the suggested mechanism is to send back another cookie with an expiry date in the past so that the browser should (and usually does) delete the cookie.

The problem you can get is that a browser can hold multiple cookies for the same site which differ on domain or type (session or persistent). What happens if you have a session cookie for auth and then you sign out? .Net will send back a cookie with a negative expiry but guess what? If you have a cookie expiry date set (even an old one), the cookie is automatically a persistent cookie and will NOT replace the session cookie you are trying to delete. What will actually happen is the browser will think that it is a different cookie, will immediately expire the new cookie because of its expiry date and then it looks to you like the auth cookie has not been deleted. In fact, it looks like nothing has worked correctly and you are right!

How do you return an expired cookie for sessions then? You can't! All you can do is to send back an empty cookie with no expiry date set so that it overwrites the session cookie. You will then need to ensure that your system correctly handles a blank auth cookie.

If you allow session OR persistent cookies (perhaps a remember me check box) then you need to handle both cases. One will be taken of for you with FormsAuthentication.SignOut(), the other you need to code.

Some Code

The following code is a sign out function that calls SignOut() to effectively delete any persistent auth cookie and also sends back a blank session cookie to overwrite any session cookies. In this case, it also deletes the session cookie, although this is less important because in my case, the session can be abandoned in code and would effectively be deleted anyway.

internal static void SignOut(Page source)
{
    FormsAuthentication.SignOut();
    source.Session.Abandon();
    HttpCookie cookie1 = new HttpCookie(FormsAuthentication.FormsCookieName, "");
    cookie1.Domain = FormsAuthentication.CookieDomain;
    cookie1.HttpOnly = true;
    cookie1.Secure = true;
    source.Response.Cookies.Add(cookie1);
    HttpCookie cookie2 = new HttpCookie("ASP.NET_SessionId", "");
    cookie2.Expires = DateTime.Now.AddYears(-1);
    source.Response.Cookies.Add(cookie2);
    FormsAuthentication.RedirectToLoginPage();
}

Note, my version is a static method shared between all pages and also, the RedirectToLoginPage will put the current page into the url as the "return url", which you might or might not want. Otherwise, just source.Response.Redirect(..etc..); Also note that you must setup the secure and httponly flags to match your auth cookie so that it matches and overwrites the one that is already there.

Cookies for Session

Something else that people find confusing is the overlap between session and authentication. They are not the same and in most cases cannot be treated as consistent with each other. You can and probably would use session very early on in an application, before you even know who the user is. They might have selected a language, even added items into a basket, whatever, and they haven't authenticated yet. Session is tracked in a similar way to authentication using a cookie (ideally a session identifier which maps onto data stored internally on the server somewhere) but most of the time, a user can have session whether or not they are logged in.

So should these two systems ever interact? Possibly but not definitely. In some articles on security, they suggest that if the user is still logged in as indicated by a valid auth cookie BUT the session is empty, implying the session has expired, then you should log out the user and force them to log back in. This is a mechanism, to avoid the very easy problem of session and auth expiries that don't match. Why not make them match? Because session expiry extends every time you use it whereas auth doesn't by default. Is that always useful? No. If you allow the user to stay logged in for, say, 4 weeks, then their session will very likely be empty when they come back and you don't want them to login again. What you would need to do is to ensure that any important session is repopulated by the system when the user returns.

For security reasons, thought, it is recommended that when a user clicks "log out" then you abandon the session as well as calling SignOut(). This ensures that no-one can hijack the data that might have been put into the session while the user was logged in, which they could do by going back to the same site on a shared computer and the system potentially using the session to populate a load of functionality that the second person shouldn't have access to.

If possible, do not store security or auth data in the session but if you do, keep it to a minimum to reduce the outcome of a session hijack attempt.

Thursday, 11 June 2015

Adding assets folder to Android Studio project

I had ported a project from Eclipse to Android Studio and one of the things I needed was an assets folder. I keep encryption keys in it but there is no assets folder by default in Android Studio.

Solution: Right-click the module (e.g. app) and choose New > Folder > Assets Folder and it will put it into your project and into the correct place on the file system. If you want to find that place, right-click the folder after creation and choose "Show in Explorer" or whatever it says on other platforms.

Android assets not appearing in the folder

As you have probably worked out, in Android Studio, if you want to add an existing item, you can just copy it into the relevant directory and Android Studio will pick it up automatically into the project.

But not always. For some reason.

To work around when it doesn't happen, I just hit Build -> Clean Project and it seemed to re-sync. I couldn't seem to force it any other way but hey.....

Android app crashes with no stack trace after calling startActivityForResult or startActivity

I don't write loads of Android code and it has been about 6 months since I last did anything in anger but I once again ran up Android Studio (HIGHLY recommended for Android development) and set about creating a new demo app to use with our PixelPin mobile app. I will write a few specific posts about things that I have seen.

The most major was trying to use the app to call into an Activity I had added to the project. The debugger wouldn't report anything, the app just died when calling startActivityForResult.

The problem? Not adding the activity into the application manifest! I just about caught the error in the logcat output but sadly, my Samsung S4 mini spits out about 10 errors a second! from all manner of services and apps, something that is very poor in my opinion since logging is recommended to be switched off in production, partly for security and partly for usability.

Added it into the manifest and cleaned the project - it was all good!

Wednesday, 10 June 2015

Firefox and XML Parsing Error: no element found

I kept getting this error from a newly deployed .Net web service when trying to access the SVC file.

The web services works, which is unusual but it was just the svc file and therefore the wsdl that wasn't. However, I could use svcutil to create the client classes so that was also weird.

tl;dr? I use a client certificate for authentication. If this is set to negotiatessl instead of requiressl in the web configuration, Firefox WON'T ask you to choose a client certificate. Instead, you get a 403 (forbidden) error returned with no body and Firefox in it's wisdom decides that because it asked for xml, the response must also be xml but blank is obviously invalid and you get the error.

When I ran Fiddler to check the request/response, it all worked correctly (obviously Fiddler found the correct CC and used it!) but when you use Chrome, it was clever enough to work it out and ask you to choose a client certificate. Whether this was because of a previous call which cached the client certificate need or by some other means, I don't know.

So you can't test this on FireFox it seems (without using Fiddler) but would need to use Chrome instead. Naughty Firefox.

Thursday, 4 June 2015

What you really must know before you think you should make web sites for a living

Let's be honest, it sounds easy. Write a few lines of PHP or knock-up some wordpress with a theme, tinker around with it and sell the service to your customers for a few thousand dollars a pop. I've made a wordpress site in less than an hour so even with some customisation, I should comfortably finish in week so a few thousand dollars for that is good right?

It would be great, except.....

It doesn't work like that for a whole load of reasons. These not reasons that you learn on one of those "learn to code in 30 days" courses that promise so much. All they really teach you is a few basics but learning to lay bricks does not enable you to build a house, there are a load of other issues. Most of these come out in testing obscure combinations of things late in the day when you thought you were nearly finished and which although you don't think are very serious, the customer thinks they need to be fixed before they can possibly sign-off the site.

There is also the difference between your assumption: That the customer knows exactly what they want before you start anything and the reality: The customer has no idea about anything, including their own business processes and they definitely have no expertise in anything web related so usually what happens is that you think you have a whole lump of the site finished before the customer then decides, "now that I can see it, I've realised that it won't work properly".

There are soooooo many examples of this happening in all sorts of trades but I see most of it in web designs and the massive gap between customer expectations and your own expectations.

So where do we need to rewind to? The first contact:

Customer: How much for a web site?
You (should say): There is no way on earth I can answer that question without knowing almost everything about your system
You (actually say): They usually cost around $2000 depending on how complex
The customer hears: It will cost $2000 unless some kind of natural disaster occurs.

Straight away, your expectations are already adrift because you want to be nice and helpful (engineers like being helpful) but you are not being helpful because you are creating a poor foundation. It would be great if developers could start acting like professionals. Could you imagine saying to an architect, "how much to design a house"? What would the architect say? So why don't we say that?

We should have a pre-packaged piece of text that describes what affects the price of a web site. We can point people to it on a web site or we could print it out and give it to them. Let them know that although web sites look simple, they are not. Small changes on the surface can have large knock-on effects in timescales, stability and most importantly: costs!

This is hard because some people will say, "a web site costs X dollars" and the customer thinks you are being awkward and evasive so they go somewhere else but you already know that people say that, so you need to manage it, you need to tell people convincingly that anyone who offers a fixed price is either going to short change them or will not allow them to make changes.

So knowing that customers do not usually understand web technologies and sometimes don't even understand the business they are trying to help with their new web site, you have to manage that too. You should really employ a Business Analyst who is both cheaper than an equivalent developer but also, hopefully, much more skilled at digging into what the business is trying to do. This might be a quick conversation, "I want an online shop" or it might be very complex and take several weeks of time (or longer). The good thing here is that even if they get to that point and things start getting complicated, they can still pay you for your analyst time and could walk away with something that they can reuse later perhaps after saving more money or perhaps once they've thought about what they want.

The really important thing that follows is change control. In smaller companies, we get a call from the customer and he says, "I just noticed that button was a bit small, can we change it?". Of course we can, we are helpful but what is the knock-on effect? How are we going to record or charge them for this work? Seems small but we still need to pre-empt what will really happen. The customer will ask for loads of small things without realising how quickly they add up, they will then be surprised at the bill we are sending them and get all funny thinking we are ripping them off. It's a pain but we need to manage change - it does happen. We might have included some hours in the bill for rework, which we will agree up-front. All changes are then requested, put into some technical detail and costed and then SENT BACK to the customer to authorise. Funnily enough, the person paying the bill is usually tighter than the people who are asking for the changes so let them argue about whether the change is important now or never! The same list gets added to for every change, the billing amount increases each time so the customer has visibility. We should do NOTHING until it has been authorised however tempting it is because then we are working for free and you will eat up enough of your free time fixing unexpected things.

So then how do we finish? If they keep paying you to change things and are happy to defer the release of the site, that's their call but you have to decide before you start the project what your acceptance criteria are. At what point can you objectively say that you've finished? Again, you know that people have different opinions on things so work it out in advance, put it in a document and get it agreed. You might say that the site will work functionally on all the latest versions of browsers and IE from version 8 onwards (for example) meaning that the site is navigable and looks close to correct. You might exclude browser-specific layout issues that are minor and you might exclude any browsers that are not mainstream. If it's broken in them, it's up to the customer to pay to fix it if they think it's important enough, otherwise, it doesn't get done.

Of course, the job doesn't stop there. You have a working site and 2 days later, someone calls you up again and says that X,Y,Z isn't working. Perhaps the server disk is full, perhaps the database has fallen over but have you agreed up-front what service you are providing? You might say that you will investigate and find problems for free but fixing them will cost money except in certain cases where you have made some kind of serious error. Of course, the customer will want you to support it forever but that is probably not your main job so make sure you charge money for it (most industries give X months free cover and then you're on your own or you pay) and MAKE SURE you have agreed this all up-front so you can point to the document and say that replacing broken hard disks is not included in the price and you will charge X hours at Y dollars per hour.

This stuff seems boring but it will totally increase your profitability. It's not that the current popular way of working is providing cheap sites for people, it is providing inefficient expensive sites that are a pain for the customers who order them and a pain to the developers who are building them!

Learn these things, ponder them, consider the types of things that will go wrong and then put in place some kind of framework that makes it pleasant for the customer and efficient for you!

Validation of viewstate MAC failed on mobile only!

This error has probably been seen by most .Net developers at some point and the solutions seem to be easy but yet they weren't working on my latest site, which was showing this error only on mobiles!

ViewState is a way of storing control data in the page so that the server doesn't need to store everything for every single session. You can also store your own values in it, such as CSRF tokens and whatever else you want.

These are all in a hidden form field and are signed by the server using (usually) HMACSHA1 or in .Net 4 HMACSHA2(56) and get posted back to the server. The server then uses its machine key to verify the value that has been posted still matches the signature. If not, it assumes someone has tinkered with the value of ViewState and shows the above error in all its glory!

So when does it fail validation without it being an error?When the machine has recycled/rebooted and has re-generated its machine key, the signature process will obviously produce different values and will look wrong.

It can also happen in a web server farm since each web server will produce its own machine key and a request being posted back to a different server will fail validation. This is usually fairly consistent unless you are expecting sticky sessions where the same session gets linked to a single server but for some reason that hasn't happened.

The problem in my case was that I have a single server and even when I recycled the app pool for the server, it never seemed to fail from the desktop so I assumed that the machine key was safe. In fact from IIS 7.5, the machine key is supposed to be written to a special area that is accessible by all users to avoid a previous permission problem when an app pool tried to write keys to the registry.

Well, whatever was happening, the problem is that Chrome mobile is caching so heavily that even when it looks like it's refereshing, it isn't refreshing (even entering the address bar and pressing Enter or loading it on another tab). It resubmits the same stale page with the old viewstate and it fails validation. The only way I managed to force the reload was to redirect all http to https on the server (something I was going to do anyway) and then it seemed to reload OK.

Fortunately, the solution is pretty easy in all cases - hard-code a key into the web config and so the site always uses the same signing key.This uses the machineKey element under system.web and you can generate a key here.

Once that has done, it should hopefully stop all instances of this occuring across deployments, but I am going to add a fairly short cache duration on the main page just to try and give it some extra help!

Thursday, 28 May 2015

Windows Firewall not allowing DNS queries through.

Arrrrggggghhhhhhh

tl;dr Restart the DNS service

I just spent 2 hours trying to find out what was wrong with my Firewall setup. There was nothing wrong with it.

I have a Windows Server 2012 Domain Controller which also runs DNS (but only for the Windows network, it lives inside a Linux network!). I had successfully connected one machine to the domain but when I tried to do the same with the second, it couldn't find the domain.

The whole management around AD and Domain is horrifically complicated, not something for the faint-hearted but I could work out a few things.

Firstly, I knew the DNS was working, including forwarders because it worked on the domain controller itself. I also knew that because the first machine had joined the domain, the domain was basically setup correctly.

I tried to compare network setup between the machine that worked and the one that didn't and there was nothing obvious. I started trying ping and nslookup (they get their results from different places so they can come out with different answers!) and to make sure I wasn't getting led up the garden path, I disabled the second network card onto the Linux network leaving only one physical route via the host-only network and through the DC to the outside world.

I eventually worked out that with the firewall OFF on the DC, the DNS lookup worked correctly but if it was ON, it didn't. Easy right? Some rule problem? I went through it all loads of times, double-checked rule settings, ports, Googled the correct rules and all sorts but it still didn't work. I even tried switching logging on in GP editor to see what rule was being hit and it didn't work at all - it logged precisely nothing.

For some reason, after being confused, I decided to try and restart the DNS service. Guess what? It all started to work as expected and I could join the second machine to the domain!

WTF?

I have no idea what I had done to the DC to make it's firewall basically block everything even though it wasn't supposed to. I also have no idea why restarting it would make it work but I have decided that I dislike this whole area. Look through the firewall and there are loads of services, many of which I have never heard of and which might be required for something useful or not. Other services require a plethora of weird and unrelated ports to be opened and through all of that, all of the DC setup is carried out through a load of tree-view windows using old-fashioned languages and millions of dialogs. There is no distinction between the settings you are likely to be interested in and most of them have no way to restore defaults for those moments when after trying 100 things, you eventually fix it and want to reset everything you tried in the process!

Friday, 22 May 2015

Cyber Security. It's really hard but it's also not that hard!

On December 20th 1995, American Airlines flight 965 crashed into a mountain in Buga, Columbia while en-route from Miami to Cali airport. The aircraft was fully functional and the crew both experienced and concious but a series of errors led to the deaths of 159 people - only 4 passengers survived.

The flight departed 2 hours late from Miami due to both waiting for some connecting passengers and then, missing their slot, a host of other flights had to depart before another space was open. In some ways, this was the foundation for everything that followed. Being in a rush is rarely a good thing in aviation since it makes you skip things that you would normally do and more importantly, it reduces the thinking time you have when unexpected conditions arise in the air. It is also serious because of FAA rules on working hours and breaks, which although well-meaning can have the effect of causing people to rush even more to avoid an embarrassing situation, such as Flight 965, where such a delay would have caused a knock-on delay in the subsequent departure from Cali in the understandably but not ideal short turnaround times that airlines plan their schedules around.

The flight was largely uneventful until it arrived close to Cali and was planning its final approach. Cali being under civil conflict at the time had lost its radar system so it had no visibility of approaching aircraft, relying instead on the pilots to inform the tower of their position. The approach should have been largely textbook. It involved approaching a beacon called TULUA, after which another called ROZA close to the airport would be tracked, after which the plane would have passed the airport, turned and landed from the south. The approach here, as in other airports, is critical because the airport is in a valley surrounded by 4000m high mountains and it was night time. (I think sometimes, us developers feel like we are always flying at night!)

A second factor now comes into play. The air traffic controller was Columbian and spoke Spanish as his first language. There was no suggestion that his English was not understood but some confusion as to the language used causes the Captain to make the first of a series of errors. The flight is told that they are "cleared" to ROZA and to report TULUA. The intention here was that although they were still supposed to fly the approach as planned, they were clear all the way to ROZA (they didn't need any more permissions) but when they flew over the TULUA beacon en-route, they would inform the tower so he knew where they were. The Captain heard, "ignore TULUA and go straight to ROZA", causing him fatefully and incorrectly to delete the TULUA beacon from his flight plan.

The tower also informs them that because the wind has died down, they can land directly from the north if they want to. Of course they want to, they are 2 hours late and that kind of thing is helpful to save some minutes and potentially avert any further delays. They are, however, too high to make this a simple procedure and against the background of being rushed, they deploy the speed brakes to enable the flight to descend more quickly and continue their approach.

A series of more confusion leads the two pilots to decide to put ROZA back into their Flight Management System (FMS) and go there as part of the ROZA 1 standard approach.

Entering a new waypoint is usually done before the engines have even started from the comfort of the ground and without anything else distracting you. You can think more clearly about each decision and the FMS can warn you about any discontinuities - which is waypoints that don't appear to logically connect and which therefore are not permitted. Do the same thing in a rush in the air and you are presented with a list of all the waypoints listed under the letter R (in this case for ROZA) and you would normally assume that the top one is both the closest to the current position and therefore the correct one in this situation. It is not. It turns out that for unknown reasons, ROZA cannot be found under R in the FMS on the 757 aircraft - it would be found only by its full name, the Captain doesn't know this and blindly selects this new waypoint, executes it (without checking with his First Officer as process requires) and the plane starts a strong left bank to point to a waypoint that happens to be 100 miles away in the wrong direction near Bogota.

Remember, this is nighttime and it is not always obvious when you are turning. They are also still descending and they are also still confused about what is going on.

They reach a point a couple of minutes later where they realise that something is not right with their heading and they can't seem to agree how to reset their bearings and bring them back to the normal approach. The Captain tries TULUA but can't find it on his radio so instead plots to ROZA on his NAV radio instead of the FMS - which gives him a much more unambiguous heading. What they haven't done is kept an eye on their flight, which has now descended so far that they are on the other side of a high mountain from Cali but without visual references, they don't know this until an alarm suddenly blasts the cockpit with a continuous "Terrain, terrain, pull up". They quickly throttle up and pull back as they have been trained to do but it is too little and too late. The flight crashes into the side of the mountain. In a scary twist, investigators find that the speed brakes are still deployed from earlier and that if they hadn't been, the pilots would probably have cleared the mountain!

Why have I told this story? Firstly, I find flight crash investigations incredibly interesting but also there are clear parallels with the software industry and particularly Cyber Security.

What if I said to you that the incident above was unavoidable? What if I said that there was no practical way that the plane could have avoided crashing? I hope you would disagree. Of course it could have been avoided. In fact, 99.99999% of flights avoid this every day by following procedures, learning from other people's mistakes, working out where weaknesses lie and doing something to mitigate the risks that they carry.

How is this is very different from Cyber Security? A breach is rarely caused by a single thing but by a series of events which, when added together, create the opportunity for an attacker to take advantage and for your site to crash.

Sure, in some ways Cyber is very difficult because there are many different attack vectors, also many different types of attack vectors. One person rarely has enough expertise to understand all of them (except in Hollywood movies) and it seems every day there is some new exploit or malware or weakness. Also, there are sometimes advanced, persistent threats. These are not easy because they occur over periods of time and exploit human factors as well as systems but they are still systems that can be quantified, risk assessed and mitigated. You can still create processes that help and don't hinder your security.

So what can we learn from flight 965 and how can Cyber be easier instead of harder?

You need knowledge. With the best systems in the world, if you do not have one person who understands the basic attack surface of each type of system that you expose to the web (hardware, software, remote desktop, web application etc.) you will not be able to have any security assurance. Even though external contractors can help you, you still must have someone in-house that understands broadly what's going on - a domain expert. In flight 965, this was the flight crew. You cannot get a non-pilot and expect him to fly a plane safely, even with everything that is known about the subject being available to that person.

Secondly, you need a good map. The pilots would not have been able to even attempt a night landing without their maps, including the beacons that specify waypoints to safely navigate the valley. But yet in our companies, most of us have, at best, a general idea of our systems and how they are connected and exposed (or not) to the outside and inside world. This is partly a software issue since the only programs I am aware of that present this sort of data tend to be expensive networking tools from people like Cisco and HP, not the kinds of software that most people can afford. There is also, sometimes, an assumption that things like Network monitoring tools are non-productive and are therefore a luxury. Why pay someone to maintain something that if done properly is never an issue? You might as well employ someone to keep an eye on your office carpets. Of course, this logic is the same for many things that only betray their value if something goes wrong and most of us are fortunate enough to either never get attacked or not to find out that we are.

Thirdly, you need processes and procedures. Is someone allowed to spin up an FTP server on one of your web servers without any oversight or approval or risk management? Could you imagine if the pilots of airlines were allowed to fly in their own way depending on what worked for them? "I always fly faster than planned to give myself some slack if I hit a delay", "I never follow that route exactly because I think it goes too close to those mountains". How are your networks wired? Do you have anything that ensures that things only get connected because they have to, not just because it is easier than buying another firewall or another web server? Do you have any code development checklists and approvals to ensure that someone - who might have all the right skills - hasn't forgotten something and opened up a hole?

The fact is, sometimes just one of these measures will be enough to stop a hack in teh same way as any one of the various links could have prevented flight 965 from crashing. Of course, it is better to have several measures - defence in depth - so that we can afford for one to break under some certain conditions and rely on the others to help us. We do, however, need to know when the individual checks break so that we can see whether those measures are fit for purpose. If flight 965 had retracted the spoilers and got over the mountain, an investigation might have decided that the processes for verifying flight plan changes or even the language used between Air Traffic Control and aircraft needed tightening up for this specific scenario.

Whatever happens. Do something. Your worst processes are probably 100 times better than nothing at all but if you have an improvement mentality, you start where you are, you learn from yourselves or others and you improve things over time.

So Cyber Security...it's not that hard!

Thursday, 21 May 2015

Sell like Steve Jobs

It is common in the startup world to compare yourself to others. Why did they succeed when we didn't? Why did they sell the company for £10M and we can't even sell something for £50?

We try and evaluate it and ask, is our team good enough? Are we in the correct location? Are we listening to customers and meeting their needs? Are we targetting the correct market?

I think lots of these refer to my previous post about mistakes we make when we look too closely rather than taking a step back. Firstly, there is not necessarily a repeatable reason why some companies succeeed when others don't. Some companies produce garbage but seem to be in business, others have a great idea and it doesn't bite. You might call it serendipity, which is a posh way of saying luck.

Maybe you were in the right place to meet that key person, maybe you met the first large customer by chance in a conference, maybe you have some good money men friends who made your company look good, not because it was but because that's what money men do.

So let's not worry about the why's and look at it in a different way. We need to sell like Steve Jobs. Imagine your product was made by Apple, there are certain things that you know would be true.

  1. You would be ruthless in your design. "Alright" wouldn't be an option and although you might not get everything in that you wanted, everything that was in there would be optimum at least as far as any of your customers are aware. This might require a ruthless system of employment that rewards success and fires failures - something that most of us don't like.
  2. Your user experience would be familiar and consistent across any other products that you have. People need to feel that they are buying into a bigger family, even if they are getting the cheap version of something. Remember the iPod? It was like a small iPhone.
  3. You would make sure that you know what people will buy. You might do this by various research or you might have a visionary who can just see it but once you have decided it, you will not deviate and you will never, ever, doubt the value of your product. Steve Jobs wouldn't be on a stage even hinting that the new MacBook might have been a mistake. You have put too much effort in at this stage to second-guess yourself.
  4. When you launch, you will not even think about suggesting that your product is sub-par or has features that will come later because everything that is in it has been done to perfection.
  5. You will not care about your customers opinions on whether they would have made it differently because you are the expert and what you have produced is quality. If they don't think they need it then they are either stupid (don't tell them that) or they haven't caught up yet with the next big thing. You will find plenty of doubters and critics but who cares? Don't pretend you can ever avoid that.
We all know that Steve Jobs was known as an arrogant man but is there a difference between arrogance and extreme self-belief? He didn't need to please people because he knew what he was doing and did it well.

Interestingly, Apple have taken a few knocks in the past which is weird for a company who most of us assume cannot put a foot wrong but that's just the luck again.

So if you want success, you can do much, much worse than sell like Steve Jobs.

Why can't everybody back off?

I am not talking about people leaving me alone, I am talking about people being able to quantify a problem from the right distance. This affects software development but also affects many other decisions we make in life.

If you had a leak in a water pipe and someone offered you a bucket, you would use it temporarily but you would know (hopefully) that the only sensible long-term plan is to fix the leak. That is an example of looking from the right distance - the actual problem is the leaking pipe and not the water dripping onto the carpet.

When we look at other areas of life, however, we note very quickly that people seem unable to look from the correct distance. We get too close to something and then we either miss the bigger issue or otherwise we get too stuck to our particular view and therefore become less and less able to make good long-term decisions.

In the UK, it is NOT unlawful to park on the pavements (sidewalks) unless you are causing an obstruction, a definition that is rarely invoked because it is too abstract. As a result, we have many pavements that are broken from vehicles parking on them - particularly heavy vehicles. So what do the Councils do? They look at the symptom of broken pavements and decide, quite rightly, that it is too expensive to try and keep them maintained. What do they do? Either nothing or they make some token repairs knowing full well that the pavement might be broken again within weeks. They argue - from the wrong distance - that it is an unwinnable position.

They are wrong.

When you look from the correct distance, you either decide a) cars should be allowed to park on the pavements, in which case, they need to be designed and built to withstand the weight involved or b) cars should not be allowed on pavments, therefore the law should be changed. (You might also decide to mix a and b in different areas). The idea that pavements are not strong enough but cars are allowed to park on them is neither sensible or logical but is the outcome of an evaluation that is too close to the symptom.

In the software world, many organisations have and continue to have massive cost and time overruns on projects (not quite software but the F-35 project is an obscene example of this). It happens why? Because the people who order these systems foolishly believe that a) They need the system and b) The contractor is competent therefore it will all be fine. Why is this foolish? Because we all know many projects that have failed miserably and many of them were probably run by people much more competent than you and me but yet they failed? The mistake is that the issue is not viewed from a pragmatic view, from a distance that says, "why did these projects actually fail?" I don't think that it is a hard question to answer. If you ask most people, they would tell you what went wrong: Unclear requirements, changing requirements due to long-duration projects, lack of expertise from people in charge of design or requirements, inventing "new tech" that is unknown - sometimes you don't even know if it might work, pricing based on gross estimates.

It happens time and time again and the question is, "How do we act differently so that the outcome is different" I don't know who said that madness is doing the same thing in the same way multiple times and expecting a different outcome.

Maybe the answer is that projects should never last more than 12 months. Maybe if something is larger, it needs to be devloped in stages, each of which is a deliverable in its own right. Why wait for the F-35 to build a new super helmet? Design and build one. If the F-35 dies, we use it on the next plane. New engines? Same. Maybe the answer is that the whole way a project team works needs to be reduced and simplified. Maybe a domain expert needs to be involved in every decision making process rather than assuming they are only needed when we get to design stage.

All of these decisions can be taken if we are able to recognise we are too close and take a step backwards and the same is true in Software Development and other singular job roles. Is what I am producing of a suitable quality? If not, I definitely need to recognise that and I then need to ask what is going on. Am I trying to do too much?, am I lacking process or independent review etc.? If I do it again, will it be better from what I've learned? One of the problems is that we don't teach people this, it is something that some people know instinctively, something that others have learned the hard way and something that some people don't even get - but they still write code!

I don't know what it about humans that we seem to always do it wrong. Are we too egotistical? Perhaps we care too much and want to make something work at all costs - even if it is taking too long and costing too much.

I just wish we would learn how to back off.

Wednesday, 13 May 2015

Encryption - so necessary but so dangerous?

The public are an alarmist lot. Despite the fact that most people spend their time online sending pointless messages to each other on Facebook or reading rude jokes, as soon as there is the possibility that GCHQ or the NSA can read your data, everyone gets up in arms. How dare they read about my visit to a Birmingham shopping centre or my latest status update involving a large glass of beer.

So we end up with quite a wide deployment of SSL/TLS. Not such a bad idea. For most of us, especially those in business, the cost of an SSL cert although much greater than it needs to be is fairly cheap in the scheme of it. The server overhead is minimal and everyone's happy right?

No. Of course not.

Enter the Systems Engineers.

You see, TLS is all OK and everything but what happens after the TLS is terminated at some server somewhere? How is the data transmitted around the data centre or stored on disk? Your password is usually sent to the server to login so even though it isn't stored in a readable format, it could still be interecepted reasonably easily when you are in the process of logging in. Even if you have it setup wonderfully well, I'm pretty sure the NSA can sign their own certificates and could presumably man-in-the-middle most sites without most people spotting it so what can we do?

Apparently we should encrypt everything at rest and encrypt our comms end-to-end from the browser itself right through to some trusted other end.

So now we're good?

No. You see theoretically, if someone has a computer the size of the moon and 50 years, they could potentially crack your Facebook traffic because it uses TLS 1.0. TLS 1.0? You still use that ancient, weak, crackable scheme? You might as well just send stuff in plain text. Or so some of the Systems Engineers would have us believe. The reality, of course, is that most of these theoretical weaknesses are so hard to achieve they are only of interest to people who have things of real value to crack. No-one is going to spend a year trying to access my Facebook account - although they will spend that time trying to access Lockheed Martin or Boeing.

We are running to stand still, the perfectionists are dictating good-practice and we all get sucked up into it. Including me.

I needed to re-install OS-X on my MacBook (although according to one blogger, I should never have to do that because it has never been necessary for him to do it!) and I was REALLY careful to backup everything and to double-check that the backups at least appeared to be on the backup disk, some Western Digital NAS box. It all seemed good and after taking a deep breath, I took the plunge and reinstalled Yosemite.

And then I accessed the Time Machine backup. Which I had encrypted, of course. I'm not a privacy junkie but I have work code on my home laptop and I would be more comfortable if I had the NAS stolen to know that it's encrypted.

But I couldn't remember the password. I tried all the usual ones and nothing. I can't even get the "hint" because that's stored in Key Chain and the MacBook is reinstalled. Apparently there is no recovery process because these systems are designed to be perfect.

And that brings me to my point. Why do we insist on pefect security? Our houses are not perfectly secure by a long shot and they are much more likely to be attacked than any of my software systems. My front door key can be easily copied, the lock could be bumped or snapped fairly easily, the windows could be shattered and an intruder could get in so easily but somehow I live with that risk. In the computer world though, we are told that this risk is not acceptable. We are not taught, certainly by the vocal engineers, to risk assess what we are doing. Things like encryption key rotation and such like are all very well but are they really necessary or do they just increase the chance that they will cause us to be stuck with a lost key and an inaccessible system?

Wednesday, 6 May 2015

Why has Google lost the plot?

I am not a designer but like most software developers, I know there are certain simple rules that you should adhere to when creating web applications. Many of them make sense visually, such as using a few complementary colours and having consistent font sizes, others are just practical, such as consistency with other web sites, obvious navigation and not overloading the user with too much information.

I would expect a company like Google to be top of their game at this, especially with their continuous release of new "Beta" applications, whether that is Google Docs or the new Contacts interface.

I am wrong. Google, seem to have lost the plot. Let me give you some examples. I opened my GMail contacts to add a mobile number for someone. I didn't know if they were in my contacts or not, so let's start by typing into the search box:



Has this worked? It looks like it's still waiting but this is multi-billion dollar Google's attempt at a "no results found" page.

OK, so he doesn't exist, I delete the name and go back to the main contacts page.



I've blacked out the details but you get the idea. I need to add someone new. Remember what I said about obvious navigation? Where am I drawn to? Top left where the menu is? Nope. Top right? The Google shared toolbar? Nope. Tabs in the middle, near the search box? Oh - it's that lone button in the bottom-right by itself! The button which is not near any other navigation! Great. Let's click that.


Now you get this extremely bland form, which is mostly distracting since, in this case, I have only entered a name, wich you can marely see amongst the noise. I also have a mobile number. Now where do I put that? At first, I thought it wasn't on the form, but it is. Underneath really useful things like nickname and job title. So you enter it and then what?


Where's the done button or the X to close the dialog? Nope. You have to press the "Back" arrow. Of course, because back is not confusing at all. It couldn't possibly mean go back to the edit.

Now I'm more than happy that certain trend-setters push boundaries and maybe this is Google's attempt at Material design which makes it all nice and cross-platform but honestly? This is crap. It beggars belief that the number of people who work at Google cannot get something much more consistent and instinctive after spending what must be a vast amount on development and testing of this new Contacts form. It honestly looks like something a design or developer student would produce and then be told by their teachers to go and make it good!

Why Google? Are you really the new Microsoft where you exist in your own version of reality where you don't care what is good, just whatever you feel like doing and people have to use it because you have them hooked on GMail?