Monday, 25 November 2013

Public/Private keys, OpenSSL, RSA, PEM and DER

It's no wonder that many people stay away from security in their web apps. Trying to do something that sounds simple like "create a public/private key pair" ends up being 30 minutes of Googling. This is because there are variations and differences but also similarities between the various types of keys and certificates and then not all systems support all types. Add in things like Putty which works great but which has its own key format and it is pain on a stick. Anyway, I had openssl downloaded so to create the "Key Pair", simply run

openssl genrsa -out mycert.pem 2048

Now, the first confusion is that although you will find lots of people saying that this generates a public/private key pair you will notice that it only outputs one file and this is because a public and a private key are not really two distinct entities but rather the public key contains a subset of the information that is contained in the private key - enough to encrypt data but not enough to decrypt it or to work out the private key. For this reason, you sort of have a key pair but really it is just a single private key in a single file. Soooo. We probably want our public key to exist separately as well to give to people who want to encrypt data for us so we need to use another command to tell openssl to export the public key and to convert it, if required, into the raw digital format that some systems prefer to the base-64 encoded DER format of PEM.

openssl rsa -in mycert.pem -outform der -out mycert.pub.der -pubout
openssl rsa -in mycert.pem -outform pem -out mycert.pub -pubout

This produces another file (or two) which, in the first case above, is DER rather than PEM encoded but which only includes the public key part of the private key. The second example does the same thing but keeps the exported key in PEM format. This is enough for you to start your public-key encryption process but note that when you search for help on the topic on Google, be aware that as well as public/private keys, you might also be looking at certificate requests (like you send when you want an ssl certificate), a certificate, a certificate chain, or some combination of the above. For this reason, there are various switches to openssl so keep your eyes peeled before trying something out. On the other hand, things tend to fail quite quickly if you have done something wrong like trying to export a public key in one format to a public key in another - you apparently have to do this from the original private key instead!

Monday, 18 November 2013

Why we cannot have perfect encryption and why we can't trust anybody

Something that has only really dawned on me recently through the whole NSA spying issue and what they (and possibly others) can and can't know about me from the internet traffic I create and consume. Part of my job is writing software that is secure so people can trust it so I am very interested in what I can achieve in terms of security in my software and possibly in other products that I could produce. Let's be honest, there is much money to be made in good security products so why not? This led me to my first question that set off a whole chain of events.

1) How is my data secure from snooping?

Firstly, I use SSL for all communication links but we know SSL fails in various areas. Firstly there are a couple of known exploits Beast Attack and RC4 attacks possibly the most famous or at least the most famous of the ones that are possible even on newer implementations of TLS/SSL. This brings up another important issue, "Use TLS 1.2" they all say but we cannot. Quite simply, too many end-users are still using older browsers that don't support it and amazingly, even some new browsers do not support it out of the box: Browser support. If we are honest, many of these attacks are highly unlikely but TLS/SSL is still potentially insecure.

Secondly, what about the ciphers I use, both in TLS/SSL and also for the data itself? Well, everyone knows AES256 is amazingly hard to crack (or do we?) but what about the ciphers used in SSL. Again, a very sloppy organic growth of SSL and the need to support the widest range of web servers and browsers/clients means a frightening amount of cipher suites are possible during an SSL handshake, the web server supports a set, the browser supports a set and they negotiate which to use (my guess would be the first to match regardless of any type of ordering). The only way to reduce these is to use newer versions of TLS and browsers but again, that is not possible for public sites that want to support everyone. Even more frightening, is a classic man-in-the-middle attack that intercepts the cipher negotiation and tells both ends to use the weakest protocols it has in order to make interception and decryption much easier.

We can't really trust SSL as it stands. And back to AES256, I only trust it because that is the common consensus, I don't personally know whether it is solid, I do not know whether large governments have ways to make the cracking easier, I do not know whether specific implementations have back-doors into them put there by corporations, hackers, governments or anyone else. And Open Source? Great in some ways but how many people are really watching all the commits and spotting exploits that have been inserted by someone or other?

2) If I assume my SSL and my AES256 is secure in itself, how secure is the key?

Well, many people say that encryption should be secure if everything about it is known apart from the key. It's a bit like letting people examine the door to the vault and assuming that any weaknesses will be pointed out (they could be seen but not pointed out!) but if someone has the key, it is game over. This is part of the motivation for cryptographic hashing but this just opens other problems instead.

So how can we use our symmetric key? Well two options, we generate one using a key generator or we stretch a password to create the key dynamically. Both of these, of course, move the key problem to a storage problem.

Assuming the key generator and the password stretching algorithm are sound (again, most of us can never know this - it is easier to know something isn't true than to prove it is) the key itself or the password used to create it need to be stored somewhere. Database? Lots of ways in which that can be accessed, particularly with Cloud Computing, often you are sharing a database server which means that someone (not you) has a Super Admin account and could access ANY data in your database. If you are using a Cloud Service DO NOT store keys in a database.

We could store it on disk somewhere but again, if the storage is shared, other people potentially have access, another client perhaps (by mistake) a dodgy sys admin like Edward Snowden who (rightly or wrongly) abuses his access privileges or indeed a company who accidentally or purposely allow governments to access this data for "legitimate reasons". The problem with keys is they cannot be encrypted, otherwise you need another key which cannot be encrypted, eventually it is stored somewhere "in the clear".

The question is, what can you use to secure your keys? Servers and many higher-end modern PCs have a hardware security module which allows, theoretically, secure storage of keys in a way that cannot be obtained by reading hard disks or whatever but these are, of course, designed more to protect against stolen computers. Malware running in software is still able to obtain these keys in the same way the system obviously needs to access them. Again, can we trust the corporations not to put back doors in? Can we trust them not to make a mistake in the design which allows access to them? Do we know that governments cannot remotely login to our servers and get these keys?

In Cloud Computing systems, keys are stored in the infrastructure. Again, of course, we have no idea how these are stored and whether they are secure. If I use encryption in my Azure system and the key is the SSL certificate I uploaded to Azure (and told it the private key password) do I really know that this is secure? Of course not. The system does need access to these keys to share them amongst web instances for TLS/SSL connections but at the same time, anyone in the world could have gained access to my system and I wouldn't even know.

Conclusion
So what? Clearly, we are a long way from this Utopian idea that encryption can buy us the privacy some people so eagerly desire. I have nothing to hide in one sense but I also do not relish the idea of armies of complete strangers analyzing my digital footprint to find out about me.

Before everyone says we need to go back to the old days, remember that trust has always been an issue and I suspect always will. Chelsea Manning was trusted with US secrets and blow the whistle, the same for Edward Snowden and I'm sure many others we will never know so the idea that the communication system/storage is the only risk is, in itself, flawed. While we need to communicate with someone else, there is always the chance that the other person is the security hole.

That being said, there are some things we can do to help protect ourselves but only to a point.
  1. Know where your security risks are and the level of risk. For instance, if you are relying on TLS/SSL to protect communications, that is reasonable but record it somewhere so that if this was ever broken badly, you would immediately know where you are exposed. Sadly, I suspect most people have never done a basic risk analysis of their system.
  2. Stick with the most widely regarded patterns and modes of encryption and update regularly. For instance, why are people still using MD5 for password hashing? Also, understand why you choose these things, why is bcrypt probably better than SHA-256 for password hashing (because it is deliberately slow)
  3. Never, never, never invent your own crypto-systems, ciphers, key derivation functions etc. unless they go through peer review. Just because MD5(data) + MD5(data) seems like it will be twice as strong doesn't mean it is! The people who write these things are either stupidly clever or they take a LOT of time and effort and research to ensure they are solid.
  4. Think twice before using shared hosting servers if you are selling a security product. You can still use a data centre but if the machines are yours, you know what is on them and you know, to an extent, that they are not accessible by people who shouldn't be accessing them.
  5. Linux servers are more likely to be secure to back-door remote access than windows but I think if Windows really did have it, people would know by now with the various pen-test tools available.
  6. Rotate keys over time so that if a key was obtained, the chances of being able to decrypt older or newer data is reduced, also, the attacker would need to steal the key and the data at the same time.
  7. Be very aware of social attacks (possibly the most common attack vector) and the amount of access people have to your systems. If you are a startup, does the sales director really need access to your code just because they are on the same network? Just by doing very basic access controls, you could prevent many attacks.
  8. Don't base your entire business case on the fact you can trust any system. Even modern "browser based" encryption systems that promise the earth are mostly quite weak due to weaknesses in the design of JavaScript and browsers (for all the same reasons as before).

Wednesday, 13 November 2013

What every developer should know about password hashing before writing it!

Another leak here: Macrumors leak and I start getting all annoyed again about how often online systems are NOT using best-practice when it comes to password storage. It sounds like the system was hacked due to account privilege escalation but it doesn't really matter. If a developer does not have a proper understanding of password hashing, they should not be allowed anywhere near a password system, they should certainly not be writing their own password system.

Sadly, much data on the web on such matters is either inaccurate, too opinionated and often out-of-date but this is not always easy to notice when most forums do not expire content related to technical information, even though they probably should. Anyway, although I'm sure there are plenty of other articles out there about password hashing, some of which are written by people who know much more than I do, I want to write something by way of an introduction to password hashing and how it is used and why. Hopefully, when people understand, they will stop following bad practices and even if we see breaches in the future, users will not be so worried about it.

I guess before we even start, web apps themselves should be secure and follow best-practice guidelines. For instance, using stored procedures in a certain way would mean the web app cannot access everyone's password even when hacked. Likewise, the connection should not be made using the sa user which can do anything on the database anyway. These types of practices are beyond the scope of this article but, in my opinion, another area that every developer should be familiar with is https://www.owasp.org who look after security so other people do not need to do their own research. They have a whole wealth of information on security and you should know where it is.

Passwords

Right, we all know what passwords are. We hopefully also know that most people use the same password on most of the web apps they are members of. That means that if only one site spills the beans, the other sites are vulnerable. Your web app is not an individual, it is part of a community and you should take that responsibility seriously. The best way you can manage passwords is to use a single-sign-on service like google, twitter or PixelPin so that the password issue, including how to securely store them is in the hands of companies who specialise in it and haven't glued two pieces of paper together and written your password on it in crayons like some sites appear to!

Doing it your way

I know many of you are saying, "but I don't want to use a 3rd-party for reason xyz". In most cases, I would disagree with you but lets assume that you really need to reinvent the wheel and make a user create a totally new account on your site.

Firstly, I hope that you can easily understand why you should never, never, never, never, never store user passwords in plain text. There are so many ways in which database contents can be leaked that this would be criminally negligent. A rogue worker, someone at a data center with machine access, a hacker, another customer who has shared access, a careless developer, a vulnerability in a framework. All of these are risks and a plain text password is plain wrong! What's that? You need to store passwords in plain text so you can send them to users who forget? Please do not do that! Email is insecure in many ways, including people reading over your shoulder but also, many connections are not encrypted and the password is there for anyone reading the network connections.

Encrypting passwords (using symmetric encryption like AES or DES - encryption that can be decrypted) is another issue that has surfaced recently due to the Adobe breach. The general received wisdom is that although the encryption itself might be very solid (at least hopefully, it is not using something old and weak like DES) that an attacker might well have access to the decryption key which makes every single piece of data decryptable. In most cases, you should not use symmetric encryption for passwords, using something called hashing is preferred. For one reason, doing hashing properly and choosing a strong password makes cracking the password as good as impossible for anyone.

Encrypting Passwords

So you obviously understand that if a password is not to be stored in plain text, it obviously needs to be 'encrypted' in some way. The three broad schools of encryption are called symmetrical encryption, asymmetrical encryption and crytographic hashing. For the purposes of this discussion, the first two are the same in that encrypted data can be decrypted by a key and as discussed above, this raises the danger of the key being accessed/stolen by an attacker at which point, none of the data is safe any more. Hashing is a little different in that the password is encrypted and stored but it cannot be decrypted meaning that theoretically, even if an attacker stole the hash, they wouldn't know what the original password is, the application is also unable to decrypt it so how can that be useful?

Hashing Passwords

The trick to using hashing is a property of a hashing algorithm (there are various algorithms available, we will discuss this later) in that if you hash the same data using the same algorithm, it will ALWAYS produce the same output which is a "hash", a series of bytes, most often seen as a base-64 encoded string  of a certain length (dependent on the algorithm used) which makes it easy to read and transmit across channels like the web which are not binary friendly.

When a user creates an account, you store the hash of their password. When they login, you hash the password they type and if it matches the hash in the database, they typed in the correct password, otherwise they didn't. Theoretically, because the algorithm produces a hash, more than one password might create the same hash (a "collision") but this is so unbelievably unlikely that it is not considered a problem. In fact, if an algorithm is found to have too many collisions then it is discredited and not used any more.

As a basic example if you hash the word "password" (without the quotes) using the common algorithm known as MD5, it will produce a hash that looks like this: 5f4dcc3b5aa765d61d8327deb882cf99 Even though I told you that this hash was produced from "password" there is no known way to directly compute "password" from this hash. It is consider a one-way function. This is a bit like multiplication in maths where it is very easy to multiply two numbers to produce a result but much harder to work out what these factors are just from the result.

At its most basic level, hashing already adds some security because if someone read your database and saw that your password was stored as 5f4dcc3b5aa765d61d8327deb882cf99, they would not immediately know that your password was "password".

Basic Hashing Weaknesses

There is, however, a problem with just using a pure hash. The weakness is because the hashing algorithm will always produce the same result for the same input so if I hash a load of common passwords (including "password") and store the hashes in a big lookup table, when I come across 5f4dcc3b5aa765d61d8327deb882cf99, I can look it up in my table and see that it was produced from "password".

Ineffective Improvements

Developers are a strange breed and sometimes think they understand things that they don't. For instance, somebody thinks that rather than using a common hash algorithm, they will do something strange like invent their own hash algorithm, either from scratch or based on other algorithms. In most instances this causes something that is either no better than a basic algorithm or in some cases much weaker. The amount of time and work that has been put into attacking common algorithms proves how strong they are. If your home-made algorithm has not been reviewed in the same way (which it won't be!) then there is no way you will write anything that is any good. Please don't ever invent your own mechanisms, they are not needed since the correct way is very easy to do.

Another ineffective improvement is to add a fixed "salt" to every password before storing it. The idea being that the salt is then added to a typed in password and the hashes are compared in the same way as before. The thinking is that if I add salt of, say, "thisismysalt" to the end of "password" before I hash it, I will not get 5f4dcc3b5aa765d61d8327deb882cf99 anymore but 1d63491d7f52a91da41213205b422062 in other words, when the attacker sees it in the database, it won't match their lookup table! Win? Nope sadly not. If the attacker gets enough passwords, they can assume certain things like the most commonly occurring hash is likely to relate to one of the top 10 most common passwords like "password123", "letmein", "password" etc. in which case, it will not take long to work out the system used for hashing and what salt is used, all the attacker has to do is start hashing various combination of the top 10 passwords with data after it and perhaps before it. Some cracking systems can perform billions of such hashes per second so we might only be talking about minutes and as soon as one breaks and shows that the password was "passwordthisismysalt" it would not take an expert to realise how the passwords are constructed, at which point, the attacker simply re-hashes their password list with "thisismysalt" on the end and job-done!

Another popular method is to perform multiple hashes on the same data. Rather than running MD5 once, you run it in a loop hashing the hash for, say, 1000 times. This adds some amount of time to the process both for the attacker and the system itself but it still doesn't really help when the attacker has enough data. In the same way as attacking fixed salt, they can try various combinations of hash iterations to find what they are looking for since they can still assume that the most common hash probably relates to one of the top 10 most common passwords.

Using Salt Properly

If you are going to add a salt to a hash, a very minimum, it should be a different salt per user. Ideally, it should be random, relatively long (i.e. hard to guess) and never re-used either across users or for the same user (otherwise if historical data was eventually cracked, the information could be used to attack newly hashed data). PHP now includes very easy-to-use password hashing functions and these should be used if they are available. They are shortcuts to using bcrypt directly which makes it much easier for people who do not understand all the options. The defaults are good but they can be improved over time. .Net has security classes that perform the same functionality depending on what exactly you want.

The beauty of a "variable salt" is that you no longer have patterns of data in your database, you can no longer determine which of the passwords relates to "password" and which relates to "thisisaveryhardpasswordtocrack becAuseitislongandh@sweirdcharacter£init" this makes the work much harder even, as is the case with bcrypt, the salt is stored alongside the password hash.

Is Pepper Good

Another tactic that is often cited but should not be confused with the purpose of salt is pepper. The idea is that pepper is a deterministic way of introducing additional data to the password before hashing but it is not stored with the hash or salt in the database so is unknown to an attacker who only has the database data and not the source code. It could be something like the userid transformed in some way such as reversed, upper cased and then perhaps with a long fixed string added to the end so that it is still different per user but does not have to be cryptographically random like salt should be.

Defeating attackers with size and speed

There are, above these other techniques, two ways to defeat an attacker. The first uses the size of an algorithm to make it much less likely that an attacker can cover the required number of "guesses" before they make a match. For instance, MD5 only produces an output of 128 bits which would mean that guessing every value of MD5 would take 3^38 attempts and an average hit would take half of this time 1.7^38. Currently, this sounds like an impossible task but with some other data to hint at the answer and enough computing power (in excess of 7 billion guesses per second), these attacks are quite trivial. Compare this with something like SHA-512 which produces 512 bits of output and the number of combinations possible is now 1.3^154 which is massively more complex than MD5. This is also slower to guess (circa 200M/second) so this is a good way to defeat an attacker.

The alternative, arguable cleverer, way is to slow down the process deliberately. Imagine if your hash algorithm took, say, 300 milliseconds to compute. This would not be noticeable when one user was logging into a system but would slow down an attacker who could no longer try millions of guesses per second but just 3 per core per second! bcrypt, which uses blowfish is designed exactly for that and in many ways is a great defence for passwords. The only major problem is that the slowness incurs a memory overhead which makes it unsuitable for low-memory devices although even that is likely to become less of a problem over time.

Recommended Setup

Whatever I recommend is likely to end up being controversial but I might as well be brave! If you are using PHP, the following code is all you need to store a password and check it again afterwards:

// Note, requires PHP 5.5 Look at each function here http://us3.php.net/manual/en/ref.password.php to find equivalent code for earlier versions
function signup($username,$userpassword,etc...)
{
   // Do whatever checks are needed
   $hashedPassword = password_hash($userpassword, PASSWORD_DEFAULT);

   // Save $hashedPassword to database
}

function authenticate($username,$userpassword)
{
   // Get hashed password from database where username = username into $row
   if (password_verify($userpassword, $row->password))
   {
      // Success
   }
   else
   {
      // Failure
   }
}
One of the great things with these functions is that you can upgrade the "cost" or algorithm of the password hash as you go along. You can then test the existing database entries with password_needs_rehash() to see whether it is out-of-date. If so, check the password entered matches and if so, rehash the entered password and update the database!

.Net is a little different in that there is no built-in blowfish implementation. You can do one of two things, you can bring in another library like BouncyCastle that provides bcrypt or you can use PBKDF2 instead which is designed to do a similar thing to bcrypt but is not so cost-intensive. You can achieve this like this:
//Disclaimer: I have not used this code so it might not work exactly out of the box. I use PBKDF2 to create encryption keys using code like this
public void signup(String username, String password)
{
   // Do whatever checks are needed
   // Generate random salt using your own function or something like System.Web.Security.Membership.GeneratePassword
   var rfc2898 = new Rfc2898DeriveBytes(password, randomSalt);
   var hashedPassword = rfc2898.GetBytes(32);
   var combinedHash = randomSalt + "$" + Convert.ToBase64String(hashedPassword);    // Use dollar to make it easier to split later

   // Save the username and combinedHash in the database
}

public void authenticate(String username, String password)
{
   // Get hashed password from database where username = username
   // Split hashed password from database into dbpassword and dbsalt using the $ symbol
   var rfc2898 = new Rfc2898DeriveBytes(password, dbsalt);
   var hashedPassword = rfc2898.GetBytes(32);        // This should match what is in the database if successful

   if ( hashedPassword == dbpassword )
   {
      // Success
   }
   else
   {
      // Failure
   }
}
Edit: Thanks to Duncan Smart for pointing out that MS already do what I was attempting above here:  http://msdn.microsoft.com/en-us/library/system.web.helpers.crypto.hashpassword and http://msdn.microsoft.com/en-us/library/system.web.helpers.crypto.verifyhashedpassword

Conclusion

If you follow the suggestions above, then what an attacker gets is a randomised salted-hash, they have few if any clues (they may or may not guess what hash algorithm you are using) in which case they would have to resort to some kind of brute force. Even if they had a known-plaintext (one of the hacked hashes is for a password they know), they would have to spend some time working out what system is in use, and even if they eventually work out you are using bcrypt with 100 iterations and they know the salt from the database, they would have to construct a brute-force against each hash, one at a time and they may or may not choose a hash that is generated from an easy password, which would significantly slow down the attacker. It would not prevent them from cracking any passwords but it would certainly make them think twice about whether the effort was worth it. If you added some pepper to the system, if they did not have the code, they would probably not be able to crack any passwords at all.

AES Encryption/Decryption in Android using SpongyCastle

At the moment, Android have handicapped the BouncyCastle security library included with Android so that it doesn't support some of the more common and useful encryption algorithms. I will assume this was done for performance reasons but also possibly because older handsets might not support some of the algorithms? Anyway, I don't care and I want AES256 encryption.

I found out that the writers of BouncyCastle had another project called SpongyCastle which is the full library for Android and includes all the features I wanted. It is easy to download from here. Once you have the library installed, it works much the same way as BouncyCastle but with different namespaces. Here is an example of a module that includes encryption and decryption using AES256, a randomly generated key, which is stored in private memory on the device and the ability to use this key to encrypt or decrypt data which can be stored in other files.

Any comments most welcome on whether I have got this correct for Android. Note I can't promise that I have used the most efficient or suitable cipher types but this does work! Also note that the initialization vector, which ensures that encrypting the same data multiple times does not produce the same ciphertext, is added to the encrypted data and stored with it, it must therefore, be stripped off before decryption.

I am unsure as to the best way to read in data from file with an unknown length. The method I have used below is to make a preset-sized buffer (larger than the data I am storing) and then once it is read in, using the length to create a new buffer and copying the data over. This ensures that the buffer length is correct otherwise the decryption gets confused about padding information.

package org.MyCompany.MyApp;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.security.NoSuchAlgorithmException;
import java.security.SecureRandom;

import javax.crypto.KeyGenerator;
import javax.crypto.SecretKey;
import javax.crypto.spec.SecretKeySpec;

import org.spongycastle.crypto.engines.AESFastEngine;
import org.spongycastle.crypto.modes.CBCBlockCipher;
import org.spongycastle.crypto.paddings.PaddedBufferedBlockCipher;
import org.spongycastle.crypto.params.KeyParameter;
import org.spongycastle.crypto.params.ParametersWithIV;
import org.spongycastle.util.Arrays;

import android.content.Context;

/**
 * Handles the security aspects of the app such as encryption and offline storage
 * Reference takens from http://android-developers.blogspot.co.uk/2013/02/using-cryptography-to-store-credentials.html 
 * @author Luke
 *
 */
public class SecurityModule 
{
    /**
     * Store the given serialized data onto the local device
     * @param pointsSer
     * @return
     */
    public static Boolean StoreMyData(String myData)
    {
        SecretKey key = CreateOrRetrieveSecretKey();
        if ( key == null )
            return false;
        
        WriteData(encrypt(myData.getBytes(), key.getEncoded()), "mydata.ser");
        
        return true;
    }
    
    /**
     * Get the cryptographically securely stored data for this application
     * @return
     */
    public static String GetMyData()
    {
        SecretKey key = CreateOrRetrieveSecretKey();
        if ( key == null )
            return "";
        
        byte[] data;
        try 
        {
            data = ReadData("mydata.ser");
        } 
        catch (IOException e) 
        {
            e.printStackTrace();
            return "";
        }
        
        byte[] decrypted = decrypt(data, key.getEncoded());
        return new String(decrypted);
    }
    
    /**
     * Encrypt the given plaintext bytes using the given key
     * @param data The plaintext to encrypt
     * @param key The key to use for encryption
     * @return The encrypted bytes
     */
    private static byte[] encrypt(byte[] data, byte[] key) 
    {
        // 16 bytes is the IV size for AES256
        try
        {
            PaddedBufferedBlockCipher cipher = new PaddedBufferedBlockCipher(new CBCBlockCipher(new AESFastEngine()));
            // Random iv
            SecureRandom rng = new SecureRandom();
            byte[] ivBytes = new byte[16];
            rng.nextBytes(ivBytes);
            
            cipher.init(true, new ParametersWithIV(new KeyParameter(key), ivBytes));
            byte[] outBuf   = new byte[cipher.getOutputSize(data.length)];
        
            int processed = cipher.processBytes(data, 0, data.length, outBuf, 0);
            processed += cipher.doFinal(outBuf, processed);
            
            byte[] outBuf2 = new byte[processed + 16];        // Make room for iv
            System.arraycopy(ivBytes, 0, outBuf2, 0, 16);    // Add iv
            System.arraycopy(outBuf, 0, outBuf2, 16, processed);    // Then the encrypted data
            
            return outBuf2;
        }
        catch(Exception e)
        {
            e.printStackTrace();
        }
        return null;
    }
    
    /**
     * Decrypt the given data with the given key
     * @param data The data to decrypt
     * @param key The key to decrypt with
     * @return The decrypted bytes
     */
    private static byte[] decrypt(byte[] data, byte[] key) 
    {
        // 16 bytes is the IV size for AES256
        try
        {
            PaddedBufferedBlockCipher cipher = new PaddedBufferedBlockCipher(new CBCBlockCipher(new AESFastEngine()));
            byte[] ivBytes = new byte[16];
            System.arraycopy(data, 0, ivBytes, 0, ivBytes.length); // Get iv from data
            byte[] dataonly = new byte[data.length - ivBytes.length];
            System.arraycopy(data, ivBytes.length, dataonly, 0, data.length    - ivBytes.length);
    
            cipher.init(false, new ParametersWithIV(new KeyParameter(key), ivBytes));
            byte[] decrypted = new byte[cipher.getOutputSize(dataonly.length)];
            int len = cipher.processBytes(dataonly, 0, dataonly.length, decrypted,0);
            len += cipher.doFinal(decrypted, len);
    
            return decrypted;
        }
        catch(Exception e)
        {
            e.printStackTrace();
        }
        return null;
    }
    
    /**
     * Check for a currently saved key and if not present, create a new one
     * @return The newly or previously created key
     */
    private static SecretKey CreateOrRetrieveSecretKey()
    {
        try
        {
            byte[] keyBytes = ReadKey();
            SecretKey key;
            if ( keyBytes == null )
            {
                key = GenerateKey();
                WriteKey(key.getEncoded());
            }
            else
            {
                 key = new SecretKeySpec(keyBytes, 0, keyBytes.length, "AES");
            }
            return key;
        }
        catch( NoSuchAlgorithmException e )
        {
            e.printStackTrace();
        }
        return null;
    }
    
    /**
     * Generate a key suitable for AES256 encryption
     * @return The generated key
     * @throws NoSuchAlgorithmException
     */
    private static SecretKey GenerateKey() throws NoSuchAlgorithmException {
        // Generate a 256-bit key
        final int outputKeyLength = 256;

        // EDIT - do not need to create SecureRandom, this is done automatically by init() if one is not provided
        KeyGenerator keyGenerator = KeyGenerator.getInstance("AES");
        keyGenerator.init(outputKeyLength);
        SecretKey key = keyGenerator.generateKey();
        return key;
    }
    
    /**
     * Write the given data to private storage
     * @param data The data to store
     * @param filename The filename to store the data in
     */
    private static void WriteData(byte[] data, String filename)
    {
        FileOutputStream fOut = null;
        try {
            fOut = MyApp.getAppContext().openFileOutput(filename, Context.MODE_PRIVATE);
            fOut.write(data);
            fOut.flush();
            fOut.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    /**
     * Write the given encryption key to private storage using the hard-coded filename
     * @param key The key to write
     */
    private static void WriteKey(byte[] key)
    {
        WriteData(key, "myappkey");
    }
    
    /**
     * Read data from private storage using the given filename
     * @param filename The filename whose contents to read
     * @return The contents of the file or null
     * @throws IOException
     */
    private static byte[] ReadData(String filename) throws IOException
    {
        byte[] key = new byte[5096];
        Arrays.fill(key, (byte)0);
        FileInputStream fOut = null;
        try 
        {
            fOut = MyApp.getAppContext().openFileInput(filename);
            int length = fOut.read(key);
            byte[] key2 = new byte[length];
            System.arraycopy(key, 0, key2, 0, length);
            fOut.close();
            return key2;
        } 
        catch(FileNotFoundException e)
        {
            return null;
        } 
    }
    
    /**
     * Read the encryption key from private storage
     * @return
     */
    private static byte[] ReadKey()
    {
        try 
        {
            return ReadData("myappkey"); // Hard-coded filename representing the encryption key
        } 
        catch (IOException e) 
        {
            e.printStackTrace();
        }
        return null;
    }
}


Tuesday, 5 November 2013

GMail email relay from PHP in Azure worker role

I have a worker role that needs to connect to a web application regularly to force it to keep it's database connection established (poor I know!) but in this case, the connection can take 10 seconds to connect and if you are the unlucky person who hits it at that point, you have to wait.

Anyway, I wanted to test how the Azure PHP worker role worked and check that it did what I thought it would do so I decided I would get it to send me an email in the worker loop to ensure it was all happy and running properly.

When you add an Azure worker role, it creates a whole load of weird files that run certain commands, set up paths etc but the only one you need to care about is index.php which is invoked by default when the worker role starts. The way it works, therefore, like .net worker roles is that you should have some PHP that sits inside a while(true) loop (assuming you want it to loop) and which then calls whatever PHP code you need it to.

In my case, I was using swiftMailer to relay email via our Google accounts and swiftMailer uses PHP's open ssl extension (which was already enabled by default).

Anyway, this was all fine but the emails weren't coming through, although the role was running. This was because I put the bit I expected to fail inside a try/catch to ensure that any errors would not make the deployment break!

I ended up enabling remote desktop for the role (see previous post!) and went straight into d:\windows\temp\php53_errors.log and saw the following error:

PHP Warning:  fsockopen(): SSL operation failed with code 1. OpenSSL Error messages:
error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol in E:\approot\swiftMailer\classes\Swift\Transport\StreamBuffer.php on line 233
PHP Warning:  fsockopen(): Failed to enable crypto in E:\approot\swiftMailer\classes\Swift\Transport\StreamBuffer.php on line 233
PHP Warning:  fsockopen(): unable to connect to ssl://smtp.gmail.com:587 (Unknown error) in E:\approot\swiftMailer\classes\Swift\Transport\StreamBuffer.php on line 233


Fortunately, I found a useful blog post with a similar issue on another framework and it is related to how Google works and a few shortcomings in the swiftMailer's SSL functionality. Long story short, you need to use the following config: $transport = Swift_SmtpTransport::newInstance('smtp.gmail.com', 465, 'ssl'); Note the use of port 465 (not 587) and protocol ssl.

So a complete example for a simple PHP mailer using swiftmailer is as below:

require_once('.\swiftMailer\swift_required.php');
        
$transport = Swift_SmtpTransport::newInstance('smtp.gmail.com', 465, 'ssl');
$transport->setUsername('accountUsername@gmail.com');
$transport->setPassword('yourPassword');
    
$mailer = Swift_Mailer::newInstance($transport);
    
// Create the message
$message = Swift_Message::newInstance();
$message->setSubject('This is the subject');
$message->setFrom(array('no-reply@server.com' => 'No reply'));
$message->setTo(array('someone@somewhere.com' => 'john'));
$message->setBody("This is the actual content");
$mailer->send($message);

Cannot remote desktop to Azure Worker Role

Spoiler: The certificate was missing!

Well I couldn't before, but I can now! I have a PHP Azure Cloud Service. It is not quite as integrated into Visual Studio as .Net ones (not surprisingly) so you have to do certain operations manually in the configuration files. One of these is Remote Desktop access and this is a bit messy to do by hand but then I noticed that you can enable it on the Azure portal after deployment (which is nice).

I enabled it, chose the certificate to use for the RDP connection and carried on. I RDP'd into the site but accidentally logged into one of the web roles, when I was intending to look into problems with the worker role. I tried instead to RDP into that but this time I got, "The user name or password is incorrect". After trying 5 times to make sure that I had typed the long random password in correctly, I was confused. I only set the credentials in one place and the RDP had correctly picked up the username I had chosen so what gives?

It took me a while but then I remembered choosing the certificate when setting up RDP and, in my case, the worker role did not have those certificates loaded since they were used for https on the web roles. I added the certificates to the worker role, re-deployed and it all worked fine. As with many things, it makes perfect sense once you fix it, the certificate is used for encryption and if it is not present, rather than a useful error, it encrypts the data incorrectly (or not at all) and then fails authentication at the other end.

Friday, 25 October 2013

The Myth of Code Portability and "lack of vendor lock-in"

This is a Friday-style essay looking at something that is mentioned time and time again in articles about, "how we moved from .Net to Scala" or, "breaking free from the Microsoft stack". What I see time and time again is that people seem to say that they prefer one language or framework over another simply because it is open source/non-commercial. Naturally, open source can mean many things but the basics of the argument are that locking into a single vendor is a mistake that cannot be undone later but I don't agree with this for various reasons, described below.

The reality of virtually every system I have ever worked on or heard about is either, they only live for perhaps 5-10 years or otherwise they reach a point where they are left alone and not maintained unless absolutely necessary. From the customer point-of-view, this is fine because once it basically works, they want it left alone. By and large, if that customer decided, 10 years later, that they wanted a load more functionality, they would almost certainly get a completely new system based on completely different components and also, very possibly from a new supplier (because most customers get fed up with suppliers over time!).

So let us consider "product lock in" against this background. Scenario 1: Supplier creates a product based on a full Microsoft Stack, which gets installed at a customer site.

Issue 1: Maintenance.
No problems here, the .Net stack hasn't fundamentally changes in the past 15 years. C# looks pretty similar, albeit with some new functionality in later releases. Updating server OSs to new versions of Windows hasn't caused many headaches as far as I know, in fact, getting software and OS from the same vendor is good in terms of compatibility testing and support from that supplier. Consider what would happen if your open source product didn't work properly on Red Hat any more - you might be able to fix it but contributing to open source projects is not as easy as it sounds on paper and that assumes that you have the skills to even know how to fix it - you probably don't!

Issue 2: Major update. Again, no real issues here. This will be a new system that may or may not be Microsoft-based. If this is big enough, the customer is probably comfortable with the idea of replacing servers or hosting or OSs to suit whatever technology you decide to use for the new system.

Issue 3: A change is required (perhaps major) to a piece of your software - a module or service perhaps. Well, on the one hand, If you use the MS stack, it is likely that it either all works well or if it doesn't, it is a problem with the implementation, in which case, a technology change is not the issue, just a rewrite of that part of the code. On the other hand, many important customers are not comfortable with major changes anyway and the supplier certainly wouldn't want to do this unless they were paid for it. Using the MS stack doesn't actually preclude any of these things. The Vendor lock-in is not really in play and after all (God forbid) you can call PHP web services and MySQL databases from .Net if that was your core problem.

So what is the lock-in that people are concerned about? I think people are often talking about money rather than lock-in. I don't like the idea that because my software runs on Windows, MS have some kind of financial grip on me. But do they really? If you pay your licences up-front for Windows Server, how often do you have to upgrade and how much does it cost in the scheme of it? People still run 10 year old servers and even if licences are £500 a pop, compare that to the rest of the IT costs in an organisation and it really is peanuts.

What would using Linux, PHP, MySQL etc. actually give you? Really? Free updates - amazing, they are free from MS too except for major upgrades (which are hardly deadly expensive). A database engine that is still not as mature as SQL Server - again, if that's what you want, fine, but I am happy to pay a few hundred quid more for something that won't easily spill my user details over my web app. The same goes for .Net and Visual Studio with Azure. The cheapest? Nope but in my opinion, well worth the money.

If you want to use PHP - great. MySQL? Fine, Scala - do whatever floats your boat, but the decision, in my opinion, should be very heavily weighted towards the skillsets you have or can easily come-by and not some idealistic opinion about the latest and greatest languages, whether they are open-source or not. If your software is tested correctly, you should already know if the language/framework/database is suitable for the scale of work that the application needs to do.

Much debate is centred around endemic and often thinly veiled hating of corporations. It is easy to hate a company that makes lots of money but in most cases, they have lots of money because they produce something that people want. Some of the assumptions about the motivations of these companies are also out-dated (and might not have been true in the first place) - again, it is easy to assume that the only thing that a large company cares about is money but I think MS (I don't know much about Apple and Oracle etc) have made large improvements over the past 5 years which many haters still don't even know.

The moral? Don't change your system just to be "free", wait for it to need an upgrade and make changes then, based on what you need for your system at that point in time.

Invalid service path! Cannot locate ServiceDefinition.csdef in current folder or parent folders

This was really confusing me, I got this error attempting to add a PHP worker role to my cloud service and the ServiceDefinition.csdef was DEFINITELY in the correct directory!

Anyway, I had to download the source code for the Azure CmdLets and spent an hour working out how to get it to debug and use the modified source to log more verbose information (anyone else find that the verbose is not very verbose!).

...anyway, this error means one of two things, firstly, it means what it looks like it means but secondly, once it finds the ServiceDefinition.csdef file and knows the correct directory, it then parses the ServiceConfiguration.Cloud.cscfg and reads all the roles that exist and then looks for their directories being present, if it doesn't find all the role directories, it assumes the csdef is not correct and rather than recursing like it does when it first looks for the csdef file, it just fails with the same error text.

This was caused, in my case, because I had added a non-PHP worker role (assuming it would be C# - it wasn't!), deleted the folder and ran Add-AzurePHPWorkerRole at which point I got the error. It would be nice to have a more specific error here, otherwise, there is no point in displaying what it does!

I edited the cscfg files and the csdef file, removed the old role sections and ran the cmdlet again and it was all good!

Dropbox folder keeps appearing, can't delete it - solved

This was annoying and slightly confusing. Some folders would delete OK (and sync online) but others would re-appear. This was on Windows 7.

Solution: Right-click and "View on dropbox.com" which opens your browser. It turns out, in my case, there were two files in the folder which, for whatever reason, hadn't sync'd to my computer and made the folder look empty. I deleted them using the browser interface and was then able to delete the folders permanently from my computer.

Tuesday, 22 October 2013

Finally - Azure with client certificates!

Introduction

The basic setup is that I have a web application which is public and a web service which is private. Since they are both hosted on Azure, I thought I would use client certificates as a way to ensure only the web application can access the web service. I had untold problems with it and got fed up with the odd IIS type errors about mismatches and authentication modes so I kind of gave up and hosted the web service on an Azure virtual machine so I could configure it all.

Anyway, there were a few problems with the raw virtual machine approach. It is harder to scale up (although I think it is possible with Azure VMs), it costs slightly more to run and although it was quick to deploy using SubVersion directly onto the box, I had so many problems trying to update the web site without it causing the web application to stop communicating with it properly. I decided enough was enough and tried to set up the client certificates in a cloud service again.

Client Certificates

You probably already know what these are if you are reading this article but if you don't, the basic idea is to use an X509 (SSL/TLS) type certificate to prove to the server that you are an authorised client. Using this mechanism is a good way of protecting your public facing but private services/applications from prying eyes. Note, that although you can possibly use these without https, I would not recommend it.

You can self-sign certificates but that opens a can of worms regarding how to check whether the certificate is genuine and having to override various errors but in my case, I am using SSL certificates from a root authority, which are pretty cheap really for the extra security.

Server End

Code

The web service is pretty much a standard WCF web service but you have to be a little careful with namespaces and stuff (that is not related to client certificates but I might as well include it all to make sure it works). I HIGHLY recommend testing the basic web service without too much security just to make sure it works. The security settings can drive you mad but your problem might be related to something much simpler.

So... the service interface itself looks like this:

namespace com.mycompany.WebServices
{
    [ServiceContract(Namespace = "http://mycompany.co.uk/com.mycompany.WebServices.CC")]
    public interface IMyWebService
    { //etc...

Naturally, the exact namespaces are not too important (and don't need to use https in the name) but best to keep the standard format. The implementation is this:

namespace com.mycompany.WebServices
{
    [ServiceBehavior(Namespace = "http://mycompany.co.uk/com.mycompany.WebServices.CC")]
    public class MyWebService : System.Web.Services.WebService, IMyWebService
    { //etc...

This is pretty standard stuff and although you can often live without the namespaces, it can make it really difficult if you start referencing more than one web service from your client with type conflicts etc.

Configuration

There are obviously lots of little bits, many of these are specific to my implementation (buffer sizes etc) but again, I will include it all so that you get an example that definitely works. The configuration for the service looks like this:

<system.serviceModel>
    <services>
      <service behaviorConfiguration="defaultBinding" name="com.mycompany.WebServices.MyWebService">
        <endpoint address="standard" binding="wsHttpBinding" bindingConfiguration="standardConfig"
          name="standard" bindingName="standard" bindingNamespace="http://mycompany.co.uk/com.mycompany.WebServices.CC"
          contract="com.mycompany.WebServices.IMyWebService" />
     </service>
    </services>
    <bindings>
    <customBinding>
        <binding name="wsdlBinding">
          <textMessageEncoding messageVersion="None" />
          <httpsTransport requireClientCertificate="true" />
        </binding>
      </customBinding>
      <wsHttpBinding>
        <binding name="standardConfig" maxBufferPoolSize="10485760" maxReceivedMessageSize="10485760">
          <readerQuotas maxDepth="32" maxStringContentLength="10485760" maxArrayLength="10485760" maxBytesPerRead="10485760" maxNameTableCharCount="10485760" />
          <security mode="Transport">
            <transport clientCredentialType="Certificate" />
            <message clientCredentialType="None" />
          </security>
        </binding>
      </wsHttpBinding>
    </bindings>
    <behaviors>
      <serviceBehaviors>
        <behavior name="defaultBinding">
          <serviceMetadata httpGetEnabled="false" httpsGetEnabled="true" httpsGetBinding="customBinding" httpsGetBindingConfiguration="wsdlBinding"/>
          <serviceDebug includeExceptionDetailInFaults="true"/>
          <serviceCredentials>
            <clientCertificate>
              <certificate findValue="6FE3C33463GF87GFF94DC52479E72BFF60A1C34B" x509FindType="FindByThumbprint" storeLocation="LocalMachine" storeName="My"/>
            </clientCertificate>
            <serviceCertificate findValue="AB656DEE53E920801D23A5CC90250CB11F88D62E" storeLocation="LocalMachine" storeName="My" x509FindType="FindByThumbprint"/>
          </serviceCredentials>
        </behavior>
      </serviceBehaviors>
    </behaviors>
    <serviceHostingEnvironment aspNetCompatibilityEnabled="true" multipleSiteBindingsEnabled="true"/>
  </system.serviceModel>


There are not too many things to worry about here but note that I have a service certificate for the https and another certificate that is what the clients use to connect and authenticate. Both of these are added into the Azure configuration as per normal certificates (both of them in LocalMachine/My). I have also included the certificate for my certificate provider to provide the full trust chain and this is also added in LocalMachine/My. I have not needed to do this for the root certificate which is obviously included on Azure. All of these certificates need to be uploaded to the Azure management interface so they are available to the project during deployment.

I then have to unlock the access section of the web config to force the use of SSL and require a client certificate. You can create a startup file like this: http://msdn.microsoft.com/en-us/library/windowsazure/gg456327.aspx and inside that file, you need to run "%windir%\System32\inetsrv\appcmd.exe unlock config /section:system.webServer/security/access" without the quotes.

Then inside your web config, add the following section inside system.webServer:

<security>
   <access sslFlags="Ssl,SslNegotiateCert"/>
</security>

Client End

The client needs some similar configuration but note that using "Add Service Reference" inside visual studio did not work as expected, so I used SvcUtil.exe (which is part of the Windows SDK if you need to download it) and running SvcUtil.exe against the wsdl like this:

SvcUtil.exe https://mycompany.co.uk/MyWebService.svc?wsdl

produced the relevant client code (a .cs file) and an example configuration to use for the client. I already had the client configuration setup so I didn't use this.

I had to include the .cs file from SvcUtil into my project and then create the client configuration in my web config in much the same way as it looks in the web service end:

<system.serviceModel>
    <bindings>
      <wsHttpBinding>
        <binding name="standard">
          <security mode="Transport">
            <transport clientCredentialType="Certificate" />
            <message clientCredentialType="None" />
          </security>
        </binding>
      </wsHttpBinding>
      <webHttpBinding>
        <binding name="webHttpBinding" />
      </webHttpBinding>
    </bindings>
    <client>
      <endpoint address="https://mycompany.co.uk/MyWebService.svc/standard"
        behaviorConfiguration="CCBehaviour" binding="wsHttpBinding" 
        bindingConfiguration="standard" contract="IMyWebService" name="standard" />
    </client>
    <behaviors>
      <endpointBehaviors>
        <behavior name="CCBehaviour">
          <clientCredentials>
            <clientCertificate findValue="6FE3C33463GF87GFF94DC52479E72BFF60A1C34B" storeLocation="LocalMachine" storeName="My" x509FindType="FindByThumbprint" />
          </clientCredentials>
        </behavior>
      </endpointBehaviors>
    </behaviors>
  </system.serviceModel>


As before,the certificate referenced in the clientCertificate element was added to the client's Azure configuration in the LocalMachine/My store and also uploaded to the Azure control panel.

Conclusion

It seemed easy once I had it all working but I'm not sure where I got it wrong before. I'm always suspicious of any caching mechanisms or the classic of mine which is, "break something just before you fix something" that just keeps pushing the problem to somewhere else! Hopefully this will help you get started with the client certificates functionality, a great way to secure communications with a web service.

Thursday, 17 October 2013

I really can't stand Java

Let me be up-front. I ONLY use Java because it is the only way I can write apps for Android. There are other promising frameworks that allow you to use other languages but I am betting that most of the low-level functionality I need is not available under those frameworks, some of it is barely there in the Android Java libraries!

My big gripe, though, is that using Java is such a struggle and I am a senior developer with over 10 years of commercial experience in all kinds of languages and frameworks. I am not a big fan of PHP but I can see where it works well. I wouldn't write most of my code in C but for some things it has an elegant beauty. Java on the other hand seems to have the worst of most worlds rolled into one.

  1. The language itself has weird shortcomings, that could have been fixed, but haven't and which are a real pain for those of us who are used to C#. Having to declare or catch every exception is nonsense. On paper it sounds good but, in reality, it means that exceptions are taken for granted and everything put into large "catch all" blocks rather than the intention of making people treat errors deliberately. If I am parsing a string into a number and I have already validated the input, why do I have to bloat my code with error handling for InvalidFormatException.
  2. Trying to compare strings with == compiles OK but doesn't do what virtually every other language would do - a string comparison of the two strings. In Java, it compares objects which, in my experience is totally useless and just asking to cause errors - subtle ones! The same goes for other basic types and you can't even overload operators to overcome this.
  3. Lots of the core libraries seem unnecessarily verbose. Compare the default HttpClient usage with the Apache one. No comparison.
  4. Various inconsistent naming and casing for classes throughout the core libraries.
  5. C# has introduced var for an implied typed variable. If you use var something = new Class(), you know that var equates to Class and this makes declarations so much shorter and clearer. In Java, no such luck, some names, especially with generics end up causing a constructor call taking up a whole line.
  6. Performance is basically terrible for most scenarios - another difference between the theory (write once, run anywhere) and the reality (virtual platforms suck). At least .Net can be compiled to a more useable and much faster byte code. The fact that it can perform acceptably well is not an excuse. I can make anything run fast by massive optimising or lots of hardware but these things should be like that by default.
  7. Libraries, jars, source paths, class paths etc. Is a NIGHTMARE in Java. It's like extern "C" in the old days which said "you should find this at runtime but if not, blow up". Why? If I have the library added to a project, shouldn't it all just work? This has done my nut trying to get a custom SSL manager working which compiles fine and goes bang at runtime. This is further worsened in the Android world since Google have handicapped some of the basic Java libraries for various (presumably) security and performance reasons.
  8. Using the file system directories to mimic the package names is complete nuts and doesn't (in my opinion) really give you anything other than headaches. Adding jars (glorified zip files) doesn't really change this. If anything, jars make it even more confusing since you have to browse through them to find some of your classes.
  9. Separation of docs, classes and libraries is just stupid. Another problem that rears it's head when you are trying to browse docs or code when debugging.
  10. Trying to understand the toolchain is also very hard for newbies. JDKS, JREs, J2EE - nice abbreviations for geeks but not useful for people trying to enter the world of Java.
  11. It doesn't do much of any of it's design goals. It's cross platform is crappy, it's embedded use is almost non-existent since it is so heavy weight and all of the failed goals came with a cost of a poorly designed language that is hard to use.
  12. There are too many third-party libraries that are considered standard - very worrying for a framework provider. Apache web client instead of the Java one, Bouncy Castle instead of the encryption classes in javax.security and even a cut-down version for Android that looks the same but will fail (sometimes at runtime) for this reason.
  13. One class per file (except inner classes) is another pointless-ism. An IDE is perfectly capable of parsing all files under a directory and finding where all the classes live. If I have a bunch of small classes, do I really need these massive directories full of files when one file called messages.class would make more sense?
I would think that many of these things could actually be fixed without any major problems. Stop requiring exceptions to be thrown or caught (but offer the option if you really want). Perhaps, allow code to be either compiled to native (since in many cases, people don't want to run the code on lots of different platforms), be more deliberate in taking good third-party code into the main set of packages so people who want fairly pure programs can rely on these features and not have to ship loads of random libraries that may or may not be maintained.

I think in it's purest sense - the basic language - is not that bad, it mostly looks like C# and C++ but with all the additional annoyances makes it worse than the alternatives. C# has been evolving and provided features that are very useful - although optional. Using var for types you don't care about, (since you might just dump them in a data table for instance), linq for SET based data manipulation makes some functionality so much briefer (a foreach loop on one line which reads really well) and libraries which are not just maintained in one place, for consistency, but which have also integrated work that others have done such as DotNetOpenAuth.

To me, the ONLY reason I use Java is because I have to. I think in every way, C# and probably other languages are superior (most other languages are not exactly equivalent so this is harder to judge).

If Java died, I wouldn't cry!

Tuesday, 15 October 2013

SSL/TLS for Developers or IT Admins

TLS, do I care?

BEAST attacks, dodgy RC4 algorithms, government spying etc. etc. etc. SSL (or TLS as it should now be known) is quickly becoming big news. It is one of the technologies that in the past, we were happy enough for it just to work, but now we are told it is very easy to mis-configure and like many technologies must be renewed over time as older mechanisms become vulnerable to attacks.

Do I Even Need It?

This is a fair question but for many situations, the answer is a simple, "yes". Setting up TLS is not very difficult and the prices of certificates are cheap (unless you self-sign which is free!). Unless the data you are sending and receiving is completely public already and no authentication is taking place, you should probably use it.

Traditional views about the burden of SSL are largely irrelevant on todays machines, except the most fragile units like mobile phones, and I read the other day that Google's average CPU load went up by a mere 1% when SSL was enabled for all gmail accounts.

So What Is It?

If you want to encrypt your data, you are trying to prevent anyone who doesn't have the "key" from knowing what the data is in its "plain text" form. This is fine if you are sending encrypted data between two pre-established points, which can already have agreed on a key but for many situations where there are many users and most or all of these are not known to the web application, it is both impractical but also insecure to distribute the key to all of these people in advance. What TLS does is provide a mechanism that allows for a client to agree on a key to use with the server and afterwards to use this key for the entire session. This key will not be the same as the key for any other client and will not be the same one used if the client comes back another time to connect to the server.

If you are interested, wikipedia has a whole load of documents describing how this takes place, starting here: http://en.wikipedia.org/wiki/Transport_Layer_Security. We will cover how the basics work.

Secrecy and validity

The first important question is how the client establishes that it is indeed talking to who it thinks it is. It does not want to agree encryption keys with a spy server pretending to be the real server. For this, we have the whole Certificate Authority scheme which is a hierarchy of Trusted certificate issuers who basically use a secure system to assure you that if a certificate claims to be from domain xyz.com, then the server really is xyz.com because theoretically, no-one else would be able to sign a certificate claiming to be from xyz.com. Sadly, this is not actually true, since any trusted issuer can issue a certificate for any domain, which is great for competition but not so great for security. If someone steals a signing key from one of these companies (or someone bad works there) then fraudulent certificates can be generated. Currently, the only way to prevent this is to do some kind of pinning of certificates to the issuer that is known to have issued them. We won't discuss this further since the proposed solutions are not widely adopted.

Once the client sees a certificate claiming to be from xyz.com and decides it can trust this certificate, it can then use the public key contained in the certificate (which is a slow encryption method called asymmetric encryption) to encrypt some data and send it back to the server. The server can only decrypt this with its private key. Even if someone stole the certificate, without the private key they could not determine what this data is. This data is used to encrypt some other data and send it back to the client to prove that the server did receive and decrypt the data from the client - the circle is complete.

What happens next is that the client can send a list of TLS protocols that it supports to the server and the server can choose one of these to decide the details of the rest of the handshake and the methods used to encrypt the session keys. Usually, it will choose the first one in the list it supports.

Once the session keys have been generated and established at both ends, the client and server can now start using symmetric encryption which is much faster - the overhead is not noticeable except at very high server levels.

At the end of the session, one or both parties can signal to the other that the connection needs to be torn down, at which point, the machines can clean up their encryption keys.

Setting It Up

Setting up TLS is pretty easy, certainly on Apache, nginx and IIS. In IIS, for instance, you can request a certificate within the GUI, which produces a request file. This file is sent to a certificate issuer who will give you a certificate from as little as $10 (free from some places if you are not-for-profit), which you then add back into IIS with the click of a mouse. At this point, you can enable SSL for a given website, add the binding to port 443 (or whichever) and then tell it which certificate to use.

Apache and nginx are pretty similar but mostly done in the configuration file, pointing to the locations of the certificate files.

If you only did that, you would get a system that works, that gives you a green padlock and a warm fuzzy feeling that you are nice and secure so what's the problem?

Configuration and Weaknesses

There are some problems with these default setups. Some of it is understandable, other parts are suspicious, all of it should be checked and as a developer (or an IT admin), you should be familiar with the latest issues found in SSL and various algorithms therein. You should also be aware that some tools are very useful for helping you but others might also give false positives or false negatives so try and cover as much as possible and record what you know.

There is a difference between locking down systems where you have control over both ends (a mobile app connecting to your web server) as opposed to websites that you want to reach as many people as possible. TLS provides a big headache when it comes to older browsers and we will see why.

The main areas of issue are the relative weaknesses of TLS algorithms (sometimes vs speed) and the weakness in some libraries to correctly handle the SSL handshake and verification of Certificates. A very useful tool can be found here: https://www.ssllabs.com/ssltest/ which will connect to the specified server and check for known issues, spelling them out in useful detail. You are unlikely to score more than 90s across the board unless it is your own server but this is usually at the cost of older browsers not being able to connect (including newer versions of FireFox and IE!).

The History Weaknesses

History provides weaknesses in certain ways. As you might imagine, although TLS version 1.2 is reasonably well understood, many browsers either do not support it at all, or more confusingly do not have it enabled by default. For this reason, your servers must still support TLSv1 (and 1.1) and this, although reasonably secure, is certainly not ideal.

History has also shown that certain algorithms are no longer secure enough, perhaps just because they are old (and key lengths need to increase to make them harder to brute force) or because time has revealed weaknesses which can be exploited. Although most servers allow you to configure which algorithms you support, usually, the ordering is not definable since the client tells the server which algorithms it supports in order of preference. These also need to be balanced with the fact that you may need to support older algorithms for older browsers. By supporting older algorithms, even if the order puts the best ones first, it is sometimes possible for an attacker to force the client and server to both downgrade their algorithm to a weak version which can more easily be hacked or exploited. The general instruction here is to put the strong ones first (DHCE and DHE) and some middle-ones afterwards that are reasonably strong and not to include any of the most weak versions - ones that use things like MD5 hashes. (There are various articles on Google about which to use)

The third way in which history hurts is that the longer that these have been exposed to the wild west of the internet, the longer attackers have had to try and exploit them, in which case certain modes like block chaining and RC4 encryption have supposed weaknesses that mean that although these attacks might be unlikely or hard, these systems are worth avoiding.

The Trade-off Weakness

Another type of weakness relates to the fact that some algorithms are much slower than others for given key sizes. This means that although a client might want a strong algorithm, a server is more likely to prefer one that is faster, even if a little less secure. The advice here is to start with the strong ciphers and worry about the performance when it becomes a problem. Load-test the systems to compare different ciphers before making the systems weaker.

Client Weaknesses

Client weaknesses fall into two camps. The browser weaknesses, the most common method used to connect to web applications, but also any net-aware client applications, assumed to be written in-house but the same issues apply anyway.

For a browser, your hands are tied. Unless you are one of the open-source browser developers, you have limited control over what is enabled/disabled, what cipher suites are supported, which are preferred and how well they are implemented. The plus-side is that the weaknesses are usually well known and well documented. Another issue with browsers is that you have no control over older versions that still exist and from which you might well want people to still connect to your system. Forcing people to update browsers is not easy, even if they are direct customers. The advice, know the limits of each browser, have an idea of which older versions you won't support (based on usage stats) and keep aware of the latest changes. Once a good percentage of browsers support, for instance, TLS 1.1, you might disable TLS 1.0. Also, try and capture browser stats for your own sites. General browser stats are handy but it is better know what your users connect with to decide how many will be affected by tightening anything up.

Client apps (and net-aware mobile apps) suffer the same potential problems as browsers but these should be more within your control. The following section discusses certain things that you should check before you trust these apps to communicate effectively.

Client App Weaknesses

Like most developers, I found a library for my Android app (The Apache HttpClient library), plugged it in and away I went. There are, however, a whole load of assumptions I have made about this library which I have not currently checked but definitely should.
  1. Does the library check that the host name on the SSL certificate matches the host name I am using the client to connect to?
  2. Do I get an error if the certificate is self-signed?
  3. Do I get an error if the chain of trust cannot be established on the SSL certificate?
  4. If I revoke a certificate, will the library ever detect this and block the connection? If so when?
  5. What cipher suites does the library send to the server and in what order? How can I limit this, especially if I only talk to my own servers?
  6. What protocol version does it use by default and which does it support? SSL? TLS 1.0? TLS 1.2?
  7. Does the library correctly implement security features such as blocking client renegotiation?

Conclusions

TLS is a minefield but although it is quite a large field, fortunately, the information is all there on the net to make informed decisions. Sites like stackoverflow or serverfault are frequented by clever people. The Qualsys site is fantastic for the latest best-practice, and the people who run the lab are very active on the forums there.

Spend some time learning about how TLS hangs together and remember, most of the time, something is better than nothing. Start with the first steps and try and test and fix as time goes by so that at least your system will be hard enough to crack, which will cause the attackers to try somewhere else instead!

Monday, 14 October 2013

CodeIgniter: You have specified an invalid database connection group.

We are currently going through a range of frameworks trying to create and test plugins for our authentication system at pixelpin.co.uk. Out of habit, we have been creating a database and user for each of the frameworks and plumbing them in to the framework. CodeIgniter was not happy.

My colleague was working through a tutorial and trying to display a list of news items, bringing up the page caused a few errors including the one above. This was because in the config.php file, the value of $active_group was set to 'test' even though the database connection settings were all set against 'default' (there were no settings for 'test').

That fixed however, the page still did not work but fixing the first problem had only uncovered something else. Using a set of "echo 'Hello'; die();" lines throughout code, we worked out that a line in the controller that was supposed to return the query results (as an array) failed.

We tried to connect to the database manually with the user we had created for the framework and realised that somehow the permissions were not set up correctly. This might have been because we recreated the database but not the user and although the user still looked to have permissions to the DB, in fact, it didn't.

We re-ran the permission code and it all worked. We had assumed that CodeIgniter uses the database by default since the site was working, but clearly, it only uses the database for developer-created tables that might or might not be required as opposed to things like wordpress which are data driven right from the off.

Moral: Development is messy and hard!

Using FastCGI instead of ModPHP

(Updated for Apache 2.4)

On a test server, I wouldn't normally care in too much detail about things like php modules and the like. Apache works out of the box with mod-php, so why bother?

One thing that can be annoying is that by default, the user www-data owns the web root so if I need to write any files to it, I have to use sudo, at which point they become owned by root and not accessible by the web server. Every file write is followed by sudo chown.... to change ownership to www-data.

A way round this is to use fast-cgi which allows the files to be run as their owner and which means I can write files into the web root without using sudo and without running chown after I make every addition. It also makes it much easier to use FTP/WinSCP to copy files to the server when connected as a user other than www-data (i.e. every time).

So how do we change Apache to run fast-cgi instead of mod-php?

Install fastcgi

sudo apt-get install libapache2-mod-fastcgi

Note that if this is not found, you might have to un-comment the multiverse repository in /etc/apt/sources.list and run apt-get update. Once installed, create a file named php-fastcgi.conf inside /etc/apache2/conf.d (<= Apache 2.2) or in /etc/apache2/conf-available (>= Apache 2.4) and put the following contents into it:

<ifmodule mod_fastcgi.c>
DirectoryIndex index.php index.html
AddHandler php5-fcgi .php
Action php5-fcgi /php5-fcgi
Alias /php5-fcgi /usr/lib/cgi-bin/php5-fcgi
FastCgiExternalServer /usr/lib/cgi-bin/php5-fcgi -socket /var/run/php5-fpm.sock  -idle-timeout 900 -pass-header Authorization

<directory /usr/lib/cgi-bin>
Options ExecCGI FollowSymLinks
SetHandler fastcgi-script
Order allow,deny
allow from all
</directory>
</ifmodule>

Apache 2.4 only: In the above config, replace the older config lines "Order allow,deny" and "allow from all" with the newer config "Require all granted". Once this has been done, you should enable the use of this new config file by typing: sudo a2enconf php-fastcgi.conf

Then ensure that the actions module is enabled:
sudo a2enmod actions
and finally restart apache
sudo service apache2 restart

 Disable mod-php

The installation will automatically enable fastcgi but you need to disable mod-php:

sudo a2dismod php5

Install FPM

The fast cgi process manager is a "nice to have" when coupled with fast-cgi. I am including it here because it is part of the instructions I have!

sudo apt-get install php5-fpm 


Now edit the file /etc/php5/fpm/pool.d/www.conf and add the following lines:

[www]
user = <your username>
group = <your username>
listen = /var/run/php5-fpm.sock
listen.owner = <your username>
listen.group = www-data
listen.mode = 0660
pm = dynamic
pm.max_children = 10
pm.start_servers = 2
pm.min_spare_servers = 1
pm.max_spare_servers = 3
chdir = /
php_admin_value[error_log] = /var/log/fpm-php.www.log
php_admin_flag[log_errors] = on


Now restart php5-fpm and apache2:

sudo service php5-fpm restart && sudo service apache2 restart

Change Directory Permissions

Once this has all been done, you need to set the correct permissions on the files in the web root.

sudo chown yourusername:yourusername -R /path/to/webroot
sudo chown yourusername:www-data /path/to/webroot
sudo chmod 710 /path/to/webroot

Thursday, 10 October 2013

How Startups should design solutions to scale

Scale, the dirty S word for companies who lack experience designing such systems. A word which can strike fear into the hearts of many and which can be done badly at either extreme. You can either not consider scale enough, in which case you might fail very early (or at least have a lot of rework to do later) but at the other extreme you can be so heart-set on scale that you waste time and money trying to make a perfect scalable system. I believe from experience that the ideal is somewhere in between these extremes and I want to lay out some principles below about how to decide on your design when it comes to scalability.

  1. The first and overriding principle is that you should be aiming to scale but not too much in one go. If you were buying a new house, you might get one with some extra bedrooms for any children you might be planning on but you wouldn't usually buy a house with 8 spare bedrooms in case you end up with 8 children. Why? Because firstly, you don't really know whether you will end up with 8 children and more importantly, there is an expense with buying too much scaling room and after all, you can upgrade your house later if you need to. I think this works in software. Having some breathing room and knowing that you can cater for the next 6 months to a year of hoped-for growth is great but you cannot predict the future.
  2. Technology moves on. You don't know what technology might readily suit your system in a year or 2 years time. NoSQL databases, new languages, special hardware, new caching systems, all can have a massive effect on your system performance whereas if you try and build in 50 years of scalability, you will base it on today's technology and spend millions building your house which will look out of date in 5 years time!
  3. You don't know how much your system will need to scale. We are producing a system that could ultimately be used by millions of people around the world but if I plan the system around that, I will be paying for a lot of redundancy that will not be needed either for a few years or perhaps ever. By imaging a very good case 6 months/12 months, I can plan for, say, 100,000 users and base my design on that. I don't need to squeeze every millisecond out of my database queries or multi-thread every single part of my system. At the moment, I don't even use memory cache because on hosted servers, memory is expensive and we wouldn't be able to cache very much of any use anyway.
  4. If you succeed, you will rebuild. As I read the other day on an article, Twitter, Facebook, Google have all had to re-factor their technology to suit their scale. Languages have changed, back-ends have changed, parts have been moved around to try and make the bottlenecks occur at places that are easy to increase like web servers. None of these people could have realistically built their original systems in the languages they now use. This might be because the new tech didn't exist back then but it might be that the overhead of the development work required just wouldn't have provided payback when the user base was small, ironically, it might have cause them to be failures instead of successes.
  5. Your design will change! We have a system with relatively few pages, few use cases and not many routes through but we have already changed our design in about 4 major ways inside a year. This has had knock-on effects on the parts of the system that are doing work but if I had spent ages designing a super scalable system in the early days, I might already have had to tear that down and start again with the new system.
  6. If you end up being successful, you can afford to rework it later. Rather than assuming you need to get the Rolls Royce before you are viable, buy an Audi and prove that you are a good driver. Once you succeed, take on more developers and start to improve things that need improving.
  7. Development cycles are much shorter than they used to be. Our system is relatively simple but if I had to pull out SQL Server and put in MySQL, it wouldn't actually take very long, perhaps a few days or weeks. We shouldn't fear rework and replacement systems - this is part of what we employ developers for.
  8. Try and identify areas that whose performance will decrease linearly and others that might have an avalanche effect - monitor all of these. A web server will roughly slow down proportionally to the number of connections made which will relate generally to the number of users. At the point that the performance becomes unacceptable, I can usually add another web server and this is usually easy enough. Other parts of the system are potentially more error prone. What happens if you exceed your service provider's bandwidth allowance? Do you get throttled and cause a massive drop in performance, caused by that one small request over and above the limit? You need to know about these hard limits because if the performance drops massively, people might start to leave your service.
  9. Learn what is easy to scale and what isn't. I recommend all web apps are designed to work in a server farm. This is either automatic with many cloud services (PaaS) but even if you have to create the farm yourself with 2 web servers and the farm server, this then allows you to increase web connections very easily. Databases are hard to scale so keep the database as slick and quick as possible to avoid this issue early on. Try not to perform any CPU intensive operations on the database server. There are ways to split and shard databases but these are best avoided since there are all kinds of dragons there.
  10. Don't worry. Stick with what you know, employ people for bits you don't know and learn from your mistakes. It is more important that your company deals with issues in a timely fashion much more than it is to never make mistakes. Learning from your mistakes should be done by asking why a mistake happened and what can be done to avoid or reduce it happening again (test cycles, checklists, 3rd-party verification, whatever...)

Tuesday, 8 October 2013

Drupal Hybridauth Social Login plugin stuck in loop when logging in

We have been trying to create various PHP plugins for the PixelPin single-sign-on solution and one of these was for Drupal. I assumed it would be easy since we had already written the PixelPin plugin for HybridAuth for the wordpress social login and it is the same library.

We altered the Drupal plugin and added the PixelPin files yet when trying to login with PixelPin, the site got stuck in a redirect loop and didn't seem to log any errors apart from random ones appearing on the front page saying, "An error has occurred".

It took a while and lots of debugging code to realise that I had misunderstood the configuration of the providers. In the file hybridauth.admin.inc, all providers start with a secret, a key and an id. Since we don't use application ids, I added PixelPin to the array on line 444 which unsets the id - I was left with a key and a secret. However, the HybridAuth library requires OAuth2 providers to use id and secret, not key and secret. If these are not set, an exception is thrown but this is somehow swallowed by the framework and leads to the redirect loop.

I changed it to remove the "key" instead of the "id" from the config for PixelPin and it was all OK again!

Monday, 7 October 2013

Calling .Net code from SQL Server

Introduction

How do you call .Net code from SQL Server and why would you want to? There are various reasons why you might want to but they all come down to a simple answer, "doing something that is easy to do in .Net but driven from a database".

In my case, I want to trigger from one database and if certain changes are made, to log these and then call onto a web service to update a dependent system (that does not run on the same SQL server). Obviously the trigger is easy in SQL but logging and calling web services is much easier in .Net - it is also easier to debug from Visual Studio.

This is how to do it....

Create a Visual Studio Project

Firstly, create yourself a database project in Visual Studio. I believe some of these have changed names but in Visual Studio 2012, there is only one database project called, "SQL Server Database Project". I think the older versions had several projects with example files in them, in which case, choose the "CLR User Defined Function" project.

Once this is created, you might or might not have any code but if not, choose "Add New Item" on the project and look under SQL CLR C# for the item called "SQL CLR C# User Defined Function". Give it a name and add it.

Once you see this file, it looks very similar to normal C# but with a special attribute (Microsoft.SqlServer.Server.SqlFunction) that will let it be called from SQL Server. You will also notice that the types live in the System.Data.SqlTypes namespace which ensures they are correctly marshalled between .Net and SQL Server. Otherwise, it is all pretty normal stuff.

Set the Project Properties

Right-click the project in the solution explorer and choose "properties". Here, you can set the names for your assembly (if different from the project name) and also change the target framework to 3.5 if it needs to work on SQL Server 2005/8.

You can, and should, also set the Assembly properties so you can more easily keep track of your code. Pressing this button creates an AssemblyInfo file.

If your assembly does anything outside of itself like file IO or network access, it will need permission to do so. You specify this by setting the Permission Level (details are here). If you have chosen anything other than "SAFE", you will need to sign your library. Do this by pressing the Signing button and choosing to sign the assembly, if you do not have a strong key already, you can create one here in the dialog.

Add Login and User

A CLR function needs to be owned by a user (this code will become a database) so you will need to add a user without login to your project and then set this name also in the Project properties against the "Assembly owner" on the first page. If you try and create a login in the project, it will fail deployment later.

Build Project

Build and deploy the project, you should get no errors. You might optionally add additional functions or other tables etc. This build should produce a .dacpac file as well as the assembly dll.

Prepare the SQL Server

The SQL Server will not allow the CLR object to install or run by default.

Firstly, you will need to enable CLR integration for the SQL server. Run the following query against the master database:

sp_configure 'clr enabled', 1
GO
reconfigure
GO

Note this does not require a restart.

Secondly, you need to create a login linked to the key that you used to sign your assembly with. The easiest way is to create an asymmetric key from the assembly file like this:

CREATE ASYMMETRIC KEY MyKeyName
    FROM EXECUTABLE FILE = 'C:\Users\Luke\Documents\Visual Studio 2012\Projects\MyProject\bin\Release\MyAssembly.dll'  

No password is required in this statement.

If you get an error here, it might be because the directory your assembly lives in does not give the SQL server user permission to read the contents of the directory, in  which case, just give "Users" permission to read e.g. ...\Myproject\Bin\Release.

Next, create a server login (I used the user interface) and point it to the key you just created in the "mapped to asymmetric key" dropdown list. Once you create this login, you need to give the login external access permission like this:

USE master
GO
GRANT EXTERNAL ACCESS ASSEMBLY TO [MyLogin]

Import the Data-Tier Project

Right-click the databases tab on your server explorer and choose "Deploy data-tier application". The options are quite easy to understand, point it at your dacpac file and press go. What happens during this import is that the server will determine if it is happy to give your CLR code the permissions that were specified in the properties. For instance, if the code requires external access, it will use the assembly signing key to associate the code with a login (the one you just created) which is linked to the same key. This is how the server establishes the trust relationship since only a system admin can create server logins.

Try it out

It works as any other database function does. For instance SELECT DataTierApp.dbo.MyFunc( Param1, Param2)