Friday, 18 December 2009

Hacked Email Accounts, Phishing etc

A colleague of mine was concerned today because she noticed funny stuff happening related to her Hotmail account. She was not tech savvy but knew something was wrong because she thinks her emails have been deleted from her inbox and also she had a reply from a friend thanking her for an email she did not send. I decided I would write a guide to email safety for people who have little or no technical knowledge.
The first important point is that email is generally extremely insecure. Although it is possible to use techniques mechanisms to make it better, in general email is sent in plain readable text over public networks. On top of that, the way in which email is sent is archaic and easy to spoof.
For instance, when you receive an email that says: from: markysmith@hotmail.com, you assume that the email is from that person. You assign an amount of trust to the information in it depending on how well you know Marky Smith so you might click on links to various sites, "check this out" or go to a product page at a shop. The problem is that you have no guarantee that the email is from the real Marky Smith even if it is the correct email address.
Firstly, the from address can be set to anything so I can send an email and change the from address to "bill.gates@microsoft.com". You must not trust the from address by itself.
Secondly, people often obscure information in emails to make them look trustworthy. They might pull actual images from, say, a bank website and create an email that looks like it comes from a bank. If it is sent to a million people then chances are some of those will believe it to be genuine. The sender might use web links that look real such as ebbay.com so that a quick look and all appears to be correct. Also, it is possible to write a link that looks like www.microsoft.com but make it direct you to a totally different site (the idea is to hide horrible looking links and make them friendly like "click here" but it can be used the other way round).
Thirdly, these various problems can be used by people who write viruses, especially ones that can read address books, and they can send email pretending to be from person a to person b while actually coming from the computer of person c so that Mr Evil can lure you into a sale or a site that will infect your PC.
What can you do?
1) Make sure you have an up-to-date virus scanner if you use Windows or Mac. I cannot stress enough that your PC can cause massive harm if it connects to the internet and gets infected by something. Harm to you, obtaining your personal data or launching some sort of attack against someone else. This will prevent many situations where you might accidentally get a virus.
2) Use a site like opendns or software like Net Nanny to prevent accidental or deliberate navigation to dodgy web sites, many of which could harbour viruses with the promise of "free nude piccies". You might need someone to set it up for you but talk to your local PC shop (not PC World!) about what options are available.
3) Never trust emails. No official body ever needs to send links in emails but if they do, try and navigate to it by opening a browser and going to the home page directly (don't click the link). Only click email links if you are expecting the email, such as just having registered for something and having to confirm it. Use a browser like Firefox (free download) which will warn you if the link you are clicking is not the same as the text displayed (i.e. it is dodgy). If you send emails with links in to your friends, get in the habit of typing something in the email that proves you are who you say you. Phrases like "click here" could be genuine but could easily be bogus whereas "Hi Sallykins saw this vides, reminded me of our night in town" is more likely to be genuine.
4) Educate your friends and family. Even youngsters are not getting taught very good web etticate at school. The more people who wise up, the harder it will be for people to get their way.
5) Change passwords every few months and try and keep them different. It would be better to write them down somewhere at home suitably obscured (I sometimes hide them in addresses in address books) than to never change them for fear of forgetting. Also use strong passwords with numbers and characters, you can make simple words doing this, such as @lbatr0ss

Be safe!

Tuesday, 20 October 2009

Password handling and hashing

A lot of people take a very simplistic or optimistic view on security. You assume that no-one will ever hack your system (perhaps there is no obvious reason to do so), you trust everyone who works for you or you simply don't really think about it. One of the most serious problems is the handling of passwords. Here are some facts that you might not know about password best practice.
  1. People often use the same passwords for more than one site so if you allow access to human-readable passwords from your site, it migth well compromise something of higher value like ebay, hotmail, banking etc.
  2. You must use defence in-depth. Just because you think that your database is secure doesn't mean you can stored passwords in plain text. Using this approach, if somebody does hack in, they then have an open door to do whatever they want.
  3. Although it is hard to hack a hashed password, it is not impossible. (hashing is obscuring it in a way that makes it very difficult to find the original value). A hacker only needs to find a word with the same hash regardless of whether it is the actual password. Some people run machines 24-7 building a list of word-hash pairs so it can reverse lookup hashes. For this reason, you should salt your hashes. This means if somebody chooses a password of, say, "PrettyBoy", before hashing it, you change it in some known but secret way like add the characters "123" to the end (or preferably something more obscure) before hashing. Suppose then the hash of "PrettyBoyab145" is ABC123, even if a hacker discovers this (which would require knowing which hashing algorithm you use) and does a reverse lookup for the password, he finds "PrettyBoy123", types it and the password fails because checking the password "PrettyBoy123" adds the salt "123" to get "PrettyBoy123123" which when hashed (e.g. XYZ456) and compared to the stored value of ABC123 will not match. Of course with a simple salt, the hacker might guess the actual password but if you add a number to the bytes of the password then this will be much harder to decipher since reverse lookups against standard hashing algorithms will probably produce garbage which won't look like a real password.
  4. You NEVER need to store plain text passwords. If someone forgets the password, use a reset mechanism. Ask them a question or two, use their registered email address and allow them to set a new password but make sure the system cannot even theretically allow a hacker to call the reset function against someone elses account. e.g. the stored proc that resets the password should look something like: procResetPassword(varchar @account, varchar @email, varchar @answer1, varchar @answer2, varchar @newpassword, varchar @newpasswordconfirm). A hacker cannot know all of these, if he does then he can simply use the normal "forgot password" page and wouldn't need to hack in the first place.
  5. You need to stay up to date with the latest knowledge concerning hashing algorithms. As technology gets faster, older algorithms tend to become insecure because it takes less time to reverse them by brute force. You don't need to update the minute somebody suggests a weakness but once a year checking and upgrading can be useful. Bear in mind, this will affect existing passwords so you would either need to know which hash to use for a given user or have to get all users to reset their passwords.
  6. Even having a good backend system does not prevent somebody snooping on the transmission between the web page and back end when by default information is sent plain text. You should consider redirection to HTTPS (SSL) for login, even if you then redirect back to standard HTTP. This way, there is encryption between client and back end.
  7. Even if the transmission is secure and you have a secure backend, a session is generally only a cookie so it is usually easy for somebody to jump straight onto a machine used by somebody else and gain access to the site. Cookies might also be stored and reused later. Ensure you understand sessions. Always provide an accessible logout button which wipes the session and try and use existing libraries instead of rolling your own. Allow others' mistakes to improve your code.
Play safe!

Wednesday, 5 August 2009

Creating QT Widget in 4.5

I have started writing an app in QT for controlling my keyboards via MIDI. QT is the obvious choice because it works really well on Linux, can be ported across platform (although I am using a Linux MIDI library) and the tools and help that come with it are superb. Anyway had a little fun creating a widget.
I wanted a widget for a double list box with the two buttons to move items from the left into the right list and vice-versa. Assuming that I might well use it again, I wanted to make it visible to the designer so I found an article on the Trolltech website which pretty much describes the process: Here but I had a few fun experiences along the way since I am not massively familiar with QT and the QTCreator IDE. So I created a new project using "New" and chose a widget as the project type. This doesn't set things up exactly as required (it inherits from some styleplugin thing and the output path is a style path) but it does some of the work like setting the "debug_and_release" compiler switch and "library" as the output type.
You need (assuming one custom widget in the plugin) a class for your actual widget which is literally as you would write any other widget - inherits from QWidget and houses whatever contents it needs. The other is the plugin class which you pretty much copy from the example and rename for whatever you want.
The first problem was seeing the target.path and CONFIG variables mentioned. I couldn't find these and it wouldn't build then I found out that the .pro file is actually text and can be opened into the IDE where some of these variables are. It is confusing because there are some project properties under the projects button but hey-ho.
The next problem I had was related to accessing the designer directory which would allow my plugin to be available to all other projects in the QTDesigner. This is what was set in target.path but by default the permissions are for root access only so this is what I did:
Navigate to the directory above designer (in my case /usr/var/qt4) and then run "sudo chmod 777 designer" which allows full access into the designer directory for everyone. I tried read/write only but then when it tried to "stat" a file it failed so had to give it the works.
It now all builds fine, just make sure you follow the tutorial.

Thursday, 18 June 2009

Securing your web apps

Web apps are so commonplace today that we totally take them for granted. If you write and maintain sites, how much do you think about their security? How much do you know about what dangers and vulnerabilities exist in your software/web server/database server and in what way can you protect yourselves?

Why protect?



  1. You don't want people to gain access to private/sensitive information. Once it is out, it is out!

  2. If your site is infected with malware, it will affect your reputation and can result in your site being blocked to everyone (including customers!)

  3. You don't want people damaging or deleting your work in what might be a totally malicious attack. People with no backup policy have lost immense amounts of information which is mostly related to someone's time and therefore money.

  4. Just because your site is on an intranet doesn't mean someone won't get through your firewall one day (for whatever reason) and also if your app was to go public, you DON'T want to retro fit security

  5. You might think you trust everyone in your company but this is a false assumption and bad security can expose things accidentally as well as deliberately and there is possibly a lot of data to expose that you would rather stayed hidden.
  6. It is good discipline to get you used to thinking security

  7. If people get passwords from your users and something like an email address, there is a good chance they will be able to log into common sites like EBay, Facebook and Hotmail since people regularly use the same passwords for more than one site.

  8. By more people securing their sites, malware has less effect and will reduce the current efficiency of people trying to spread chaos/SPAM via infected machines.


How do we approach security?



  1. A weeks security course would be money well spent

  2. Assume that all data from outside your control zone could be malformed. The browser is not the only way for information to be passed to your web app so any client code/javascript/error checking can be totally bypassed.

  3. Rely on a chain of protection at an appropriate standard in every level of your app. A key checker on the web page is OK but doesn't stop bad code being injected so you also need sanity checking in the code-behind files and in your business objects and in your database. Don't think that one layer will prevent all attacks

  4. Use stored procedures, NEVER use SQL directly to query databases from your business logic since you open up so many avenues for SQL injection. Parameters in stored procedures escape any text that might contain SQL code so it can not do anything useful in the database.

  5. Never reflect untrusted user input back to the web page without sanity checking it. It is easy to inject cross site scripting code into an unchecked input field which then gets parsed by the browser on reflection as valid HTML which can be used for all manner of bad things. Using regular expressions takes some learning but they are fast and very powerful. You might, for instance, not allow usernames to contain anything other than a-z, A-Z, 0-9 and the @ symbol.

  6. Do not go outside of standard software and techniques unless you are properly qualified and able to do so. Standard software gets a lot of exposure and bugs are usually seen and fixed quickly.

  7. ALWAYS hash passwords so they can never be seen even after a successful hack. You do NOT need to be able to tell your users what their password was. If they forget it, have a system to reset it/set it to something else in a secure way.

  8. Encrypt ALL data that has either intrinsic value (bank account details) or that cannot be recovered/changed esily if lost like names, dates of birth, mothers maiden names etc. Again you cannot guarantee that a hack is impossible so better let people steal a load of data they can't read. Have a secure system for protecting encryption keys.

  9. Become involved in owasp, a not-for-profit organisation that does a lot of work in the area of web security and who have many resources and articles about safe coding and safe processes.

  10. Always use intrusion detection, setup to balance the security with the extra server load/cost. For instance, always lock out accounts that have 3 invalid logins and which require a phone call or secure reset method. This prevents brute-forcing of password.


Don't be blaze about it, you don't want to be the next person to lose 100,000 patient records!

Tuesday, 16 June 2009

Good OO Practice

OO is more like art than science. There is usually more than one way to achieve certain functionality but here are some general suggestions for good code.

  1. Encapsulate stuff. In other words, make fields private and other functions private, protected or public - make the security as restrictive as you can. The more secure, the less that the function can be called which reduces code use and the chance for problems as well as making it easier to know how to test it. It also prevents people calling functions that seem to be right when they are supposed to be called internally to a class. "Internal" is useful for keeping stuff visible inside a single assembly (dll). Think about the testing required for a dll with 100 public functions rather than one with 10!

  2. Only pass into functions what you need to use. Do not pass a parent object and then extract some property of it, if it needs the child, pass the child in. This again makes the intent of the function clearer and allows irrelevent changes (such as changing the parent type) to not affect functions that don't care.

  3. Try and make functions either call a list of other functions OR do something technical and low-level. The functions that call functions are easy to understand and check, the others should be small and easy to unit test. Mixing them makes them hard to understand.

  4. Try and work out whether it is better to replace structured constructs like switch statements and if/then/else with classes. Putting logic into structure is so much more robust and easier to check. Consider a statemachine in a switch, what is easier, to ensure that all the other case statements change the state variable correctly or in an OO state machine to have a class named MovingState that has specific transitions and guards coded into one function in one class? Again encapsulation is your friend. You can't affect anyone elses variables or functionality because you cannot see them.

  5. Try and consider a class as either data, control or boundary and don't mix them. Data classes are persistent (usually) and have functions related only to their data. Control classes can be created and destroyed as they are used and manipulate data or send information to the system boundary. Boundary classes translate from one system to another (including the user interface, ports, the network etc) and can be considered pluggable so that, for instance, a web service can be plugged in where a user interface was previously since they can both send data that is conceptually the same even if one has no actual user interface.

  6. Use the safest types for the data you have. If you are doing money stuff, a decimal is almost always preferred to a double or float because the maths functions work properly. If your data will be integer, use integers for similar reasons. If you have pairs of data that are related, either use something like a Map/Hashtable to link them or otherwise create a type to encapsulate both, don't assume that you can keep any related things up to date with each other without a setter function or constructor as the only means to do so.

Wednesday, 10 June 2009

Calling Web Services from Reporting Services

I wanted to do this because I didn't like the idea that we needed SQL procs in a database for reports that have to do the same thing as business logic dlls did for the web app pages. This duplication not only made things harder (being done twice) but allowed the possibility of breaking one when the other was changed and the fact that SQL is so different from code that one was often much easier to change than the other. I found out that you can call web service functions from reports so that the same code can populate reports AND web pages. It is not massively easy to work it out and my example uses Reporting Services 2005 and .Net 2.0. It also uses the Reporting Services XML Data Provider for the dataset (since the web service returns xml in the form of soap).

  1. Create your web service as per a normal web service. In order to be generic and not need changing or growing for every different sort of function I would need to call, I made some generic functions that use reflection to call whatever function I want. You can probably work out what I did if I show you the code for the generic function:

    [WebMethod()]
    public ArrayList ExecuteArrayMethod(String typeName, String method, object[] ParamArray)
    {
    ArrayList returnObject = null;
    Type type = AccessType(typeName);

    try
    {
    if (type != null && type.GetMethod(method).GetCustomAttributes(typeof(ExportedMethod), false).Length > 0)
    {
    returnObject = (ArrayList)type.InvokeMember(method, BindingFlags.Default | BindingFlags.InvokeMethod, null, null, ParamArray);
    }
    }
    catch (Exception)
    {
    throw;
    }
    return returnObject;
    }

    There are a couple of points of note. The ExportedMethod attribute was something I added to control what functions were allowed to be exported to the reports and what couldn't, it is optional. Also, the arraylist allows collections of any type to be exported via the function. I have created a dataset flavour which I haven't tested yet and which I don't think I'll need.

  2. Obviously the functions that actually generate your report data must be callable from the parameters you have available in your web service. If the function needs a load more information like connection strings, arguments etc, then either pass them in via the web service function or if they are private, create a parent function that can obtain these values and pass them to the worker function. The function you call from the web service must be static (I think!).

  3. You will need to declare the types of any exported classes (i.e. types put into the ArrayList) at the top above the web service class definition even though you never need to reference them by name in the report. This forces the webservice to include the property data in the xml schema used by callers of the functions. You do this:
    [XmlInclude(typeof(ClassName1)), XmlInclude(typeof(ClassName2))]

  4. You must ensure that these exported classes have default constructors and public getters AND setters for all class properties that you want to access from the report. This allows the system to serialize and de-serialize the objects across the network. The service will build fine but you will get an error when you try and execute the function in reporting services.
  5. Build and deploy the web service to a url somewhere, such as http://myhost/webservices/myservice.asmx

  6. Navigate to the asmx file in your browser and click on the function that you will be calling from your report. You should get an invoke section at the top which you can use to ensure that your code is returning the expected data. Do not go straight to the report without checking this, the report syntax is hard enough without the underlying data actually not working! Also note the RESPONSE syntax under the soap 1.2 section (the second of the two brown boxes). It will contain some soap bits (ignore these) and then a response tag, a result tag and then data depending on what you are returning. These are the important bits.

  7. Open up your reporting services project (or create a new one) and add a datasource of type XML. In the connection string field, put the path to the asmx web service including the http and the asmx filename itself. Under credentials, you will need to put whatever is needed to call functions on this web service. In my case, it is a private intranet so I use windows credentials.

  8. Create/edit a report and add a new dataset that uses this XML datasource, set the type to text and this is where the fun begins. You need to use a query syntax similar to XPath but not quite (search the web for details, there are various MSDN articles about it). This names the method, the parameters and the expected return values (and optional types). I will show mine so hopefully it makes more sense:

    <Query>
    <Method Name="ExecuteArrayMethod" Namespace="Namespace.Of.The.Webservice">
    <Parameters>
    <Parameter Name="typeName">
    <Parameter Name="method">
    <Parameter Name="QuoteReference">
    </Parameters>
    </Method>
    <ElementPath IgnoreNamespaces="true">
    ExecuteArrayMethodResponse{}/ExecuteArrayMethodResult{}/anyType{Location,Description,Schedule,Item_Code,Quantity(Integer),OriginalSale(Decimal),RevisedSale(Decimal)}
    </ElementPath>
    </Query>

    Note that the anyType is the type that is returned in my soap response because I used an arraylist, it might be something else depending on your data. I have also typed some of the return values so that they can be treated as numbers rather than the string defaults. This XPath relates to the following web service xml response:

    <soap12:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://www.w3.org/2003/05/soap-envelope">
    <soap12:Body>
    <ExecuteArrayMethodResponse xmlns="Namespace.Of.The.Webservice">
    <ExecuteArrayMethodResult>
    <anyType />
    <anyType />
    </ExecuteArrayMethodResult>
    </ExecuteArrayMethodResponse>
    </soap12:Body>
    </soap12:Envelope>


  9. Every time you save the xml, it will parse it and give you an error if there is a syntax problem, it does not understand line breaks so you might get a line1, character 300 error or similar, unfortunately, you have to try and build up the element path if your syntax is wrong and hope for the best.

  10. If your web method requires parameters (as mine does), click on the ellipses next to the dataset (...) and click the parameters tab. You need to define any parameters passed to the method here so either a) use static values (in my case the method parameter is static for any given report) or to use a previously created report parameter if it will be different for each instance of the report (like my QuoteReference is).

  11. Exit the edit pages and click the ! button to call the method. You might be prompted for parameters and you should see your expected table of data or you might get an error. If you get data but it is incorrect, check your element path. If you get no data, your element path might be wrong or there might be no data being returned, remove the element path and leave it blank to see if there is actually data present, if there is, your element path is still wrong. if you get an error, click the little red X icon and make the dialog bigger, the errors are often hidden but usually useful "no permission", "could not instantitate type" etc

  12. Once you get data in the right format, your dataset on the left should now show the field names which can be used on the report in the normal way.

  13. One of the big benefits for us is that the workload is shifted from the database server which also seems to be chugging away to the web server which is fairly lightly loaded since the work is now mostly done in a web service not a view/proc.

  14. When you deploy your report to another server, you might get other errors and these are sometimes related to the permissions with which the report server is attempting to access your web service with. When you go to the web service in your browser, you are usually logging in as you and you might have access, the report server however logs in to the webservice as the report server (and not the person running the report). I got round this by modifying the datasource on the live server to use specific login credentials rather than windows credentials.

Tuesday, 9 June 2009

Why have a software process?

There is always lots of hype about software processes for quality/productivity etc but like a lot of people, I was often unconvinced about their value. At the end of the day I enjoy coding, not writing documents and so I would pay lip service to the process of design and specifications but only the minimum amount of work and then dive into code. I was attracted to the idea of things like eXtreme programming which promised quality without the overhead of design and specification but sadly this is a lot of gas imho, quality is not just related to the lines of code but about a whole lot more which is why the process is important. Now that I have less process and less design and specification to work to than my previous employer, I find myself uncomfortable and pine for documents to solidify the quality of my work. So why have a process including design, specification etc?
1) The earlier on the process, the easier/quicker/cheaper it is to fix things. For example, I had a spec review which revealed that a whole chunk of functionality was not required. A minute making a decision saved me potentially days of code (however good it was)
2) It requires that your customers (even internal ones) think about what they want. It is easy to spend 10 seconds saying, "We want a page to do x" with all the cost associated with it, whereas for them to spend a few hours defining it means they will think about whether it is really worth it and allows holes in their assumptions to come out.
3) It enables everyone to have buy-in and sign-off on functionality including everyone agreeing formulas/workflows etc and also allowing budget holders to question the value of the work compared to the proposed benefits.
4) It reduces the likelihood of rework at a later date where the original system implements assumed functionality and then the customer realises it is wrong and needs it to be changed. EVERY change is a risk and takes time/costs money. These risks are in productivity and reputation and why bother reworking? Sometimes it is unavoidable but reduction is better than nothing.
5) It allows systems to be seen in a larger context by people who care about more than one area of the system. With one spec next to another on the desk, a manager can ask questions related to the interaction or consistency of systems/sub-systems with each other. For instance, they might notice duplicated functionality and allow something to be removed in one sub-system since it will be present in another.
6) It permits a quality system to be implemented. If you notice problems in code that are actually related to design weaknesses, how can these be fed back into the quality cycle unless you have design and specification stages to modify?
7) You can have code of 100% reliability/quality from the programmers point of view but that lives in a function/class/system that is not actually of any business use. This is why quality and process CANNOT rely solely on a code view of quality such as eXtreme programming and other shortcut processes.

If you don't have a system, start one. The managers will often need convincing that the extra time is worthwhile but actually if implemented properly it will save time not increase it although there no doubt will be some short term overhead as you learn the way things work.

Thursday, 28 May 2009

Viewstate, postbacks and pains in the neck

It took me a long time to get my head round what was going on asp.net. Sometimes controls did not get their correct values and things didn't always get bound. Here is a simple guide that hopefully will help you know what is going on.
1) The web is not connection based by default. When you go to a web page, you request a page, you are given it (usually) and that is the end. Things like session etc have been cobbled on the top of this system to try and keep track of things but the server never knows for sure if you are still connected which causes all kinds of problems with cookies and secure sites and loggin out etc. If you remember this connectionless pattern, this will help you understand.
2) What happens when you FIRST request an aspx page? The server recognises that the request is a "GET" HTTP request so sets the property IsPostBack to false. It calls various event handlers in a prescribed order (you can find these on msdn) most importantly Page_Load which you would use to set the page up. You probably managed this.
3) What happens when you change something on the page after it is served? This depends on what control you change. By default buttons cause a post back but data fields like text boxes and radio buttons do NOT unless you set their autopostback attribute to true.
4) When you finally cause the postback, a POST request is passed to the server along with the values of any fields on the page (which is why you need to put all the aspx controls inside a form). The server sees that it is POST and assume this is a "post back". It unpacks all the form values into member variables and it also uses something called viewstate to remember any data that is not a current value for a web control. For instance, a text box has no viewstate because the only data that needs to be remembered between posts is the text in it, this is already passed in the form data with the post. Controls like TreeViews however want to remember whether they were expanded etc and because this is not part of their 'current value' it has to be specially stored in viewstate. The viewstate is a hidden field with a load of ascii characters which the system automatically packs and unpacks for you (although you can disable the viewstate to save bandwidth or customise it).
5) If you write a custom control then you might have to save data into the viewstate so that it is 'remembered' between posts (but only if it needs remembering otherwise you are wasting bandwidth). You do this with the Viewstate property which is an array of objects.
6) The area that confuses most people is when working with a data source and a postback needing to update something in a database and then change the screen somehow. Firstly you need to know that Page_Load is called BEFORE your event handler. If you need to access data in the event handler, it must be set up first in Page_Load. Secondly, you will need to change your screen data in code otherwise it will remember its last values.
7) There are some times when you do not want to update from your datasource during postbacks. This is for two reasons. Firstly, because controls remember their values, once they are setup, if they are not going to change while the page is being used, there is no point keeping updating them, secondly you might want a load of controls to come up as disabled when you first open the page and then to become enabled when you click a button. However if you disable them in Page_Load, they will disable every time you cause a postback (such as clicking the button to unlock them!). Use the IsPostBack property to only do things on the first time in Page_Load.

Friday, 22 May 2009

Mixing Forms and Windows Authentication

My friend works on an ASP.net web application at work and uses windows authentication to access the various pages with database backed roles to provide authorisation. He then asked me how to further lock down certain parts of it to require a password. This would be so that certain orders could only be viewed by people who knew the password even if they were generally allowed access to the page.
Of course, the general authentication schemes are designed for a per-page basis and it seemed like rolling your own flavour was quite involved as well as using something like ESAPI from OWASP. He didn't want to spend weeks developing it so I had a play around.
I initially investigated mixing windows and forms authentication so that for the most part you would use windows but for a more protected page, it would re-direct you to a login. I thought this would be best because a lot of authentication stuff is already built in to the ASP libraries. However I shortly realised my mistake. The problem is that the authentication is the first thing to occur when the page request is made. At this point, you do not know whether the page is one that needs a password or not so you have to allow it to authenticate. If you then read the database and know you need a login, you can then force a redirect to the login page but it gets really messy because then you have to keep track of authentication state so it allows the user in the first part but then blocks and then if a correct password is entered, it needs to allow you in until you close the browser. After many hours, I opted for a simple redirect to a login page which gets passed the return url in the querystring and a logout button in the order that flushes the cache and then goes to a neutral page instead. Simple is good.

Tuesday, 19 May 2009

Building Test Scripts

How do you know your code is reliable? That it can handle all situations in a known and/or designed way. For a simple test application, you might be happy for exceptions to be thrown which stop the application from running and which allow you to identify what went wrong but what about serious business or mission critical applications? It is not enough to hope they are OK. Do you think you are a good programmer? You cannot rely on that to make your code robust because you simply cannot guarantee that you will not miss something. Another pair or eyes is helpful but again cannot provide enough reliability.
One of my current favourite ideas is to build unit tests in software that instanatiate your objects and call functions on them to test the expected results. This might sound trivial or ineffective but it is surprising how effective they are since they make you think about what the function should do. You will refactor if you cannot think of a simple test because you won't want to write a 500 line unit test function. The trick is to consider what should and could happen and how the system should cope.
A classic scenario is data input fields. Your user needs to type, i.e. a price for an item which will be saved and then used. How many times should this data be checked in the system? You should assume that all external data is potentially tainted so anything coming into the system from the user OR the database should be checked and dealt with.
The most simple case, you might trust the user to type the correct thing in with no real checking. They accidentally type -100 instead of 100. What happens now? In most systems, the data would be strictly valid (but incorrect) and will cause all calculations to be broken. Things like total = cost x quantity will compute without crashing but will generate an incorrect value. OK, you get clever and ensure the user can only type 0-9 and a '.' character. So what happens now when they type 100.12.23? Your system might not notice but this time something will crash (most likely when the number is parsed into a numeric data type). You should actually use regular expressions for most validation so you can be very specific about valid data and can give the regex a grilling with a unit test to make sure it allows all valid numbers and disallows anything incorrect.
Here are some other things you might need to remember:
1) Number ranges: Are you allowed negative numbers? Numbers with decimal places? Do you need to retain leading zeros? Are the values re-displayed (if applicable) in the same format they were typed in? If they are currency, what happens if someone types the currency symbol into the input field? Do you allow commas/periods to separate numbers into thousands? Do your numbers need to consider the user locale and display differently? Will they need rounding (especially if you have divided them by something)? If so, when are they rounded? What happens if you need to compare these rounded figures with other figures? Will they equate to each other or will you need to check merely that they are within e.g. 0.001 of each other? What happens if the user types a massive number that is too large for your number type? Do you restrict this in some way?
2) Strings: What characters are allowed? Will you encode or remove illegal characters or tell the user that they are not allowed? How long can the string be? Do you know what the database will permit? How are you going to ensure that a user will not type in a string that is too long for the database? Do you catch the potential database errors that might result? How do you avoid people injecting SQL into user input? Do you need to upper case or lower case anything? Do you need to spell check anything?
3) Aggregates: There are plenty of chances for error in situations where you are summing a number of items or performing some other calculation. Do you know that each item that should be part of the calculation *is* part of it? If you are updating a page, does it always update for every item that can be changed by the user? Does the sum need rounding? Should the numbers be rounded before or after they are added? Are there any round-trip issues where a number is perhaps multipled, rounded and then changed when attempting to back-calculate the unit price?

There are many things to think about so take your time and let your managers know that there is always a trade-off between quality and the amount of time given for design and testing.

Tuesday, 12 May 2009

Structure for Reliability

I got stung again today by a function of mine which was a one liner of logic but for which I had not considered a particular scenario. I hate these because they should be really easy to make correct but no doubt most of our code is littered with them. We can write Unit Tests but that can seem a bit extreme for every single function we write (or is it?) but even those do not gaurantee that we consider all the implications and permutations of the function. The function basically did something like
return (Number == 0 && Locked) || (Orders[Number].IsComplete);

Because the number can be either related to a quote or an order we need to check the quote being locked or the order being complete. The subtlety I had missed is what happens when you pass it a number greater than 0 which is not actually an order and what happens if you pass in 0 but the quote is not locked. In both cases, the number is passed to the indexer for orders which then throws an exception. This can be fine but how can you ensure the potential exception is considered and caught or make sure the function is not called for invalid values?
You can resolve most of the issues with the following:

  1. Make the function as private as you can, this way only the class it lives in or possibly subclasses has to be concerned with its correct use. Don't get in the habit of making all functions public just to avoid thinking about it.

  2. Consider whether it is correct to put a guard in the function generally that might only call the logic if a precondition is true such as the order exists etc. This could either be around the logic or part of the logic itself.

  3. Check the logic carefully. In the above example, I should have checked the Number being greater than one in the second set of brackets since orders start from 1 and if the quote is not locked, the second clause would be evaluated for 0 which would then throw.

  4. Ask whether the functions can be pushed further down into the system or refactored to hide them from logic problems. For instance, instead of passing in an int which then has to work out what it is, pass in an object of some interface or super class that you can then call a function on, this way you have no knowledge of the inner workings or logic of the numbering scheme and no chance to muck it up!

Wednesday, 6 May 2009

Should object A reference B or B reference A?

A commmon problem in software design is when you have two entities, one of which describes general information about another and which is shared between sub-objects. For instance, a motorbike and the motorbikeinfo. A motorbike might have registration plate number and colour whereas a motorbikeinfo might have manufacturer, engine capacity and top speed. Should a motorbikeinfo 'have' a number of motorbikes? Should a motorbike 'have' a motorbikeinfo?
There are problems with both approaches. If you take the first, how can you display a list of motorbikes with their info as well? You obtain the list of motorbikes and then have to get the information from an object that is not referenced by the item. The problem with the second approach is that you then share references to an info item between different motorbikes and then what happens if you delete all motorbikes? The info should still exist but where? You then get into a whole area of hassle with weak references and sharing references all over the place or even the dreaded circular reference which causes memory leaks.
There is another solution, the pattern that sounded the most un-useful in the Gang of Four patterns book, the "Facade". With this pattern, we hide the structure of objects behind another object.
With a facade, we can create a class called e.g. MotorbikeItem and this class has a reference to a motorbike and a reference to a motorbikeinfo, it can provide access to all items in both of these as required and it allows the consumer of the object to see a single entity. It allows for restructure of the data classes but most importantly, it avoids all the tangle references, the facade can obtain the indivdual reference from anywhere such as collections, databases etc and it can handle managerial things like deleting the motorbikeinfo if no motorbikes exist any more. Have fun.

Friday, 1 May 2009

HtmlTextWriter breaks lines

Not exactly sure if this is my fault but I was seeing real strange HTML generated broken across lines even in the middle of attribute values. The code was generated by a custom control using an HtmltextWriter to write a large <ul> list. There was nothing apparently wrong with the code generation but for some reason, the HtmlTextWriter was breaking the text up into 1024 character lines and inserting line breaks wherever the string was positioned and I'm sure it was breaking things despite IE's forgiving rendering engine. You would at least expect the writer to recognise tags and break it there but oh no.
In the end I made sure there were line breaks in the string already before writing to the HtmlTextWriter and it seemed happy with that.

Rubbish IE expression bug

Internet Explorer has some none-standard css extensions that you can use to do things that you can't do in css (at least not in their rubbish implementation of it), for instance, you can use
height : expression((mydiv.height > 54)? '54px' : 'auto');
which all sounds fun and useful but beware if your element does not have a height set on it because what can happen is that when the page loads, the browser will ask mydiv what it's height is and the element will say, "I don't know, it isn't set, ask my parent". The parent is then asked what it's height is and it says, "I don't know, it isn't set, ask the children" and continue ad-infinitum. This sort of circular reference should be checked but it isn't and it hangs IE7 and IE8. Theoretically, you can set the height to 0 on the element in question (if it isn't already set) and the problem goes away but I haven't tested it.
This took me HOURS to track down - grrrrrrr
It was reported years ago and still not fixed. For some reason it only happened in some instances and not others, it also randomly affected some users and not others, some times and not others!!!

Wednesday, 29 April 2009

Beware the COLLATE clause

Not sure what the details of this are exactly but I run a test system at work which is basically copies of live database running on my test server for when I test code changes. I usually copy the live backup and restore it to my machine.
Anyway, I notice a particular view was running really slowly on my test server compared to live (even though it should be identical!?) and we are talking 38 seconds instead of 2 here!
I split the view and tried the various parts of it and found a sub view taking all the time. I then opened that and found a particular other view was fine as a standalone select:
SELECT * from ProblemView WHERE...

and the rest of the view was OK but when joined, it took ages. I couldn't really see why but then I noticed the join had a collate clause. Collate is related to the way in which text fields are ordered and depends on languages, sometimes you have to tell it how to collate otherwise the order might be wrong.
Anyway, I got rid of the collate clause and the view became 1 second again instead of 38. Oooh, I thought, a bug in the view, I went into the live site and tried to make the same change thinking it might massively speed it up and got the error:
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_CI_AS"

Obviously the collation somewhere was different and obviously getting it wrong made the view MASSIVELY slow even though the data is quite small. Hopefully you might find this helpful.

Monday, 27 April 2009

plasma-widget-network-manager NO

Had a right ball ache tonight with my mums computer. I installed Kubuntu 8.10 a while back and she is happily using it despite being a Windows user. This was great. If mum could get on with it (albeit with occasional support calls) then anyone could.
The problem was I had the unsupported updates enabled so I could get a more stable version of KDE 4 since 4.1 was pretty rough. I was also annoyed that I couldn't get knetworkmanager to run automatically like it did on my laptop running kde3. Small annoyance but anyway I upgraded to the latest KDE 4 stuff through the unsupported repo and then when 9.04 (jaunty jackelope) came out, I thought this was my chance to solidify the remaining dodgy interface issues and leave my mum with something that will just be cool. Problem was, after upgrading to 9.04, the wireless stopped working. It is one of those things that is unforgivable for software which is supposed to be release ready that something that worked fine stops working. Of course, because it is the wireless (isn't it always) you can't download any updates or search forums for answers - grrr.
My first guess was to backup all mums files to an external hard disk and then do a clean install rather than the possibly dodgy upgrade. This of course took a while but to no avail, didn't work in exactly the same way, the wireless card driver was fine, lshw showed the card with a kernel driver and iwconfig showed a single wireless interface so no double-driver issue. The clue was the fact that I could see the networks in the list but when I clicked onto them, I would get a connect dialog, fill in all the details but then nothing!
Went to the forums and found out that sometimes network-manager required the password in hex instead of ascii so used the conversion code but to no avail. Also then read it doesn't support wpa properly and double-checked but my router was using wep. In the end, I ground my teeth and decided to uninstall plasma-widget-network-manager and go back to knetworkmanager which had worked before. Next mistake, ran the uninstall and install in the same session and of course as soon as the plasma stuff had uninstalled, I couldn't connect to the net to install knetworkmanager, for some reason I couldn't get it off the CD.
Reinstall AGAIN, another 30 minutes.
This time, I installed knetworkmanager and simply right-clicked the plasma network icon and told it to go whence it had came. I rebooted just to make sure it didn't come back up and it didn't (although knetworkmanager still doesn't always come up).
This left me fuming, this is the reason why people go with Windows. It is inferior, bloated, slow, expensive and insecure but for some reason (the details of which are irrelevant to most people) it just seems to work. The fact that the release has regressed is a big pity and I hope some people's backsides are being kicked for it!!

Wednesday, 22 April 2009

Why another security breach?

Just reading on the news that several UK and US government computers have been compromised by a botnet worm that can be used remotely to carry out various other IT attacks. However many of computers were actually compromised and how many were significant and not just some rubbish desktop in a constituency office I do not know but what I do know is that security can be achieved but people are simply either unable or willing to do so.
For instance, any things like flash drives etc can be used outside of work and can end up with viruses on them. Simple, do not permit any important PC in high risk situations to accept USB flash drives - period. If people need to work on stuff on another PC, they need to log in remotely or find some other way to avoid insecure transport.
Dodgy browser? Simple. Remote Internet Explorer and use something else. Setup the security settings and lock them down (although to be fair, sometimes new ways are found to circumvent what were previously acceptable measures). Better still, run an operating system like Linux, Unix or possibly Macs that have a decent security model. Not only will most viruses not even run on Linux but if they do, they cannot do anything to the system without 'root' access and likewise, they cannot spread across the network to other Linux machines for the same reason. The risk is massive and yet these bods are still using lorry loads of Windows Machines.
Dodgy web sites? Here is a simple solution, use a web site blocking tool like WebSense that will probably block 99% of all malware sites automatically so the chances of stumbling onto one are much reduced.
Better still, use a whole range of measures, everyone of them reducing the attack surface and putting these hacker people out of business.
There aren't many people with the power to carry out such security work so if you are one of them, please do it. I don't want to spend millions via the banks and insurance companies to finance some criminal somewhere in the world.

Monday, 20 April 2009

Microsoft take heed

When I see things that are very wrong in MS products, i wonder how this can be. Much smaller companies produce very high quality software (maybe because they are small?) whereas it's the basics that get me.
If you accidentally hover over the print button in an Outlook email, the system stalls for about 2 or 3 seconds, presumably while it is having a look for printers or something. Why? Why stop me working because of a hover? Is it their way of trying to buy seconds before the button press rather than afterwards?
Why is your software model so flawed that updates and fixes are large and take ages to install? I installed Access 2007 (only) and Microsoft Update then offered me something like 150Mb of fixes!! (how big is office anyway?) they took longer to download than Linux would do for 20 times as many fixes. Also Linux installations after updates can be done usually in less than 30 seconds whereas Windows ones go on for minutes.
Another horror story, I reinstalled a copy of Windows XP that I bought about a year ago and went to Microsoft update and there was 500Mb of updates for it. 500Mb!! It took 2 hours to install after it spent several hours downloading.
Come on MS, these are basics. You had a chance with Vista to right many wrongs and you mucked up big time. Are you surprised that loads of people have started using Linux?
IE8. A step forward in web standards apparently. It scores a paltry 18 out of a 100 in the acid test web site compare to the next worst 'other' browser which is about 75 and Opera 2 for Windows can manage 100. Please Microsoft, get good people and write good software otherwise you will die!!!

Wednesday, 8 April 2009

Problem with Windows Path setting

I installed some software yesterday in XP and needed to run a utility from anywhere on the PC. I put the path to it into the PATH environment variable (Control Panel, System, Advanced, Environment Settings) but yet when I tried to run it, I got, "'nmake' is not recognized as an internal or external command, operable program or batch file.". They could simply have said "nmake not found" but good old bloatmasters Windows!
Anyway, it didn't make any sense, I had tried all the usual, definitely got the correct path, definitely runs OK in the actual program directory and I finally found out what was wrong.
When I pasted the URL into the Path environment variable, I had somehow copied a space before the C: so it looked like: "Something Else; C:\Program Files..." The space doesn't bother Windows but it also breaks the path lookup for the directory after the space. Removed it and it was fine. Great!! Only about 30 year old code still has bugs in it.

Monday, 23 March 2009

Event Handlers with Dynamic Controls

Have you ever had to create a page dynamically? You can use datagrids and such-like and bind directly to datasets of some description but what if you are doing it manually and still want your dynamic controls to call event handlers? You might have found that when you press your button or whatever, the event handler is not called.
What you need to know is that the server will map a control id to the event handler after the page is loaded. If you are generating dynamic controls, you need to re-create them with the same IDs in the Page_Load before the event handler is called. This way, although you have effectively garbage-collected the old controls with their event handlers, because of the mapping of name to event, it will behave correctly using the link from the new control that you need to set up.
This means that even if you need to change the layout of dynamic controls after the event handler is called, you will firstly need to create them the same as before in Page_Load and then after the event handler is called, you could then modify them accordingly.

Compiler error

Ever got this one?:
"An error occurred during the compilation of a resource required to service this request. Please review the following specific error details and modify your source code appropriately"
"The compiler failed with error code 1"

Beautiful and helpful all at the same time. Anyway, in my case this was caused by the inherits in the top of the aspx file pointing to the correct name of class but not the correct namespace. For some reason it is not a build error you just get this wonderful error instead. The detail of the error says, "....._ascx.FrameworkInitialize()': no suitable method found to override" which is equally unhelpful.

Tuesday, 17 March 2009

Dodgy Joins

When do you use the ON statement in a join and when do you use the WHERE clause?
Many newbies fall over on this and scratch their heads. I have fallen foul of it many times but have now learnt what I've done wrong.
Take the following tables:
itemprices
Price: decimal (19,5)
QuotationReference: varchar(128) (foreign key references quotes)
ItemName: varchar(128) (foreign key references items)

items
ItemName: varchar(128)
DefaultPrice: decimal(19,5)

It has any calculated prices that have been applied for particular quotations. The first table has 0 to count(*) from items rows per quotation. In other words, for quotation X, the first table might have between 0 and lets say 100 rows (if there are 100 rows in the items table).
Now, I want to get the price for each of the 100 items in the second table for a particular quote but I want to use the Price from the first table (if it exists) in preference over the DefaultPrice in the second.
The SQL might be:
SELECT items.ItemName, coalesce(Price,DefaultPrice) as Price
FROM items LEFT OUTER JOIN itemprices ON items.ItemName = itemprices.ItemName
WHERE QuotationReference = @QuoteRef

It looks correct doesn't it? You use LEFT OUTER JOIN since there might not be an entry for each item in the prices table and then you filter out all the rows that do not belong to your quotation. You will find that it does not work. Can you see what might happen?
The join is incomplete. Simply joining on the itemname would be fine if your quotation had an entry in itemprices but if it didn't then it would join rows from any other quotation. You would then apply the WHERE clause and lose all of the rows. Interestingly if you had a single entry for your quotation OR multiple entries including one for your own quotation, the join would work it is ONLY in the case where your quotation has no entry and at least one other quotation does. In other words you might not notice straight away.
The reason the join is incorrect is that logically you do not want to relate rows that simply have the same name, they also need to have the same quotation reference. A join should only link rows that are related so that if you had no WHERE clause, the data would still be logically correct and consistent. In our case, the join should be:
SELECT items.ItemName, coalesce(Price,DefaultPrice) as Price
FROM items LEFT OUTER JOIN itemprices ON items.ItemName = itemprices.ItemName
AND itemprices.QuotationReference = @QuoteRef
WHERE QuotationReference = @QuoteRef

If you were joining two tables that both had quotation references, you might want the ON statement to use the column name rather than the parameter but either way works. Now what happens if you have no entry for your quotation in itemprices but you DO have entries for other quotations, the join will fail. You will get NULL for the itemprices columns and the select will return the DefaultPrice instead. You would have to tailor it slightly if you were writing a view since you would have no parameters to join on. In this case you would have to start from the quotes table so that you can return info for all quotes:
SELECT quotes.Reference,items.ItemName,coalesce(Price,DefaultPrice) as Price 
FROM quotes
CROSS JOIN items ---Ensure you have all items for all quotes
LEFT OUTER JOIN itemprices ON items.ItemName = itemprices.ItemName
AND itemprices.QuotationReference = quotes.Reference

The Triangle Problem

A lot of people who do not work with software wonder why software has defects in it. To the end-user, something like Microsoft Word looks simple enough, why is it that in a certain scenario it locks up or crashes?
It would be easy to say that people who write software understand why and are able to avoid these situations with the correct processes but sadly this is not usually true at all. Even very able and competent engineers make mistakes and, of course, we expect the less competent engineers to make mistakes.
I came across the following example in a computer magazine once and apparently it is a classic testing question:
q) You need to write a function that takes 3 inputs and determines wether they form an equilateral, an isoceles or a scalene triangle or displays an error if the inputs cannot form a triangle.
Spend a few moments and see how many tests you could think of to determine the type of triangle.
Ideally, the functionality for this would be specified in a very robust way by somebody who is both a good spec writer and has done the necessary research to ensure all bases are covered! Back in the real world, you would be very lucky to get a spec for this level of functionality and even if you did, you would still have to interpret it.
When I first heard the problem I immediately thought that all values would form a triangle thinking of the phrase "any three points make a triangle" (which of course they don't if they are on a straight line).
After some hints from a colleague I started thinking about inputs that are negative or zero and but still forgot about what happens if 1 side is much longer than the others and therefore the triangle can't close. I didn't think about what happens if the 3 inputs are not specified correctly, such as 23E04 or what happens if the input is mistyped like 23..56 or if the inputs are specified to too much precision or whether the inputs are guaranteed to be within the limits of the specification. I think I managed about 6 tests out of a potential of 15 and I thought I was pretty clued up!
|Being the engineer, I then got really concerned that this was the real-life experience of many software engineers. When you write code you are assumed to be able to guess what error checking etc is required in your code and implement it. Of course, most of the error combinations would probably never surface but what if they do in a way that is hiddden from the end-user. Something gets corrupted and crashes?
The real question is, what can I do about it? Well you can have a committee to discuss what tests you need to perform but this is unlikely to reveal much more than you would by yourself (big quiz teams don't always win first prize!) so what else can you do?
You can write Unit Tests as you go along. You can test functions using test cases that give you very good overall coverage of the code. You can spend more time on functions that use logic rather than functions that simply call a string of other functions sequentially. You can keep the test cases as a reminder for when you are testing newer code. You can share tests with other developers so that over time, you gradually increase the types of error conditions that you test for. It is an underused mechanism for reliable code.
You can develop shared controls. For instance, you could design a web control that only allows valid numbers to be typed in so that your functions are never passed malformed data and don't fall over. You can even create simple classes like strings etc that can perform earlier testing on data before it has a chance to upset the belly of your code.
At the end of the day, the issues need to be analyzed and something needs to be put in place to keep the closed-loop quality control doing its job properly. You need to learn if your counter-measures are working and either modify them or drop them if they are not. You need to listen to other people and what they use. You need to take your craft seriously.

Thursday, 12 March 2009

INNER JOINS, OUTER JOINS and CROSS JOINS

Sql Joins, they seem easy to begin with but then as soon as you think you understand, something happens leaving you scratching your head. This is a brief but hopefully very useful introduction to the 4 main joins.
We will begin by assuming we have a database with two tables that is used to track people borrowing books from the library. We will ignore all the irrelevant columns that we might actually have and keep it simple:


Table 1 (Members)
Name: varchar(250)
ID: int (identity, autoinc, primary key)


Table 2 (Bookings)
Book: int (foreign key to books table)
Member: int (foreign key to members table)
DateOut: DateTime


So when a user takes a book out, its ID number, the member's ID number and the date are put into the Bookings table. We now want to use the data in the tables. Suppose we want to know who has what books checked out, we need information from more than one table (we will forget the book name for now and assume we only want the date, book id number and the person's name). We need to join them together.
What are we actually doing? We are relating information from the rows of one table to the rows of another. In some cases there will be 1 row in each that matches but in most cases, as here, there can be many. For instance a single person can borrow e.g. 5 books so 1 row in members might match 5 rows in Bookings. It is worth noting for later that a row in one table might match zero rows in the other table (somebody with no books on loan) which we will look at later.

INNER JOIN


An inner join simply says that the information we want to return from our select statement will have at least one entry in both tables. For each row we can join using the criteria (the columns that we will match), a row will be returned in the results. If we cannot match a row from one to the other, the row will not be returned.
select Name, Book, DateOut from Members INNER JOIN Bookings on Members.ID = Bookings.Member

If John has a book taken out (and therefore e.g. an entry for book 11 on 20th Jan), the previous select would include: John, 11, 20th Jan. If however there was a member called Luke who had no books on loan, his name would NOT show in the results because his row in members would not JOIN any rows in the bookings table.

It is very common that actually we want to know everybody in the system and if they have any books on loan or not. We simply use...

LEFT OUTER JOIN


A left outer join says that we will return items from the left-hand table even if there are no entries in the right hand table. If we run exactly the same select as before but with a LEFT OUTER JOIN,
select Name, Book, DateOut from Members LEFT OUTER JOIN Bookings on Members.ID = Bookings.Member

We would get "Luke, NULL, NULL" as well as "John, 11, 20th Jan" in the results. The NULLS mean there is nothing there, not zero, not blank, nothing i.e. NULL. We can use the coalesce function if we would rather return something useful instead of NULL:
select Name, coalesce(Book, ''), coalesce(DateOut, '') from Members etc..

RIGHT OUTER JOIN


Don't bother. It is the same as the LEFT flavour but the other way round and to be honest, it is usually better to rewrite your select so it effectively reads from left to right and only uses left outer joins.

FULL OUTER JOIN


A full outer join returns all items from both sides whether or not there are matching rows but if they match, they will be returned together. Not generally considered good practice since are the tables even related if something can exist on either side and not the other?

CROSS JOIN


Sometimes, it is not possible to join a table based on any relationship. For instance, suppose we want to work out the prices to replace certain types of books for a particular person. There is a table of 10 book categories with base prices but this has to be multiplied by a certain value that exists in the Members table (perhaps how rich they are!!). You have nothing to join on.
Table 1 (Members)
Name: varchar(250)
ID: int (identity, autoinc, primary key)
PriceFudgeFactor: decimal(9,5)


Table 2 (BookReplacements)
BookType: int (foreign key to book categories table)
BasePrice: Decimal(19,5)


What do you do? Well a CROSS JOIN matches each item in the left hand table to each item on the right with no linked columns. What would result in our case would be a list where the number of rows equals the number of users times 10 (the number of categories) What you would probably then do, using a sub-query is do the calculation using the base price from one side and the factor from the other:

SELECT PriceFudgeFactor * BasePrice as TotalPrice FROM
(
SELECT Name, PriceFudgeFactor, BasePrice FROM Members CROSS JOIN BookReplacements
)

Note there is no "ON" clause after the join since it will match all rows with all other rows. I have used INNER and LEFT OUTER joins frequently, CROSS JOINS occassionally and never RIGHT or FULL OUTER joins so that should give you some help knowing which to use.

Monday, 2 February 2009

SIAM Carlisle Hints

If you have found this post, you know that SIAM make some DOS looking but very addictive railway traffic simulators. I first bought Barnstaple which was very enjoyable and then I was looking at modelling Reading so I bought that too. I then decided to buy a hard one that would keep me entertained for longer so I plumped for Carlisle, a severely difficult simulation that would be fair to say requires at 10 attempts before becoming familiar enough with it to avoid many lost points. Unfortunately, it is harder to gain points than to lose them and a single late train can slice your points in half so now I have completed it, let me give you some beginners advice:

  1. There are turntables, water and coal on screens 0 (turntable on 2) and 1. Use them for locos that arrive at Petteril Bridge and London Road respectively. Although the water and coaling is slow on screen 0, there are few scenarios where you will not have time to complete them.

  2. Start with Monday, because it is by the easiest to start with. There are few trains initially and the ones that are already on the simulation are either already coupled, or the locos are to hand in the station.

  3. It will take a while to get used to the screens because there are so many but most action takes place along the LMS and Scottish lines on screens 3, 5, 6, 7 and 9. Screen 4 is dormant for the first few hours, screens 0, 1 and 2 tend to be 95% freight, the passenger trains passing through.

  4. Try and spend your spare time in the early hours turning all 12A locos to face south (only occasionally does a freight loco point north as it goes round Rome to Currock yard) and I put all the Black 5s together in the running sheds in Upperby depot to make it take less time to pull a specific loco from the end of perhaps 5 other locos. All visiting locos and Scots, Patriots and Coronations I put in the Roundhouse. Water and coal all locos - you don't want to be rushing these things 20 minutes before a train is due. Many trains change locos at Citadel and it is easy to get caught with an imminent departure lacking the correct loco.

  5. Likewise, in Kingmoor depot, apart from perhaps 1 Jubilee, 8F and Crab. Turn all the other Jubilees, 8Fs, crabs and black5s for 68A and all Scottish locos (65/66 etc) to face north. Occasionally a large 68A loco goes south from Durranhill. Leave the 4Fs alone since most will go south from Durranhill but you might want one or two for piloting from the Down Yard northbound.

  6. Turn all locos at Durranhill to face south. There is sometimes a 68A Jubilee there so quickly check whether it is needed on an early service from Petteril bridge to the north, otherwise turn it was well. It will save some space to do the locos in order and turn and couple up any where the trains are complete ("wagons for.." rather than "part of.."). The locos at Durranhill are mostly all gone by about 6am until the evening. Be careful since several 68A locos need to get from Kingmoor depot to Durranhill and can take 10 minutes or longer to get there as well as turning time if required.

  7. There is 7:20 to Newcastle stock parked in Durranhill, don't wait for the timetable before moving it to the station since it is a bit of a hassle to pull it out and then couple up the rear to take it to the station.

  8. Remove all incoming locos at Petteril Bridge and replace them with tank locos for the journeys across Carlisle unless the loco is scheduled to take the train beyond Carlisle. This gives extra time to water/turn/coal them and also avoids too many light-engine trips across Rome junction.

  9. If you get goods arriving at Petteril Bridge for the goods yard, put them into DL5 or 6 and bring a loco from the shed on 0. It saves having to play around with the stock to pull and then push it into the shed.

  10. Try and keep at least 3 tank engines in screens 1 and 2 for the trip freights that arrive. At certain times, particularly at London Road, 4 or more services might arrive at the same time and if they are late (as is common with the freight) you will not have time to find locos and move them there. Also there is an occasional forward working from London Road to the north which either require the correct loco to move there from Kingmoor or otherwise to take it with a tank to Kingmoor Down yard and change there. If you can, it is easier to put the correct loco on in the first place and avoid the hassle of changing it.

  11. Because it takes a long time to get from Upperby (screen 3) to Lancaster, when you have a light loco, it is best to let it leave around 15 minutes before booked time which will mean it gets to the boundary (and you get your points) at the right time.

  12. Upperby goods depot can get busy and some arriving freights stop there to detach. Often the incoming loco can be used to move goods to Crown street and you can couple a tank loco for the trip working (usually to Viaduct yard).

  13. Be careful to remember getting goods from the cattle depot and Crown street since if you use the yellow trains as an indicator of what needs coupling up, you might not realise that a portion is already there somewhere.

  14. Try and leave 3RD empty for arrivals from the carriage sidings since moving trains from the yard to 3rd can take a long time and cause the yardmaster at Upperby to get angry!

  15. Watering at Currock (screen 4) takes ages. Don't bother. Run the loco to Upperby and do it there. It can be fiddly here because many services go north as well as south and this can involve many moves. I try and keep a handful of tank locos there both to take trip freights and also to help move things around.

  16. The station generally is a real bottleneck. Do not leave things in platforms longer than possible. Remember only platforms 1 and 3 have north-end watering columns and 3 and 4 south-end. Move post to the TS sidings otherwise it will block a platform for an hour or so. If services are leaving more than an hour after arriving or the station is busy, move them with the incoming loco to a siding. Try and move trains that originate in Carlisle to a bay platform to leave the through-platforms clear.

  17. The quickest way to change southbound locos is to move the new loco to S6 from SSB, SSC or P4 using the Newcastle line as the headshunt. Move the southbound train in P3 and then uncouple and move it to S5. As soon as it is clear of the scissors crossing, which is very soon, move the new loco onto P3 and then couple it and back the other loco via P1 or a siding to Kingmoor (since it usually a Scottish or Kingmoor loco).

  18. Screen 7 has some interesting moves between the depot and the main and freight lines. Be careful since the routes taken might not be what you think and you can lose points for conflicting routes. You can move locos and trains into E2 to reverse into the down yard from the upyard. This is good for instance for the 1:55 to Canal yard, part of which is in the down yard. You can then pull out southbound onto the southbound freight line. You must go to Canal yard from the up/down yards via Rome Street where you uncouple and runround.

  19. There are lots of trip workings from the up yard so again make sure you keep about 4 tank locos to hand (3ft and J39), generally you can take the ones that arrive at the down yard from elsewhere in Carlisle and reverse them on the main line into the up yard. You can fit two locos in a siding. You will not need many pilot locos from the Down yard so you don't need to keep any there.

  20. Be very careful of the time it takes slow freights to travel from KDS to RDH (and from RSH to KUH) on screen 9. Even if you are putting trains in the freight loop at Floriston, it can take 10 minutes to cover that section and it is easy to delay the passenger trains (and lose points)

  21. On screen 8, do NOT send freight services to Silloth until their scheduled time since it takes a long time for them to clear the single line to Silloth and if you send them in front of a passenger service - even by 30 minutes - you can delay the passenger train severely and lose many points.

  22. There are some tricky shunting operations in Canal yard so try and keep the arrival sidings CA1 to 4 and CY6 and 7 empty because careful use of these will make what could be a nightmare, reasonably hassle free. For instance, the 6:00, 6:06 and 6:16 trip services are made up of 2 portions so if you couple up locos to the first part, when the second parts all arrive on the same service, you can detach and pull forwards and then back onto the portion with the 6:00 and then repeat for the others. Also, move any departures out of the goods shed and to departure (or arrival) dead-end sidings to keep them clear of other goods.

  23. Don't send the 9:45 from London road to Dalston Road or the 9:45 to Number 12 (Rome Street) until about the right time. These services will block the line in both directions between Canal Yard and Rome Street and then they turn round and go back. This is OK, just make sure there are no other services that need to use the line before them.

  24. Before sending a freight away, particularly from screens 0 (SC), 3(LC) and 9(CR/GSW), make sure that there are no imminent passenger trains since these services can take some time to clear the area. Services via NB, TV and MC clear quite quickly and are less busy generally.

  25. When sending the 9:10 and 9:15 light-locos to the north, put one in FSH and one FDH at about 8:30 after the previous services have gone north. The next one will be the 9:00 to Glasgow from Citadel. At about 9:01 (I get problems if I send locos exactly 10 minutes beforehand), send the one in FDH to CR and then if you remember in time, send the FSH at 9:06 which is just enough time to get out of the way of the 9:00. If you are a bit late, let the 9:00 to Glasgow pass and then send the 9:15 light-loco

  26. Probably the hardest move is the 3:11 and 3:16 postal. The 3:11 arrives, usually late and needs watering as well as attaching another portion. The incoming service detaches part of the 3:16 which needs to meet a portion already in the station. The loco for the 3:11 remains coupled and the loco for the 3:16 has been detached from an earlier service. They leave very close in time and often by the time you have watered the incoming service, it then delays the 3:16. This is what you should do: Put the 3:11 section from Citadel into P1. When the 3:11 arrives, put it into P3 but DO NOT water it now. When it divides, move it to P1 for watering at which point, couple the Citadel section of the 3:16 onto the detached portion in P3. This will couple up and be ready fairly quickly. The postals can leave up to 5 minutes early but the 3:16 is slower so if the 3:16 is ready by about 3:11 and the 3:11 has only just started watering, let the 3:16 depart early but depending on how close it is, move it into the loop at Floriston since the 3:11 will probably catch up. if you think the 3:11 will be ready by about 3:11 or soon afterwards do not let the 3:16 depart early, the 3:16 should be fine as long as the 3:11 leaves about 3 minutes earlier. Remember if the 3:11 was late, you can let it leave late but the 3:16 originates at Carlisle and is expected to leave on time.