Thursday, 28 May 2009

Viewstate, postbacks and pains in the neck

It took me a long time to get my head round what was going on Sometimes controls did not get their correct values and things didn't always get bound. Here is a simple guide that hopefully will help you know what is going on.
1) The web is not connection based by default. When you go to a web page, you request a page, you are given it (usually) and that is the end. Things like session etc have been cobbled on the top of this system to try and keep track of things but the server never knows for sure if you are still connected which causes all kinds of problems with cookies and secure sites and loggin out etc. If you remember this connectionless pattern, this will help you understand.
2) What happens when you FIRST request an aspx page? The server recognises that the request is a "GET" HTTP request so sets the property IsPostBack to false. It calls various event handlers in a prescribed order (you can find these on msdn) most importantly Page_Load which you would use to set the page up. You probably managed this.
3) What happens when you change something on the page after it is served? This depends on what control you change. By default buttons cause a post back but data fields like text boxes and radio buttons do NOT unless you set their autopostback attribute to true.
4) When you finally cause the postback, a POST request is passed to the server along with the values of any fields on the page (which is why you need to put all the aspx controls inside a form). The server sees that it is POST and assume this is a "post back". It unpacks all the form values into member variables and it also uses something called viewstate to remember any data that is not a current value for a web control. For instance, a text box has no viewstate because the only data that needs to be remembered between posts is the text in it, this is already passed in the form data with the post. Controls like TreeViews however want to remember whether they were expanded etc and because this is not part of their 'current value' it has to be specially stored in viewstate. The viewstate is a hidden field with a load of ascii characters which the system automatically packs and unpacks for you (although you can disable the viewstate to save bandwidth or customise it).
5) If you write a custom control then you might have to save data into the viewstate so that it is 'remembered' between posts (but only if it needs remembering otherwise you are wasting bandwidth). You do this with the Viewstate property which is an array of objects.
6) The area that confuses most people is when working with a data source and a postback needing to update something in a database and then change the screen somehow. Firstly you need to know that Page_Load is called BEFORE your event handler. If you need to access data in the event handler, it must be set up first in Page_Load. Secondly, you will need to change your screen data in code otherwise it will remember its last values.
7) There are some times when you do not want to update from your datasource during postbacks. This is for two reasons. Firstly, because controls remember their values, once they are setup, if they are not going to change while the page is being used, there is no point keeping updating them, secondly you might want a load of controls to come up as disabled when you first open the page and then to become enabled when you click a button. However if you disable them in Page_Load, they will disable every time you cause a postback (such as clicking the button to unlock them!). Use the IsPostBack property to only do things on the first time in Page_Load.

Friday, 22 May 2009

Mixing Forms and Windows Authentication

My friend works on an web application at work and uses windows authentication to access the various pages with database backed roles to provide authorisation. He then asked me how to further lock down certain parts of it to require a password. This would be so that certain orders could only be viewed by people who knew the password even if they were generally allowed access to the page.
Of course, the general authentication schemes are designed for a per-page basis and it seemed like rolling your own flavour was quite involved as well as using something like ESAPI from OWASP. He didn't want to spend weeks developing it so I had a play around.
I initially investigated mixing windows and forms authentication so that for the most part you would use windows but for a more protected page, it would re-direct you to a login. I thought this would be best because a lot of authentication stuff is already built in to the ASP libraries. However I shortly realised my mistake. The problem is that the authentication is the first thing to occur when the page request is made. At this point, you do not know whether the page is one that needs a password or not so you have to allow it to authenticate. If you then read the database and know you need a login, you can then force a redirect to the login page but it gets really messy because then you have to keep track of authentication state so it allows the user in the first part but then blocks and then if a correct password is entered, it needs to allow you in until you close the browser. After many hours, I opted for a simple redirect to a login page which gets passed the return url in the querystring and a logout button in the order that flushes the cache and then goes to a neutral page instead. Simple is good.

Tuesday, 19 May 2009

Building Test Scripts

How do you know your code is reliable? That it can handle all situations in a known and/or designed way. For a simple test application, you might be happy for exceptions to be thrown which stop the application from running and which allow you to identify what went wrong but what about serious business or mission critical applications? It is not enough to hope they are OK. Do you think you are a good programmer? You cannot rely on that to make your code robust because you simply cannot guarantee that you will not miss something. Another pair or eyes is helpful but again cannot provide enough reliability.
One of my current favourite ideas is to build unit tests in software that instanatiate your objects and call functions on them to test the expected results. This might sound trivial or ineffective but it is surprising how effective they are since they make you think about what the function should do. You will refactor if you cannot think of a simple test because you won't want to write a 500 line unit test function. The trick is to consider what should and could happen and how the system should cope.
A classic scenario is data input fields. Your user needs to type, i.e. a price for an item which will be saved and then used. How many times should this data be checked in the system? You should assume that all external data is potentially tainted so anything coming into the system from the user OR the database should be checked and dealt with.
The most simple case, you might trust the user to type the correct thing in with no real checking. They accidentally type -100 instead of 100. What happens now? In most systems, the data would be strictly valid (but incorrect) and will cause all calculations to be broken. Things like total = cost x quantity will compute without crashing but will generate an incorrect value. OK, you get clever and ensure the user can only type 0-9 and a '.' character. So what happens now when they type 100.12.23? Your system might not notice but this time something will crash (most likely when the number is parsed into a numeric data type). You should actually use regular expressions for most validation so you can be very specific about valid data and can give the regex a grilling with a unit test to make sure it allows all valid numbers and disallows anything incorrect.
Here are some other things you might need to remember:
1) Number ranges: Are you allowed negative numbers? Numbers with decimal places? Do you need to retain leading zeros? Are the values re-displayed (if applicable) in the same format they were typed in? If they are currency, what happens if someone types the currency symbol into the input field? Do you allow commas/periods to separate numbers into thousands? Do your numbers need to consider the user locale and display differently? Will they need rounding (especially if you have divided them by something)? If so, when are they rounded? What happens if you need to compare these rounded figures with other figures? Will they equate to each other or will you need to check merely that they are within e.g. 0.001 of each other? What happens if the user types a massive number that is too large for your number type? Do you restrict this in some way?
2) Strings: What characters are allowed? Will you encode or remove illegal characters or tell the user that they are not allowed? How long can the string be? Do you know what the database will permit? How are you going to ensure that a user will not type in a string that is too long for the database? Do you catch the potential database errors that might result? How do you avoid people injecting SQL into user input? Do you need to upper case or lower case anything? Do you need to spell check anything?
3) Aggregates: There are plenty of chances for error in situations where you are summing a number of items or performing some other calculation. Do you know that each item that should be part of the calculation *is* part of it? If you are updating a page, does it always update for every item that can be changed by the user? Does the sum need rounding? Should the numbers be rounded before or after they are added? Are there any round-trip issues where a number is perhaps multipled, rounded and then changed when attempting to back-calculate the unit price?

There are many things to think about so take your time and let your managers know that there is always a trade-off between quality and the amount of time given for design and testing.

Tuesday, 12 May 2009

Structure for Reliability

I got stung again today by a function of mine which was a one liner of logic but for which I had not considered a particular scenario. I hate these because they should be really easy to make correct but no doubt most of our code is littered with them. We can write Unit Tests but that can seem a bit extreme for every single function we write (or is it?) but even those do not gaurantee that we consider all the implications and permutations of the function. The function basically did something like
return (Number == 0 && Locked) || (Orders[Number].IsComplete);

Because the number can be either related to a quote or an order we need to check the quote being locked or the order being complete. The subtlety I had missed is what happens when you pass it a number greater than 0 which is not actually an order and what happens if you pass in 0 but the quote is not locked. In both cases, the number is passed to the indexer for orders which then throws an exception. This can be fine but how can you ensure the potential exception is considered and caught or make sure the function is not called for invalid values?
You can resolve most of the issues with the following:

  1. Make the function as private as you can, this way only the class it lives in or possibly subclasses has to be concerned with its correct use. Don't get in the habit of making all functions public just to avoid thinking about it.

  2. Consider whether it is correct to put a guard in the function generally that might only call the logic if a precondition is true such as the order exists etc. This could either be around the logic or part of the logic itself.

  3. Check the logic carefully. In the above example, I should have checked the Number being greater than one in the second set of brackets since orders start from 1 and if the quote is not locked, the second clause would be evaluated for 0 which would then throw.

  4. Ask whether the functions can be pushed further down into the system or refactored to hide them from logic problems. For instance, instead of passing in an int which then has to work out what it is, pass in an object of some interface or super class that you can then call a function on, this way you have no knowledge of the inner workings or logic of the numbering scheme and no chance to muck it up!

Wednesday, 6 May 2009

Should object A reference B or B reference A?

A commmon problem in software design is when you have two entities, one of which describes general information about another and which is shared between sub-objects. For instance, a motorbike and the motorbikeinfo. A motorbike might have registration plate number and colour whereas a motorbikeinfo might have manufacturer, engine capacity and top speed. Should a motorbikeinfo 'have' a number of motorbikes? Should a motorbike 'have' a motorbikeinfo?
There are problems with both approaches. If you take the first, how can you display a list of motorbikes with their info as well? You obtain the list of motorbikes and then have to get the information from an object that is not referenced by the item. The problem with the second approach is that you then share references to an info item between different motorbikes and then what happens if you delete all motorbikes? The info should still exist but where? You then get into a whole area of hassle with weak references and sharing references all over the place or even the dreaded circular reference which causes memory leaks.
There is another solution, the pattern that sounded the most un-useful in the Gang of Four patterns book, the "Facade". With this pattern, we hide the structure of objects behind another object.
With a facade, we can create a class called e.g. MotorbikeItem and this class has a reference to a motorbike and a reference to a motorbikeinfo, it can provide access to all items in both of these as required and it allows the consumer of the object to see a single entity. It allows for restructure of the data classes but most importantly, it avoids all the tangle references, the facade can obtain the indivdual reference from anywhere such as collections, databases etc and it can handle managerial things like deleting the motorbikeinfo if no motorbikes exist any more. Have fun.

Friday, 1 May 2009

HtmlTextWriter breaks lines

Not exactly sure if this is my fault but I was seeing real strange HTML generated broken across lines even in the middle of attribute values. The code was generated by a custom control using an HtmltextWriter to write a large <ul> list. There was nothing apparently wrong with the code generation but for some reason, the HtmlTextWriter was breaking the text up into 1024 character lines and inserting line breaks wherever the string was positioned and I'm sure it was breaking things despite IE's forgiving rendering engine. You would at least expect the writer to recognise tags and break it there but oh no.
In the end I made sure there were line breaks in the string already before writing to the HtmlTextWriter and it seemed happy with that.

Rubbish IE expression bug

Internet Explorer has some none-standard css extensions that you can use to do things that you can't do in css (at least not in their rubbish implementation of it), for instance, you can use
height : expression((mydiv.height > 54)? '54px' : 'auto');
which all sounds fun and useful but beware if your element does not have a height set on it because what can happen is that when the page loads, the browser will ask mydiv what it's height is and the element will say, "I don't know, it isn't set, ask my parent". The parent is then asked what it's height is and it says, "I don't know, it isn't set, ask the children" and continue ad-infinitum. This sort of circular reference should be checked but it isn't and it hangs IE7 and IE8. Theoretically, you can set the height to 0 on the element in question (if it isn't already set) and the problem goes away but I haven't tested it.
This took me HOURS to track down - grrrrrrr
It was reported years ago and still not fixed. For some reason it only happened in some instances and not others, it also randomly affected some users and not others, some times and not others!!!