Tuesday, 20 August 2019

The problem with being a code-driven organisation

Something I have noticed has happened at Microsoft recently has been the drive to release tonnes of new features in Azure and elsewhere but with documentation lacking or lagging, sometimes by many months. This is not acceptable from a business point of view but is common in code-driven organisations where success is driven by the number of lines of code that are deployed each day.

The DevOps philosophy hasn't helped because with all the drive to automation and release rate, documentation is not really mentioned. Sometimes you get a, "things should be self-documenting" but that is evidently never true ever! Even MSDN documentation produced from code is woefully inadequate without remarks and examples, all of which have to be manually thought of and added.

Microsoft have moved lots of projects to github, including their main documentation but even raising issues against simple things like changing explanations slightly or adding a single comment can take weeks or months (and in one case, I waited months to eventually be told that the repo was then moved and I had to submit it again). All of this evidence points to an organisation, and am guilty of the same thing, that does not take good documentation seriously.

There is no way that a change to the UI of an Azure blade should be released until the instructions are checked and updated accordingly but it is happening all the time. I even found some documents about adding App Insights to JS and the documents don't actually match the snippet that I have so I have raised a ticket to try and clarify but suspect I will get the, "we will add it to the list".

If you are a company that sells B2C then you need to have a serious set of documentation that is easily updated, especially for clarifications and to help people on support tickets. You need to ensure that features that require docs changes are marked accordingly and not released (or even started) until the docs are ready.

Now to work out how to do that...

Sunday, 28 July 2019

what I need to know about ransomware

ransomware is in the news again in South Africa. City Power in Johannesburg have become victims and have lost access to their systems as a result. this is already causing problems for people who cannot purchase pre-paid electricity and is a stark reminder that just because someone runs critical infrastructure does not mean that they do everything properly.

Ransomware is a type of malware or "virus" that encrypts whatever it can on your system and demands a payment to provide the encryption key. although this would sometimes be provided, firstly you don't know that you will ever get your files back and secondly, you would be encouraging the criminals to continue their work by paying them.

firstly, you should avoid the ransomware risk in the first place. be very strict about the use of non-approved usb sticks, the use of work systems for personal use and in training people about the risks of malware dressed up as legitimate or entertaining email messages or im chats.

secondly, you must always have your systems backed up. if you lost all your systems to ransomware then you should easily be able to recover it. backups are cheap and relatively easy so no excuses!

thirdly, do not use a third-party to unlock your files for a fee, they will very possibly pay the attacker a smaller amount and pocket the difference. Any legit companies will both prove they can decrypt the ransomware and very often for free or very little.

fourth, install anti virus and make sure it stays updated. they can't always catch the very latest malware but it doesn't take long for the programs to update after a new virus is spotted and added to the dictionary.

fifth, your systems absolutely need to be segmented. why would an office machine have access to other important systems like database over a network share? at most an app would have a database connection but that wouldnt allow ransomwae to do anything. segmentation is so easy and is built into most modern network switches.

lastly, play out what would happen if your systems were all encrypted by malware. literally what would you do and how would you do it? how long would it take? a little effort up front could save you not just embarrassment but possibly the future of your company.

Friday, 19 July 2019

Cannot consume scoped service from singleton

So this is an error message logged in the event viewer from a dotnet core app that I am building as a PoC and the problem is that although I understand the error message, I don't understand why it is occurring, although I have managed to get rid of it!

Hopefully you understand that when you register services in an IoC container, you set a lifetime that dictates how long the service is used for before it is disposed. Although you want to create as few as possible, having everything as a singleton would mean making everything completely thread safe and also potentially creating a bottleneck in the system. On the other hand, making everything transient would mean that you are probably creating far too many objects and also you lose the ability to set the property of an object and know that other objects in the same scope can see the change. Scoped is a good balance where you can define a lifetime (like the lifetime of an HTTP request) which is usually the best balance.

One of the potential problems is that you cannot inject an object with a shorter lifetime than the current object into its constructor (other than transients which are sometimes allowed). This is to avoid a singleton keeping a lock on a scoped object which is probably not what you intended. In dev mode, dotnet throws a dirty big exception and gives you a reasonable useful error message which might make sense but didn't for me:

Application startup exception: System.InvalidOperationException: Cannot consume scoped service 'System.Collections.Generic.IEnumerable`1[Microsoft.Extensions.Options.IConfigureOptions`1[Microsoft.AspNetCore.Authentication.AuthenticationOptions]]' from singleton 'Microsoft.Extensions.Options.IOptions`1[Microsoft.AspNetCore.Authentication.AuthenticationOptions]'.

The reason this was confusing is:
  1. It only happened when I introduced some IoC registrations for types in some of our shared libraries
  2. These libraries worked fine in another (not dotnet core) application
  3. These libraries and my app do not reference AuthenticationOptions anywhere
  4. If I changed the lifetime of the imported types, it broke my MVC routing!
Why the message? One of the gotchas in IoC containers is that some errors cause other unrelated errors. For example, if it can't find a constructor on type A that is required for type B, the error might be thrown for type B, which might not make sense unless you read into the details. In my case, the app didn't throw from Visual Studio, otherwise I might have seen the exception, instead it logged part of the error in the event viewer and I had to guess the rest.

The solution in my case was to be more specific in my registrations. I was using Scrutor to get everything from an assembly of repositories (which works in the other app) but there is one class that cannot be directly created by IoC and which has an extension method to do so. Perhaps because this was registered in the other app and not in this one meant at runtime, mine tried to find some strings for the Constructor and fell over strangely! By being more specific and only scanning things that are IRepository, it started working again.

Wednesday, 17 July 2019

Considerations when starting a web app from scratch

We all get excited (hopefully) about starting from scratch with the latest version of our favourite framework, maybe a brand new app, maybe a port or rewrite of existing but if we are not careful, we will rush in, build some poor foundations and then end up with the same mess we wanted to leave behind when deciding to do a rewrite.

Of the several rewrites I have done, I think the best overall advice is: Get a small part of everything working in a Proof of Concept (PoC), which allows you to choose the best patterns, try out some alternatives for each part of the system, wire up various systems which are often left until later (and which are then a pain to wire up) and to look at the result and see whether various maintenance activities look like they will be easy enough to perform.

Below I have listed some of the considerations.

Framework/language

This is often already decided by our expertise. If we write .Net, we'll probably choose that and I personally think that's fine. .Net Core and Java might be miles faster than PHP in certain online speed tests but in most cases, the combinations of code, hosting and expertise will all have more of an impact on the final performance than the pure choice of language/framework.

The issue that can be a bit more tricky to decide is whether to replace something like .Net+Razor with .net+ReactJS which does give some nice advantages but the same rules apply to the decision. Expertise first and then try a number of slices of your system and see if you can understand the ReactJS way more easily than the Razor way.

CSS/SASS/etc

If you are not using some kind of CSS compiler then you should be. Hand-cranking CSS is a pain but there are still decisions to make, most of which will probably be dictated by your choice of UI framework. LESS and SASS/SCSS are the most common but again, stick with expertise since the differences are probably not great for most of our applications although read the niche cases and decide for yourself.

Once you have chosen a platform, you will then need to decide how to produce the CSS files themselves. You can pre-compile and put them in the webroot somewhere, you can pre-compile and send them to a CDN or you can have a CSS pre-compiler wired up so that any calls to the resources calls into the file system, builds things if needed and returns back the correct version. Each of these has advantages and disadvantages but you should consider how somebody is going to update a CSS file after deployment. Are you happy to do a full deployment to achieve this? If not, you need automatic cache busting strings that are generated, usually from MD5 hashes of the file, appended to the resource paths.

Bundling

Another no-brainer is bundling resources into few HTTP requests. If you have 10 scripts, you can create a single bundled JS that gets loaded with one request. You will again have to balance the reduction of HTTP requests with the other extreme of a single very large bundle that most pages won't use and which will change every time anything inside any script changes. We used to have a common js and then page-specific ones so that 2 requests was still OK but we didn't bloat pages unecessarily.

Again, this can be pre-compiled (using gulp for example) and then put onto the web server or CDN as well as being generated dynamically.

Minification

Similar to the above, minifying files is easy and can save quite a lot on transmission sizes. As above, they can be pre-compiled and used locally or on a CDN.

Internationalisation

Unless you know that you will never translate a site, which most of us probably don't, then we need to try out some alternatives, even if we don't bother translating everything now (a good system should be easy enough to use, even if there are no translation yet). Ask yourself not just how the app will work but where is the work being done? Where is the data being transmitted to/from and how will you manage updates to translations? Database-translations are dynamically updatable but might cause a performance hit. Static ones are faster but might require a full deployment to update. Maybe you don't update text that frequently, maybe you do.

I recommend looking at Mozillas Fluent translation framework which has some really clever ideas in it like how to specify that in English we might only have single or plural but in other languages, there are other multiples.

Caching

Unless you absolutely know that no more than 20 people will ever use your app at once (and sometimes even if it less than that), caching is critical for moving performance into RAM where it is fast and away from the database which is expensive and hard to scale. Getting the pattern right, however, is not always easy. You want the code to read fluently and not have random places that caches are used. You also want to be careful assuming that just because a caching interface is built-in, that it is exactly what you want or need (Yes .Net, I'm looking at you!)

Authentication

Most frameworks have authentication functionality built-in but they are not all great and the configuration is not always easy. As well as basic "Form authentication" that uses username/password, think about OAuth2/OpenID Connect for single-sign-on, 2-factor functionality for security and a system that allows a modern and correct storage of password hashes (Not MD5 - I will cut you!).

The basic HTML layout

Although we theoretically separate content from presentation, the reality is that each UI framework has a certain required set of elements in order to work. Most dialogs are not a single div but might contain nested divs for the title, body and footer of the dialog. Since we might need to change these from what we start out with, we need to keep the code in re-usable units so that if we have to add another nested div for our new framework, firstly we only change it in one place and secondly, we don't break everything in the process.

We also need to ensure that we liberally apply classes or ids to decorate sets of elements so that if we need to pick out, e.g. menu buttons, we can easily do this in CSS and not be finding stuff in files! Following the lead of the frameworks is easy enough, look how Bootstrap (or someone else) decorates the elements in a menu to be nice and selectable in the style sheet.

Horizontally scaling

If you are not building rubbish, it's almost certain that you need to scale horizontally on your web servers. You need to think about sharing session content, even if you use sticky sessions, using a database or other shared system. You also need to avoid using the local file system for anything that needs permanently storing. Use cloud storage instead, which can be shared between all your servers and more importantly doesn't get lost when a server dies (and they do die).

Automation and Devops

You should also design your app to deploy in one-step and ideally in a cookie-cutter way. To get true elastic scaling you do not want to carry out 100 steps of deployment and especially do not want to do anything manual other than changing the number of replicas you need. You need to think about how appsettings will get injected either by the deployment tool or by the system itself.

Using git with putty and pageant not working!

I had git running OK but it was using openssh, which is OK but provides a pain barrier when using putty to generate keys (they all need to be exported to ssh format and put into the .ssh folder). I tried switching to Pageant and plink but it wasn't working. It said I needed to add the server key to the cache but typing "y" didn't do anything and ctrl-C then revealed the error: "Could not read from remote repository".

Unfortuntely, this was the first time I had connected to bitbucket so wasn't 100% sure I had done everything correctly but thanks to this post: https://blog.craigtp.co.uk/Post/2015/07/28/SSH_with_PuTTY,_Pageant_and_Plink_from_the_Windows_Command_Line I realised that I had to run plink directly the first time for the server key to be cached correctly.

A bit poor, for sure, if plink shows the message, it should be able to accept the answer but at least it works now!

Monday, 15 July 2019

Enterprise vs Startup and the battle of Developer Types!

Many years ago, when the internet was very much Web 1.0, somebody asked a friend who "worked in web design" to create us a web site. It was pretty awful. It looked bad and really didn't portray us very well. When challenged as to the poor quality, the response was along the lines of, "tell me what's wrong and I will fix it".

The answer to that? "If I need to tell you what's wrong, why am I paying for your expertise"?

Sadly, this culture still exists and it is mainly in the corporate world. The corporate world is largely about power and control. Managers who's value is vague at best and who are nervous that unless they boss some people around and make their team large, they will be exposed and will disappear in the next round of restructures. People therefore, are not encouraged to be motivated, enterprising etc. in their roles, they are implicitly and sometimes explicitly told to do what they are told and nothing more. I even read about someone who was paid $5000 for a single web page because the customer was so controlling that they couldn't sign off his work and needed him to go to meetings etc. to meet all of these "stake holders".

Flip over to the world of the Startup - historically a small company trying to grow big but now more of a culture than a size. In the startup world I absolutely cannot have people sitting down waiting to be told what to do - I need them to own their role, to be hungry for the bigger picture, to take the barest of priorities and make something amazing to solve the problem. Compare the following types of person:

person 1: Do you need a database? What type of database do you want? How should I lay out the tables? Do you need failover and clustering? etc.

person 2: I know you need a database. I've looked at a few options and think that Cassandra is the best fit. You get clustering out of the box and can pay for support if needed. Here are some examples of how it works and a small demo for our app.

Guess which person fits best in which culture?

The thing is, the sound of the startup is appealling: Freedom! You mean I can have a say over what I do? I can choose tooling and bring in new ideas? Sounds amazing but surprisingly, many people struggle at a startup for the simple reason that they don't know how to think! They do not ponder the next best things while sitting on a train. They don't wonder why React is better than Angular (or vice-versa). They don't think about what tools are available to performance test a web app or whether Azure beats AWS on compatability but AWS wins on price.

George Orwell once said that the greatest tyranny is freedom - the tyranny of making choices and being responsible for them. Not having someone to spoon feed me, to tell me what to do in the morning when I get to work. The ability to put everything down at 5pm and go home to my normal life.

Even at the recruitment stage, we see the first signs of the lazy enterprise attitude and the motivation of those people who are a little better. We get sent a CV. It's terrible. Not terrible because they don't have the right skills but we have examples where someone's 10 year job position somehow only makes 4 bullet points. Another example where the description was "worked on innovative application..". Cover letters that just read plain weird and don't have any sense of motivation or self-starting.

Now that remote working is more common and information is more widely available than before, recruiters are going to start losing business as they add little value and we start recruiting directly and start weeding out those who really care about what they are doing and spend time writing good CVs and impressive covering letters.

Friday, 12 July 2019

Powershell script not running from Task Scheduler

Apparently, I'm not the only one who wondered why a script run manually worked fine but the same script run from Task Scheduler seemed to run but didn't actually do anything.

Firstly, Task Scheduler doesn't seem to care too much if the task fails. It records the whole event as "Action Started" and "Action Completed". This is confusing since if you think the Task Scheduler should report your error but doesn't, it must have run the script right? Nope.

Anyway, forget getting much help from Task Scheduler, although, if you look closely in the history tab of the task and find Action Completed, you will find a return code logged in the history item. If the error code is 4294770688, then it might simply be that you need to pass arguments in double-quotes, not single quotes!

How do you debug this whole thing?

Firstly, it does work, it is just about setting it up correctly. Start with a simple powershell script that creates a random file like New-Item "C:\temp\test.txt" and nothing else. Once you get this to run correctly, any other problems are related to your script.

Secondly, although running it locally might look like it works correctly, when I did this, I was not calling it exactly as the Task Scheduler was. When testing locally, I was simply calling powershell.exe .\myscript.ps1 whereas the task scheduler was doing something more like powershell.exe -File 'c:\full\path\to\myscript' If I had run this exactly as Task scheduler, I would have probably seen the error and worked it out!

Thirdly, try not to assume the current directory when running your script. Simply include the full path to it and make sure it isn't inside a user directory if you are not running the script as that user otherwise access might be denied.

Fourthly, some people have talked about telling powershell to bypass its execution policy (-ExecutionPolicy bypass) but this depends on what is setup on the machine globally. If it runs locally, it should run from task scheduler without any changes to the arguments.

Firth, make sure the user you are running the task as has permissions to do what the powershell script needs to do. If at all possible, login as the other user and run it to make sure. Running as system accounts is OK but don't assume that they will automatically get super permissions to do everything.

Sixth, any calls to things like GetUserDirectory or GetTempDirectory() will either return different values for different users or potentially won't work at all if the account is logged off. Again, assume nothing, start with something simple and work your way up to finding out what is going on.

Enjoy!