Thursday, 5 December 2019

DMARC, SPF and DKIM - ouch!

I have been looking at this recently since, as a bulk mail sender, we experience regular complaints about how our customer's emails are not delivered or thrown into the SPAM folder. Most people have no idea about how complicated SMTP is and the constant very convoluted battle between senders, receivers and spammers.

What is the problem to fix?

SMTP as a basic protocol is simply text. Anyone can send anything and pretend to be anyone else. I can send an email with a "from" address of billgates@microsoft.com and the receiver is none-the-wiser. This is a problem because spammers or phishers can pretend to be anyone leading the victim to click something they should not click.

Note that the https versions of SMTP do not fix this problem since they only refer to encryption in-transit and not the proof of origin.

What is SPF?

Sender Policy Framework was a simple but misguided attempt to answer a simple question: which ip addresses are allowed to send emails as "me"?

It is easy enough to understand. You create a text record in your DNS for sender.com that lists ips or external references to other lists of ips which say who can send email "from" someone@sender.com. The receiving email server simply queries the DNS and attempts to match the sending ip to one in the list. If it matches, great, but if not, you have an SPF fail.

Why doesn't SPF work properly? There are two problems really.

The first is simply that when using one or more mail forwarding services like mailjet or amazon ses, your domain has to list all of these as "permitted senders", which even when involving external references can cause a bit of a headache. There is a 10 hop limit on DNS searches so with spf records, which are allowed to hop, you can quickly run our of space. We had one customer who (for reasons I don't understand) used some tool which expanded the spf lists into specific ips and created linked records in their DNS. They had already reached the 10 record limit when they needed to add 15 of our mail servers to their spf lists.

The second problem is that spf links the sending ip address to the DNS record so what happens if you e.g. send an email to Gmail and it forwards it to Yahoo? Yahoo sees GMail's ip address as the sender and this would fail SPF.

There is a workaround, in the good old way of the internet, someone invents another protocol to fix the broken one, instead of re-inventing it! It's called Sender Rewriting Scheme (SRS) and basically says to the forwarder that they have to replace the "from" address with their own, temporary from address, which will pass spf although it, of course, assumes that they validated the SPF from the origin server. If that failed, should they forward the email or not?

Messy business! Also, there are still a number of mail relays that do not support SRS and which basically break SPF.

SPF only really protects the senders domain, it doesn't protect people from receiving SPAM and in many ways has very limited value since we know that SPAM and phishing don't have to come from real domains to be effective.

What is DKIM?

DKIM solves a slightly different problem. How do I know my legitimate email wasn't tampered with en-route? How do I know that someone hasn't hacked my ISP and started added phishing links into emails via a "virus scanner" or some low-level network tool?

I don't know what motivated this, whether a real or a perceived threat but dkim uses cryptography to sign a message. The nice thing about signing is that it doesn't require encryption in-transit. Anyone could theoretically see the contents of the message but it is very hard/impossible to modify the email content without the signature validation failing.

The sender signs usually the message body and the "from" field with a private key and embeds the result into the message. The receiving server gets the public key for the domain from another DNS entry and performs a similar signature derivation but using the public key to verify that the message and from field have been untouched.

So far, so good. If we forward email to another server, we can handle this because the forwarder can add another dkim signature for their domain (which they need if they have used SRS) but otherwise can pass the original message unmolested.

So does this fix the original problem? In an word, no. DKIM only ties the domain of a "from" address and the message content to the signature. It provides no other protection or non-repudiation. If I am a spammer, although I would not be able to sign an email from microsoft.com and pass dkim, I could sign my own email from dodgy.com and it would pass the DKIM check. A common trick is for spammers and phishers to use domains that either use unicode characters to look visually like a real domain like paypa1.com (notice the last character of Paypal!) or otherwise something like paypal-security.com, which might be owned by an attacker and no paypal. Neither of these tricks is prevented with SPF or DKIM.

Enter DMARC

What do we do with our two largely broken protocols? We invent another, of course, to fix all the problems (without actually fixing anything!).

Domain-based Messaging Authentication is not much more than a bit of glue and some reporting functionality but doesn't really offer any useful above DKIM and SPF, just another thing to try and use and get wrong.

DMARC allows you to publish a policy via DNS that tells receivers how seriously to treat messages that fail SPF and/or DKIM. You can be strict, you can allow one or the other to succeed or you can just report the information directly to a published email address to analyse what people are trying to abuse your system.

We already know that SPF is easily broken with forwarders and that even the mighty DKIM both doesn't actually protect very much and can also be broken by various mail relays that "have to" change the contents of the message that has been signed - for example, adding a "scanned by Avast" line to the bottom.

The problem with DMARC is that you can be strict because there are too many known cases that just don't work and you will have email rejected. Most senders do not want this. Putting it to relaxed or report-only just generates noise. We know that there are loads of cases that will fail to enabling the reporting just sends you information that you already know about and can't fix like "forwarding sometimes breaks SPF".

Even if we got a report saying that an ip belonging to dodgy.com was trying to send emails as smartsurvey.co.uk, we can't do anything about that in most cases. The IP address might not be traceable, we might not have any legal route to approach someone registered overseas etc.

In the end, DMARC seems like a big waste of time.

What if everyone supported this stuff?

Well this is the big problem with the internet. Even if you bring in a protocol that solved all of these issues (and none of these do!), not only do you have to wait for people to upgrade, there are likely to be 1000s of servers that are redundant on the internet which will never be upgraded. It would also require that all custom builds of things like postfix and sendmail were also updated to use any new functionality - you simply can't do that.

You end up with a strange problem: you have to not trust the mail from servers you already trust like GMail and Yahoo since they will support the protocols that you use and trust mail from untrusted servers since they do not use the protocols that can establish trust!

There could potentially be a new version of SMTP or something similar (it would have to be similar to be adopted) that retains the original message body and "from" address signed with DKIM, where we wouldn't care which mail server sent it to us because they couldn't change it anyway. Any intermediate servers should not need to add different "from" addresses, they should just forward the message. If they need to add body content, it should be done in a different boundary of the message so it is clear which part of the message is safe and which isn't. Mail readers could prohibit interaction with the untrusted section. Otherwise the intermediate would have to sign the new message with its own dkim, although then the end-reader has to be more complicated to highlight that "this was from the original sender" and "this was from GMail who has forwarded it" etc.

Anyway, as I said in the title - ouch!

Wednesday, 27 November 2019

FakeItEasy "not working" on a really easy setup

We have used FakeItEasy for a while and it basically works pretty well and reads fluently but recently someone asked me to help them work out why setting their fakes to strict was breaking their tests.

The code was along the lines of this:
public void TestMethod()
{
  A.CallTo(() => testMock.FetchBody(bodyId)).Returns(new MessageBody());
  A.CallTo(() => testMock.UpdateStatus(bodyId, Status.Failed));

  TestTarget.CallMethod();

  A.CallTo(() => testMock.UpdateStatus(bodyId, Status.Failed)).MustHaveHappened();
}

but...

When set to non-strict, it was all fine. Set to strict, we got the error: FakeItEasy.ExpectationException: Call to non configured method "UpdateStatus(123,Failed)" of strict fake...

Hmm...

FetchBody worked and if I made the fake non-strict, not only did it work but the call to MustHaveHappened() also worked so the setup was definitely correct!

It was a subtle error (have you seen it yet?) but if a method does not return anything, you need to pass DoesNothing() to the setup to enact the setup in the fake, otherwise it's a bit like not finishing your sentence.

Since you cannot have a compiler error to signal that you have called a method but not used its return value, we saw nothing but the assertion error.

Thankfully, you can catch these using the very helpful FakeItEasy.Analyzer.CSharp nuget package, which uses the Visual Studio analyzer functionality to highlight issues including the one we created wrongly above!

Monday, 11 November 2019

Error or CrashLoopBackOff when deploying dotnet core container to Azure Kubernetes Service

This was really confusing because I was sure that the new service I was deploying to AKS was basically the same as one that was already working but when deploying, my pods were showing the following (with kubectl get pods):

0/1     CrashLoopBackOff     8     2m30

Where the 0/1 is the ready number and 8 is the number of restarts.

If you call kubectl describe pod/ you get stuff that isn't too helpful except it says it is terminated and the exit code: 140 - hmmm.

I then realised that you could call kubectl log -p and got something MUCH more helpful:

Error: An assembly specified in the application dependencies manifest Microservices.Messaging.deps.json) was not found: package: 'Dapper.Contrib', version: '2.0.30' path: 'lib/netstandard2.0/Dapper.Contrib.dll'

This is when I realised that I had copied the new (broken) Dockerfile from a service that runs unit tests and does not include any third-party packages (and is currently unused!) and the service that is actually working does not run unit tests and does have third-party packages.

What I had done was by adding in a unit test step, I had removed dotnet publish, thinking it was not doing anything special. Of course, what it does is package all referenced packages into the target directory ready for deployment. Added this back in and all was good with the world again.

Wednesday, 30 October 2019

SQL Server Backups from Linux to Azure Blob Storage

So I have a few specifics here that might make this harder than it would otherwise be:

1) SQL Server running on Linux (Ubuntu) in an Azure VM
2) Backing up to Azure blob storage
3) Using the amazing Hallengren helper scripts for backups
4) Scheduling these from SQL Agent jobs (the ones that get created when you run the Hallengren install script)

So here is how to make it work.

Azure Storage

Nothing too funky here, just create a normal storage v2 account and make it public (the container will be private). I don't know if the problems I had previously on private storage accounts were because they were private or because of something else that I have now fixed.

Create a Shared Access Signature for your SQL server to access the account with. These are nice because they are time-limited and affect the scope of what the token can perform. I only allowed blob access and removed the delete permission. You can optionally lock this down to an IP address. I gave mine 1 year but remember that you need a regular job to rotate these and if the access key is changed, these will stop working.

Copy the SAS WITHOUT the leading ? character.

Create a private container to store your backups in in the blob service section of the blade.

SQL Server Prep

Install the Hallengren scripts. This will create stored procs and some SQL agent jobs (if you haven't already, you might need to enable this on Linux). By default, they are installed into master.

Create a credential for use with an SAS by following this. Note that the following code will ONLY work for SAS. If you are using the access key, see the alternate instructions in the linked article.

IF NOT EXISTS 
(SELECT * FROM sys.credentials  
WHERE name = 'https://.blob.core.windows.net/') 
CREATE CREDENTIAL [https://.blob.core.windows.net/]
   WITH IDENTITY = 'SHARED ACCESS SIGNATURE', 
   SECRET = ''; 


Note that the identity must equal "SHARED ACCESS SIGNATURE", you cannot change that part. Remember to paste the SAS token WITHOUT the leading ? character.

Calling the Backup Script

The SQL Agent jobs created by the Hallengren script are designed for local backup to a directory. Instead, you should change the SQL that is called from the job to look like the following:

 EXECUTE [dbo].[DatabaseBackup]
@Databases = 'USER_DATABASES',
@URL = 'https://name.blob.core.windows.net/container',
@BackupType = 'FULL',
@Compress = 'N',
@Verify = 'Y'

But obviously check that the individual settings are correct for your job (full, partial etc). Note that there is an issue with @Compress in that it only works for certain versions of SQL Server and it was not supported on mine. You can always try Compress='Y' and you will simply get an error if it is not supported.

Do NOT use the @Credential parameter when using SAS, it will automatically find the correct credential from the URL.

When you run the script manually, you should see some output including any error messages and the actual command that is generated by the script. This should be enough to work out any problems with the script and there are certain limits like max database size and other options that are not supported when backing up to blob storage.

Tuesday, 29 October 2019

Dotnet core functional tests running in Docker

Sometimes you might see them call integration tests but either way, if you are using Docker to its fullest, then you don't want to install Dotnet on your build servers just to run functional tests against a Docker container with your app installed and fortunately, you don't have to.

The Test project

You can create whatever type of test project you want but I use NUnit but also with the amazing WebApplicationFactory class from the Microsoft.AspNetCore.Mvc.Testing nuget package, which allows you to run the app inside an in-memory server. For basic integration tests or calls to APIs, this is all you really need.

The WebApplicationFactory is simply a generic class so you will need to specify the type of Startup from your app under test but otherwise you simply create a new one and then call CreateClient() on it to return an HttpClient. Note that the factory and client are disposable so best to create them in OneTimeSetup and dispose them explicitly in OneTimeTearDown, although you can also create them per test (which is slower but might be cleaner).

Other than that, you simply call client.GetAsync("/api/version") or whatever and get a response that you can test. You can test razor pages and all sorts but I am only testing an API currently. Note that functional tests are not to test all variations of logic that you should try and test in unit tests (which are generally much faster) but instead to test high level journeys for both correctness and speed!

The Docker File

There are a couple of problems with the docker file produced by Visual Studio. Firstly, you should make sure that it is at a level above any folders it will need access to. For example, my test projects are siblings to my projects under test so the dockerfile needs to be moved up a level from the project. It is ok to have multiple docker files in the same directory but you will need to modify the build process accordingly! I think it also affects Visual Studio's ability to do the docker debugging so you might want a duplicate in the project folder without the test changes!

My file looks like this (note I am running on ubuntu): EDIT 11/11/2019 - added publish step to Dockerfile

FROM mcr.microsoft.com/dotnet/core/aspnet:2.2-bionic AS base
WORKDIR /app
EXPOSE 80

FROM mcr.microsoft.com/dotnet/core/sdk:2.2-bionic AS build
WORKDIR /src
COPY ["Microservices.Templates/Microservices.Templates.csproj", "Microservices.Templates/"]
RUN dotnet restore "Microservices.Templates/Microservices.Templates.csproj"
COPY . .
WORKDIR "/src/Microservices.Templates"
RUN dotnet publish "Microservices.Templates.csproj" -c Release -o /app

FROM build AS testrunner
WORKDIR /src
RUN dotnet restore "Tests/Functional/Functional.Templates.Tests/Functional.Templates.Tests.csproj"
RUN dotnet build
ENTRYPOINT ["dotnet", "test", "--logger:trx"]


FROM build AS publish

WORKDIR "/src/Microservices/Templates"
RUN dotnet publish "Microservices.Templates.csproj", -c Release -o /app

FROM base AS final
WORKDIR /app
COPY --from=publish /app .
ENTRYPOINT ["dotnet", "Microservices.Templates.dll"]


The ordering is slightly weird since some versions of dotnet seem to put the base and the build in different orders but anyway...

 The bold part is the important part. Up until then it is just a normal publish. The test layer has a name that we will invoke in the build server "testrunner" and all we then do is restore and then build the tests project and specify that when this container is run, it will run dotnet test and sets the logger format accordingly.

The cool thing with Docker builds is that subsequent builds will reuse any layers that are already built so when we build the deployment after testing, none of the previous layers will need rebuilding!

The Build Process

The build process is fairly simple, first we build the docker file using --target testrunner to ensure it only goes that far. We can also tag it with test so we can easily target it in the test step.

Executing the tests starts with creating a directory for test results and then runs our docker container with a volume mapping to the test results directory so that we can keep the results after the container exists (docker run --rm -v "$(pwd)"/TestResults:/app/tests/TestResults microservices.templates:test)

If this step fails then the build will fail with errors in the console (I haven't yet wired it up to Team City proper), otherwise we are good to continue.

In the final build step, we simply build without specify a target which builds every layer but due to the cache, my small API build only took 2 seconds!

Summary

The documentation is a bit all over the place for each of these but once you have it, you should fairly easily understand what is going on. Enjoy.

Wednesday, 23 October 2019

The request signature we calculated does not match the signature you provided

This is a very common error people receive when trying to use the Amazon SES SDK to send email. You follow all the instructions to build the most simple test application and when it comes to sending, you receive the following error:

The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

This is confusing. You didn't provide a signature, you used the SDK and presumably it provided a signature. It sounds like there are a number of ways to cause this error including having whitespace at the beginning or end of certain fields like email addresses (the server will trim these before calculating the signature) but a much more worrying broken use-case, even after 10 years is simply:

Your secret key has non alpha-numeric characters in it!

Yes, that's right. Although it is a base-64 type generated string, and although it was provided to you by Amazon in the IAM portal, if you have any + or / symbols, something happens somewhere with URL encoding and the signature process fails.

The only workaround is to keep re-generating the access key/secret pair until you get one with just alphanumerics then it works.

This is shocking for a company who famously employs the "best" engineers to produce their code. The truth is that they either don't care or don't have an agile enough development process which might be because they have too much legacy code and don't know the effect of any changes. I can't see why they can't just use alpha-numerics for all secret keys. They are long and random enough not to be guessable without + and /.

Tuesday, 8 October 2019

Problem when cloning Octopus Kubernetes deployment project

I am currently trying to find out if this is a bug but I had cloned a working Kubernetes project (with one deploy step) in Octopus Deploy and then changed the step to use the names of a different pod. However, afterwards when deploying either project, it would deploy and then delete the previous deployment, whichever of the two projects that happened to be.

 I always thought K8S identified deployments by name but the names are dynamically generated by Octopus in this case so that didn't (obviously) explain why these projects thought they were the same thing. I wondered if Octopus is keeping track of the history and explicitly deleting deployments?

The only clue currently is that the Octopus Step Id annotation was the same for both projects, even though all the other names, labels, annotations etc. were not.

What I did was delete the cloned step and re-created it manually by copying back in the new projects config. Once I had done this, the Step Id was different but more importantly, deploying the projects did not cause the other one to be deleted!