Thursday, 9 March 2017

Could not establish trust relationship for the SSL/TLS secure channel with authority

This was a surprising and annoying error we experienced on Microsoft Azure when our web app was calling a WCF web service but it was only happening randomly.

Fortunately, I knew certain things worked which made it easier to narrow-down the problem.

I knew the web service worked, I knew I could connect to it with a valid SSL certificate chain, the only variable was that I was using Azure Traffic Manager to balance load between Japan and Ireland. Normally, the web apps in their respective areas would get sent to their local web service but in the unlikely event an entire web service dies, Traffic Manager could send the request to the other data centre.

Every now and then, I would see the following error:

The underlying issue is that when you make an SSL connection to a load balancer, the load balancer terminates the SSL (usually) so that if your request gets sent to a different web server, your connection stays up OK.

HOWEVER, if your request gets sent to another load-balancer, which would happen if Traffic Manager decided your previous one was unavailable, then the SSL connection cannot be resumed on the new load-balancer and you get the error above which, as often, contains a misleading message.

You wouldn't notice this effect in a browser since browser will automatically retry the connection if it drops, in which case they would reestablish the connection to the new load balancer and carry on. The call from the web app to the web service uses SvcUtil.exe to create a proxy class and this doesn't have any built-in functionality for reestablishing dropped SSL connections, it will instead throw the Exception and fail.

There is a project that provides some error handling for web service clients provided here, which I haven't tried but which looks like it might get around the problem.

I have worked around the problem by disabling Traffic Manager for the app to web service call so it is always local, which opens up a small risk if one web service died, but it should be OK for now.

Friday, 3 March 2017

Azure Traffic Manager shows degraded status for App Services https

I was surprised to see that the endpoints that Azure Traffic Manager was monitoring were showing degraded.

I looked into it and Google said that the Traffic Manager would check for a 200 response (and it won't follow 3xx responses) from the site but from where was it calling?

I thought that the problem might be the http->https redirect I had on the site so I needed the Traffic Manager to call the https endpoint and not the http one but when you click on the endpoint and press Edit, it doesn't show the endpoint.

What you need to do INSTEAD is to click Configure on the Traffic Manager itself and set the endpoint location in there:

Note that I am using the favicon in the path. The reason for this is that if I hit the default endpoint (/) it might cause a redirect to another page. Favicon is a nice static known resource that should always return 200. You could, of course, point it to anything else.