Thursday, 4 January 2018

ElasticSearch, LogStash and Kibana (ELK) on Azure App Services

Edited: 28th June

ELK is a nice logging stack that provides a log sanitizer called Logstash, a NoSql style database called ElasticSearch and a graphing tool called Kibana. It is Open Source and most of the tooling is free to use. For many small companies, this model works well until we have enough money to get paid support or larger hosted systems.

There are a couple of problems, however, when trying to extend its use to App Services:
  1. A lot of .Net tutorials are out of date
  2. Azure has changed a lot over the past few years in terms of what is and isn't possible
  3. It is easier if you only want to use a logging framework e.g. log4net but harder if you want to grab IIS logs directly
  4. Lots of solutions involve external storage, which is more cost, especially if you need a dedicated server just to parse the storage for log files.
  5. Lots of the solutions don't look like they would scale well
  6. There are some old and some dodgy looking plugins for things like log4net but these are specific to explicit logging and don't work for things like IIS or other system logs. It is also hard to know if there are any performance problems with these.
This is my attempt at documenting how I have nearly got this all working using the latest version of LogStash (6.1.1) and ElasticSearch.

I will update this article as I have carried out more steps!

Elastic Search

Elastic Search is fairly easy to install using the instructions here (use these to ensure you get the latest versions). These are for Ubuntu/Debian 16:04 and as per good-practice, you should try and use a dedicated server (I'm using a VM in our office server). This should have at least 16GB and initially, it should be setup as 8GB for Java and 8GB for Lucene according to this article. I haven't bothered with that for now, but it will be critical in larger production environments.

The instructions talk about tweaks you can make to elasticsearch.yml, I edited the following:

  • cluster.name: something_useful
  • node.name: node-1   # Could be whatever but for now I am just using 1 node
  • index.number_of_shards: 1   # In my case, 1 shard and no replicas
  • index.number_of_replicas: 0
(Edit: Removed indexes which can no longer be set in newer versions of ES)

I did initially change the network.host setting (to my interface IP address - it is localhost by default) to ensure I could access the database from outside of the VM. I won't go into details here but clearly you need to decide what needs to be able to route to this new box, what DNS needs setting up, tunnels, port-forwards etc.

Using the default port, I first tried curl -X GET 'http://localhost:9200' to check that it worked correctly (it did!) and then ran the same command from another machine on the network to ensure I had setup the VM networking correctly (as always, do things step-by-step to make it easier to debug problems).

I then started thinking about security and decided to proxy Elastic Search behind nginx for a number of reasons.

nginx proxy

There are proxies other than nginx but it is lightweight, new and well-supported so it made a good choice for my ubuntu server. I installed it from ubuntu repos and then modified the config to provide the following advantages:

  • Basic authentication, which isn't supported by ElasticSearch without their paid x-pack tooling. This allows me to only allow access to users with the correct password
  • SSL termination. Not setup yet, but if nginx is on another server, it can be dedicated to terminating SSL and proxying onwards to Elastic Search over plain http
  • Re-use of connections for clients that don't support persistent connections
  • Locking down of various parts of the elasticsearch API using location tags.
In my case, I followed the instructions here and used the bits I wanted and modified the default configuration (since I won't be running other sites on this box):

upstream elasticsearch {
    server 127.0.0.1:9200;
    keepalive 15;
}

server {
    listen 80 default_server;
    listen [::]:80 default_server;

    # Look at apache-utils package to use htpasswd to generate this file
    auth_basic "Protected";
    auth_basic_user_file /etc/nginx/passwds;

    location ~* ^(/production)|(/_bulk) {
        proxy_pass    http://elasticsearch;
        proxy_http_version 1.1;
        proxy_set_header Connection "Keep-Alive";
        proxy_set_header Proxy-Connection "Keep-Alive";
        proxy_redirect off;
    }

    # Block anything not specifically allowed
    location / {
        return 403;
    }
}

Note I haven't enabled SSL yet and there is another block to allow the root location, used by LogStash. Since I didn't know how to combine this with the _bulk regex, I just duplicated the block and used location = /

Once again, I reloaded nginx and checked that I could now access elasticsearch from outside of the server on port 80 and NOT any more on port 9200. I will switch the Elastic Search server firewall on at some point to only allow 80 and 22 but for now, the server is internal so not critical.

Posting to Elastic Search

Before cracking on with LogStash, I decided to ensure that I could post a message to ES from outside of the server using Curl on my local Windows box (using Git bash if you want to know!) using the command from the previous installation instructions: curl -X POST 'http://myserverurl.local/production/helloworld/1' -d '{ "message": "Hello World!" }'

This worked and proved the basic mechanism was all good.

Logstash

Logstash is the application that can read from a variety of sources and then format messages into a format that works with ElasticSearch. In fact, you can use it without elastic search but it is designed to work well so without re-inventing the wheel, I decided the first step was to set this up on my Windows machine and start by outputting IIS log data to ElasticSearch (since this is what I will eventually do).

I find an article here but unfortunately, it was based on an older version of LS, so I have listed my instructions below which need to be used in addition to the instructions linked.

  1. I didn't bother putting all the server logs into a single file since in my ultimate usage, there will be only one site whose logs I need. I simply ensured a single local site was outputting to a specific location and ensured all the boxes were ticked.
  2. I ensured my JRE 1.8 was up to date (it was already installed)
  3. I added a JAVA_HOME environment variable pointing to the JRE root directory
  4. Once Logstash is downloaded DON'T put it into program files, since the batch file has a problem with path spaces. I put it instead into C:\Elastic\Logstash-6.1.1
  5. The memory tweaks are possibly not needed, they are not in logstash.bat but config\jvm.options and are set to 1G by default
  6. Note that the conf file you need to create must NOT go into the config directory but a separate directory (called conf if you want). This is because logstash will cat all the files in the specified directory together to form a master config file.
  7. I copied the example config which was mostly correct but I did need to make some changes since the supplied example does NOT work with the latest versions of LS.
    1. The type at the top is used for the index name in ElasticSearch so choose something you are happy with
    2. The path obviously needs to match where your logs are located
    3. The match line was correct!
    4. The syntax of event["name"] is deprecated and causes an Error. You need to change these to event.get("name") or event.set("name", value).
    5. I removed the dns block which attempts a reverse lookup of the ip address. I don't imagine this is particularly useful or efficient except in a corporate environment.
  8. The format of the elasticsearch output block is incorrect
    1. Remove embedded. It is invalid
    2. host should now be hosts and is like ["http://yoururl:8000"] (it takes a number of combinations of format but is basically a URL)
    3. port and protocol should go in the hosts entry, they cannot be specified outside
    4. Set index to use the correct format. You can use an autoindex style so that a new index gets created for e.g. each month in this example. This makes searches on a month faster at the cost of large indexes, whereas doing e.g. per day would be slower for month queries but would cause smaller indexes which would be better for lots of daily queries.
    5. If you are using basic auth, you need to set user and password appropriately.
I didn't use the Windows service approach since I will be installing it on App Services but to test it, I first used the stdout output and commented out elasticsearch. When you run the batch file, you need to run .\Logstash.bat -f ../conf from the command line (in the bin folder). The argument agent is now illegal (and not needed).

The actual output might take a while and depends on whether your site has logged anything. IIS logs are fairly low priority and you might have to hit the site a number of times to cause something to happen in the command window (you should see json documents displayed).

If you get an error, the first error line should have the details in it, the subsequent ones are likely to have non-useful information like call stack information.

Logstash to ElasticSearch

I then decided to comment out stdout and put elasticsearch back in. This didn't work initially because I realised that Logstash attempts to call the root of the ES API in order to get a heartbeat and initially, I didn't have this allowed in my nginx config.

Once I added that, I then realised it was calling /_bulk/ instead of /production/ so I also added that in. After restarting nginx, Logstash automatically got going again and added the records into elastic search.

I tested this by running this in a browser: http://myserverurl/production/_search?pretty=true just to ensure that there were records returned.

Installing Kibana

We actually use Mosaik to aggregate dashboards so I am hoping that I can output what I produce in Kibana to Mosaik but it won't be the end of the world if not.

Kibana is produced by Elastic and so it plays nice with the data in ES and can be used to produce easy graphs like counts of certain events, frequency over time etc.

In good-practice again, I have created a separate VM for Kibana. This allows for isolation and allows both Kibana and Elastic Search to be independently scaled at a later date as the company grows. I did this on the same VM host so I have chosen to use a host-only network for Kibana to talk to Elastic Search. This allows me to expose Elastic Search to the host-only network and localhost but NOT to the public facing interface.

Installing Kibana is from repository, like ElasticSearch.

Once installed, I attached the VM to the same host-only network as the ElasticSearch server so that it could have host-only communication. This means I edited my ElasticSearch.yml file to include my host-only IP address in the array of network.host so that I can contact Elastic Search directly from the Kibana box.

I didn't change many settings in /etc/kibana/kibana.yml:

server.host: "0.0.0.0"  # This server is not visible outside of the company so no need to tie down web access
server.name: "whatever it is - used by the software for display purposes"
elasticsearch.url: "http://172.16.0.46:9200"     # 172.. is the host-only ip address of the ES server

I am not too bothered about SSL internally on the host-only network but I might eventually link to elasticsearch via nginx and therefore need to set the basic auth (which I can then use to lock down the endpoints that are allowed for Kibana differently than those that are used by external logstashes.

I had installed an old version of elastic search from earlier instructions, so I ended up purging the lot and installing again, since I was seeing all manner of errors caused by a messy upgrade. I therefore ran my local desktop LogStash to populate the ES database again before attempting to use Kibana.

Basic Kibana Usage

I don't have space to go into massive details - mostly because I don't know enough - but once Kibana is installed, you should be able to go to http://yourkibanaserver.com:5601/ and if everything has gone OK, you should see a neat home page. If not, you will see a grid with any warnings or errors (which was where I noticed the old version of ES and had to fix it!).

If you have data, which I did, I clicked the button on the top right to create an index. An index is a dataset in ES terms and will match the index that you created in LogStash when inserting the data into ES. You can have separate indexes for separate data like IIS logs, app logs etc. and I assume you can join these indexes although I would expect joins to be expensive compared to queries in a single index.

In my case, I have a single index called production so I typed that into Kibana and it instantly told me it had matched an index in ES (which was nice!)

I then decided to try some visualisations. This was really easy (but watch out for the time range filter in the top-right of the screen!). Click Visualize in the menu and choose a graph type. I started with metric which is a basic way of counting stuff. Choose your index name and it should default to showing a count of the items in that index. If you change the time range in the top-right of the screen, the number will change directly. If you want to change what is being counted, you click the blue arrow next to the Metric name in the left, which will reveal a number of aggregation properties. You can also add filters at the top, for instance, only those that equal /api/request1, and these can be individually applied and switched off (but don't accidentally delete it!). You can save your graph in the top right and it will be added to the list in the Visualize page.

I then tried a vertical bar chart, chose my same index. Again, it defaults to a useful Y axis (which is count) and then you have to tell it what to use for the X axis. You choose this from the buckets list and then if you choose Date Histogram and the relevant timestamp field in the data, Auto Interval works quite well. Then Hit the play button under the index name in the top-left of the designer to apply the changes. If you change the time range, the X axis will automatically re-scale, which is really useful.

Next, I want to use WebJobs to pull IIS logs from my App Services.

WebJobs

WebJobs are a new way in Azure for applications to run startup or background tasks. I am going to write one to run LogStash to sit and watch IIS logs and attempt to push them to my corporate ES database and see whether they start coming in...

I have realised that LogStash itself is way to big and bloated to run on an App Service. The files themselves are over 100MB (since they include jruby) and require Java to run - which might not be a massive problem. So.....We can use FileBeat instead. It is much smaller than LogStash and runs without external modules as an exe (30MB). There are various modules that might not be required but the bulk of the size is the exe so it looks like the ticket....BUT....

....Filebeat has nowhere near the functionality that LogStash does to parse and modify log entries. This is fine if your log data is already rich enough but, for instance in IIS, where the log data has no context and where LogStash can do this, FileBeat cannot so what I need to do is create a web job that uses FileBeat to simply watch the log files and then forward them to a LogStash server, which also can be run in our office server environment, and which can do the parsing of the log files, converting them into decorated objects and then passing them directly to ElasticSearch.

Installing and Testing Filebeat locally

To continue with best-practice, I downloaded a zipfile for Windows with Filebeat in it and extracted it onto my local machine. I then edited the default filebeat.yml config file (this is what filebeat looks for by default so might as well use this one!).

I decided to test it reading some real Azure IIS logs (where we don't have control over what is logged - I don't think) and write them to the known-working elastic search API. I downloaded a couple of IIS logs from an Azure App Service and put them into c:\temp and then edited filebeat.yml to set the following:

  • set enabled to true for the log prospector
  • set the log paths to include c:\temp\*.log (note paths is an array and needs the - for entries)
  • set exclude_lines to ['^#'] to exclude IIS comments
  • set index.number_of_shards to 1. Not sure if this is relevant but I only have 1!
  • setup the output for elasticsearch to include my (now https) API endpoint and also setup protocol to https and the username and password for basic auth. Note that basic auth only makes sense over https otherwise anyone on the network can read it!
In preparation for the web job (which according to this tries to run something called run.bat/exe etc. and created a run.cmd in the same directory as filebeat.exe. run.cmd has to be in the root of the zip file I will upload to azure to run filebeats for me. Inside run.cmd, I simply have .\filebeat.exe (note it will look for a default config file in the same directory, these can be overridden if needed)

There are now two jobs I can do in either order. I need to setup LogStash on my central server and get FileBeats to pass logs via LogStash, which will put them into a useful format before storing them in Elastic Search and secondly, I need to test that App Services will run my cmd executable from the zip file and send the logs successfully over the internet to my database.

I'm going to setup logstash first because that is internal to my network and safer to play with.

Setup Logstash Centrally

As mentioned, filebeat is lightweight enough for Azure but doesn't include the "grok" of logstash to transform log entries into decorated objects for Elastic Search. Rather than create another cloud server, I have instead created a central one at the office and have the multiple instances of FileBeat pass their messages to the single server for processing. Eventually, I guess, I can scale out the number of logstash instances if needed.

It generally makes sense to keep these things on separate servers. Sharing servers means that you might not see some problems now that you will see if you ever do want to separate them out. You could use Docker if you want slightly more lightweight servers but these programs run on Java (urgh) and therefore need plenty of beef to run correctly.

Installing logstash was as easy as installing the other components using apt-get via the elastic repository. There are obviously lots of things you can configure in logstash but the basics are quite easy. My central logstash config looks something like this:

input {
  beats {
    port => "5044"
  }
}

filter {
  # The code here from the example above under "Logstash"
}

output {
  elasticsearch {
    hosts => ["http://172.16.0.20:9200"]
    index => "%{[log_type]}-%{+YYYY.MM}"
  }
}

The main differences compared to the above example are that I have now added a field to the FileBeat config called log_type, which allows me to index things like IIS and app logs separately by giving them different names. I also connect to elasticsearch via a host-only network so I don't need to waste CPU using https. I still have elasticsearch available via nginx on port 443 but for this kind of direct access, I might as well go directly to the elastic and avoid nginx.

NOTE: The index names must be lowercase, otherwise the export to elasticsearch will fail!

The Filebeat config to add the new variable is:

- type: log
  enabled: true
  paths:
    - D:\home\LogFiles\http\RawLogs\*.log
  exclude_lines: ['^#']
  fields: 
    log_type: iis
  fields_under_root: true

Note the use of "fields_under_root", which allows you to access the field at the top level of the Logstash code, as above.

I also set up nginx as a proxy again for logstash, which allows me to separate the ssl proxy later on and allows me not to worry about SSL in both my logstash receive channel and also in the output to elasticsearch.


Post a Comment