Elasticsearch 1.5: benchmarking your queries

Send to Kindle

speedWith the release of Elasticsearch 1.5.0 (probably) we will get the ability to benchmark our queries and see which parts of them is not fast enough for our use case. This gives us a great insight on what is being executed, how and where are the potential bottlenecks. Although it requires setting us a node for benchmarking I suspect this may come in handy for Elasticsearch users struggling with performance. Let’s have a look at this functionality to see what can we expect and how we can use it.

Setting up

The benchmark mode in Elasticsearch can’t be run on every Elasticsearch node that is working in the cluster. The reasons can be different, but in general you don’t want to overload your production cluster with benchmarking requests. Because of that, in order to run benchmarks you need to run Elasticsearch with node.bench property set to true. For example like this:

bin/elasticsearch –node.bench true

The other possibility is to set the node.bench property set to true in the elasticsearch.yml file for the nodes that we want to use for benchmarking.

Running benchmarks

Running a benchmark is as simple as sending a proper request to the _bench REST endpoint – for example like this:

curl -XPUT 'localhost:9200/_bench/?pretty' -d '{
 "name": "firstTest",
 "competitors": [ {
  "name": "post_filter",
  "requests": [ {
   "post_filter": {
    "term": {
     "link": "Toyota Corolla"
    } 
   }
  }]
 },
 {
  "name": "filtered",
  "requests": [ {
   "query": {
    "filtered": {
     "query": {
      "match_all": {}
     },
     "filter": {
      "term": {
       "link": "Toyota Corolla"
      }
     } 
    }
   }
  }]
 }]
}'

The above request defined two queries that will be run and will be tested – one using a post_filter and one using filtered query. The example response could look as follows:

{
 "status": "COMPLETE",
 "errors": [],
 "competitors": {
  "filtered": {
   "summary": {
    "nodes": [
     "Free Spirit"
    ],
    "total_iterations": 5,
    "completed_iterations": 5,
    "total_queries": 5000,
    "concurrency": 5,
    "multiplier": 1000,
    "avg_warmup_time": 6,
    "statistics": {
     "min": 1,
     "max": 5,
     "mean": 1.9590000000000019,
     "qps": 510.4645227156713,
     "std_dev": 0.6143244085137575,
     "millis_per_hit": 0.0009694501018329939,
     "percentile_10": 1,
     "percentile_25": 2,
     "percentile_50": 2,
     "percentile_75": 2,
     "percentile_90": 3,
     "percentile_99": 4
    }
   }
  },
  "post_filter": {
   "summary": {
    "nodes": [
     "Free Spirit"
    ],
    "total_iterations": 5,
    "completed_iterations": 5,
    "total_queries": 5000,
    "concurrency": 5,
    "multiplier": 1000,
    "avg_warmup_time": 74,
    "statistics": {
     "min": 66,
     "max": 217,
     "mean": 120.88000000000022,
     "qps": 8.272667107875579,
     "std_dev": 18.487886855778815,
     "millis_per_hit": 0.05085254582484725,
     "percentile_10": 98,
     "percentile_25": 109.26595744680851,
     "percentile_50": 120.32258064516128,
     "percentile_75": 131.3181818181818,
     "percentile_90": 143,
     "percentile_99": 171.01000000000022
    }
   }
  }
 }
}

As you can see Elasticsearch returned statistics for each defined competitor (the competitors property in the request), which allows us to define multiple queries for each competitor and test them at the same time. With the example response we can clearly see which query is faster.

Summary

Of course this blog post is only a quick view on the incoming Elasticsearch Benchmarking API, but it shows the potential of the new API. Once the API is finalized and released we will think of writing a more comprehensive example on how to use the API, what parameters are available and what we can expect from it.

Leave a Reply