ElasticSearch 0.90 – Search Shards API

Send to Kindle

searchElasticSearch is designed to work with indices that are built of multiple shards and replicas and you probably have such indices in your cluster. Sometimes it may be handy to see which shard will the query be exectued at. Before ElasticSearch 0.90 you could run a query and check the stats to see that, but now we can use the Search Shards API. Let’s look on how we can use this API by using a simple example and queries that does and doesn’t use routing.

Simplest cluster

In order to test how the API works let’s create the simplest possible cluster using ElasticSearch 0.90.3 – a single node with a single index on it created using the following command:

curl -XPOST 'localhost:9200/test'

And that’s all, it should be enough to see, what we can expect from Search Shards API. Just as a reminder – the above command will create an index called test, which will be built of 5 primary shards and one replica (default behavior, which means that each shard will have one replica). Of course we have only a single shard, so our replicas won’t be assigned and thus only replicas will be operational.

Search Shards API without routing

As you probably know, when ElasticSearch executes a query, by default it will be executed in two phases: scatter and gathering. During the scatter phase, the query will be forwared to all the relevant shards, and during gathering phase the results are gathered from the shards. How those phases are executed depends on the search type used, but now we will concentrate on the default ElasticSearch behavior. In theory the simplest match_all query should be executed against all shards – so let’s use the Search Shards API (_search_shards endpoint) to check that. We do that by running the following command:

curl -XGET 'localhost:9200/test/_search_shards?pretty' -d '{"query":"match_all":{}}'

The response to the above command would look as follows:

{
  "ok" : true,
  "nodes" : {
    "N0iP_bH3QriX4NpqsqSUAg" : {
      "name" : "Oracle",
      "transport_address" : "inet[/192.168.1.19:9300]"
    }
  },
  "shards" : [ [ {
    "state" : "STARTED",
    "primary" : true,
    "node" : "N0iP_bH3QriX4NpqsqSUAg",
    "relocating_node" : null,
    "shard" : 0,
    "index" : "test"
  } ], [ {
    "state" : "STARTED",
    "primary" : true,
    "node" : "N0iP_bH3QriX4NpqsqSUAg",
    "relocating_node" : null,
    "shard" : 1,
    "index" : "test"
  } ], [ {
    "state" : "STARTED",
    "primary" : true,
    "node" : "N0iP_bH3QriX4NpqsqSUAg",
    "relocating_node" : null,
    "shard" : 4,
    "index" : "test"
  } ], [ {
    "state" : "STARTED",
    "primary" : true,
    "node" : "N0iP_bH3QriX4NpqsqSUAg",
    "relocating_node" : null,
    "shard" : 3,
    "index" : "test"
  } ], [ {
    "state" : "STARTED",
    "primary" : true,
    "node" : "N0iP_bH3QriX4NpqsqSUAg",
    "relocating_node" : null,
    "shard" : 2,
    "index" : "test"
  } ] ]
}

As you can see in the response we’ve got information about our nodes (in our case it is a single node that is present in the cluster) and all the shards that will be used during query execution. We can see shard state information, if it is a primary or not, which node it is allocated to, if the shard is being relocated and to which node, the shard number and the index it belongs to. Just as we would expect, we’ve got all the shards for our index returned, which means that the query will be executed against the mentioned shards.

Search Shards API with routing

And now let’s check what the Search Shards API response will look like, when we add routing. In theory, our query should only hit a single shard. In order to check that let’s use the following command:

curl -XGET 'localhost:9200/test/_search_shards?pretty&routing=a' -d '{"query":"match_all":{}}'

The response to the above command would look as follows:

{
  "ok" : true,
  "nodes" : {
    "N0iP_bH3QriX4NpqsqSUAg" : {
      "name" : "Oracle",
      "transport_address" : "inet[/192.168.1.19:9300]"
    }
  },
  "shards" : [ [ {
    "state" : "STARTED",
    "primary" : true,
    "node" : "N0iP_bH3QriX4NpqsqSUAg",
    "relocating_node" : null,
    "shard" : 0,
    "index" : "test"
  } ] ]
}

You can see, that this time the query will be only executed against a single shard of the test index – shard with the number of 0. If we would change the routing parameter value to b instead of a, the response would be as follows:

{
  "ok" : true,
  "nodes" : {
    "N0iP_bH3QriX4NpqsqSUAg" : {
      "name" : "Oracle",
      "transport_address" : "inet[/192.168.1.19:9300]"
    }
  },
  "shards" : [ [ {
    "state" : "STARTED",
    "primary" : true,
    "node" : "N0iP_bH3QriX4NpqsqSUAg",
    "relocating_node" : null,
    "shard" : 1,
    "index" : "test"
  } ] ]
}

You can see that we are hitting a different shard now – the one with the number of 1.

Summary

As you can see, the described API can be useful when debugging queries problems or checking which shards and nodes will be used by our queries. This can be very useful, for example when diagnosing performance problems.

Tagged , , ,

Leave a Reply