ElasticSearch 0.90 – Sorting on Multi-Valued Fields

Send to Kindle

sortingWith the release of ElasticSearch 0.90 the users of this search engine will be able to sort on multi-valued fields and chose which value from that field to chose for sorting. Yes that’s right, till now you had to design a field especially for sorting, use scripts or multi-fields. Now, you don’t have to do anything like that. Let’s look how we can use this new functionality.

Let’s Start

Sorting on multi-value fields was not a good idea till now. ElasticSearch didn’t know which value from the ones present in the index to chose, and how to finally sort the result documents, which could result in an error like the one shown in the section of the book dedicated to sorting. There were some ways of handling multiple values that were present in a field – like using a multi field of which one was set to not_analyzed, using scripts and losing performance or creating a different fields that were used for sorting. With the release of ElasticSearch 0.90.0.Beta1 we can add the sort_mode property to the sort section of our query to inform ElasticSearch which value in the field should taken into consideration by ElasticSearch when sorting our search results. Let’s see how it works.

Sample Mappings

The mappings we will use in the example are as follows:

{
 "mappings" : {
  "book" : {
   "properties" : {                
    "id" : { "type" : "long", "store" : "yes" },
    "title" : { "type" : "string", "store" : "yes", "index" : "analyzed" },
    "price" : { "type" : "float", "store" : "yes" }             
   }
  }
 }
}

Those mappings will be stored in the book.json file and we will use the following command to create the books index:

$ curl -XPOST 'localhost:9200/books' -d @book.json

Data Used in the Example

Our example data will be stored in the data.json file and will look like this:

{ "index": {"_index": "books", "_type": "book", "_id": "1"}}
{ "title": "All Quiet on the Western Front", "price": [29.99, 39.99]}
{ "index": {"_index": "books", "_type": "book", "_id": "2"}}
{ "title": "Catch-22", "price": [19.99, 21.99]}
{ "index": {"_index": "books", "_type": "book", "_id": "3"}}
{ "title": "The Complete Sherlock Holmes", "price": [19.99, 29.99]}
{ "index": {"_index": "books", "_type": "book", "_id": "4"}}
{ "title": "Crime and Punishment", "price": [19.99, 49.99]}

In order to index the above batch file to ElasticSearch we will use the following command:

$ curl -s -XPOST 'localhost:9200/_bulk' --data-binary @data.json

Sorting in ElasticSearch 0.90.0.Beta1

Now, let’s look at the same with the new sort_mode property. The modified query would look like this:

{
 "query" : {
  "match_all" : {}
 },
 "sort" : [
  { 
   "price" : {
    "order" : "asc", 
    "sort_mode" : "max"
   }
  }
 ]
}

The query asks for all the documents and sorts them on the basis of the maximum value that is present in the price field. The response returned by ElasticSearch is as follows:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : null,
    "hits" : [ {
      "_index" : "books",
      "_type" : "book",
      "_id" : "2",
      "_score" : null, "_source" : { "title": "Catch-22", "price": [19.99, 21.99]},
      "sort" : [ 21.99 ]
    }, {
      "_index" : "books",
      "_type" : "book",
      "_id" : "3",
      "_score" : null, "_source" : { "title": "The Complete Sherlock Holmes", "price": [19.99, 29.99]},
      "sort" : [ 29.99 ]
    }, {
      "_index" : "books",
      "_type" : "book",
      "_id" : "1",
      "_score" : null, "_source" : { "title": "All Quiet on the Western Front", "price": [29.99, 39.99]},
      "sort" : [ 39.99 ]
    }, {
      "_index" : "books",
      "_type" : "book",
      "_id" : "4",
      "_score" : null, "_source" : { "title": "Crime and Punishment", "price": [19.99, 49.99]},
      "sort" : [ 49.99 ]
    } ]
  }
}

As you can see the data was sorted without any error and in the desired way.

Sort Mode Options

Before I’ll call this post done, I would like to write a few words about the new sort_mode parameter. It is used to tell ElasticSearch what value we want to be chosen for sorting. We have the following options:

  • min – the lowest value present in the field will be used for sorting,
  • max – the highest value present in the field will be used for sorting,
  • avg – the average of all values present in the field will be used for sorting,
  • sum – the sum of all the values present in the field will be used for sorting.

Now, the nice thing about the sort_mode is when sorting on multi-valued fields, depending on the sort order, the min or max value will be used automatically for the sort_order. When desc sort order is used the max value will be used as the default one, when asc sorting is used the min value of the sort_mode is used.

2 thoughts on “ElasticSearch 0.90 – Sorting on Multi-Valued Fields

  1. Ramesh says:

    Use “mode” instead of “sort_mode”.

    “sort” : [
    {
    “price” : {
    “order” : “asc”,
    “mode” : “max”
    }
    }

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>