ElasticSearch 0.90 – Using Rescore

Send to Kindle

signSometimes it is handy to change ordering of documents already returned by the query. The reasons for such behavior can vary. One of the reasons may be performance – for example calculating target ordering is very costly in terms of performance and we would like to do this on the subset of documents returned by the original query. At the first look rescore gives many great opportunities for business cases. This short article will verify the usefulness of this function.

What is rescore?

Rescore in the ElasticSearch is the process of recalculation of the score for defined number of  documents returned by the query. This means, that ElasticSearch takes first n documents for given query and calculate their score using provided rescore definition.

Example Data

Our example data is identical as the ones used in article about suggester:

{"index": {"_index": "library", "_type": "book", "_id": "1"}}
{ "title": "All Quiet on the Western Front","otitle": "Im Westen nichts Neues","author": "Erich Maria Remarque","year": 1929,"characters": ["Paul Bäumer", "Albert Kropp", "Haie Westhus", "Fredrich Müller", "Stanislaus Katczinsky", "Tjaden"],"tags": ["novel"],"copies": 1, "available": true, "section" : 3}
{ "index": {"_index": "library", "_type": "book", "_id": "2"}}
{ "title": "Catch-22","author": "Joseph Heller","year": 1961,"characters": ["John Yossarian", "Captain Aardvark", "Chaplain Tappman", "Colonel Cathcart", "Doctor Daneeka"],"tags": ["novel"],"copies": 6, "available" : false, "section" : 1}
{ "index": {"_index": "library", "_type": "book", "_id": "3"}}
{ "title": "The Complete Sherlock Holmes","author": "Arthur Conan Doyle","year": 1936,"characters": ["Sherlock Holmes","Dr. Watson", "G. Lestrade"],"tags": [],"copies": 0, "available" : false, "section" : 12}
{ "index": {"_index": "library", "_type": "book", "_id": "4"}}
{ "title": "Crime and Punishment","otitle": "ÐÑеÑÑÑплéние и наказáние","author": "Fyodor Dostoevsky","year": 1886,"characters": ["Raskolnikov", "Sofia Semyonovna Marmeladova"],"tags": [],"copies": 0, "available" : true}

Query

Let’s use a simple query that looks like this:

{
 "fields" : ["title", "available"],
 "query" : {
   "match_all" : {}
  }
}

It returns all the documents from the index. Every document returned by the query will have the score equal to 1.0. This is enough to show how rescore affects our result set. One more thing about the query  – as you can see we are specifying which fields we want in the results for each document – the title and available ones.

Structure of the rescore query

The example query with rescore looks like this:

{
  "fields" : ["title", "available"],
  "query" : {
    "match_all" : {}
  },
  "rescore" : {
    "query" : {
      "rescore_query" : {
        "custom_score" : {
          "query" : {
            "match_all" : {}
          },
          "script" : "doc['year'].value"
        }
      }
    }
  }
}

In the above example, in the rescore object you can see a query object. In this version of the ElasticSearch query is the only option, but in the future versions we may expect other ways to affect the resulting score. In our case we use a simple query that returns all documents and every document has score equal to value of year field (please, don’t even ask about the business sense of this query;))

If we save this query in the query.json file and send it using the following command:

curl localhost:9200/library/book/_search?pretty -d @query.json

we should see the following documents (I omit the structure of the response):

"_score" : 1962.0,
"title" : "Catch-22",
"available" : false
--
"_score" : 1937.0,
"title" : "The Complete Sherlock Holmes",
"available" : false
--
"_score" : 1930.0,
"title" : "All Quiet on the Western Front",
"available" : true
--
"_score" : 1887.0,
"title" : "Crime and Punishment",
"available" : true

As we can see ElasticSearch found all the documents from the original query. Now look at the score of the documents. ElasticSearch took the first N documents and applied the second query to them. In the result the score of those documents is the sum of the score from the first and the second query.

Now let’s see how to tune this behaviour and what parameters are available.

Rescore parameters

In the query under the rescore object we may use the following parameters:

  • window_size (defaults to sum of from and size parameters) – information connected with the N documents mentioned above. The window_size parameter is the number of documents used for rescoring on every shard.
  • query_weight (defaults to 1) – the resulting score of the original query will be multiplied by this value before adding the score generated by rescore.
  • rescore_query_weight (defaults to 1) – the resulting score of the rescore will be multiplied by this value before adding the score generated by the original query.

To sum up: the target score for the document is equal to:

original_query_score * query_weight + rescore_query_ score * rescore_query_weight

At the end

Sometimes we want to show results, where the ordering of the first documents on the page is affected by the additional rules. Unfortunately this cannot be achieved by the rescore functionality. The first idea points to window_size parameter, but this parameter in fact is not connected with the first documents on the result list but with number of results returned on every shard.  In addition window_size cannot be less than page size. (If it is less, ElasticSearch silently use page size). Also, one very important thing – rescoring cannot be combined with sorting, because sorting is done after changes introduced by rescoring.

The above – mentioned limitations and the lack of possibility of using several different rescorings (for example one rescore definition for first three positions on the result list, the second for the following five) limits the usefulness of this functionality and should be remembered when using this functionality.

But remember that this is a very new feature and we will probably see more of it in the future :)

2 thoughts on “ElasticSearch 0.90 – Using Rescore

  1. [...] use the simplified version of the data that was used when we’ve looked at the Rescore functionality (we store it in the bulk.json [...]

  2. Meidan says:

    When you say “rescoring cannot be combined with sorting”, you mean sorting by custom fields or also sorting by score?

    Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>