ElasticSearch 0.90 – Using suggester

Send to Kindle

Light BulbWhen we wrote ElasticSearch Server book, the latest available version of this search server was 0.20 (to be honest, it was 0.19 and we’ve used unreleased version).  A few days after the publication of the book we have the 0.90.0.Beta1 version available with cool, new features. One of the most important things that were introduced was the suggestion query API.  Let’s look how it works.

What is a suggester?

Suggester is often called “Did you mean” functionality – the system tries to show alternative to user input – correct his errors and typos.

Example Data

As an example data we will use the following information about classic books:

{"index": {"_index": "library", "_type": "book", "_id": "1"}}
{ "title": "All Quiet on the Western Front","otitle": "Im Westen nichts Neues","author": "Erich Maria Remarque","year": 1929,"characters": ["Paul Bäumer", "Albert Kropp", "Haie Westhus", "Fredrich Müller", "Stanislaus Katczinsky", "Tjaden"],"tags": ["novel"],"copies": 1, "available": true, "section" : 3}
{ "index": {"_index": "library", "_type": "book", "_id": "2"}}
{ "title": "Catch-22","author": "Joseph Heller","year": 1961,"characters": ["John Yossarian", "Captain Aardvark", "Chaplain Tappman", "Colonel Cathcart", "Doctor Daneeka"],"tags": ["novel"],"copies": 6, "available" : false, "section" : 1}
{ "index": {"_index": "library", "_type": "book", "_id": "3"}}
{ "title": "The Complete Sherlock Holmes","author": "Arthur Conan Doyle","year": 1936,"characters": ["Sherlock Holmes","Dr. Watson", "G. Lestrade"],"tags": [],"copies": 0, "available" : false, "section" : 12}
{ "index": {"_index": "library", "_type": "book", "_id": "4"}}
{ "title": "Crime and Punishment","otitle": "ÐÑеÑÑÑплéние и наказáние","author": "Fyodor Dostoevsky","year": 1886,"characters": ["Raskolnikov", "Sofia Semyonovna Marmeladova"],"tags": [],"copies": 0, "available" : true}

We put this data in documents.json file and index into ElasticSearch using command:

$ curl -XPUT localhost:9200/_bulk --data-binary @documents.json

Query

This is the simplest query – it fetches all of the documents from the index:

{
 "query" : {
  "match_all" : {}
 }
}

If we store it in the query.json file, we can send it to ElasticQuery using command:

$ curl localhost:9200/library/book/_search?pretty -d @query.json

Structure of the suggest query

Note: This part describes suggester as available in development version of 0.90Beta2. The difference with the released version 0.90Beta1 is the naming of this suggester. If you try this example with 0.90Beta1 just change the “term” suggestion type to “fuzzy”. Let’s modify our previous query to also include information generated by suggester. The modified query could look like the following one:

{
  "query" : {
    "match_all" : {}
  },
  "suggest" : {
    "check1" : {
      "text" : "crume",
      "term" : {
        "field" : "title"
      }
    }
  }
}

The query part is not important. In fact it has no influence on the suggest results so can be omitted. If we don’t need the query part it is better to change search type to “count” value, which tells ElasticSearch to not to worry about query results preparation and processing, which can save us some query execution time. In order to send our modified query and use the mentioned count search type, we would run a command like this:

$ curl 'localhost:9200/library/book/_search?pretty&search_type=count' -d @query.json

The suggest part in the query can contain any number of queries for suggestion. In our example there is only one: “check1″. Each of these queries contain text, which we would like to correct and a configuration of suggester (we will talk about it in just a second). We can also define a default text just below the “suggest” part, which allows us to avoid repeating this definition in every suggester query. For example:

  "suggest" : {
    "text" : "crume",
    "check1" : {      
      "term" : {
        "field" : "title"
      }
    }
  }

If we send this query to the server, in reply we can see additional part:

"suggest" : {
  "check1" : [ {
    "text" : "crume",
    "offset" : 0,
    "length" : 5,
    "options" : [ {
      "text" : "crime",
      "score" : 0.8,
      "freq" : 1 
     } ] 
   } ] 
 }

As we can see, ElasticSearch tried to correct word “crume”.  For our data, the proposed change is “crime”, which is exactly what we want to get.

Now let’s look deeper into various configuration types of the suggester API.

Term Suggester

The terms suggester allows to fetch suggestions for given word. The suggestions are based on edit distance so it is well suited for catching user typos. The configuration of this type of suggester includes:

  • field – the field from the index that should be scanned for possible suggestions
  • size – the maximum number of suggestion for a given word
  • sort – how to sort multiple suggestions for a given word:
    • score (default) – sorting is done by score of the given term
    • frequency - sorting is done by frequency of the term in the field
  • suggest_mode – by default ElasticSearch returns only suggestion if the word from the text field does not exist in the index. Thanks to this, we can suggest new words to user only when his input is probably not correct. The suggest_mode has the following possibilities:
    •  missing – already mentioned default behavior
    • popular – suggestion will be returned only, when the suggested word is more popular than the word entered by user (this means that suggested word is more frequent in the index)
    • always – ElasticSearch always returns suggestions, of course is at least one suggestion was was found.
  • max_edits – (default: 2). The maximum number of changes for a probable suggestions that can be applied in order for it to match the given word. Note that only value of 1 and 2 is supported right now.
  • min_prefix – (default: 1). The number of beginning characters that should be the same in the input and in the corrected form of the word. Change of this parameter can improve performance and is based on assumption that typos rarely occur on the beginning of the word.

0.90Beta2 brings the second suggester type: the phase suggester. But this is something for the next note.

5 thoughts on “ElasticSearch 0.90 – Using suggester

  1. Petar says:

    What about the Java API for this suggester?

  2. Bax says:

    TermSuggestionBuilder termSuggestionBuilder = new TermSuggestionBuilder(“name”);
    termSuggestionBuilder.text(“someTerm”);
    termSuggestionBuilder.field(“field name”);
    TermSuggestion termSuggestion = node.client().prepareSuggest().addSuggestion(termSuggestionBuilder).execute().actionGet().getSuggest().getSuggestion(“name”);

  3. Rafał Kuć says:

    Thanks a lot for the example Bax and Petar sorry for not responding, your comment got lost in TODO list :(

  4. Bax says:

    What’s the reason for specifying only the ‘field’ option ?
    A field can exist only in the context of a type, so one may want to specify neither or both (the type and the field).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>