Mastering ElasticSearch – Search analytics

Send to Kindle

search-analyticsThe Mastering ElasticSearch book that was published in December 2013 included a chapter dedicated to improving user search experience. However, one of the topics didn’t made it into the book. Because of that we wanted to share this section of the book on the blog. We hope that you’ll find it useful.

Introduction to search analytics

When dealing with query relevance tuning you should see how the users react to the changes you made to your queries. There are a few general things you should pay attention to and we will try to mention the most important ones, at least from our point of view. However, not matter if you are using a dedicated software for search analytics, your custom in-house made software or you are looking at what the users queries return by yourself (that can be hard actually) you should remember to introduce a single change at time and see how your users behaviour changes. This will allow you to see the good changes that you’ve made and also discard the ones that are not going into the right direction. Basically what search analytics can help you with is understanding your users, because sometimes your point of view can be completely different from what your users see and understanding user needs is one of the keys to success.

Things to look at

In general there are a few thing you should look at when looking at how well your queries represent what the user is actually looking for and how relevant the returned result are, at least from the users point of view. Those things are:

Click through rate

Click through rate also known as CTR, is a way to measure the initial response to your queries (CTR is mostly related to advertisements). By looking at the query and which document was actually chosen by the user you can see how well the query did its job in terms of relevancy and serving the content that users are looking for. Low query CTR usually means poor queries relevancy. You can read how CTR is calculated and what it is in the domain of advertisement on Wikipedia: http://en.wikipedia.org/wiki/Click-through_rate.

Paging

How many pages your users have to go on average to get to the searched document. We discuss it in greater details in The less paging the better (in most cases) section of this chapter.

Query latency

Maximum, minimum and average query latency. Although it is a performance factor, the query average and maximum query latency can show you how long an average user needs to wait for the search results to be displayed. If your users will have to wait too long for their results they can just cancel the request and go somewhere else.

Click position and MRR

The click position shows us which document user click on the search results for the query. The higher the document was on the results list, the better our query is. In the ideal world we would love our users to get the desired document as the first one on the results list. However, in most cases, this is not possible because of different things – like too broad queries, queries not being perfectly relevant, users searching for more than a single document and so on. This is exactly what Mean Reciprocal Rank (MRR) can help you with – it is a statistic that is used to evaluate your queries relevancy. In ideal world, that we’ve mentioned, you would have the MRR equal to 1 for all your users queries (which would mean that the document search by the users for all the queries was returned as the first one on the results list). You can see how it is calculated by looking at the following Wikipedia article: http://en.wikipedia.org/wiki/Mean_reciprocal_rank. Some of the mentioned analytics software products that are mentioned at the end of the chapter supports calculating MRR for your queries.

Top documents

The most frequent documents that users are clicking on. Knowing which documents are popular can help you understand what your users are seeking for and adjust you application or business for user needs. Also seeing what documents are the most popular one can help adjusting the query relevance by putting those results as the first one.

Top queries

The most frequent queries that users run. Knowing this can be valid when thinking about what results should be

Number of returned results

You should monitor the number of results returned by your queries in order to avoid empty pages. The reason for this is described in the Avoiding empty pages section of this chapter.

Small summary

The above mentioned statistics are important, but please remember that those are not all number that can be gathered. If you look at one of the analytics software mentioned at the end of the chapter you’ll see a dozen of different statistics, charts and analysis possibilities you can use in order to better understand your users.
Please remember that search analytics is a very broad topic and it is definitely out of the scope of the book. However when talking about query relevance tuning it is crucial to discuss search analytics, even in a very slight way.

Track your users

One of the general rules is looking on how your users behave in your application. By doing that you will be able to identify problems with your search or interface in general. You are the one that knows how your business works, what you sell or what your users are looking for. It all depends what you application do. If you run an e-commerce site, you want to see what happens after user enters a query, which document was clicked, if the item was added to the basket or not, if the basket was finalized and so on. If you have a photographs selling site, you can look at the most popular authors, similar tags that users tend to follow and so on – again it all depends on your use case. Even though each use case is slightly or totally different it all comes to a single point – monitoring what your users do and acting on the basis of this data is valuable.
Also remember that search relevance is not all that matters. You should remember about usability of your application. In many cases the problem is not search relevance, but the way the application is built – users get lost and they can’t find what they are looking for even though the documents are on top positions in the results.

Avoiding empty pages

One of the general thing that you want to avoid is showing user no search results at all. Think of about the following example – you run an e-commerce site, where you sell books. Your application logic doesn’t allow to show books that are not currently available and instead you show an empty page. Now imagine a user that searches for w new Dan Brown book and ends up with no search results at all, because the books is currently not available. What such user will do – it will go and search for the book on another site. Now imagine that we would show the results although saying that the book is not available at the moment, but the estimated delivery is about one week and the book can be ordered. Of course not all users would do that, but some would and you wouldn’t lost a customer.
The example we’ve shown is not the only reason for empty pages. Sometime you just don’t have the product or data user is searching for or maybe user made a spelling mistake or his query is not good. Try to help such users and avoid empty pages – try to use suggester to correct spelling mistakes – if you care about performance you can only do that for queries that return small number of results or even no results at all. However don’t go the other way around and don’t try to show anything only to avoid empty pages, that can also make the users turn away from you.

The less paging the better (in most cases)

Again, the fact that the less paging is done the better is a very general one. Such sentence is not true if your application is mostly about scrolling through results. For example you use case can be such that users enter a single word into the search box and narrow down the results using faceting or browse through the pages. However, in most cases, when your users need to page it’s not the best thing, because it can be a sign that the query results are not as good as they should be. Again, let’s take a simple example – our hypothetical e-commerce book library. If your user runs a query harry potter, you would like to show him the book with that title first probably with maybe some other that are currently promoted or highly searched by users. Of course you may want to show other books, like the ones with that phrase in the description of the book or user comments, but they should go after the initial, most relevant results. If the less relevant results will go first, the users will have to go through pages of results to find what they are looking for. Now you see what information paging can give you – in most cases it is one of the factors that will help you understand the quality of your queries and how the changes you make affect your users.

Autocomplete will help you

Although not connected directly to search analytics we wanted to mention about autocomplete. Look what your users do, how they behave, what is your click through, how deep you users go. If you don’t expect deep paging and your business doesn’t rely on it, but yet your users page, that can mean that your queries results are not satisfying your users. You did queries improvements and still, but it only helped a bit? Maybe you should start with introducing autocomplete to your application? With autocomplete you can guide your customers by displaying products suggestions or even keywords suggestion or even both at the same time. You can add thumbnails to suggestions to visualise the document you are suggesting. Some use cases report up to 10% of revenue increase after introducing autocomplete functionality to their applications.
However a thing to remember when implementing an autocomplete functionality is not to overdo it. You usually don’t need autocomplete on the body of the document, rather use its name for it. Make the matches highlighted so that users are aware what the suggestion you show is about. If you have products that can be shown – show the images in the autocomplete box.

Search analytics software

Finally at the end of the current chapter we wanted to mention some of the software that can help you with search analytics. Of course there are open source tools available as well as closed source and commercial solutions. We would like to briefly show you one from each of the mentioned groups, so you can see where to start if you want to avoid building your own analytics software.

Piwik

piwikOpen source search analytics software that you can download and install without the need of buying analytics software. It supports most of the thing you’ll want for the start and more. You are able to see visits over time, observe visitors in real time, see the keywords used for search and many, many more. It also supports tracking your users behaviour with the use of JavaScript tracking API. For more information please visit http://piwik.org/ and try the demo located there.

Google analytics

analystics

Google Analytics is an analytics software for your web application. It can track your where you users come from, the conversion rates, sources and social media sharing. Of course it allows us to use different reports, add charts to dashboards and so on. In order to use it, what you need to do is create an account and include a JavaScript tracking code in your web application. For more information please go to http://www.google.com/analytics/.

Site search analytics from Sematext Group Inc.

sematext-sa
Search Analytics software available from Sematext Group Inc. is a cloud based solution that allows you to get insight on the needed statistics that are tightly coupled with search and search experience. After creating an account and adding a tracking JavaScript code to your application you are able to observe the rate and volume of your queries, the click through rate (on which document user clicks on average), paging (how deep your users page in your application), mean reciprocal rank (http://en.wikipedia.org/wiki/Mean_reciprocal_rank), the average terms count for your queries. In addition to that Search Analytics support top documents statistics, top queries, sorting and so on. In order to use it one just needs to create an account at http://sematext.com/search-analytics/index.html or try a live demo.

Leave a Reply