Precisely, what is precision and recall?

Precision and recall is something which comes to our mind first when we talk of information retrieval.

Whenever we develop an IR engine or tune the existing engine, we are interested to know how good our search result is or how is the improvement. This is where precision and recall comes into play.

Whenever we query the IR system, we generally retrieve the x result out of the relevant results from the total documents z in corpus. Out of these x retrieved documents some a will be relevant.

Precision can be defined as a/x and recall is a/y.

Hence, we can define precision as the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved.

For example, the index has 20 documents for music and 10 for movies.  A query for some music returns 10 document which as 5 music and 5 movies. Hence, the precision is 5/10= 1/2 i.e. 50% and recall is 5/20= 1/4 i.e. 25%for the query.

In a nutshell, we can say that precision is a measure of quality while recall is a measure of quantity. So, high recall means that an algorithm returned most of the relevant results and high precision means that an algorithm returned more relevant results than irrelevant.

Precision = relevant (intersect) retrieved / retrieved
Recall = relevant (intersect) retrieved / relevant

 

In the next blog, we will try to dive deeper into the concept.

 

Advertisements

Solr vs Lucene

In this post, we are not trying to compare Solr and Lucene, as both are not different technologies, but we are trying to identify when to use which. I would recommend that in 90% of the cases, or even more, Solr would be the preferred choice, as it’s nothing Serverization of Lucene. Below are the list of additional features which solr provides, on top of Lucene:

  • – Processing request over http
  • – Caching mechanism
  • – Admin interface
  • – Configuration in xm file, with notion of fieldType
  • – DisMax query
  • – Spell check & suggest
  • – More like this
  • – Distributed & cloud features
  • – DataImportHandler & other handlers for extracting data
Above features makes it the preferred choice. Now comes the question when should you use Lucene. It most of the cases you would not. But if the memory available is limited like in cases of mobile devices or you need to write lot of low level code, tuning/adding your own logic, Lucene would be your choice.