Precision and recall is something which comes to our mind first when we talk of information retrieval.
Whenever we develop an IR engine or tune the existing engine, we are interested to know how good our search result is or how is the improvement. This is where precision and recall comes into play.
Whenever we query the IR system, we generally retrieve the x result out of the y relevant results from the total documents z in corpus. Out of these x retrieved documents some a will be relevant.
Precision can be defined as a/x and recall is a/y.
Hence, we can define precision as the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved.
For example, the index has 20 documents for music and 10 for movies. A query for some music returns 10 document which as 5 music and 5 movies. Hence, the precision is 5/10= 1/2 i.e. 50% and recall is 5/20= 1/4 i.e. 25%for the query.
In a nutshell, we can say that precision is a measure of quality while recall is a measure of quantity. So, high recall means that an algorithm returned most of the relevant results and high precision means that an algorithm returned more relevant results than irrelevant.
Precision = relevant (intersect) retrieved / retrieved
Recall = relevant (intersect) retrieved / relevant
In the next blog, we will try to dive deeper into the concept.
In this post, we are not trying to compare Solr and Lucene, as both are not different technologies, but we are trying to identify when to use which. I would recommend that in 90% of the cases, or even more, Solr would be the preferred choice, as it’s nothing Serverization of Lucene. Below are the list of additional features which solr provides, on top of Lucene:
- – Processing request over http
- – Caching mechanism
- – Admin interface
- – Configuration in xm file, with notion of fieldType
- – DisMax query
- – Spell check & suggest
- – More like this
- – Distributed & cloud features
- – DataImportHandler & other handlers for extracting data