Precisely, what is precision and recall?

Precision and recall is something which comes to our mind first when we talk of information retrieval.

Whenever we develop an IR engine or tune the existing engine, we are interested to know how good our search result is or how is the improvement. This is where precision and recall comes into play.

Whenever we query the IR system, we generally retrieve the x result out of the relevant results from the total documents z in corpus. Out of these x retrieved documents some a will be relevant.

Precision can be defined as a/x and recall is a/y.

Hence, we can define precision as the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved.

For example, the index has 20 documents for music and 10 for movies.  A query for some music returns 10 document which as 5 music and 5 movies. Hence, the precision is 5/10= 1/2 i.e. 50% and recall is 5/20= 1/4 i.e. 25%for the query.

In a nutshell, we can say that precision is a measure of quality while recall is a measure of quantity. So, high recall means that an algorithm returned most of the relevant results and high precision means that an algorithm returned more relevant results than irrelevant.

Precision = relevant (intersect) retrieved / retrieved
Recall = relevant (intersect) retrieved / relevant

 

In the next blog, we will try to dive deeper into the concept.