Stemming vs Lemmatization

How is stemming and lemmatization different?

Stemming work on single word without knowledge of the subject. Stemmers are easier to implement and faster to run.

Lemma of a word changes with context and hence are difficult to implement. ‘Running’ has ‘run’ as it’s lemma as well as stem. ‘Better’ has ‘good’ as it’s lemma but not stem.

Solr vs Lucene

In this post, we are not trying to compare Solr and Lucene, as both are not different technologies, but we are trying to identify when to use which. I would recommend that in 90% of the cases, or even more, Solr would be the preferred choice, as it’s nothing Serverization of Lucene. Below are the list of additional features which solr provides, on top of Lucene:

  • – Processing request over http
  • – Caching mechanism
  • – Admin interface
  • – Configuration in xm file, with notion of fieldType
  • – DisMax query
  • – Spell check & suggest
  • – More like this
  • – Distributed & cloud features
  • – DataImportHandler & other handlers for extracting data
Above features makes it the preferred choice. Now comes the question when should you use Lucene. It most of the cases you would not. But if the memory available is limited like in cases of mobile devices or you need to write lot of low level code, tuning/adding your own logic, Lucene would be your choice.