How is stemming and lemmatization different?
Stemming work on single word without knowledge of the subject. Stemmers are easier to implement and faster to run.
Lemma of a word changes with context and hence are difficult to implement. ‘Running’ has ‘run’ as it’s lemma as well as stem. ‘Better’ has ‘good’ as it’s lemma but not stem.
In this post, we are not trying to compare Solr and Lucene, as both are not different technologies, but we are trying to identify when to use which. I would recommend that in 90% of the cases, or even more, Solr would be the preferred choice, as it’s nothing Serverization of Lucene. Below are the list of additional features which solr provides, on top of Lucene:
- – Processing request over http
- – Caching mechanism
- – Admin interface
- – Configuration in xm file, with notion of fieldType
- – DisMax query
- – Spell check & suggest
- – More like this
- – Distributed & cloud features
- – DataImportHandler & other handlers for extracting data