One of the most common question I have been asked recently is what are the new features in Solr 4.0. Well, there are lot many posts on the web that provides the information. Solr wiki also explains it quite well.
I will try to throw some light on the features that are really cool and what are the practical usages of the feautes and which of them we are trying to leverage in our current projects. Also, I will define the feature in brief to set the ground.
- Pseudo-fields: It provides ability to alias a field, or add metadata along with returned documents. We are using it to return the confidence score of the matched document
- New Spell Checker implementation: This will not require a new index to be created and will work on the main index. Hence, no extra index need to be maintained for spellchecker to work
- Enhancements has been done to the function queries. Conditional function query is allowed. We had a scenario where we were boosting the document on the basis of download count. There were some documents for which download count was not available. They were badly effected. Now conditional boosting can be done or only document with more than specified download count will be boosted
- Atomic update: Provide flexibility to update only the fields of the document that has been modified. Prior version required us to send the complete document even in single field has been modified. Note: Internally, it’s still implemented as delete and add and is not DB like update
- New relevance ranking models like BM25, Language Models etc has been introduced. Analysis need to be done to check if some other model works better than current VSM
- Indexed terms are no longer UTF-16 char sequences, instead terms can be any binary value encoded as byte arrays. By default, text terms are now encoded as UTF-8 bytes
- A transaction log ensures that even uncommitted documents are never lost
- FuzzyQueries are 100x faster
- Solr Cloud. I will refrain from using this until 4.1 releases
- NoSql features: As of now, I will prefer to use Solr for search. For NoSql i will prefer to stick with my NoSql DB of my choice