Solr Atomic Update

One of the promising features of Solr 4.0 is atomic updates. With the previous releases of Solr, to update a document, you are supposed to send all the fields, even those that have not changed. If you provide only the fields that has changed, the values of other fields will be lost. What does it behave so? It’s because Lucene deletes the document and then adds it.

There are many a times, you form Lucene index by reading data from different sources or from different tables in DB. Forming a complete Solr document is a costly operation in many a case, say you are forming a Solr document from different graph DBs. Solr 4.0 sets you free! Just add the field (along with few additional parameters) that is to be updated along with the unique field and you are done. Internally Solr queries the document based on uniqueId, modifies it and  adds it back to the index. But it sets you free from doing the same in your client application.

To update a field, add update attribute to the field tag, with set as the value. Here is an example:

<add>
<doc>
<field name=”id”>1</field>
<field name=”profession” update=”set”>consultant</field>

</doc>

</add>

This works really well for single valued field. But if you want to update the value for a multi-valued field, better send the whole document as you can make it work only with a work around as of now.

I tried setting the values of multi-valued field and this is what happened.

<add>
<doc>
<field name=”id”>1</field>
<field name=”skills” update=”set”>mac</field>
<field name=”skills” update=”set”>linux</field>
</doc>
</add>

The document added to Solr was as follows:

<doc>
<str name=”id”>1</str>
<arr name=”skills”>
<str>{set=mac}</str>
<str>{set=linux}</str>
</arr>
</doc>

This is really something which I was not looking for.

I happened to find a work around to achieve proper add of value to multi-valued field. Trick is update any other field with same value (you are sure of) or have a dummy field and update it will null value. Also pass the values for multi-valued fields the way you do will adding new document. Here is an example:

<add>
<doc>
<field name=”id”>1</field>
<field name=”profession” update=”set”>consultant</field>
<field name=”skills”>java</field>
<field name=”skills”>J2EE</field>
<field name=”skills”>Unix</field>
</doc>

</add>

With this I was able to achieve what I wanted.

Yeah, if you want to add value to existing set of values in multi-valued field, this is simple.

<add>
<doc>
<field name=”id”>1</field>
<field name=”skills” update=”add”>windows</field>
</doc>
</add>

If you want to reset/remove the value of a field in document, pass additional parameter null=true as follows:

<field name=”name” update=”set” null=”true”></field>

Advertisements