Index XML documents to Solr

The two primary operations on Solr are indexing and searching. When it comes to indexing in Solr, documents can be indexed using different sources like DB, XMLS, CSV etc. In this blog, we are going to focus on indexing XMLS. The XML can be indexed to Solr as follows:

  1. Over HTTP:To index document to Solr, xml should be created in following format:
    <add>
     <doc>
       <field name="field1">value to be indexed</field>
       <field name="field2">value to be indexed</field>
     </doc>
     <doc>
       <field name="field1">value to be indexed</field>
       <field name="field2">value to be indexed</field>
     </doc>
    </add>

Note: field1 & field2 should correspond to field name in schema.xml. Ensure that the value for required field is present in the XML.

The document can be indexed using GET or POST method. Use GET only when adding few documents.

If the document is being added using GET method, index the documents as follows:

http://localhost:8080/solr/umdb_mapping/update?stream.body=<add><doc><field name=”song”>Love the way you cry</field><field name=”album”>Rihanna</field></doc></add>

If POSTing the document using curl, do it as follows:

curl http://<host&gt;:<port>/solr/<core-if-applicable>/update?commit=true -H “Content-Type: text/xml” –data-binary ‘<add><doc><field name=”song”>Love the way you cry</field><field name=”album”>Rihanna</field></doc></add>’

If the XML document is in a file (assuming the file is in the same directory) instead of stream, pass the file-name as follows:

curl http://<host&gt;:<port>/solr/<core-if-applicable>/update?commit=true -H “Content-Type: text/xml” –data-binary ‘@solr_xml_sample.txt’

The documents added are not committed by default, this has to be done by either making separate commit request like ‘<commit/>’  or add query should contain parameter commit=true. 

Ever wondered how to add value for a multiValued Solr field. Well, it’s no new tag need to be known. All you need to do is have multiple field tag with that name. It has to be something as follows:

<add>
 <doc>
   <field name="field1">value to be indexed</field>
   <field name="field2">value to be indexed</field>
   <field name="field2">value to be indexed2</field>
 </doc>
</add>

2. Using DataImport: Will be taking this up soon..

Please refer to Solr Wiki URL for details of other optional attributes and other XML operations like update & delete.