The following data is intended for developers evaluating Apache Solr for enterprise search or website search applications. Solr, much like Elasticsearch, uses Lucene libraries for custom search.
Solr schema refers to a configuration file that instructs Solr how to index documents. Documents may contain structured data like you might find in a database like an online store, or unstructured data as used in full text search applications like search engines.
The Solr schema is formatted in the file called
managed-schema when the user elects to
make modifications using the Solr Schema API, or
schema.xml for more advanced users
who modify the schema by hand.
Fields in Solr are related to the documents themselves and the information being searched for. Each Field is assigned a Field Type which provides rules for how Fields of that type should be processed.
An example for both Field Types and Fields might look like this (including the XML and schema tags).
The schema file is typically hundreds of lines long, and above is a snippet first of a simple Field Type that processes Fields that are given the type="text_general" name. In this case, the title Field pulled from the indexed document is assigned this Field Type.
Where you see name="text_general" and class="solr.TextField" in the fieldType tag, these are examples of Field Type properties. In fact, these are the two required properties for any Field Type.
The table below provides a list and description of all 7 general properties that can be included in the fieldType tag.
Below is a list of 7 Field Type properties provided by Solr.
||Each Field points to a Field Type using the type="name" attribute. The Field Type is then used to process text in that Field according to the rules of the Field Type during both indexing and querying.|
||The class points to the code that processes data for that
Field Type. It is located at
||If a document can have mutiple values for a Field Type then it is considered multiValued. This property sets the distance between multiple values and is helpful for fine-tuning phrase (multiple word) settings so searches don't provide false phrase matches.|
||For text fields, true will automatically create phrases for adjacent terms. For example, creating a token for "new york" when those two words are adjacent instead of only "new" and "york". With false all phrases must be enclosed in double quotes during indexing or searching.|
||The default of true when filters are graph-aware, like the Synonym Graph Filter. Use false for filters that match documents when tokens are missing, like the Shingle Filter.|
||An advanced setting when a custom docValues format is established in the solrconfig.xml.|
||And advanced setting for custom postingFormat as set up in the solrconfig.xml.|
The following is an example of name and class set up for the binary Field Type.
The following is an example of a positionIncrementGap property for a text field.
The following is an example of an autoGeneratePhraseQueries property for a text field.
FactorPad offers Apache Solr Search content in both tutorials and reference.
See what other developers like yourself are learning at our YouTube Channel. Subscribe here.