Solr Field Properties - Syntax and Examples | Lucene and Solr Reference

Solr Field Properties and Their Defaults

Beginner

The following reference is intended for developers evaluating Apache Solr for enterprise search or website search applications. Both Apache Solr and Elasticsearch use Lucene libraries for custom search.

Apache Solr Reference

1. About Solr Field Properties

Solr schema refers to a configuration file that instructs Solr how to index documents, plus which Fields to display in search results. Documents may contain structured data as you might find in a database like an online store, or unstructured data as used in full text search applications like search engines.

The Solr schema is formatted in the file named managed-schema when the user elects to make modifications using the Solr Schema API, or schema.xml for more advanced users who modify the schema by hand.

Fields in Solr are related to the documents themselves and the information being searched for. Each Field is assigned a Field Type which provides rules for how Fields of that type should be processed during indexing and search.

The version="1.6" attribute in the schema dictates default values for each Field Type class, with 1.6 being the schema version for Solr version 7. These may be overridden at the Field level.

The easiest way to think about defaults is that each Field Type class dictates the default values. These defaults are listed in the tables below, but they can be overridden at the Field Type level or the Field level.

2. Syntax for Solr Fields and Field Types

An example for both Field Types and Fields might look like this (including the XML and schema tags).

<?xml version="1.0" encoding="UTF-8"?> <schema name="default-config" version="1.6"> ... <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true"/> ... <field name="title" type="text_general" indexed="true" stored="true"/> ... </schema>

The schema file is typically hundreds of lines long, and above is a snippet first of a Field Type that processes Fields that are given the type="text_general" name. In this case, the title Field pulled from the indexed document is assigned this Field Type.

Where you see indexed="true" and stored="true" in the Field tag, these are examples of Field properties. They dictate whether information is being stored in the index and whether it can be accessed during a search.

The table below provides a list and description of 19 properties that can be included in the Field tag and will override defaults for that Field Type.

3. Options for the Different Field Properties in Apache Solr

Below are 19 Field properties provided by Solr with defaults. The first table represents the most commonly used properties for beginners. All properties are entered as either true or false, except the name and type field definitions.

Most common Field properties

The list of 8 common Field properties relate to whether Fields are stored and can be retrieved during search. Also, similar to a database, whether they are required and can have multiple values.

Remember, part of the goal is to minimize the size of an index, and these settings allow you to customize your index and turn on the features you need.

field Property	Description	Default
`name` (required)	The name for the field.	--
`type` (required)	Points to a FieldType within the same schema that controls behaviors for all fields of that type.	true
`default`	The field will be populated with the value (default="value") if no data is supplied at index time.	none
`indexed`	Only when true is selected can the Field be searched or sorted in queries to retrieve matching documents.	true
`stored`	Only when true is selected can the Field be retrieved in queries.	true
`required`	When true Solr will not add documents to the index where a value in this Field is missing. This is common for id Fields and structured data.	false
`multiValued`	When set to true then a document may have multiple values of this Field or Field Type. Similar to a one-to-many relationship in a database.	false
`docValues`	When true the value in a Field will be added to an additional structure called DocValues that is helpful for retrieving information that will be used to sort, highlight terms or provide facets (groupings). A standard inverted index is not ideally suited for this type of operation, so DocValues adds columns to the index. This adds to the size and complexity of the index, so if you are not sorting, highlighting and faceting, then the setting should be false. docValues are only available for some Field Types.	false

The name field should use the convention of starting with a letter. Those with leading and trailing underscores are reserved for those like _version_, _text_ and _root_ which are four pre-declared fields in the _default configset along with id.

Field properties for more advanced implementations

The following table of 11 properties relates to finer points of index construction and will impact the size of the index and its ability to find and rank documents during search.

field Property	Description	Default
`sortMissingFirst`	Documents are sorted on a specified Field, when none is provided and true is specified, then those with missing data in the specified Field show up first when sorted. This works for string, boolean, date and numeric data types only.	false
`sortMissingLast`	Documents are sorted on a specified Field, when none is provided and true is specified, then those with missing data in the specified Field show up last when sorted. This works for string, boolean, date and numeric data types only.	false
`omitNorms`	When true it disables length normalization for text Fields. Defaults to true for non-analyzed Field Types such as BinaryField, BoolField, IntPointField and StrField, and false for text fields.	true
`omitTermFreqAndPositions`	When text fields are tokenized, tokens include information on the frequency, position and payloads which are used in document ranking. It defaults to true for non-text fields and false for text fields.	true
`omitPositions`	Omits the position information from tokens.	true
`termVectors`	Maintains locations of tokens in documents, helpful for MoreLikeThis where document similarity is required.	false
`termPositions`	Maintains position information for tokens in documents.	false
`termOffsets`	Maintains offset information for advanced Field parsing.	false
`termPayloads`	Maintains information for document scoring.	false
`useDocValuesAsStored`	If the Field has stored="false" and this Field set to true would allow for the Field to be returned with "*" in the fl search parameter. Defaults to true.	true
`large`	If stored="true" and multiValued="false" then this can be used to adjust whether large Fields are cached or not, thus improving performance.	false

4. Examples of Common Solr Field Properties

Example 1 - Set up a Field as a unique key

In this case, a Field is given two required properties that make it suitable as a unique key.

Example 2 - Create an indexed and searchable Field

In this case, a Field is included in the index, and searches can be performed within the Field.

Example 3 - A Field that can be retrieved in queries

In this case, we set up a Field that can be returned in queries.

Alternatively, if you are using docValues you could use docValues="true".

Example 4 - A Field used to perform sorting

In the following case a Field can be used to sort documents.

It is advised to use docValues="true" for integer and floating point Field types and use omitNorms.

Example 5 - To perform highlighting on a Field

In the following case Fields can be returned with highlighting.

Here a tokenizer must be used for the Field. Also, termVectors is not required, but must be set to true for termPositions to be used.

Example 6 - To perform Field faceting

In the following case a Field can be used for Field faceting.

It is advised to use docValues="true" for faceting but not required.

Solr Field Properties : Syntax, Options and Examples