/ factorpad.com / tech / solr / reference / solr-post.html
An ad-free and cookie-free website.
Beginner
Many people use a custom search application like Apache Solr to index documents in an enterprise search capacity. Here, generally, documents are stored privately and internal security topics are less of a focus because the end-users are typically trusted employees. The post tool, in this case however, can be set up for use only by specific users with a password.
In public-facing custom search applications, security becomes a much greater issue for website search tools. Here, many developers have relied on Google Custom Search or Google Site Search to simplify the process. Whether you are developing a custom search application using a managed search offering or with Apache Solr or Elasticsearch, in a production environment file directory locations will change from what is presented here. This illustration describes locations in a test environment.
Regardless of your application, adding documents to the Solr index is a very important and sometimes complicated step. Many firms prefer to develop a custom posting method provided by other means, including: Solrj Client APIs, custom Data Import Handlers, Solr Cell, Apache Tika and Apache Nutch.
Here we cover the basic operations of the
bin/post
script. It is also
referred to as the solr post tool, solr post script, solr post
command and sometimes just solr index.
The bin/post
command allows you to
post one document, directories of documents and perform a basic web
crawl.
The bin/post
command is a Linux shell
script that calls on a Java Archive file JAR called
the SimplePostTool. It will work on Linux and
macOS systems, but will not run on Windows
without pointing to a different location (see below).
The bin/post
command itself has 16
options.
The syntax for running bin/post
is
as follows.
This syntax assumes your current working directory is
the $SOLR_HOME directory for Solr, which for version 7
would be ~/solr-7.0.0/
in
standalone mode for a local installation. When running in a production
environment the directory locations may differ.
So the path to the location of the solr post script is:
~/solr-7.0.0/bin/post
. Alternatively,
the solr
script can be run using the
convention ./post
from within
the bin
directory.
If Solr Windows is your preferred environment for custom search, the
solr post script is run by pointing to the JAR file.
One can be found at
example\exampledocs\post.jar
. To find
help on this use
java -jar example\exampledocs\post.jar
-help
from the installation directory. Please see the
documentation for Windows as the rest of this page will refer to usage
in Linux-type environments.
The bin/post
command can take 16
options (parameters). Required fields include the core or collection
name with the -c
option and the
location of the files to post.
Option | Purpose | Default |
---|---|---|
-c <name> |
The name of the core or collection. | None, but required. |
<files|directories|urls
["...",...]> |
The location of the files to be indexed. | None, but required. |
-d <["...",...]> |
Files to be deleted from the index. | None |
-url <update URL> |
To point to a different location for the core or collection. | None |
-host <host> |
To point to a different hostname. | localhost |
-p <port> or -post <port> |
To point to a different port number. | 8983 |
-commit yes|no |
To commit the post. | yes |
-u <user:password> or -user
<user:password> |
To post when user credentials are required. | None |
-recursive <depth> |
The number of directories below to find documents for web crawls. | 1 |
-delay <seconds> |
The delay in seconds between each http request for web crawls. | 10 |
-delay <seconds> |
To delay in seconds between each post when crawling directories. | 0 |
-type <content/type> |
The type of document to be indexed when documents posted as standard input. | application/xml |
-filetypes <[,<type>,...] |
The types of documents to be indexed. | xml, json, jsonl, csv, pdf, doc, docx, ppt, pptx, xls, xlsx, odt, odp, ods, ott, otp, ots, rtf, htm, html, txt, log |
-params
"<key>=<value>[&<key>=<value>...]" |
To post documents to the update request using URL-encoding. | None |
-out yes|no |
To report results of the post to the console. | no |
-format solr |
To send application/json content to /update instead of /update/json/docs. | None |
The following command posts one JSON file in the
example/films
directory to the films
core and exits.
The following command posts one CSV file in the
example/films
directory to the films
core, prints a response to the console and exits.
The following command posts one XML file in the
example/films
directory to the films
core, on the non-default port of 8984 and exits.
The following command posts a directory of files with mixed formats in
the example/exampledocs
directory to
the techproducts collection and exits.
The following command posts HTML files in a remote
example.com/docs
directory and two
directories below with a pause of 20 seconds between each request,
while obeying directory permissions in the website's
robots.txt file.
Any of the following three options prints help information and usage
for the post tool, similar to a man page:
bin/post -h
, or
bin/post -help
, or
bin/post -usage
.
FactorPad offers Apache Solr Search content in both tutorials and reference.
Check out our YouTube Channel for more free opportunities to learn. Follow @factorpad on Twitter and our email list for updates to new content.
/ factorpad.com / tech / solr / reference / solr-post.html
A newly-updated free resource. Connect and refer a friend today.