Index Schema configuration
The schema is the foundation of your index. It defines the structure of your data and the fields that can be searched. On Searchcraft Cloud you will create and manage the schema via the Vektron UI. If you are using the self-hosted version of Searchcraft you will need to create the schema via the REST API.
Index Properties
name
- The name of the index. Should be url friedly (i.e. no spaces or special characters).fields
- Object containing the field definitions. Field objects may contain the properties listed below.search_fields
- An array of default fields to search against when a specific field is not specified. Should match what is included inweight_multipliers
. These must be text fields for the index to make use of fuzzy matching and typo-tolerance.weight_multipliers
- A map of field names to weight multipliers. The weight gives more or less importance to specific fields when running a search query. Using a number greater than 1.0 gives more importance to a field and less than 1.0 reduces the importance of a field. The baseline is 1.0. More information on weight multipliers can be found in the weight multipliers guide.language
- The two letter language code for the index. This is used for language specific stemming and stop word filtering.enable_language_stemming
- (optional)true|false
boolean setting. Whether or not to enable a language specific stemming algorithm. Requires that you have a language code set for the index and the language is supported.exclude_stop_words
- (optional)true|false
boolean setting. Whether to strip stop words when performing a search. On Searchcraft Cloud this is enabled by default. For self-hosted you can enable this by settingexclude_stop_words: true
in your schema. If you enabled and don’t have a language code specified it will load theen
dictionary by default.auto_commit_delay
- (optional) integer value. The number of seconds to wait since last receiving a ingestion request before automatically committing a batch of documents. The Searchcraft API will wait this amount of time and if has not received another ingestion request it will commit the batch. This is useful if you don’t want to explicitly use thecommit
endpoint after succesfully POSTing documents however you will it does cede a level of control.time_decay_field
- (optional) string value. Setting this configuration option enables a exponential temporal decay function on document relevancy scoring. This must match the name of a date field that exists in your schema, typically adate_published
field. This field will be used to calculate the number of days since publish used in the decay factor for time decay function. Note, this chosen field must be marked asfast
andindexed
in your schema.
Field Properties
type
- The field type.required
- A required field. If a document is missing a required field the request will get rejected.indexed
- Whether the field should be indexed. If you are not searching on this field you can disable indexing to save space. For fields like text fields you can still do normal mode exact matches against the field usingfieldname:value
without the need to index the field you just can’t do things stuch as range queries. One reason to use non-indexed fields is have them for search result display purposes but not have them impact relevancy. This is common for fields that contain data like page urls, image urls, etc.stored
- Whether or not the value is returned in search result documents or just used during ingestion.fast
- Whether or not the field should be used for fast scoring. Not used on text fields or facet fields. There is a trade-off off speed vs. accuracy when using fast fields. For those familiar with Lucene this is similar todoc_values
. For number fields you typically always want this enabled unless you know what this does.multi
- Whether or not the field is a multi-valued field. In the document JSON the expected value is an array of values, eg,"tags": ["tag1", "tag2"]
.
See the Field Types page for details on which properties are available for each field type.
Field types
text
- A text field.datetime
- A datetime field. Example:"2024-07-16T00:25:39Z"
. Could also be a unix timestamp.bool
- A boolean field.true
orfalse
without quotes.f64
- A 64-bit floating point field. Example:3.4
u64
- A 64-bit unsigned integer field. Example:9
facet
- A facet field. Expects a format of “/section/subsection”. Think of this as a taxonomical category that can be walked down.
These are detailed in depth on the Field Types page.
API Endpoints
See the REST API documentation for more information on the API endpoints.