Index Schema configuration

The schema is the foundation of your index. It defines the structure of your data and the fields that can be searched. On Searchcraft Cloud you will create and manage the schema via the Vektron UI. If you are using the self-hosted version of Searchcraft you will need to create the schema via the REST API.

Index Properties

name - The name of the index. Should be url friedly (i.e. no spaces or special characters).
fields - Object containing the field definitions. Field objects may contain the properties listed below.
search_fields - An array of default fields to search against when a specific field is not specified. Should match what is included in weight_multipliers. These must be text fields for the index to make use of fuzzy matching and typo-tolerance.
weight_multipliers - A map of field names to weight multipliers. The weight gives more or less importance to specific fields when running a search query. Using a number greater than 1.0 gives more importance to a field and less than 1.0 reduces the importance of a field. The baseline is 1.0. More information on weight multipliers can be found in the weight multipliers guide.
language - The two letter language code for the index. This is used for language specific stemming and stop word filtering.
enable_language_stemming - (optional) true|false boolean setting. Whether or not to enable a language specific stemming algorithm. Requires that you have a language code set for the index and the language is supported.
exclude_stop_words - (optional) true|false boolean setting. Whether to strip stop words when performing a search. On Searchcraft Cloud this is enabled by default. For self-hosted you can enable this by setting exclude_stop_words: true in your schema. If you enabled and don’t have a language code specified it will load the en dictionary by default.
auto_commit_delay - (optional) integer value. The number of seconds to wait since last receiving a ingestion request before automatically committing a batch of documents. The Searchcraft API will wait this amount of time and if has not received another ingestion request it will commit the batch. This is useful if you don’t want to explicitly use the commit endpoint after succesfully POSTing documents. However, it does cede a level of control.
time_decay_field - (optional) string value. Setting this configuration option enables a exponential temporal decay function on document relevancy scoring. This must match the name of a date field that exists in your schema, typically a date_published field. This field will be used to calculate the number of days since publish used in the decay factor for time decay function. Note, this chosen field must be marked as fast and indexed in your schema.

Field Properties

type - The field type.
required - A required field. If a document is missing a required field the request will get rejected.
indexed - Whether the field should be indexed. If you are not searching on this field you can disable indexing to save space. For fields like text fields you can still do normal mode exact matches against the field using fieldname:value without the need to index the field you just can’t do things stuch as range queries. One reason to use non-indexed fields is have them for search result display purposes but not have them impact relevancy. This is common for fields that contain data like page urls, image urls, etc.
stored - Whether or not the value is returned in search result documents or just used during ingestion.
fast - Whether or not the field should be used for fast scoring. Fast fields in Searchcraft are a way to store and retrieve structured data efficiently, especially for filtering and aggregations. Think of them as an optimized way to store numeric or categorical values so they can be accessed quickly without scanning the entire index.

For example, if you have an e-commerce book store site with fields release_date, rating, and author_id, you might want to:
1. Filtering: “Find books published after 2015.” or “Find all books written by a specific author.”” Since publication_year and author_id are fast fields, Searchcraft can efficiently scan it without checking every document.
2. Sorting:
“Show top-rated books first.” With rating as a fast field, sorting is much faster because Searchcraft doesn’t need to extract the value from each document dynamically.

Fast fields store this data in a columnar format (like a spreadsheet where each column is optimized for fast lookups) instead of keeping it mixed with the full text. This makes queries like filtering and sorting much faster compared to searching through the raw text index. Not used on text fields or facet fields. For those familiar with Lucene this is similar to doc_values. If you want to be able to perform range or comparison queries this needs to be enabled. For number fields you typically always want this enabled unless you know what this does.
multi - Whether or not the field is a multi-valued field. In the document JSON the expected value is an array of values, eg, "tags": ["tag1", "tag2"].

See the Field Types page for details on which properties are available for each field type.

Field types

text - A text field.
datetime - A datetime field. Example: "2024-07-16T00:25:39Z". Could also be a unix timestamp.
bool - A boolean field. true or false without quotes.
f64 - A 64-bit floating point field. Example: 3.4
u64 - A 64-bit unsigned integer field. Example: 9
facet - A facet field. Expects a format of “/section/subsection”. Think of this as a taxonomical category that can be walked down.

These are detailed in depth on the Field Types page.

API Endpoints

See the REST API documentation for more information on the API endpoints.