Skip to content

Content Ingestion Overview

Content ingestion is how your content gets into Searchcraft. Content may be ingested via either a push or a pull mechanism. Typically the model with search engines is to utilize a push mechanism as this ensures your index is immediately up to date. For applications that are updated via a CMS, a hook event tied to the publish event is typically what triggers a content push into your search index.

Prerequisites

Your index schema must be configured before you can ingest content.

Pushing content into your search index

You may use one of our pre-built integrations, upload directly via Vektron, or use the REST API directly.

To upload via the REST API, assuming your index schema is defined as such

{
"override_if_exists": false,
"index": {
"auto_commit_delay": 1,
"name": "my_first_index",
"language": "en",
"search_fields": [
"title",
"body"
],
"fields": {
"id": {
"type": "text",
"required": true,
"stored": true,
"indexed": false
},
"title": {
"type": "text",
"required": true,
"stored": true
},
"body": {
"type": "text",
"required": true,
"stored": true
}
},
"weight_multipliers": {
"title": 2,
"body": 0.7
}
}
}

this would be a sample payload to a document into your index.

Terminal window
curl -X POST -H "Content-Type: application/json" -H "Authorization: your-ingest-key" --data '[{"id": "1", "title": "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", "body": "Maecenas sed mauris commodo ligula porttitor euismod a vitae nunc. Nam placerat consequat arcu, ut consectetur nisi feugiat eget. Nam in tellus vel ligula cursus sollicitudin non id ex. Praesent sollicitudin ultrices tempor."}]' https://your-sc-server.search.searchcraft.io/index/1_my_first_index/documents

if auto_commit is set to 1, the document will be committed to the index immediately otherwise you will need to follow up with a commit request to write the document to the index.

Terminal window
curl -X POST -H "Content-Type: application/json" -H "Authorization: your-ingest-key" https://your-sc-server.search.searchcraft.io/index/1_my_first_index/commit

Pulling content via a scheduled crawl

Searchcraft also offers a way to map a data feed to your index and configure a crawling schedule for content to automatically get added. This capability may be configured in your application settings within Vektron.

Recommendations

Document Size

Ideally you should remove any unnecessary fields from your documents. The only fields that you need to store in Searchcraft are those that may be used for search terms, search filters or for display within a search result item. The more fields you have in your documents, the more data needs to be transferred and processed. This can slow down your search experience.

Ingestion Payloads

Ingestion can be a heavy process. For initial population of your index it is recommended to send large batches of content in a single request rather than many small requests. Ideally ingest JSON payloads should be kept to under 150MB in size for Searchcraft Cloud customers. For self-hosted customers the limit is depending on the amount of RAM available on the server.