Searchcraft Core
The Searchcraft Core is the stand alone back-end core API without the Searchcraft Cloud Vektron UI and associated micro-services. It allows developers to run the API locally or for companies to run Searchcraft Core as part of their on-premise infrastructure. For production use, contact Searchcraft for licensing and support agreements.
Running via Docker
Section titled “Running via Docker”Download either the latest image tag or a specific version from Docker Hub.
docker pull searchcraftinc/searchcraft-core:latestdocker run --name searchcraft -p 8000:8000 searchcraftinc/searchcraft-core:latestAdd -it -d to run it in detached mode.
The image is a multi-architecture image for arm64 and amd64. When you docker pull on the destination system the approriate plaform will be selected. Full environmental variable configuration options are listed in the documentation. The default port is 8000 but you can change that any any other setting by passing environmental variables to the container.
docker run --name searchcraft -p 80:80 -e SEARCHCRAFT_PORT=80 searchcraftinc/searchcraft-core:latestRunning the CLI directly
Section titled “Running the CLI directly”./searchcraft <arguments>If you require running the Searchcraft Core binary directly rather than via Docker, reach out.
Arguments
Section titled “Arguments”--log-level <LOG_LEVEL> The log level filter, any logs that are above this level won't be displayed.
Options: trace, debug, info, warn, error
Default: info
[env: SEARCHCRAFT_LOG_LEVEL=]
--disable-request-logs If set to true, disables the request logging middleware
[env: SEARCHCRAFT_DISABLE_REQUEST_LOGS=]
--disable-log-timestamps If set to true, removes the timestamps from the log output
[env: SEARCHCRAFT_DISABLE_LOG_TIMESTAMPS=]
--disable-log-color Disables ansi colors in the log output
[env: SEARCHCRAFT_DISABLE_LOG_COLOR=]
--host <HOST> The host to bind to (default: '0.0.0.0')
[env: SEARCHCRAFT_HOST=] [default: 0.0.0.0]
-p, --port <PORT> The port to bind the server to
[env: SEARCHCRAFT_PORT=] [default: 8000]
--authorization-code <AUTHORIZATION_CODE> The system level auth key.
If specified this will enable auth mode and require a token bearer on every endpoint.
This key is used to make tokens with given permissions.
Careful with this, in the wrong hands a starship could be made to lower its shields.
[env: SEARCHCRAFT_AUTHORIZATION_CODE]
-t, --runtime-threads <RUNTIME_THREADS> The number of threads to use for the tokio runtime.
If this is not set, the number of logical cores on the machine is used.
[env: SEARCHCRAFT_THREADS=]
--restore-backup <RESTORE_BACKUP> Load a backup and use it's data.
This expects `./index` not to not be present and to be empty.
This is a separate sub-command and Searcraft will shut down after loading the backup.
--backup-interval <BACKUP_INTERVAL> The interval time in hours to take an automatic backup.
This is resource intensive, should typically be set to at least 24 hours.
The backup will be saved in the directory provided by `--backup-directory`.
[env: SEARCHCRAFT_BACKUP_INTERVAL=]
--create-backup Generates a backup of containing the contents of this Searhcraft instance.
The backup will be saved in the directory provided by `--backup-directory`.
This is a separate sub-command and Searcraft will shut down after creating the backup.
--backup-directory <BACKUP_DIRECTORY> The filesystem directory where backups will be stored. Defaults to "./backups"
[env: SEARCHCRAFT_BACKUP_DIRECTORY=] [default: ./backups]
--backup-retention <BACKUP_RETENTION> Max amount in days to retain backups files
[env: SEARCHCRAFT_BACKUP_RETENTION=] [default: 7]
-m, --max-result-limit <MAX_RESULT_LIMIT> Max limit for the number of results returned by a search query. This is different than the default query limit value when no limit parameter is provided. This is the absolute highest limit for the number of results returned on this instance. Maximum value is 65_535 but you should typically use a lower value. If a query request sends a limit parameter that is higher than this value, the limit will be reduced to this value
[env: SEARCHCRAFT_MAX_RESULT_LIMIT=] [default: 200]
--clickhouse-db-url <CLICKHOUSE_DB_URL> The database url of ClickHouse.
This is automatically available to you with Searchcraft Cloud, for self-hosted you will need to setup your own ClickHouse cluster to use this functionality.
[env: SEARCHCRAFT_CLICKHOUSE_DB_URL=]
--clickhouse-db-name <CLICKHOUSE_DB_NAME> The name of the ClickHouse database
[env: SEARCHCRAFT_CLICKHOUSE_DB_NAME=] [default: searchcraft_measure]
--cache-enabled Enable shared search-result and summary caching
[env: SEARCHCRAFT_CACHE_ENABLED=]
[default: true]
--cache-ttl-seconds <CACHE_TTL_SECONDS> Cache TTL (time-to-live) in seconds for the query cache. Requires cache_enabled=true
[env: SEARCHCRAFT_CACHE_TTL_SECONDS=] [default: 300]
--cache-max-memory-mb <CACHE_MAX_MEMORY_MB> Maximum memory usage for the query cache in megabytes. Requires cache_enabled=true
[env: SEARCHCRAFT_CACHE_MAX_MEMORY_MB=] [default: 512]
-h, --help Print help (see a summary with '-h')
-V, --version Print versionCache Configuration
Section titled “Cache Configuration”Searchcraft Core includes a shared query cache for standard search requests and AI-generated search summaries.
The cache is enabled by default and can be configured with the following environment variables:
SEARCHCRAFT_CACHE_ENABLEDSEARCHCRAFT_CACHE_TTL_SECONDSSEARCHCRAFT_CACHE_MAX_MEMORY_MB
Cache entries are tied to the current index revision so cached responses are not reused after an index reload changes the underlying search state.
Example Docker configuration
Section titled “Example Docker configuration”docker run --name searchcraft -p 8000:8000 \ -e SEARCHCRAFT_CACHE_ENABLED=true \ -e SEARCHCRAFT_CACHE_TTL_SECONDS=300 \ -e SEARCHCRAFT_CACHE_MAX_MEMORY_MB=512 \ searchcraftinc/searchcraft-core:latestClickHouse (Optional)
Section titled “ClickHouse (Optional)”Searchcraft Core can optionally connect to a ClickHouse database to capture analytics events such as searches, document interactions, and AI summary usage. Running ClickHouse is completely optional — if you do not need analytics, you do not need to stand up a ClickHouse cluster, and you can leave the ClickHouse arguments unset.
When using Searchcraft Cloud a ClickHouse cluster is provisioned and connected for you automatically. For self-hosted deployments you are responsible for provisioning and connecting your own cluster.
To enable analytics, point Searchcraft Core at your ClickHouse cluster:
SEARCHCRAFT_CLICKHOUSE_DB_URL— the ClickHouse connection URL.SEARCHCRAFT_CLICKHOUSE_DB_NAME— the database name (defaults tosearchcraft_measure).
Example Docker configuration
Section titled “Example Docker configuration”docker run --name searchcraft -p 8000:8000 \ -e SEARCHCRAFT_CLICKHOUSE_DB_URL=http://clickhouse.internal:8123 \ -e SEARCHCRAFT_CLICKHOUSE_DB_NAME=searchcraft_measure \ searchcraftinc/searchcraft-core:latestDatabase Schema
Section titled “Database Schema”Searchcraft Core expects the following schema in your ClickHouse cluster. Create the database and event_table before starting Searchcraft Core with ClickHouse enabled.
CREATE DATABASE IF NOT EXISTS searchcraft_measure;
-- Use the new databaseUSE searchcraft_measure;
-- Base Event table where All events are insertedCREATE TABLE IF NOT EXISTS event_table ( date_recorded DateTime DEFAULT now(),
organization_id LowCardinality(String), application_id LowCardinality(String), index_name LowCardinality(String), federation_name LowCardinality(String) event_name LowCardinality(String), event_id UUID DEFAULT generateUUIDv4(),
user_id String, user_type LowCardinality(String) DEFAULT 'anonymous'
country LowCardinality(String), city String, device_id String, latitude Float64, longitude Float64, client_ip String, locale LowCardinality(String), os LowCardinality(String), region LowCardinality(String), sdk_name LowCardinality(String), sdk_version LowCardinality(String), platform LowCardinality(String),
search_term String, search_kind LowCardinality(String), ai_provider LowCardinality(String) DEFAULT '' number_of_documents UInt32, external_document_id String, document_position UInt32, session_id String,) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/event_table', '{replica}')PARTITION BY toYYYYMM(date_recorded)ORDER BY (organization_id, application_id, index_name, date_recorded)TTL date_recorded + INTERVAL 60 MINUTE TO DISK 'default'For detail on the individual measure event properties recorded to this table, see Measure Events.
Security
Section titled “Security”If you are running Searchcraft Core in an environment that is open to the internet you must enable key based authentication in your instance. This is enabled by starting Searchcraft Core up with a token for the admin key via --authorizaton-code. Ingestion keys (used to write documents) should ONLY be used for backend to backend communication and are not safe to expose via front-end code such as a React SPA. Keys with read level permissions should be used for front-end code. See Access Keys for more information.
Ready to commit?
Section titled “Ready to commit?”While Searchcraft Core is fully featured you may want the additional features of Searchcraft Cloud which includes our easy to use Vektron UI for relevancy tuning, and analytics as well as fully-managed hosting. For those wanting to use Searchcraft Core in production, contact us for support plan pricing.