Searchcraft Core

The Searchcraft Core is the stand alone back-end core API without the Searchcraft Cloud Vektron UI and associated micro-services. It allows developers to run the API locally or for companies to run Searchcraft Core as part of their on-premise infrastructure. For production use, contact Searchcraft for licensing and support agreements.

Running via Docker

Download either the latest image tag or a specific version from Docker Hub.

docker pull searchcraftinc/searchcraft-core:latest
docker run --name searchcraft -p 8000:8000 searchcraftinc/searchcraft-core:latest

Add -it -d to run it in detached mode.

The image is a multi-architecture image for arm64 and amd64. When you docker pull on the destination system the approriate plaform will be selected. Full environmental variable configuration options are listed in the documentation. The default port is 8000 but you can change that any any other setting by passing environmental variables to the container.

docker run --name searchcraft -p 80:80 -e SEARCHCRAFT_PORT=80 searchcraftinc/searchcraft-core:latest

Running the CLI directly

./searchcraft <arguments>

If you require running the Searchcraft Core binary directly rather than via Docker, reach out.

Arguments

--log-level <LOG_LEVEL>
    The log level filter, any logs that are above this level won't be displayed.

    Options: trace, debug, info, warn, error

    Default: info

    [env: SEARCHCRAFT_LOG_LEVEL=]

--disable-request-logs
    If set to true, disables the request logging middleware

    [env: SEARCHCRAFT_DISABLE_REQUEST_LOGS=]

--disable-log-timestamps
    If set to true, removes the timestamps from the log output

    [env: SEARCHCRAFT_DISABLE_LOG_TIMESTAMPS=]

--disable-log-color
    Disables ansi colors in the log output

    [env: SEARCHCRAFT_DISABLE_LOG_COLOR=]

--host <HOST>
    The host to bind to (default: '0.0.0.0')

    [env: SEARCHCRAFT_HOST=]
    [default: 0.0.0.0]

-p, --port <PORT>
    The port to bind the server to

    [env: SEARCHCRAFT_PORT=]
    [default: 8000]

--authorization-code <AUTHORIZATION_CODE>
    The system level auth key.

    If specified this will enable auth mode and require a token bearer on every endpoint.

    This key is used to make tokens with given permissions.

    Careful with this, in the wrong hands a starship could be made to lower its shields.

    [env: SEARCHCRAFT_AUTHORIZATION_CODE]

-t, --runtime-threads <RUNTIME_THREADS>
    The number of threads to use for the tokio runtime.

    If this is not set, the number of logical cores on the machine is used.

    [env: SEARCHCRAFT_THREADS=]

--restore-backup <RESTORE_BACKUP>
    Load a backup and use it's data.

    This expects `./index` not to not be present and to be empty.

    This is a separate sub-command and Searcraft will shut down after loading the backup.

--backup-interval <BACKUP_INTERVAL>
    The interval time in hours to take an automatic backup.

    This is resource intensive, should typically be set to at least 24 hours.

    The backup will be saved in the directory provided by `--backup-directory`.

    [env: SEARCHCRAFT_BACKUP_INTERVAL=]

--create-backup
    Generates a backup of containing the contents of this Searhcraft instance.

    The backup will be saved in the directory provided by `--backup-directory`.

    This is a separate sub-command and Searcraft will shut down after creating the backup.

--backup-directory <BACKUP_DIRECTORY>
    The filesystem directory where backups will be stored. Defaults to "./backups"

    [env: SEARCHCRAFT_BACKUP_DIRECTORY=]
    [default: ./backups]

--backup-retention <BACKUP_RETENTION>
    Max amount in days to retain backups files

    [env: SEARCHCRAFT_BACKUP_RETENTION=]
    [default: 7]

-m, --max-result-limit <MAX_RESULT_LIMIT>
    Max limit for the number of results returned by a search query. This is different than the default query limit value when no limit parameter is provided. This is the absolute highest limit for the number of results returned on this instance. Maximum value is 65_535 but you should typically use a lower value. If a query request sends a limit parameter that is higher than this value, the limit will be reduced to this value

    [env: SEARCHCRAFT_MAX_RESULT_LIMIT=]
    [default: 200]

--clickhouse-db-url <CLICKHOUSE_DB_URL>
    The database url of ClickHouse.

    This is automatically available to you with Searchcraft Cloud, for self-hosted you will need to setup your own ClickHouse cluster to use this functionality.

    [env: SEARCHCRAFT_CLICKHOUSE_DB_URL=]

--clickhouse-db-name <CLICKHOUSE_DB_NAME>
    The name of the ClickHouse database

    [env: SEARCHCRAFT_CLICKHOUSE_DB_NAME=]
    [default: searchcraft_measure]

--cache-enabled
    Enable shared search-result and summary caching

    [env: SEARCHCRAFT_CACHE_ENABLED=]

    [default: true]

--cache-ttl-seconds <CACHE_TTL_SECONDS>
    Cache TTL (time-to-live) in seconds for the query cache. Requires cache_enabled=true

    [env: SEARCHCRAFT_CACHE_TTL_SECONDS=]
    [default: 300]

--cache-max-memory-mb <CACHE_MAX_MEMORY_MB>
    Maximum memory usage for the query cache in megabytes. Requires cache_enabled=true

    [env: SEARCHCRAFT_CACHE_MAX_MEMORY_MB=]
    [default: 512]

-h, --help
    Print help (see a summary with '-h')

-V, --version
    Print version

Cache Configuration

Searchcraft Core includes a shared query cache for standard search requests and AI-generated search summaries.

The cache is enabled by default and can be configured with the following environment variables:

SEARCHCRAFT_CACHE_ENABLED
SEARCHCRAFT_CACHE_TTL_SECONDS
SEARCHCRAFT_CACHE_MAX_MEMORY_MB

Cache entries are tied to the current index revision so cached responses are not reused after an index reload changes the underlying search state.

Example Docker configuration

docker run --name searchcraft -p 8000:8000 \
  -e SEARCHCRAFT_CACHE_ENABLED=true \
  -e SEARCHCRAFT_CACHE_TTL_SECONDS=300 \
  -e SEARCHCRAFT_CACHE_MAX_MEMORY_MB=512 \
  searchcraftinc/searchcraft-core:latest

ClickHouse (Optional)

Searchcraft Core can optionally connect to a ClickHouse database to capture analytics events such as searches, document interactions, and AI summary usage. Running ClickHouse is completely optional — if you do not need analytics, you do not need to stand up a ClickHouse cluster, and you can leave the ClickHouse arguments unset.

When using Searchcraft Cloud a ClickHouse cluster is provisioned and connected for you automatically. For self-hosted deployments you are responsible for provisioning and connecting your own cluster.

To enable analytics, point Searchcraft Core at your ClickHouse cluster:

SEARCHCRAFT_CLICKHOUSE_DB_URL — the ClickHouse connection URL.
SEARCHCRAFT_CLICKHOUSE_DB_NAME — the database name (defaults to searchcraft_measure).

Example Docker configuration

docker run --name searchcraft -p 8000:8000 \
  -e SEARCHCRAFT_CLICKHOUSE_DB_URL=http://clickhouse.internal:8123 \
  -e SEARCHCRAFT_CLICKHOUSE_DB_NAME=searchcraft_measure \
  searchcraftinc/searchcraft-core:latest

Database Schema

Searchcraft Core expects the following schema in your ClickHouse cluster. Create the database and event_table before starting Searchcraft Core with ClickHouse enabled.

CREATE DATABASE IF NOT EXISTS searchcraft_measure;

-- Use the new database
USE searchcraft_measure;

-- Base Event table where All events are inserted
CREATE TABLE IF NOT EXISTS event_table (
    date_recorded DateTime DEFAULT now(),

    organization_id LowCardinality(String),
    application_id LowCardinality(String),
    index_name LowCardinality(String),
    federation_name LowCardinality(String)
    event_name LowCardinality(String),
    event_id UUID DEFAULT generateUUIDv4(),

    user_id String,
    user_type LowCardinality(String) DEFAULT 'anonymous'

    country LowCardinality(String),
    city String,
    device_id String,
    latitude Float64,
    longitude Float64,
    client_ip String,
    locale LowCardinality(String),
    os LowCardinality(String),
    region LowCardinality(String),
    sdk_name LowCardinality(String),
    sdk_version LowCardinality(String),
    platform LowCardinality(String),

    search_term String,
    search_kind LowCardinality(String),
    ai_provider LowCardinality(String) DEFAULT ''
    number_of_documents UInt32,
    external_document_id String,
    document_position UInt32,
    session_id String,
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/event_table', '{replica}')
PARTITION BY toYYYYMM(date_recorded)
ORDER BY (organization_id, application_id, index_name, date_recorded)
TTL date_recorded + INTERVAL 60 MINUTE TO DISK 'default'

For detail on the individual measure event properties recorded to this table, see Measure Events.

Security

If you are running Searchcraft Core in an environment that is open to the internet you must enable key based authentication in your instance. This is enabled by starting Searchcraft Core up with a token for the admin key via --authorizaton-code. Ingestion keys (used to write documents) should ONLY be used for backend to backend communication and are not safe to expose via front-end code such as a React SPA. Keys with read level permissions should be used for front-end code. See Access Keys for more information.

Ready to commit?

While Searchcraft Core is fully featured you may want the additional features of Searchcraft Cloud which includes our easy to use Vektron UI for relevancy tuning, and analytics as well as fully-managed hosting. For those wanting to use Searchcraft Core in production, contact us for support plan pricing.