Skip to content

Generation

Permission required: generation

Fire LLM generations programmatically.

spindle.generate.raw(input)

Direct generation — you specify the provider, model, and messages.

const result = await spindle.generate.raw({
  messages: [
    { role: 'user', content: 'Summarize this text: ...' },
  ],
  parameters: { temperature: 0.3, max_tokens: 200 },
  connection_id: 'optional-connection-id',
})
// result: { content: string, finish_reason: string, usage: { ... } }

spindle.generate.quiet(input)

Uses the user's active connection profile and preset parameters.

const result = await spindle.generate.quiet({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' },
  ],
})

spindle.generate.batch(input)

Run multiple generation requests.

const results = await spindle.generate.batch({
  requests: [
    { messages: [...], provider: 'openai', model: 'gpt-4o' },
    { messages: [...], provider: 'openai', model: 'gpt-4o' },
  ],
  concurrent: true,
})
// results: Array<{ index, success, content?, error? }>

GenerationRequestDTO

Field Type Description
messages LlmMessageDTO[] The message array to send
parameters Record<string, unknown> Optional LLM parameters (temperature, max_tokens, etc.)
connection_id string Optional. Use a specific connection profile (see Connection Profiles below)
signal AbortSignal Optional. Cancel the in-flight LLM request when the signal fires (see Cancellation below)

Cancellation

Every generation method (raw, quiet, batch, rawStream, quietStream) accepts an optional AbortSignal. When the signal fires, the upstream LLM HTTP request is torn down and the call rejects with a standard DOMException whose .name === "AbortError".

The signal is consumed inside the extension worker and never crosses the wire. When abort fires, the worker posts an internal cancel_generation message to the host, which calls controller.abort() on the AbortController it created for the upstream provider call.

const controller = new AbortController()
const timer = setTimeout(() => controller.abort(), 5_000)

try {
  const result = await spindle.generate.raw({
    messages: [{ role: 'user', content: 'Write a long essay…' }],
    signal: controller.signal,
  })
  // result: { content, finish_reason, usage }
} catch (err) {
  if (err.name === 'AbortError') {
    spindle.log.info('Generation cancelled')
  } else {
    throw err
  }
} finally {
  clearTimeout(timer)
}

Compose with AbortSignal.timeout() and AbortSignal.any() for richer cancellation semantics:

const userController = new AbortController()
const signal = AbortSignal.any([
  userController.signal,
  AbortSignal.timeout(30_000),
])

await spindle.generate.quiet({ messages, signal })

For batch, the same signal is threaded into every sub-request. Aborting mid-flight cancels the in-flight call and prevents any not-yet-started sequential calls from beginning. With concurrent: true, every parallel call sees the abort.


Streaming

Stream tokens incrementally as the LLM emits them, instead of waiting for the full response. rawStream and quietStream mirror their non-streaming counterparts but return an AsyncGenerator<StreamChunkDTO> that you can iterate with for await.

The generator yields one or more token / reasoning chunks and exactly one terminal done chunk carrying the aggregated response. If the call fails or is aborted, the generator throws instead of yielding done.

spindle.generate.rawStream(input)

let acc = ''
for await (const chunk of spindle.generate.rawStream({
  provider: 'openai',
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Tell me a story.' }],
})) {
  if (chunk.type === 'token') {
    acc += chunk.token
    process.stdout.write(chunk.token)
  } else if (chunk.type === 'reasoning') {
    spindle.log.info(`[thinking] ${chunk.token}`)
  } else if (chunk.type === 'done') {
    spindle.log.info(`Final usage: ${chunk.usage?.total_tokens} tokens`)
    spindle.log.info(`finish_reason: ${chunk.finish_reason}`)
  }
}

spindle.generate.quietStream(input)

Same semantics as rawStream, but uses the user's active connection profile and preset parameters (no provider/model required).

for await (const chunk of spindle.generate.quietStream({
  messages: [{ role: 'user', content: 'Hello!' }],
})) {
  if (chunk.type === 'token') process.stdout.write(chunk.token)
}

Cancelling a stream

rawStream / quietStream accept the same AbortSignal as the non-streaming methods. The generator throws AbortError on abort. You can also break out of the for await loop early — the generator's cleanup posts a cancel message to tear down the upstream request.

const controller = new AbortController()
setTimeout(() => controller.abort(), 2_000)

try {
  for await (const chunk of spindle.generate.rawStream({
    messages: [{ role: 'user', content: 'Long answer…' }],
    signal: controller.signal,
  })) {
    if (chunk.type === 'token') process.stdout.write(chunk.token)
  }
} catch (err) {
  if (err.name === 'AbortError') spindle.log.info('Stream cancelled')
  else throw err
}

// Or break early — same effect, no AbortController needed:
for await (const chunk of spindle.generate.quietStream({ messages })) {
  if (chunk.type === 'token' && shouldStop()) break // host receives cancel
}

StreamChunkDTO

A discriminated union with three variants:

type Fields Description
"token" token: string Incremental content token.
"reasoning" token: string Incremental chain-of-thought token (provider-dependent).
"done" content: string, reasoning?: string, finish_reason: string, tool_calls?: ToolCallDTO[], usage?: { prompt_tokens, completion_tokens, total_tokens } Terminal chunk — emitted exactly once on success. Carries the aggregated response so you don't need to accumulate manually if you don't want to.

No batchStream

Batch is just a wrapper around N raw calls. If you want parallel streamed responses, run Promise.all([rawStream(a), rawStream(b)]) and consume each iterator however you like.


Dry Run (Prompt Assembly)

Run the full prompt assembly pipeline — macros, world info, context filters, memory retrieval, token counting — without actually calling the LLM. Useful for prompt debugging, token budget analysis, and previewing what the model will see.

spindle.generate.dryRun(input, userId?)

const result = await spindle.generate.dryRun({
  chatId: 'chat-id',
}, userId) // userId required for operator-scoped extensions

spindle.log.info(`Provider: ${result.provider}, Model: ${result.model}`)
spindle.log.info(`Assembled ${result.messages.length} messages`)
spindle.log.info(`Breakdown: ${result.breakdown.length} blocks`)

if (result.tokenCount) {
  spindle.log.info(`Total tokens: ${result.tokenCount.total_tokens}`)
}

if (result.worldInfoStats) {
  spindle.log.info(`WI entries activated: ${result.worldInfoStats.activatedAfterBudget}`)
}

if (result.memoryStats?.enabled) {
  spindle.log.info(`Memory chunks retrieved: ${result.memoryStats.chunksRetrieved}`)
}

You can optionally override the connection, persona, preset, or generation type:

const result = await spindle.generate.dryRun({
  chatId: 'chat-id',
  connectionId: 'specific-connection',   // default: user's default connection
  personaId: 'specific-persona',         // default: user's active/default persona
  presetId: 'specific-preset',           // default: connection's linked preset
  generationType: 'continue',            // default: 'normal'
  parameters: { temperature: 0.8 },      // override sampler params
}, userId)

DryRunRequestDTO

Field Type Description
chatId string Required. The chat to assemble the prompt for.
connectionId string Optional. Use a specific connection profile.
personaId string Optional. Use a specific persona.
presetId string Optional. Use a specific preset.
generationType string Optional. One of "normal", "continue", "regenerate", "swipe", "impersonate".
parameters Record<string, unknown> Optional. Override sampler parameters.

dryRun also accepts a second argument:

Argument Type Description
userId string Required for operator-scoped extensions. The user ID to scope the dry run to. For user-scoped extensions, this is inferred automatically and can be omitted.

DryRunResultDTO

Field Type Description
messages LlmMessageDTO[] The fully assembled message array that would be sent to the LLM.
breakdown AssemblyBreakdownEntryDTO[] Ordered list of prompt blocks showing how the prompt was built.
parameters Record<string, unknown> Final merged sampler parameters.
model string The model that would be used.
provider string The provider that would be used.
tokenCount DryRunTokenCountDTO Optional. Per-block token counts (if a tokenizer is available).
worldInfoStats ActivationStatsDTO Optional. World info activation statistics.
memoryStats MemoryStatsDTO Optional. Long-term memory retrieval statistics.

AssemblyBreakdownEntryDTO

Each entry represents one block in the assembled prompt:

Field Type Description
type string Block type: "block", "chat_history", "world_info", "authors_note", "utility", "long_term_memory", "separator", "append", "sidecar", "extension".
name string Human-readable block name.
role string Message role ("system", "user", "assistant").
content string The resolved text content.
blockId string Preset block ID (if from a preset block).
extensionId string Present for interceptor-injected breakdown blocks. Resolved from the installed extension manifest.
extensionName string Human-readable extension attribution for interceptor-injected breakdown blocks.

When an interceptor returns breakdown: [{ messageIndex, name? }], the host turns those referenced messages into type: "extension" breakdown entries. This means retrieval or prompt-engineering extensions can expose their injected context in both dry-run results and persisted prompt breakdown snapshots without having to parse or diff the final prompt themselves.

ActivationStatsDTO

Field Type Description
totalCandidates number Total WI entries considered.
activatedBeforeBudget number Entries that matched before budget enforcement.
activatedAfterBudget number Entries included after budget enforcement.
evictedByBudget number Entries dropped due to budget limits.
evictedByMinPriority number Entries dropped due to minimum priority threshold.
estimatedTokens number Approximate total WI tokens (chars/4).
recursionPassesUsed number Number of keyword-chaining recursion passes.

MemoryStatsDTO

Field Type Description
enabled boolean Whether long-term memory is active.
chunksRetrieved number Number of memory chunks included.
chunksAvailable number Total chunks in the vector store.
chunksPending number Chunks awaiting vectorization.
injectionMethod string How memories were injected: "macro", "fallback", or "disabled".
queryPreview string Truncated query text used for vector search.
settingsSource string Whether settings came from "global" or "per_chat" overrides.

Tip

Dry run mirrors the exact assembly pipeline used during real generation (macros, world info, context filters, memory) but skips the council execution and LLM call. It's the fastest way to debug prompt construction.


Structured Output

Some providers support native structured output, ensuring the LLM response conforms to a JSON schema. Pass provider-specific parameters via the parameters field.

Google Gemini

Use responseMimeType and responseSchema to request structured JSON output:

const result = await spindle.generate.raw({
  messages: [
    { role: 'user', content: 'Extract the character name and age from: "Alice is 25 years old."' },
  ],
  parameters: {
    responseMimeType: 'application/json',
    responseSchema: {
      type: 'object',
      properties: {
        name: { type: 'string' },
        age: { type: 'integer' },
      },
      required: ['name', 'age'],
    },
  },
  connection_id: 'my-gemini-connection',
})
// result.content: '{"name": "Alice", "age": 25}'

responseJsonSchema is accepted as an alias for responseSchema.

OpenAI-compatible

Use the standard response_format parameter:

const result = await spindle.generate.raw({
  messages: [
    { role: 'user', content: 'Extract the character name and age.' },
  ],
  parameters: {
    response_format: {
      type: 'json_schema',
      json_schema: {
        name: 'character_info',
        schema: {
          type: 'object',
          properties: {
            name: { type: 'string' },
            age: { type: 'integer' },
          },
          required: ['name', 'age'],
        },
      },
    },
  },
  connection_id: 'my-openai-connection',
})

Anthropic

Anthropic uses tool definitions for structured output. Define a tool with the desired output schema and set tool_choice to force it:

const result = await spindle.generate.raw({
  messages: [
    { role: 'user', content: 'Extract the character name and age.' },
  ],
  parameters: {
    tools: [{
      name: 'extract_info',
      description: 'Extract structured character information',
      input_schema: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          age: { type: 'integer' },
        },
        required: ['name', 'age'],
      },
    }],
    tool_choice: { type: 'tool', name: 'extract_info' },
  },
  connection_id: 'my-anthropic-connection',
})

Tip

Provider-specific parameters are passed through to the underlying API. Any parameter not explicitly handled by Lumiverse is forwarded directly, so you can use provider-specific features even if they aren't documented here.


Stream Observation

Observe an in-flight LLM generation in real time. observe() subscribes to all generation lifecycle events for a specific chat, accumulates streamed content and reasoning tokens automatically, and exposes them through a simple callback API.

spindle.generate.observe(chatId)

Returns a GenerationObserver that filters events to the given chat.

const observer = spindle.generate.observe('chat-uuid')

observer.onStart((info) => {
  spindle.log.info(`Generation started: ${info.model}`)
})

observer.onToken((token) => {
  // Called for every streamed token (content and reasoning)
  if (token.type === 'reasoning') {
    spindle.log.info(`[thinking] ${token.token}`)
  }
})

observer.onEnd((result) => {
  if (result.error) {
    spindle.log.error(`Generation failed: ${result.error}`)
  } else {
    spindle.log.info(`Done — ${observer.content.length} chars`)
  }
  observer.dispose()
})

observer.onStop((result) => {
  spindle.log.info(`Stopped early — partial: ${observer.content.length} chars`)
  observer.dispose()
})

At any point during streaming you can read the accumulated state:

observer.content    // all content tokens concatenated
observer.reasoning  // all reasoning tokens concatenated
observer.generationId  // active generation ID, or null if idle

Always call dispose()

The observer subscribes to four event channels internally. Call observer.dispose() when you no longer need it to unsubscribe and free resources.

GenerationObserver

Property / Method Type Description
onStart(handler) (info: GenerationStartedPayloadDTO) => void Called when a generation begins on this chat
onToken(handler) (token: StreamTokenPayloadDTO) => void Called for each streamed token
onEnd(handler) (result: GenerationEndedPayloadDTO) => void Called when the generation completes or errors
onStop(handler) (result: GenerationStoppedPayloadDTO) => void Called when the user stops the generation
content string (readonly) Accumulated content tokens
reasoning string (readonly) Accumulated reasoning/CoT tokens
generationId string \| null (readonly) Active generation ID
dispose() () => void Unsubscribe from all events

GenerationStartedPayloadDTO

Field Type Description
generationId string Unique generation ID
chatId string Chat this generation belongs to
model string Model being used
targetMessageId string Optional. ID of the message being generated/regenerated
characterId string Optional. Target character ID
characterName string Optional. Target character name

StreamTokenPayloadDTO

Field Type Description
generationId string Generation this token belongs to
chatId string Chat ID
token string The text chunk
seq number Monotonic sequence number (for deduplication)
type "reasoning" Optional. Present for chain-of-thought tokens

GenerationEndedPayloadDTO

Field Type Description
generationId string Generation ID
chatId string Chat ID
messageId string ID of the saved message (absent on error)
content string Final generated content (absent on error)
error string Error message (absent on success)

GenerationStoppedPayloadDTO

Field Type Description
generationId string Generation ID
chatId string Chat ID
content string Partial content accumulated before the stop

Raw event subscription

If you need lower-level control (e.g. observing multiple chats, or only specific events), you can subscribe to the generation events directly. These are fully typed when using lumiverse-spindle-types:

const unsub = spindle.on('STREAM_TOKEN_RECEIVED', (payload) => {
  // payload is typed as StreamTokenPayloadDTO
  console.log(payload.token, payload.seq)
})

// Clean up when done
unsub()

Available generation events: GENERATION_STARTED, STREAM_TOKEN_RECEIVED, GENERATION_ENDED, GENERATION_STOPPED.


Connection Profiles

Extensions with the generation permission can discover and inspect the user's connection profiles. This lets you present a UI for selecting which LLM provider/model to use, or programmatically pick the right connection for your use case.

Connection profiles are returned as safe ConnectionProfileDTO objects — API keys are never exposed (only a has_api_key boolean).

spindle.connections.list(userId?)

List all connection profiles available to the user.

const connections = await spindle.connections.list()
// connections: Array<{ id, name, provider, model, is_default, has_api_key, ... }>

const defaultConn = connections.find(c => c.is_default)
if (defaultConn) {
  const result = await spindle.generate.quiet({
    messages: [{ role: 'user', content: 'Hello' }],
    connection_id: defaultConn.id,
  })
}

spindle.connections.get(connectionId, userId?)

Get a single connection profile by ID. Returns null if not found.

const conn = await spindle.connections.get('some-connection-id')
if (conn) {
  spindle.log.info(`Using ${conn.provider} / ${conn.model}`)
}

ConnectionProfileDTO

Field Type Description
id string Unique connection profile ID
name string Human-readable display name
provider string LLM provider identifier (e.g. "openai", "anthropic")
api_url string Custom API URL (empty string for default)
model string Selected model identifier
preset_id string \| null Associated generation preset
is_default boolean Whether this is the user's default connection
has_api_key boolean Whether an API key is configured (key itself is never exposed)
metadata Record<string, unknown> Provider-specific metadata
created_at number Unix timestamp
updated_at number Unix timestamp

Note

For user-scoped extensions, the userId parameter is automatically inferred from the extension owner. For operator-scoped extensions, pass userId to scope the query to a specific user.