Skip to main content

Context Compaction

Context compaction automatically manages the context window of a conversation to prevent sessions from failing when they approach the model's token limit. When a conversation grows large, older messages are intelligently summarised while recent context and critical artefacts (code blocks, file paths, error messages) are preserved verbatim.

How it works

  1. Monitoring — The platform tracks token usage as messages accumulate in a thread
  2. Triggering — When usage exceeds the threshold (default 80% of the model's context limit), compaction begins
  3. Summarisation — Older messages are summarised by an LLM, keeping:
    • The most recent messages verbatim (default: last 10)
    • Code blocks and file paths
    • Error messages and stack traces
  4. Fallback — If summarisation fails, the system falls back to truncation (removing the oldest messages)

What you see

When compaction is running, a Context Compaction progress indicator appears in the chat UI, similar to a tool-call display. Compaction is otherwise transparent — conversations continue without interruption.

Default settings

SettingDefaultDescription
Trigger threshold80%Compaction starts when the thread reaches 80% of the model's context limit
Verbatim window10 messagesThe most recent 10 messages are always kept in full
Target after compaction65%After compaction, the thread is reduced to approximately 65% of the context limit

Understanding the threshold

The threshold works with a response buffer. The platform reserves approximately 8% of the context for the model's response, so the effective trigger point is:

effective_trigger = context_limit × (1 - 0.08) × threshold_percent

For a 128K-token model with the default 80% threshold:

  • Effective limit: 128,000 × 0.92 = 117,760 tokens
  • Trigger point: 117,760 × 0.80 = 94,208 tokens

Graceful degradation

If summarisation fails (for example, because the configured model is temporarily unavailable):

  1. The system retries once after a short delay
  2. It then falls back to truncating the oldest messages
  3. The fallback is logged internally

Chat sessions continue functioning even when compaction degrades to truncation mode.

note

Context compaction settings are managed at the platform level by BasePeak. Contact BasePeak support if you need the defaults adjusted for your workspace.