Synthesis & hypotheses
Beyond storing facts, knomit actively maintains and grows the corpus. All of it
runs through one engine, driven by either the MCP tools
(knomit_review,
knomit_hypothesize)
or the HTTP synthesis-run endpoints. The work is work-stealing: it borrows
cycles from the calling model, one work item at a time, rather than running a
separate headless service.
The pipeline
Section titled “The pipeline”A session presents one work item at a time; the model responds, the next item is served, until the phase completes. Sessions track three independent axes:
- status — lifecycle:
active·completed·abandoned - phase — workflow:
work·reflect·done(advanced by atomic CAS transitions) - effort — the discovery dial:
normal·medium·high
knomit_review — prune · distill · reflect
Section titled “knomit_review — prune · distill · reflect”| Stage | What it does |
|---|---|
| Prune (dedup) | Detects near-duplicate facts and merges them. Tiebreak: a non-hypothesis always wins; then higher confidence; then more sources. Domains and entities are unioned. |
| Distill (synthesis) | Clusters related facts and distills them into a higher-order synthesis fact. Evidence weight uses SumProductNorm = Σ(c·s) / (Σ(c·s)+1); hypothesis sources are excluded from the weight. |
| Reflect (methodology) | Reflects on hypothesis→outcome transitions to record reasoning lessons as methodology facts. Reinforcement appends the methodology’s path to the transition fact’s refs — git is the only source of truth, no side-channel counter. New-methodology proposals are hard-capped (KNOMIT_REFLECT_PROPOSE_CAP, default 1) and gated by a novelty/cosine floor (KNOMIT_REFLECT_NOVELTY_THRESHOLD). |
knomit_review does not generate new hypotheses.
RAPTOR — multi-depth distillation
Section titled “RAPTOR — multi-depth distillation”Distillation is recursive: synthesis facts can themselves be clustered and distilled again at greater depth, RAPTOR-style. This runs through the same work-item queue; the item’s priority orders the depth, so deeper summaries build on shallower ones without a separate scheduler.
knomit_hypothesize — generate predictions
Section titled “knomit_hypothesize — generate predictions”Walks synthesis facts on the agent branch and, per item, lets the model decide
whether to write a falsifiable hypothesis fact (skipping is the expected outcome
for most). It is a distinct, user-initiated operation — never an auto-follow-up
to review. On a later dedup collision with a confirmed observation, the
hypothesis is retracted and the observation links to it via refs.
Methodology in the loop
Section titled “Methodology in the loop”When the model reasons, knomit can surface relevant past methodology.
RelevantMethodology ranks methodology facts by a composite score
(0.6·vector + 0.4·tag_overlap, floored at KNOMIT_METHODOLOGY_MIN_SCORE,
default 0.15). Methodology facts are identified by type=methodology, never by
path. The server never auto-cites — it injects candidate methodology into the
prompt and the model decides.
LLM provider
Section titled “LLM provider”Synthesis and distillation use an LLM, configured by [llm] (default provider
gemini, model gemini-2.5-flash). Embeddings are a separate, local model — see
Embeddings and Configuration.