Flagship case study

A CI failure analysis system that remembers prior incidents and routes the next fix faster.

ForgeBeyond parses failed runs, separates the primary failure from downstream noise, checks failure memory, and writes a PR/MR-native explanation with confidence and next action.

See sample PR comments Run the commands

working product shape

Problem: Recurring CI failures force engineers to reconstruct context from logs, git blame, Slack, and memory.
System: Deterministic parsers and taxonomy first; bounded AI and memory only when evidence supports it.
Output: PR/MR comment with root cause, owner hint, confidence, prior incident recall, and next action.

Inputs

The engine starts with artifacts, not vibes.

JUnit XMLpytest outputTAPraw build logsworkflow YAMLgit diffCODEOWNERSprior FMOs

Outputs

The result lands where engineers already work.

failure classroot-cause hypothesisconfidence scoreprimary anchorsecondary noiselikely ownerprior fix contextnext action

Eval posture

The memory story is tested against cases designed to break it.

The docs preserve misses, reruns, and ambiguity. Current honest claim: memory recall is stable on repeated contract cases and cuts tokens materially; superiority claims wait until evals beat both no-memory and generic baselines.

contract-echo repeat trials 3 trials

memory retrieved 3/3 repeated cases each time

contract-echo token use 532.5 avg

vs 1696.4 avg in no-memory mode

synthetic families 28 cases

recurrence, dependency drift, wrapper noise, cross-repo breakage

release rule fail closed

no “memory moat” claim unless evals beat no-memory and baseline

Architecture

Four boundaries keep the product credible.

Rules first

Known failure patterns classify deterministically from evidence packets before any model call.

Memory as context

Failure Memory Objects supply prior incidents and fixes, but do not override present-run evidence.

Confidence is visible

Every result carries evidence strength, pattern match, signal completeness, and classification clarity.

Public corpus is sanitized

The open-source path stores normalized memories, provenance, and fix summaries rather than raw private logs.