From Quality Gates to AI Bloat Remediation

TL;DR: The SpecFact code-review module did not start as an AI-bloat detector. It started as a quality gate: Ruff, Radon, type checks, architecture checks, contracts, and TDD evidence. The recent 0.47.x release waves extend that foundation into advisory ai_bloat findings, deterministic simplification metadata, and guided remediation signals that AI IDE LLMs can use without blindly deleting meaningful code.

Status at time of writing: This story is grounded in the archived and active OpenSpec changes in nold-ai/specfact-cli-modules. The current registry entry for nold-ai/specfact-code-review is 0.47.27. The AI-bloat work is intentionally framed as "bloat detection tuned for the shapes AI code commonly produces", not as AI-authorship classification.

The Important Part Came First: Gates

It is tempting to tell this story as "SpecFact now fights AI bloat." That is true, but it misses why the current approach is useful.

The code-review module did not begin with an LLM prompt. It began with quality gates.

The early archived changes in March built the boring, necessary substrate: code-review-02-ruff-radon-runners translated Ruff and Radon output into governed ReviewFinding records. code-review-04-contract-test-runners added icontract AST checks, CrossHair fast passes, and a TDD gate for matching tests and coverage. Later changes wired specfact code review run, added ledger and rules workflows, and made scope selection deterministic with --scope changed, --scope full, and repeatable --path filters.

That sequence matters. Before SpecFact could tell an AI assistant how to clean code, it needed to know how to review code repeatably.

Release line	What changed	Why it mattered
`0.41.x`	Ruff, Radon, basedpyright, pylint, contracts, CrossHair, TDD gate	Turned review into executable evidence, not opinion.
`0.42.x`	Reward ledger and generated house-rules skill	Started feeding repeated findings back into future AI sessions.
`0.43.0`	Full `specfact code review run` command and CLI contracts	Made the review workflow runnable, testable, and scriptable.
`0.44.0`	Changed/full scope modes and path filters	Made review target selection deterministic enough for CI and agents.

The first product lesson was simple: AI remediation cannot be safer than the signal it consumes. If the findings are noisy, unactionable, or scoped unpredictably, an IDE LLM will amplify the mess.

Then Clean Code Became a First-Class Policy

The archived clean-code-02-expanded-review-module change was the pivot from generic static analysis to an opinionated clean-code model.

That change expanded the review surface across seven clean-code principles: naming, KISS, YAGNI, DRY, SOLID, PR checklist discipline, and the existing contract/TDD baseline. It shipped the specfact/clean-code-principles policy pack, extended Radon from cyclomatic complexity into staged KISS metrics such as line count, nesting depth, and parameter count, and compressed the charter into the canonical skills/specfact-code-review/SKILL.md surface.

This is where the module stopped being "a wrapper around tools" and became a governance surface. Ruff or Radon can say "this is complex." SpecFact can say "this violates the clean-code policy pack your AI assistant is supposed to follow."

Design constraint: The clean-code pack did not invent a second severity schema. It reused governed findings, categories, policy packs, and CLI contracts. That kept the output consumable by CI, docs, skills, prompts, and future agents.

External Repos Forced the Module to Grow Up

The April bug-finding and sidecar hardening work added an important reality check. Validation against external Python repos exposed gaps that a repo-local clean-code gate would not show:

Repos without icontract generated too much missing-contract noise.
CrossHair needed a longer bug-hunt mode to find useful counterexamples.
Semgrep needed a separate bug-focused ruleset, not just clean-code checks.
Installed sidecar virtualenvs could pollute codebase extraction if not excluded.
Teams needed --mode shadow for signal collection before blocking merges.

That is the second product lesson: if a review module is going to help real teams, it must distinguish enforcement from learning. Shadow mode, severity floors, focus facets, graceful missing-tool findings, and bug-hunt mode all came from that pressure.

Those mechanics later became essential for AI-bloat remediation. You cannot safely ask an LLM to clean code if every advisory is treated like a merge blocker.

The AI-Bloat Turn

The code-review-ai-bloat-detection change made the next move explicit: some code shapes are repeatedly amplified by AI-assisted generation, and conventional gates do not name them clearly enough.

Examples include manual append loops where a comprehension or stdlib call would be clearer, passthrough lambdas, identity try/except blocks, one-call wrappers, speculative Optional[...] = None parameters that are never handled, duplicate terminal guards, long low-branch functions, and redundant intermediate variables.

SpecFact added a new ai_bloat category, a Semgrep rule pack, an AST runner, and the specfact/ai-bloat-patterns policy pack. The crucial choice: these findings are severity=info, advisory-only, and score-neutral. They identify simplification candidates. They do not claim to prove that AI wrote the code.

The rollout evidence is useful because it shows the scale of the problem without overstating it. A local detector run over specfact-cli-modules and specfact-project produced 144 advisory candidates. A later full-repository review recorded 115 ai_bloat info findings alongside existing legacy findings. No automatic rewrite was applied from those numbers alone.

That restraint is the point. The module captures bloat-like shapes, classifies them as advisory signals, and writes them to .specfact/code-review.json. The remediation step is separate.

From Findings to Remediation Signals

The next active change, code-review-11-simplification-feedback-loop, is where the output became useful to an IDE LLM.

A line-level finding such as "manual loop could be simplified" is helpful to a senior developer, but too thin for an automated assistant. The simplification feedback loop adds structured metadata: confidence, rewrite hint, canonical pattern, estimated deletion impact, intent_key, and related locations. The report schema advances additively to 1.1 when that metadata is present.

It also adds --focus simplify, a queue that pulls in ai_bloat plus high-confidence DRY and KISS findings. The updated /specfact.08-simplify prompt groups candidates by intent, file/domain, and rule, then requires explicit per-change confirmation before edits.

specfact code review run \
  --scope changed \
  --focus simplify \
  --json \
  --out .specfact/code-review-simplify.json

This is the third product lesson: AI coding assistants need more than "what is wrong." They need bounded instructions for what can change, where related code exists, and when not to act.

What the Shadow Runs Showed

After the guided simplify workflow shipped, we ran shadow-mode validation across three small cohorts: public GitHub repositories tagged around vibe coding and AI-assisted development, quality-oriented SDD / Spec Kit / OpenSpec repositories with enough Python code to scan, and our own SpecFact CLI repositories. We intentionally used shadow mode and avoid naming the first public cohort here: the goal was not to call out project owners, but to test whether the new category finds a materially different signal than older clean-code checks.

The result was clear enough to be useful, with one important nuance. In the vibe-coding-adjacent sample, more than 90% of simplify-focused findings fell into the new ai_bloat category. In several quality-oriented SDD repositories, the same pattern still appeared. In our own SpecFact repos, the share was just under 90%. But one mature SDD/code-quality repo was a clear outlier: traditional DRY findings dominated there.

Validation slice	`ai_bloat` share	Interpretation
Vibe-coding-adjacent repositories	> 90%	The recent category captured the overwhelming majority of simplify signal.
Most quality-oriented SDD repositories tested	Often > 90%	SDD helps structure intent, but does not automatically remove local implementation bloat.
SpecFact CLI repositories	~89%	The same pattern appears in our own code, but most findings require judgment or tests.
One mature SDD/code-quality outlier	~48%	Traditional DRY signals dominated, showing the pattern is strong but not universal.

The takeaway is not "90% of findings can be auto-fixed." That would be wishful thinking. The better takeaway is that ai_bloat creates a high-signal remediation queue for an IDE LLM. The assistant can apply stricter clean-code patterns to safe mechanical cases, but findings marked needs_tests, design_judgment, or preserve still need tests, review, or explicit non-action.

That distinction is exactly why the recent releases matter. SpecFact can now identify a distinct cleanup class, separate it from traditional DRY/KISS signals, and then classify it before the LLM edits code. In practice, the review surface has moved from "quality gate found a problem" to "here is the cleanup queue an IDE assistant can reason about and verify."

The Current Step: Guided Simplification

The latest change, code-review-12-guided-simplification-enforcement, tightens that loop into an agent-readable decision model.

Simplification findings now carry guidance fields such as recommended action, rationale, clean-code principle, safety checks, preserve reason, action status, and before/after evidence. Findings are classified into four buckets:

safe_mechanical: deterministic cleanup that can be applied or enforced.
needs_tests: likely cleanup, but only after characterization or regression tests.
design_judgment: requires a developer decision because the shape may encode architecture or intent.
preserve: keep it because it represents a public API boundary, compatibility shim, contract, extension point, or domain intent.

That classification changes the safety profile. --focus simplify --mode enforce fails only on unresolved safe_mechanical findings. --focus simplify --fix applies only deterministic safe-mechanical rewrites and records what was applied, still recommended, kept, skipped, or failed. Everything else remains evidence for a human or agent walkthrough.

The release evidence shows why this matters. Guided fixture checks produced a schema 1.2 report with three findings: one safe_mechanical, one needs_tests, and one preserve. Enforce mode failed only because the safe-mechanical item was unresolved. With --fix, the deterministic rewrite applied, review reran, and the report passed while preserving non-mechanical guidance.

Agent contract: The workflow is deliberately conservative. The bundled prompt and skill prohibit autonomous batch edits, adapt explanation depth for vibe-coder, junior, senior/pro, and headless-agent modes, and treat findings without guidance_kind as unguided advisories rather than auto-fix input.

Why This Starts to Get Traction

The traction is not that SpecFact found a new label called "AI bloat." Labels are cheap.

The traction is that the module now has a chain from quality evidence to remediation:

Capture: run deterministic review over changed, full, or path-filtered scope.
Classify: emit governed findings across static analysis, contracts, TDD, clean-code principles, bug-finding, and ai_bloat.
Separate gates from advice: use severity, shadow/enforce mode, score-neutral advisories, and focus queues.
Add remediation metadata: provide rewrite hints, intent grouping, related locations, confidence, and deletion estimates.
Guide LLM action: classify cleanup as safe mechanical, needs tests, design judgment, or preserve.
Verify again: rerun review and tests after accepted changes.

That chain is what AI IDE LLMs have been missing. They are good at editing once asked. They are less reliable at deciding which simplifications are safe, which wrappers encode a boundary, and which cleanup should wait for tests. SpecFact's job is to turn review evidence into a narrow, reviewable action surface.

What This Means for Clean Code in AI-Assisted Teams

Clean code work used to be framed as taste: shorter functions, better names, fewer wrappers, less duplication. AI-assisted development changes the economics. Small bloat patterns can appear quickly, repeatedly, and across a repository before a reviewer notices the trend.

The answer is not to ban AI-generated code or ask reviewers to police every local simplification manually. The better answer is to make the quality model executable:

Use gates for real blockers.
Use advisory findings for cleanup candidates.
Use clean-code policy packs for shared vocabulary.
Use structured remediation metadata for IDE agents.
Require tests or developer judgment where code encodes intent.

That is the evolution of the SpecFact code-review module so far: quality gates first, clean-code policy next, AI-bloat detection as an advisory category, and now guided simplification signals that let AI tools help without being trusted blindly.

Try the Current Flow

# Install the review bundle
specfact module install nold-ai/specfact-code-review

# Collect a simplify queue for changed Python files
specfact code review run \
  --scope changed \
  --focus simplify \
  --json \
  --out .specfact/code-review-simplify.json

# Print AI-facing instructions for an IDE assistant
specfact code review run --instructions

# Enforce only unresolved safe-mechanical simplifications
specfact code review run --focus simplify --mode enforce

# Apply deterministic safe-mechanical rewrites only
specfact code review run --focus simplify --fix

From Quality Gates to AI Bloat Remediation

The Important Part Came First: Gates

Then Clean Code Became a First-Class Policy

External Repos Forced the Module to Grow Up

The AI-Bloat Turn

From Findings to Remediation Signals

What the Shadow Runs Showed

The Current Step: Guided Simplification

Why This Starts to Get Traction

What This Means for Clean Code in AI-Assisted Teams

Try the Current Flow

Further Reading

Ready to try SpecFact?