TL;DR: SpecFact CLI reverse engineers legacy Python code into executable contracts using AST analysis (not LLM guessing). In this technical deep-dive, we walk through a real example: analyzing a 3-year-old Django app with no documentation, extracting 19 features and 49 stories in seconds, then adding runtime contracts to prevent regressions during modernization.

The Challenge

You've inherited a legacy Python codebase. It has:

  • No documentation — no docstrings, no README, no API docs
  • No type hints — Python 2.7 style code, or Python 3 without annotations
  • Business logic buried in views — Django views with 200+ lines of mixed concerns
  • No tests — or tests that haven't been updated in years
  • 15 undocumented API endpoints — you don't know what they do

Traditional approach: Spend 2-3 weeks manually documenting, writing specs, creating API docs. Then hope nothing breaks during modernization.

SpecFact approach: Reverse engineer the codebase in under 10 minutes, extract features automatically, then add runtime contracts to prevent regressions.


Step-by-Step: Reverse Engineering with SpecFact

Step 1: Install and Initialize

# Install SpecFact CLI
uvx specfact-cli --version

# Navigate to your legacy codebase
cd legacy-django-app

Step 2: Reverse Engineer the Codebase

💡 Recommended: Use the slash command in your AI IDE (Cursor, VS Code + Copilot) for LLM-enriched analysis. The AI adds semantic understanding, business context, and "why" reasoning to the extracted features.

# Option A: AI IDE Mode (Recommended)
# Step 1: Install slash commands
specfact init --ide cursor  # or --ide vscode

# Step 2: In your AI IDE, use the slash command:
/specfact.01-import --repo .

# Option B: CLI-Only Mode (Quick Start)
specfact import from-code legacy-api --repo . --confidence 0.5

What happens under the hood:

  1. AST Analysis: Parses Python files, extracts functions, classes, dependencies
  2. Semgrep Pattern Detection: Identifies common patterns (API endpoints, data models, etc.)
  3. Dependency Graph Building: Maps relationships between modules
  4. Feature Extraction: Groups related code into features and stories

Output:

🔍 Analyzing Python files...
✓ Found 19 features
✓ Detected themes: API, Data Processing, Authentication
✓ Total stories: 49

✓ Analysis complete!
Project bundle written to: .specfact/projects/legacy-api/

Time taken: ~3-5 seconds for 19 Python files (varies by codebase size and complexity)

Step 3: Review Extracted Features

SpecFact automatically extracted 19 features from the codebase:

Feature Stories Confidence What It Does
Payment Processing 5 0.9 Handles payment creation, validation, and webhooks
User Authentication 3 0.8 Login, logout, session management
Data Export 4 0.7 CSV/JSON export functionality
Report Generation 6 0.9 Generates PDF reports from data

Total: 49 user stories auto-generated with Fibonacci story points.

💡 Best Practice: Use /specfact.01-import --repo . in your AI IDE (Cursor, VS Code + Copilot) for LLM-enriched analysis. The AI adds semantic understanding, business context, and "why" reasoning to the extracted features.


Before/After Example

Before: Undocumented Legacy Code

# views.py - Legacy Django view (no documentation)
def process_payment(request):
    user = get_user(request.user_id)
    payment = create_payment(user.id, request.amount)
    send_notification(user.email, payment.id)
    return JsonResponse({'status': 'success'})

Problems:

  • No docstring explaining what it does
  • No type hints
  • No error handling visible
  • No contract enforcement
  • Unknown edge cases

After: Reverse Engineered + Contracts

SpecFact extracts the feature, then you can add contracts:

# Generated plan bundle (.specfact/projects/legacy-api/plan.yaml)
features:
  - id: FEATURE-PAYMENT-PROCESSING
    name: Payment Processing
    stories:
      - id: STORY-001
        title: Process payment request
        description: Creates payment, sends notification
        acceptance_criteria:
          - User must be authenticated
          - Amount must be positive
          - Notification must be sent

Then add runtime contracts:

# Enhanced with contracts
@icontract.require(lambda request: request.user_id is not None)
@icontract.require(lambda request: request.amount > 0)
@icontract.ensure(lambda result: result['status'] == 'success')
@beartype
def process_payment(request: HttpRequest) -> JsonResponse:
    """Process payment request.
    
    Creates a payment for the authenticated user and sends a notification.
    
    Args:
        request: HTTP request with user_id and amount
        
    Returns:
        JsonResponse with status
    """
    user = get_user(request.user_id)
    payment = create_payment(user.id, request.amount)
    send_notification(user.email, payment.id)
    return JsonResponse({'status': 'success'})

Benefits:

  • Documented: Docstring explains purpose
  • Type-safe: beartype enforces types at runtime
  • Contract-enforced: Preconditions and postconditions validated
  • Edge cases handled: Contracts prevent invalid inputs

Edge Case Discovery with CrossHair

One of SpecFact's unique capabilities is symbolic execution using CrossHair (an SMT solver). This finds edge cases that LLMs miss.

Example: Division by Zero

A developer was refactoring a data validation function. The code looked correct:

def validate_and_calculate(data: dict) -> float:
    value = data.get("value", 0)
    divisor = data.get("divisor", 1)
    return value / divisor  # Looks safe, right?

LLM analysis: "Code looks correct. Default divisor is 1, so no division by zero."

CrossHair symbolic execution: Found counterexample proving division by zero is possible:

🔍 CrossHair Exploration: Found counterexample
   File: validator.py:5
   Function: validate_and_calculate
   Issue: Division by zero when divisor=0
   Counterexample: {'value': 10, 'divisor': 0}
   Severity: HIGH
   Fix: Add divisor != 0 check

Why this matters: LLMs use probabilistic pattern matching. CrossHair uses mathematical proof to find counterexamples. Deterministic, not probabilistic.

The Fix

# Fixed with contract
@icontract.require(lambda data: data.get("divisor", 1) != 0)
def validate_and_calculate(data: dict) -> float:
    value = data.get("value", 0)
    divisor = data.get("divisor", 1)
    return value / divisor  # ✅ Contract ensures divisor != 0

Gap Analysis: Finding Missing Tests and Documentation

After reverse engineering, SpecFact can identify gaps:

# Compare code vs. plan to find gaps
specfact plan compare --code-vs-plan

Output:

🔍 Comparing code vs. plan...

Deviations Found: 24 total
  🔴 HIGH: 2 (Missing features from plan)
  🟡 MEDIUM: 19 (Extra implementations found in code)
  🔵 LOW: 3 (Metadata mismatches)

🔴 HIGH Severity Issues:
  - FEATURE-PAYMENT-PROCESSING: Missing test coverage
  - FEATURE-USER-AUTH: No contract enforcement

What this tells you:

  • Which features have no tests
  • Which functions lack contract enforcement
  • Where documentation is missing
  • What needs attention during modernization

Results

Using SpecFact for reverse engineering legacy code:

Metric Before After
Documentation time 2-3 weeks 10 minutes
Features discovered Manual (incomplete) 19 features, 49 stories (automatic)
Edge cases found Manual testing (missed some) CrossHair symbolic execution (mathematical proof)
Regression prevention Code review (human error) Runtime contracts (automatic)
Developer onboarding 2-3 weeks 1 day (with documented codebase)

Real-World Example

A team modernizing a 3-year-old Django app:

  • Before SpecFact: 2 weeks to document 15 API endpoints manually
  • With SpecFact: 10 minutes to reverse engineer → 19 features extracted
  • During modernization: Prevented 4 production bugs via runtime contracts
  • Result: 87% time saved, zero production regressions

Technical Deep-Dive: How AST Analysis Works

SpecFact uses AST (Abstract Syntax Tree) analysis to extract code structure. This is fundamentally different from LLM-based approaches:

AST Analysis Process

  1. Parse Python files: Uses Python's ast module to parse source code
  2. Extract definitions: Functions, classes, methods, imports
  3. Build dependency graph: Maps relationships between modules
  4. Identify patterns: Uses Semgrep to detect common patterns (API endpoints, data models, etc.)
  5. Group into features: Clusters related code into features and stories

Why AST Analysis vs. LLM?

Approach Method Accuracy Speed
LLM-based Probabilistic pattern matching ~70-80% (may miss details) Slower (API calls)
AST analysis Deterministic code parsing ~95%+ (extracts actual structure) Faster (local processing)

Key advantage: AST analysis extracts actual code structure, not guesses. It's deterministic, fast, and works offline.


Try It Yourself

Option A: AI IDE Mode (Recommended)

For best results, use slash commands in your AI IDE (Cursor, VS Code + Copilot):

# Step 1: Install slash commands
specfact init --ide cursor  # or --ide vscode

# Step 2: In your AI IDE, use the slash command:
/specfact.01-import --repo .

# The slash command runs CLI + LLM enrichment for better feature detection

Option B: CLI-Only Mode (Quick Start)

For quick validation or CI/CD, run the CLI directly:

# Step 1: Reverse engineer legacy code
specfact import from-code legacy-api --repo .

# Step 2: Analyze contracts
specfact analyze contracts --bundle legacy-api

# Step 3: Find gaps
specfact plan compare --code-vs-plan

# Step 4: Add contracts to critical paths
specfact generate contracts-prompt src/views.py --bundle legacy-api

💡 Best Practice: Start with CLI-only mode to understand the workflow, then switch to AI IDE mode for LLM-enriched analysis and better feature detection.


Conclusion

Reverse engineering legacy Python code doesn't have to take weeks. SpecFact CLI uses:

  • AST analysis to extract actual code structure (not LLM guessing)
  • Contract generation to create executable specs
  • Symbolic execution (CrossHair) to find edge cases mathematically
  • Gap analysis to identify missing tests and documentation

Result: Legacy app fully documented in < 10 minutes, with runtime contracts preventing regressions during modernization.

Try it on your legacy codebase:

# Install slash commands
specfact init --ide cursor

# Then use in your AI IDE:
/specfact.01-import --repo .