code2spec: How SpecFact Reverse Engineers Python Legacy Code

TL;DR: SpecFact CLI reverse engineers legacy Python code into executable contracts using AST analysis (not LLM guessing). In this technical deep-dive, we walk through a real example: analyzing a 3-year-old Django app with no documentation, extracting 19 features and 49 stories in seconds, then adding runtime contracts to prevent regressions during modernization.

Verified Versions: This tutorial was tested and verified with the following version:

SpecFact CLI: 0.22.1

Commands and workflows in this guide are confirmed to work with this version. Check your version with specfact --version. For compatibility with other versions, refer to the SpecFact documentation.

The Challenge

You've inherited a legacy Python codebase. It has:

No documentation — no docstrings, no README, no API docs
No type hints — Python 2.7 style code, or Python 3 without annotations
Business logic buried in views — Django views with 200+ lines of mixed concerns
No tests — or tests that haven't been updated in years
15 undocumented API endpoints — you don't know what they do

Traditional approach: Spend 2-3 weeks manually documenting, writing specs, creating API docs. Then hope nothing breaks during modernization.

SpecFact approach: Reverse engineer the codebase in under 10 minutes, extract features automatically, then add runtime contracts to prevent regressions.

Step-by-Step: Reverse Engineering with SpecFact

Step 1: Install and Initialize

# Install SpecFact CLI
uvx specfact-cli --version

# Navigate to your legacy codebase
cd legacy-django-app

Step 2: Reverse Engineer the Codebase

💡 Recommended: Use the slash command in your AI IDE (Cursor, VS Code + Copilot) for LLM-enriched analysis. The AI adds semantic understanding, business context, and "why" reasoning to the extracted features.

# Option A: AI IDE Mode (Recommended)
# Step 1: Install slash commands
specfact init --ide cursor  # or --ide vscode

# Step 2: In your AI IDE, use the slash command:
/specfact.01-import --repo .

# Option B: CLI-Only Mode (Quick Start)
specfact import from-code legacy-api --repo . --confidence 0.5

What happens under the hood:

AST Analysis: Parses Python files, extracts functions, classes, dependencies
Semgrep Pattern Detection: Identifies common patterns (API endpoints, data models, etc.)
Dependency Graph Building: Maps relationships between modules
Feature Extraction: Groups related code into features and stories

Output:

🔍 Analyzing Python files...
✓ Found 19 features
✓ Detected themes: API, Data Processing, Authentication
✓ Total stories: 49

✓ Analysis complete!
Project bundle written to: .specfact/projects/legacy-api/

Time taken: ~3-5 seconds for 19 Python files (varies by codebase size and complexity)

Step 3: Review Extracted Features

SpecFact automatically extracted 19 features from the codebase:

Feature	Stories	Confidence	What It Does
Payment Processing	5	0.9	Handles payment creation, validation, and webhooks
User Authentication	3	0.8	Login, logout, session management
Data Export	4	0.7	CSV/JSON export functionality
Report Generation	6	0.9	Generates PDF reports from data

Total: 49 user stories auto-generated with Fibonacci story points.

💡 Best Practice: Use /specfact.01-import --repo . in your AI IDE (Cursor, VS Code + Copilot) for LLM-enriched analysis. The AI adds semantic understanding, business context, and "why" reasoning to the extracted features.

Before/After Example

Before: Undocumented Legacy Code

# views.py - Legacy Django view (no documentation)
def process_payment(request):
    user = get_user(request.user_id)
    payment = create_payment(user.id, request.amount)
    send_notification(user.email, payment.id)
    return JsonResponse({'status': 'success'})

Problems:

No docstring explaining what it does
No type hints
No error handling visible
No contract enforcement
Unknown edge cases

After: Reverse Engineered + Contracts

SpecFact extracts the feature, then you can add contracts:

# Generated plan bundle (.specfact/projects/legacy-api/plan.yaml)
features:
  - id: FEATURE-PAYMENT-PROCESSING
    name: Payment Processing
    stories:
      - id: STORY-001
        title: Process payment request
        description: Creates payment, sends notification
        acceptance_criteria:
          - User must be authenticated
          - Amount must be positive
          - Notification must be sent

Then add runtime contracts:

# Enhanced with contracts
@icontract.require(lambda request: request.user_id is not None)
@icontract.require(lambda request: request.amount > 0)
@icontract.ensure(lambda result: result['status'] == 'success')
@beartype
def process_payment(request: HttpRequest) -> JsonResponse:
    """Process payment request.
    
    Creates a payment for the authenticated user and sends a notification.
    
    Args:
        request: HTTP request with user_id and amount
        
    Returns:
        JsonResponse with status
    """
    user = get_user(request.user_id)
    payment = create_payment(user.id, request.amount)
    send_notification(user.email, payment.id)
    return JsonResponse({'status': 'success'})

Benefits:

✅ Documented: Docstring explains purpose
✅ Type-safe: beartype enforces types at runtime
✅ Contract-enforced: Preconditions and postconditions validated
✅ Edge cases handled: Contracts prevent invalid inputs

Edge Case Discovery with CrossHair

One of SpecFact's unique capabilities is symbolic execution using CrossHair (an SMT solver). This finds edge cases that LLMs miss.

Example: Division by Zero

A developer was refactoring a data validation function. The code looked correct:

def validate_and_calculate(data: dict) -> float:
    value = data.get("value", 0)
    divisor = data.get("divisor", 1)
    return value / divisor  # Looks safe, right?

LLM analysis: "Code looks correct. Default divisor is 1, so no division by zero."

CrossHair symbolic execution: Found counterexample proving division by zero is possible:

🔍 CrossHair Exploration: Found counterexample
   File: validator.py:5
   Function: validate_and_calculate
   Issue: Division by zero when divisor=0
   Counterexample: {'value': 10, 'divisor': 0}
   Severity: HIGH
   Fix: Add divisor != 0 check

Why this matters: LLMs use probabilistic pattern matching. CrossHair uses mathematical proof to find counterexamples. Deterministic, not probabilistic.

The Fix

# Fixed with contract
@icontract.require(lambda data: data.get("divisor", 1) != 0)
def validate_and_calculate(data: dict) -> float:
    value = data.get("value", 0)
    divisor = data.get("divisor", 1)
    return value / divisor  # ✅ Contract ensures divisor != 0

Gap Analysis: Finding Missing Tests and Documentation

After reverse engineering, SpecFact can identify gaps:

# Compare code vs. plan to find gaps
specfact plan compare --code-vs-plan

Output:

🔍 Comparing code vs. plan...

Deviations Found: 24 total
  🔴 HIGH: 2 (Missing features from plan)
  🟡 MEDIUM: 19 (Extra implementations found in code)
  🔵 LOW: 3 (Metadata mismatches)

🔴 HIGH Severity Issues:
  - FEATURE-PAYMENT-PROCESSING: Missing test coverage
  - FEATURE-USER-AUTH: No contract enforcement

What this tells you:

Which features have no tests
Which functions lack contract enforcement
Where documentation is missing
What needs attention during modernization

Results

Using SpecFact for reverse engineering legacy code:

Metric	Before	After
Documentation time	2-3 weeks	10 minutes
Features discovered	Manual (incomplete)	19 features, 49 stories (automatic)
Edge cases found	Manual testing (missed some)	CrossHair symbolic execution (mathematical proof)
Regression prevention	Code review (human error)	Runtime contracts (automatic)
Developer onboarding	2-3 weeks	1 day (with documented codebase)

Real-World Example

A team modernizing a 3-year-old Django app:

Before SpecFact: 2 weeks to document 15 API endpoints manually
With SpecFact: 10 minutes to reverse engineer → 19 features extracted
During modernization: Prevented 4 production bugs via runtime contracts
Result: 87% time saved, zero production regressions

Technical Deep-Dive: How AST Analysis Works

SpecFact uses AST (Abstract Syntax Tree) analysis to extract code structure. This is fundamentally different from LLM-based approaches:

AST Analysis Process

Parse Python files: Uses Python's ast module to parse source code
Extract definitions: Functions, classes, methods, imports
Build dependency graph: Maps relationships between modules
Identify patterns: Uses Semgrep to detect common patterns (API endpoints, data models, etc.)
Group into features: Clusters related code into features and stories

Why AST Analysis vs. LLM?

Approach	Method	Accuracy	Speed
LLM-based	Probabilistic pattern matching	~70-80% (may miss details)	Slower (API calls)
AST analysis	Deterministic code parsing	~95%+ (extracts actual structure)	Faster (local processing)

Key advantage: AST analysis extracts actual code structure, not guesses. It's deterministic, fast, and works offline.

Try It Yourself

Option A: AI IDE Mode (Recommended)

For best results, use slash commands in your AI IDE (Cursor, VS Code + Copilot):

# Step 1: Install slash commands
specfact init --ide cursor  # or --ide vscode

# Step 2: In your AI IDE, use the slash command:
/specfact.01-import --repo .

# The slash command runs CLI + LLM enrichment for better feature detection

Option B: CLI-Only Mode (Quick Start)

For quick validation or CI/CD, run the CLI directly:

# Step 1: Reverse engineer legacy code
specfact import from-code legacy-api --repo .

# Step 2: Analyze contracts
specfact analyze contracts --bundle legacy-api

# Step 3: Find gaps
specfact plan compare --code-vs-plan

# Step 4: Add contracts to critical paths
specfact generate contracts-prompt src/views.py --bundle legacy-api --apply all-contracts

💡 Best Practice: Start with CLI-only mode to understand the workflow, then switch to AI IDE mode for LLM-enriched analysis and better feature detection.

Conclusion

Reverse engineering legacy Python code doesn't have to take weeks. SpecFact CLI uses:

✅ AST analysis to extract actual code structure (not LLM guessing)
✅ Contract generation to create executable specs
✅ Symbolic execution (CrossHair) to find edge cases mathematically
✅ Gap analysis to identify missing tests and documentation

Result: Legacy app fully documented in < 10 minutes, with runtime contracts preventing regressions during modernization.

Try it on your legacy codebase:

# Install slash commands
specfact init --ide cursor

# Then use in your AI IDE:
/specfact.01-import --repo .

The Challenge

Step-by-Step: Reverse Engineering with SpecFact

Step 1: Install and Initialize

Step 2: Reverse Engineer the Codebase

Step 3: Review Extracted Features

Before/After Example

Before: Undocumented Legacy Code

After: Reverse Engineered + Contracts

Edge Case Discovery with CrossHair

Example: Division by Zero

The Fix

Gap Analysis: Finding Missing Tests and Documentation

Results

Real-World Example

Technical Deep-Dive: How AST Analysis Works

AST Analysis Process

Why AST Analysis vs. LLM?

Try It Yourself

Option A: AI IDE Mode (Recommended)

Option B: CLI-Only Mode (Quick Start)

Conclusion

Ready to try SpecFact?