TL;DR: SpecFact CLI reverse engineers legacy Python code into executable contracts using AST analysis (not LLM guessing). In this technical deep-dive, we walk through a real example: analyzing a 3-year-old Django app with no documentation, extracting 19 features and 49 stories in seconds, then adding runtime contracts to prevent regressions during modernization.
The Challenge
You've inherited a legacy Python codebase. It has:
- No documentation — no docstrings, no README, no API docs
- No type hints — Python 2.7 style code, or Python 3 without annotations
- Business logic buried in views — Django views with 200+ lines of mixed concerns
- No tests — or tests that haven't been updated in years
- 15 undocumented API endpoints — you don't know what they do
Traditional approach: Spend 2-3 weeks manually documenting, writing specs, creating API docs. Then hope nothing breaks during modernization.
SpecFact approach: Reverse engineer the codebase in under 10 minutes, extract features automatically, then add runtime contracts to prevent regressions.
Step-by-Step: Reverse Engineering with SpecFact
Step 1: Install and Initialize
# Install SpecFact CLI
uvx specfact-cli --version
# Navigate to your legacy codebase
cd legacy-django-app Step 2: Reverse Engineer the Codebase
💡 Recommended: Use the slash command in your AI IDE (Cursor, VS Code + Copilot) for LLM-enriched analysis. The AI adds semantic understanding, business context, and "why" reasoning to the extracted features.
# Option A: AI IDE Mode (Recommended)
# Step 1: Install slash commands
specfact init --ide cursor # or --ide vscode
# Step 2: In your AI IDE, use the slash command:
/specfact.01-import --repo .
# Option B: CLI-Only Mode (Quick Start)
specfact import from-code legacy-api --repo . --confidence 0.5 What happens under the hood:
- AST Analysis: Parses Python files, extracts functions, classes, dependencies
- Semgrep Pattern Detection: Identifies common patterns (API endpoints, data models, etc.)
- Dependency Graph Building: Maps relationships between modules
- Feature Extraction: Groups related code into features and stories
Output:
🔍 Analyzing Python files...
✓ Found 19 features
✓ Detected themes: API, Data Processing, Authentication
✓ Total stories: 49
✓ Analysis complete!
Project bundle written to: .specfact/projects/legacy-api/ Time taken: ~3-5 seconds for 19 Python files (varies by codebase size and complexity)
Step 3: Review Extracted Features
SpecFact automatically extracted 19 features from the codebase:
| Feature | Stories | Confidence | What It Does |
|---|---|---|---|
| Payment Processing | 5 | 0.9 | Handles payment creation, validation, and webhooks |
| User Authentication | 3 | 0.8 | Login, logout, session management |
| Data Export | 4 | 0.7 | CSV/JSON export functionality |
| Report Generation | 6 | 0.9 | Generates PDF reports from data |
Total: 49 user stories auto-generated with Fibonacci story points.
💡 Best Practice: Use /specfact.01-import --repo . in your AI IDE (Cursor, VS Code + Copilot) for LLM-enriched analysis. The AI adds semantic understanding, business context, and "why" reasoning to the extracted features.
Before/After Example
Before: Undocumented Legacy Code
# views.py - Legacy Django view (no documentation)
def process_payment(request):
user = get_user(request.user_id)
payment = create_payment(user.id, request.amount)
send_notification(user.email, payment.id)
return JsonResponse({'status': 'success'}) Problems:
- No docstring explaining what it does
- No type hints
- No error handling visible
- No contract enforcement
- Unknown edge cases
After: Reverse Engineered + Contracts
SpecFact extracts the feature, then you can add contracts:
# Generated plan bundle (.specfact/projects/legacy-api/plan.yaml)
features:
- id: FEATURE-PAYMENT-PROCESSING
name: Payment Processing
stories:
- id: STORY-001
title: Process payment request
description: Creates payment, sends notification
acceptance_criteria:
- User must be authenticated
- Amount must be positive
- Notification must be sent Then add runtime contracts:
# Enhanced with contracts
@icontract.require(lambda request: request.user_id is not None)
@icontract.require(lambda request: request.amount > 0)
@icontract.ensure(lambda result: result['status'] == 'success')
@beartype
def process_payment(request: HttpRequest) -> JsonResponse:
"""Process payment request.
Creates a payment for the authenticated user and sends a notification.
Args:
request: HTTP request with user_id and amount
Returns:
JsonResponse with status
"""
user = get_user(request.user_id)
payment = create_payment(user.id, request.amount)
send_notification(user.email, payment.id)
return JsonResponse({'status': 'success'}) Benefits:
- ✅ Documented: Docstring explains purpose
- ✅ Type-safe: beartype enforces types at runtime
- ✅ Contract-enforced: Preconditions and postconditions validated
- ✅ Edge cases handled: Contracts prevent invalid inputs
Edge Case Discovery with CrossHair
One of SpecFact's unique capabilities is symbolic execution using CrossHair (an SMT solver). This finds edge cases that LLMs miss.
Example: Division by Zero
A developer was refactoring a data validation function. The code looked correct:
def validate_and_calculate(data: dict) -> float:
value = data.get("value", 0)
divisor = data.get("divisor", 1)
return value / divisor # Looks safe, right? LLM analysis: "Code looks correct. Default divisor is 1, so no division by zero."
CrossHair symbolic execution: Found counterexample proving division by zero is possible:
🔍 CrossHair Exploration: Found counterexample
File: validator.py:5
Function: validate_and_calculate
Issue: Division by zero when divisor=0
Counterexample: {'value': 10, 'divisor': 0}
Severity: HIGH
Fix: Add divisor != 0 check Why this matters: LLMs use probabilistic pattern matching. CrossHair uses mathematical proof to find counterexamples. Deterministic, not probabilistic.
The Fix
# Fixed with contract
@icontract.require(lambda data: data.get("divisor", 1) != 0)
def validate_and_calculate(data: dict) -> float:
value = data.get("value", 0)
divisor = data.get("divisor", 1)
return value / divisor # ✅ Contract ensures divisor != 0 Gap Analysis: Finding Missing Tests and Documentation
After reverse engineering, SpecFact can identify gaps:
# Compare code vs. plan to find gaps
specfact plan compare --code-vs-plan Output:
🔍 Comparing code vs. plan...
Deviations Found: 24 total
🔴 HIGH: 2 (Missing features from plan)
🟡 MEDIUM: 19 (Extra implementations found in code)
🔵 LOW: 3 (Metadata mismatches)
🔴 HIGH Severity Issues:
- FEATURE-PAYMENT-PROCESSING: Missing test coverage
- FEATURE-USER-AUTH: No contract enforcement What this tells you:
- Which features have no tests
- Which functions lack contract enforcement
- Where documentation is missing
- What needs attention during modernization
Results
Using SpecFact for reverse engineering legacy code:
| Metric | Before | After |
|---|---|---|
| Documentation time | 2-3 weeks | 10 minutes |
| Features discovered | Manual (incomplete) | 19 features, 49 stories (automatic) |
| Edge cases found | Manual testing (missed some) | CrossHair symbolic execution (mathematical proof) |
| Regression prevention | Code review (human error) | Runtime contracts (automatic) |
| Developer onboarding | 2-3 weeks | 1 day (with documented codebase) |
Real-World Example
A team modernizing a 3-year-old Django app:
- Before SpecFact: 2 weeks to document 15 API endpoints manually
- With SpecFact: 10 minutes to reverse engineer → 19 features extracted
- During modernization: Prevented 4 production bugs via runtime contracts
- Result: 87% time saved, zero production regressions
Technical Deep-Dive: How AST Analysis Works
SpecFact uses AST (Abstract Syntax Tree) analysis to extract code structure. This is fundamentally different from LLM-based approaches:
AST Analysis Process
- Parse Python files: Uses Python's
astmodule to parse source code - Extract definitions: Functions, classes, methods, imports
- Build dependency graph: Maps relationships between modules
- Identify patterns: Uses Semgrep to detect common patterns (API endpoints, data models, etc.)
- Group into features: Clusters related code into features and stories
Why AST Analysis vs. LLM?
| Approach | Method | Accuracy | Speed |
|---|---|---|---|
| LLM-based | Probabilistic pattern matching | ~70-80% (may miss details) | Slower (API calls) |
| AST analysis | Deterministic code parsing | ~95%+ (extracts actual structure) | Faster (local processing) |
Key advantage: AST analysis extracts actual code structure, not guesses. It's deterministic, fast, and works offline.
Try It Yourself
Option A: AI IDE Mode (Recommended)
For best results, use slash commands in your AI IDE (Cursor, VS Code + Copilot):
# Step 1: Install slash commands
specfact init --ide cursor # or --ide vscode
# Step 2: In your AI IDE, use the slash command:
/specfact.01-import --repo .
# The slash command runs CLI + LLM enrichment for better feature detection Option B: CLI-Only Mode (Quick Start)
For quick validation or CI/CD, run the CLI directly:
# Step 1: Reverse engineer legacy code
specfact import from-code legacy-api --repo .
# Step 2: Analyze contracts
specfact analyze contracts --bundle legacy-api
# Step 3: Find gaps
specfact plan compare --code-vs-plan
# Step 4: Add contracts to critical paths
specfact generate contracts-prompt src/views.py --bundle legacy-api 💡 Best Practice: Start with CLI-only mode to understand the workflow, then switch to AI IDE mode for LLM-enriched analysis and better feature detection.
Conclusion
Reverse engineering legacy Python code doesn't have to take weeks. SpecFact CLI uses:
- ✅ AST analysis to extract actual code structure (not LLM guessing)
- ✅ Contract generation to create executable specs
- ✅ Symbolic execution (CrossHair) to find edge cases mathematically
- ✅ Gap analysis to identify missing tests and documentation
Result: Legacy app fully documented in < 10 minutes, with runtime contracts preventing regressions during modernization.
Try it on your legacy codebase:
# Install slash commands
specfact init --ide cursor
# Then use in your AI IDE:
/specfact.01-import --repo .