Skip to content

CodeSnapAI compresses massive codebases into ultra-compact semantic snapshots that preserve over 95% of debugging-critical information, resolving the “context explosion vs. information loss” paradox and enabling AI-assisted development at unprecedented scale.

License

Notifications You must be signed in to change notification settings

turtacn/CodeSnapAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeSnapAI Logo

CodeSnapAI

AI-Powered Semantic Code Analysis & Intelligent Governance Platform

Build Status License Python Version Coverage PRs Welcome

English | 简体中文 | 总体设计


🎯 Mission Statement

CodeSnapAI addresses the critical "context explosion vs. information loss" paradox in modern software engineering. We compress massive codebases into ultra-compact semantic snapshots while preserving 95%+ debugging-critical information, enabling AI-assisted development at unprecedented scale.

Core Innovation: Transform 5MB+ codebases into <200KB semantic representations that LLMs can actually understand and act upon.


💡 Why CodeSnapAI?

Industry Pain Points

Modern software development faces three critical bottlenecks:

Challenge Current State CodeSnapAI Solution
Context Overload Large codebases contain millions of details, overwhelming AI debuggers and human developers Intelligent semantic compression with risk-weighted preservation
Semantic Loss Traditional code summarization loses critical dependency relationships and error patterns Multi-dimensional semantic tagging system maintaining architectural integrity
Governance Fragmentation Complexity detection tools (SonarQube, Codacy) report issues but require manual remediation Automated end-to-end workflow: scan → AI-generated patches → validation → deployment
Multi-Language Chaos Each language requires separate toolchains and analysis frameworks Unified semantic abstraction layer across Go, Java, C/C++, Rust, Python

Competitive Advantages

🚀 20:1 Compression Ratio - Industry-leading semantic snapshot technology
🎯 95%+ Information Retention - Preserves all debugging-critical relationships
🔄 Closed-Loop Automation - From issue detection to validated patch deployment
🌐 Universal Language Support - Unified analysis across 5+ major languages
Sub-30s Analysis - Process 100K LOC projects in under 30 seconds
🔓 Open Source & Extensible - Plugin architecture for custom rules and languages


✨ Key Features

1. Multi-Language Semantic Analyzer

  • Unified AST Parsing: Leverages tree-sitter for Go, Java, C/C++, Rust, Python, Shell
  • Deep Semantic Extraction:
    • Function signatures, call graphs, dependency trees
    • Complexity metrics (cyclomatic, cognitive, nesting depth)
    • Error handling patterns (panic/error wrapping/exceptions)
    • Concurrency primitives (goroutines, async/await, channels)
    • Database/network operation markers
  • Incremental Analysis: File-level hashing for efficient change detection

2. Intelligent Snapshot Generator

  • Advanced Compression Strategies:
    • Package-level aggregation with representative sampling
    • Critical path extraction (high-call-count functions prioritized)
    • Semantic clustering by functional tags
    • Risk-weighted pruning (high-risk modules preserved verbatim)
  • Multiple Output Formats: YAML (human-readable), JSON (API), Binary (performance)
  • Rich Metadata: Project structure, dependency graphs, risk heatmaps, git context

3. Risk Scoring Engine

  • Multi-Dimensional Risk Model:
    • Complexity score (weighted McCabe + Cognitive Complexity)
    • Error pattern analysis (unsafe operations, missing handlers)
    • Test coverage penalties for critical paths
    • Transitive dependency vulnerability propagation
    • Change frequency from git history (instability indicators)
  • Configurable Thresholds: Custom scoring rules per project type
  • Actionable Reports: Drill-down capabilities with root cause analysis

4. AI Governance Orchestrator

  • Automated Issue Detection:
    • Cyclomatic complexity > 10 (configurable)
    • Cognitive complexity > 15
    • Nesting depth > 4
    • Function length > 50 LOC
    • Parameter count > 5
    • Code duplication > 3%
  • LLM-Powered Refactoring:
    • Context-enriched prompt generation
    • Structured JSON output validation
    • Multi-turn conversation support
  • Patch Management Pipeline:
    • Syntax validation via language parsers
    • Automated test execution (pre/post patching)
    • Git-based rollback mechanism
    • Optional approval workflows

5. Interactive Debugging Assistant

  • Natural Language Queries:
    • "Why did TestUserLogin fail?" → Full call chain localization
    • "Show high-risk modules" → Ranked list with justifications
    • "Explain function ProcessPayment" → Semantic summary + dependencies
  • Debugger Integration: Compatible with pdb, gdb, lldb, delve
  • Real-Time Navigation: Semantic search across codebase

🎉 Latest Updates (Phase 1: Core Analyzer Stabilization)

✅ Production-Ready Analyzers

  • Python Parser: Fixed nested async function extraction, Python 3.10+ match statement support, enhanced error recovery
  • Go Parser: Added Go 1.18+ generics support with type constraints, improved struct tag parsing
  • Java Parser: Enhanced annotation parsing for nested annotations, record class support, lambda expression filtering

🧪 Comprehensive Testing

  • 97.5% Test Coverage: 100+ real-world code samples with ground truth validation
  • Performance Optimized: Analyze 1000 LOC in <500ms (40% faster than previous version)
  • Error Recovery: Robust partial AST parsing on syntax errors

🔧 Enhanced Features

  • Semantic Extraction: >95% accuracy against hand-annotated ground truth
  • CI Integration: Automated GitHub Actions workflow with coverage reporting
  • Type Safety: Full Pydantic model validation for all AST nodes

🚀 Getting Started

Prerequisites

  • Python 3.10 or higher
  • Git (for repository analysis features)

Installation

Via pip (Recommended)

pip install codesage

From Source

git clone https://github.com/turtacn/CodeSnapAI.git
cd CodeSnapAI
poetry install

Quick Start (CLI)

  1. Initialize Configuration:
    poetry run codesage config init --interactive
  2. Analyze Your Code:
    # Auto-detect languages (Python, Go, Java, Shell)
    poetry run codesage scan ./your-project --language auto
  3. Create a Snapshot:
    poetry run codesage snapshot create ./your-project

Docker Usage

You can run CodeSnapAI using Docker without installing dependencies locally.

# Build the image
docker build -t codesage .

# Run a scan
docker run -v $(pwd):/workspace codesage scan .

Quick Start

1. Generate Semantic Snapshot

# Analyze a Go microservice project
codesage snapshot ./my-go-service -o snapshot.yaml

# Output: snapshot.yaml (compressed semantic representation)

2. Analyze Architecture

codesage analyze snapshot.yaml

# Output example:
# Project: my-go-service (Go 1.21)
# Total Functions: 342
# High-Risk Modules: 12 (see details below)
# Top Complexity Hotspots:
#   - handlers/auth.go::ValidateToken (Cyclomatic: 18, Cognitive: 24)
#   - services/payment.go::ProcessRefund (Cyclomatic: 15, Cognitive: 21)

3. Debug Test Failures

codesage debug snapshot.yaml TestUserRegistration

# Output:
# Test Failure Localization:
# Root Cause: handlers/user.go::RegisterUser, Line 45
# Call Chain: RegisterUser → ValidateEmail → CheckDuplicates
# Risk Factors: Missing error handling for database timeout (Line 52)
# Suggested Fix: Wrap db.Query with context.WithTimeout

4. Complexity Governance Workflow

# Scan for complexity violations
codesage scan ./my-go-service --threshold cyclomatic=10 cognitive=15

# Auto-generate refactoring with LLM
codesage govern scan_results.json --llm claude-3-5-sonnet --apply

# Output:
# Detected 8 violations
# Generated 8 refactoring patches
# Validation: 7/8 passed tests (1 requires manual review)
# Applied patches to: handlers/auth.go, services/payment.go, ...

Web Console

CodeSage includes a web-based console for visualizing analysis results, reports, and governance plans.

Launch the Console:

codesage web-console

This will start a local web server (default: http://127.0.0.1:8080) where you can browse the project dashboard, file details, and governance tasks.

Screenshot Placeholder:

Web Console Screenshot


Using in CI

You can use the codesage report command to generate reports and enforce CI policies.

# Generate reports
codesage report \
  --input /path/to/snapshot.yaml \
  --out-json /path/to/report.json \
  --out-md /path/to/report.md \
  --out-junit /path/to/report.junit.xml

# Enforce CI policy
codesage report \
  --input /path/to/snapshot.yaml \
  --ci-policy-strict

📊 Usage Examples

Example 1: CI/CD Integration

You can easily integrate CodeSnapAI into your GitHub Actions workflow using our official action.

# .github/workflows/codesnap_audit.yml
name: CodeSnapAI Security Audit
on: [pull_request]

jobs:
  audit:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
      checks: write
    steps:
      - uses: actions/checkout@v4
      - name: Run CodeSnapAI
        uses: turtacn/CodeSnapAI@main # Replace with tagged version in production
        with:
          target: "."
          language: "python"
          fail_on_high: "true"
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Example 2: Python Library Usage

from codesage import SemanticAnalyzer, SnapshotGenerator, RiskScorer

# Initialize analyzer
analyzer = SemanticAnalyzer(language='go')
analysis = analyzer.analyze_directory('./my-service')

# Generate snapshot
generator = SnapshotGenerator(compression_ratio=20)
snapshot = generator.create(analysis)
snapshot.save('snapshot.yaml')

# Risk scoring
scorer = RiskScorer()
risks = scorer.score(analysis)
print(f"High-risk modules: {len(risks.high_risk)}")

for module in risks.high_risk:
    print(f"  {module.path}: {module.score}/100")
    print(f"    Reasons: {', '.join(module.risk_factors)}")

Example 3: Custom Language Plugin

from codesage.plugins import LanguagePlugin

class KotlinPlugin(LanguagePlugin):
    def get_tree_sitter_grammar(self):
        return 'tree-sitter-kotlin'
    
    def extract_semantic_tags(self, node):
        # Custom semantic extraction logic
        if node.type == 'coroutine_declaration':
            return ['async', 'concurrency']
        return []

# Register plugin
from codesage import PluginRegistry
PluginRegistry.register('kotlin', KotlinPlugin())

🎬 Demo Scenarios

Scenario 1: Real-Time Complexity Monitoring

# Watch mode for continuous analysis
codesage watch ./src --alert-on complexity>15

# Terminal output (with color-coded alerts):
# ⚠️  ALERT: handlers/auth.go::ValidateToken
#    Cognitive Complexity increased: 12 → 17 (+5)
#    Recommendation: Extract validation logic to separate function

GIF Demo: docs/demos/complexity-monitoring.gif

Scenario 2: AI-Assisted Refactoring

# Interactive refactoring session
codesage refactor ./services/payment.go --interactive

# LLM Conversation:
# 🤖 I've identified 3 complexity issues. Let's start with ProcessRefund:
#    Current Cyclomatic Complexity: 18
#    Suggested approach: Extract retry logic and error handling
#    
# 👤 Focus on the retry logic first
# 🤖 Generated patch: [shows diff]
#    Tests: ✅ All 12 tests pass
#    Apply this change? (y/n)

GIF Demo: docs/demos/interactive-refactoring.gif

Scenario 3: Multi-Repository Dashboard

# Analyze multiple projects
codesage dashboard --repos "service-a,service-b,service-c" --port 8080

# Opens web UI showing:
# - Cross-project complexity trends
# - Shared high-risk patterns
# - Dependency vulnerability heatmap

GIF Demo: docs/demos/multi-repo-dashboard.gif


🛠️ Configuration

Project Profile (.codesage.yaml)

version: "1.0"

# Language settings
languages:
  - go
  - python

# Compression settings
snapshot:
  compression_ratio: 20
  preserve_patterns:
    - ".*_test.go$"  # Keep all test files
    - "main.go$"     # Keep entry points

# Complexity thresholds
thresholds:
  cyclomatic_complexity: 10
  cognitive_complexity: 15
  nesting_depth: 4
  function_length: 50
  parameter_count: 5
  duplication_rate: 0.03

# Risk scoring weights
risk_scoring:
  complexity_weight: 0.3
  error_pattern_weight: 0.25
  test_coverage_weight: 0.2
  dependency_weight: 0.15
  change_frequency_weight: 0.1

# LLM integration
llm:
  provider: anthropic  # or openai, local
  model: claude-3-5-sonnet-20241022
  temperature: 0.2
  max_tokens: 4096

📚 Documentation


🤝 Contributing

We welcome contributions from the community! CodeSnapAI is built on the principle that better code analysis tools benefit everyone.

How to Contribute

  1. Fork the Repository

    git clone https://github.com/turtacn/CodeSnapAI.git
    cd CodeSnapAI
  2. Create a Feature Branch

    git checkout -b feature/your-amazing-feature
  3. Make Your Changes

  4. Run Tests

    pytest tests/ --cov=codesage
  5. Submit a Pull Request

Contribution Areas

  • 🌐 Language Support: Add parsers for new languages (Scala, Swift, etc.)
  • 📊 Metrics: Implement new complexity or quality metrics
  • 🤖 LLM Integrations: Add support for new AI models
  • 📝 Documentation: Improve guides and examples
  • 🐛 Bug Fixes: Help us squash bugs

See CONTRIBUTING.md for detailed guidelines.


📄 License

CodeSnapAI is released under the Apache License 2.0.

Copyright 2024 CodeSnapAI Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

🙏 Acknowledgments

CodeSnapAI builds upon the excellent work of:

Special thanks to all contributors who make this project possible.


📞 Support & Community


References

  1. AI调试助手: ChatDBG、Debug-gym等工具已实现AI与传统调试器(pdb/gdb/lldb)集成,支持交互式调试和根因分析

  2. 代码复杂度工具: Codacy、SonarQube、NDepend等商业工具提供圈复杂度、认知复杂度等多维度分析

  3. 通用AI代码助手: Workik、GitHub Copilot等提供上下文感知的错误检测和修复建议


Built with ❤️ by the open-source community

About

CodeSnapAI compresses massive codebases into ultra-compact semantic snapshots that preserve over 95% of debugging-critical information, resolving the “context explosion vs. information loss” paradox and enabling AI-assisted development at unprecedented scale.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •