Skip to content

A tool for automatically tracking and archiving GitHub repositories. It periodically checks for updates, compresses older versions, and ensures that even deleted or archived repositories are preserved locally.

Notifications You must be signed in to change notification settings

Technical-1/Git-Archiver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Repo Saver

Automatically clone, update, and archive GitHub repositories with version control and compression.

Features

  • Repository Management - Add, update, delete, and track GitHub repositories
  • Automatic Updates - Hourly checks with 24-hour update intervals
  • Versioned Archives - Creates compressed tar.xz archives for each update
  • Incremental Archives - Only archives changed files to save disk space (70-90% reduction)
  • Status Detection - Detects archived/deleted repositories via GitHub API
  • GraphQL Batch API - Efficiently batches GitHub API requests (up to 100 repos per query)
  • Multiple Interfaces - Web app, desktop GUI, and command-line interface
  • Search & Filter - Real-time search and status filtering (web app)
  • Export/Import - Export repository lists to JSON, import from JSON
  • Statistics Dashboard - View totals, disk usage, and archive counts
  • Real-time Updates - Live log streaming via Server-Sent Events (web app)

Quick Start

  1. Install dependencies:

    pip install -r requirements.txt
  2. Choose your interface:

    • CLI python src/github_repo_saver_cli.py add https://github.com/user/repo
    • Web App: python src/github_repo_saver_web.py then open http://localhost:5001
    • Desktop GUI: python src/github_repo_saver_gui.py

Usage

Command Line Interface (Recommended for Automation)

Full-featured CLI for automation and scripting:

# Add a repository
python src/github_repo_saver_cli.py add https://github.com/user/repo

# Add repositories from file (one URL per line)
python src/github_repo_saver_cli.py add-bulk repos.txt

# Update all repositories
python src/github_repo_saver_cli.py update

# Update a specific repository
python src/github_repo_saver_cli.py update https://github.com/user/repo

# List all repositories
python src/github_repo_saver_cli.py list

# List repositories by status
python src/github_repo_saver_cli.py list --status archived

# Delete a repository
python src/github_repo_saver_cli.py delete https://github.com/user/repo

# Refresh repository statuses
python src/github_repo_saver_cli.py refresh-statuses

# Show statistics
python src/github_repo_saver_cli.py stats

Web Application

Modern, responsive web interface accessible from any browser:

  1. Run the server:

    python src/github_repo_saver_web.py
  2. Open your browser: Navigate to http://localhost:5001

  3. Features:

    • Add repositories (single or bulk)
    • Search and filter repositories
    • Sortable table columns
    • View archives and README files
    • Export/import repository lists
    • Real-time statistics dashboard
    • Live activity log

Desktop GUI (Legacy)

Native desktop application using PyQt5:

  1. Run the GUI:

    python src/github_repo_saver_gui.py
  2. Features:

    • Add repositories via "Add Repo" or "Bulk Upload"
    • View repository status, descriptions, and timestamps
    • Open folders, view archives, and read README files
    • Update individual or all repositories
    • Delete repositories from the list

Project Structure

Git-Archiver-1/
├── src/                    # Main Python source code
│   ├── repo_manager.py     # Backend logic (core functionality)
│   ├── github_repo_saver_gui.py    # Desktop GUI (PyQt5)
│   ├── github_repo_saver_web.py    # Web application (Flask)
│   └── github_repo_saver_cli.py    # Command-line interface
├── templates/              # Flask HTML templates
│   └── index.html
├── static/                 # Flask static files
│   ├── css/
│   │   └── style.css
│   └── js/
│       └── app.js
├── config/                 # Configuration files (auto-generated)
│   ├── cloned_repos.json   # Repository database
│   └── auto_update_config.json  # Auto-update settings
├── data/                   # Cloned repositories (auto-generated)
│   └── [repo-name].git/
│       ├── [repo files]
│       └── versions/       # Archived versions
├── requirements.txt        # Python dependencies
└── README.md              # This file

How It Works

  1. Add Repositories - Provide GitHub repository URLs (one or many)
  2. Automatic Cloning - Repositories are cloned to the data/ folder
  3. Daily Checks - The app checks hourly for updates (updates if 24h passed)
  4. Versioned Archives - When updates are detected, compressed archives are created
  5. Status Tracking - Monitors repository status (active/archived/deleted) via GitHub API

Performance Optimizations

  • GitHub API Caching - Thread-safe cache with 5-minute TTL reduces API calls by 80-90%
  • GraphQL Batch API - Batch fetches up to 100 repositories per query, reducing API calls by 90%+
  • Incremental Archives - Only archives changed files, reducing archive sizes by 70-90%
  • Smart Git Pulls - Checks for updates before pulling, skipping unnecessary operations
  • Shallow Clones - Uses --depth 1 by default for 5-10x faster initial clones
  • Async Archive Creation - Archives are created in background threads, keeping the UI responsive
  • Thread-Safe Operations - All file operations are protected to prevent data corruption

Performance: Bulk updates of 100 repos take ~5-8 minutes (60% faster than before). GraphQL batching reduces API rate limit issues significantly.

Roadmap / Future Tasks

Feel free to contribute additional features!

About

A tool for automatically tracking and archiving GitHub repositories. It periodically checks for updates, compresses older versions, and ensures that even deleted or archived repositories are preserved locally.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published