Skip to content

overmindtech/poc-data-gathering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Terraform Resource Stats

A collection of scripts for gathering and analyzing GitHub repository metrics and statistics, with a focus on Terraform-related repositories and resource management.

Overview

This repository contains utility scripts for collecting, analyzing, and reporting on GitHub repository data. These scripts are designed to help with POC (Proof of Concept) baseline analysis, resource statistics, and repository health metrics.

Prerequisites

Required Tools

  • GitHub CLI (gh): Required for gather-poc-data.sh script

  • jq: JSON processor for data manipulation

    • macOS: brew install jq
    • Linux: sudo apt-get install jq or sudo yum install jq
  • bash: Version 4.0 or higher

GitHub Access (for gather-poc-data.sh only)

  • Valid GitHub authentication via gh auth login
  • Access to the repositories you want to analyze

Scripts

gather-poc-data.sh

Collects comprehensive GitHub repository metrics for POC baseline analysis using GitHub CLI and GraphQL API.

Description

This script gathers the following data from a specified GitHub repository:

  • Contributors: List of repository contributors with their statistics
  • Pull Requests: PR data including state, timing, authors, and reviews
  • Reviews: Review analysis including reviewer statistics and PR review metrics
  • Trends: Historical trends including monthly volume, wait times, review times, and reviewer concentration

Usage

./scripts/gather-poc-data.sh --org ORGANIZATION --repo REPOSITORY [OPTIONS]

Required Arguments

  • --org ORGANIZATION: GitHub organization name
  • --repo REPOSITORY: Repository name within the organization

Optional Arguments

  • --months NUMBER: Number of months to look back (default: 12)
  • --output-dir PATH: Base output directory (default: ./poc-data)
  • -h, --help: Display help message

Examples

# Basic usage with defaults (12 months, ./poc-data output)
./scripts/gather-poc-data.sh --org myorg --repo myrepo

# Custom time range and output directory
./scripts/gather-poc-data.sh --org myorg --repo myrepo --months 6 --output-dir ./data

# Analyze a specific repository
./scripts/gather-poc-data.sh --org exampleorg --repo examplerepo --months 24

Output Structure

The script creates the following directory structure:

<output-dir>/
├── contributors/
│   └── <org>-<repo>-contributors.json
├── prs/
│   └── <org>-<repo>-prs.json
├── reviews/
│   └── <org>-<repo>-reviews.json
└── trends/
    └── <org>-<repo>-trends.json

Each JSON file includes:

  • Metadata: Collection timestamp, parameters, and execution time
  • Data: The collected metrics and analysis results

Output Files

  1. Contributors (contributors/<org>-<repo>-contributors.json)

    • List of all contributors with their contribution statistics
  2. Pull Requests (prs/<org>-<repo>-prs.json)

    • PR details including number, title, state, timestamps
    • Author information
    • Review counts and details
    • Calculated metrics: wait time, review time, reviewer count
  3. Reviews (reviews/<org>-<repo>-reviews.json)

    • Review analysis with reviewer statistics
    • PR review metrics including wait times and review times
    • Reviewer activity breakdown
  4. Trends (trends/<org>-<repo>-trends.json)

    • Monthly PR volume trends
    • Wait time statistics (median, p95, mean, min, max)
    • Review time statistics (median, p95, mean, min, max)
    • Top reviewers and reviewer concentration metrics

Features

  • Pagination: Automatically handles GraphQL pagination for large datasets
  • Rate Limiting: Detects and handles GitHub API rate limits with automatic retry
  • Date Filtering: Efficiently filters PRs by creation date
  • Cross-Platform: Works on both macOS and Linux
  • Error Handling: Comprehensive error checking and validation

collect-resource-types.sh

Collects all Terraform resource types in use within a repository and counts their frequency. Includes user-written modules but excludes external modules (from .terraform/ directory). Outputs results as a plain text file.

Description

This script scans Terraform files (.tf) in a repository and extracts all resource type declarations. It counts how frequently each resource type is used and outputs the results as JSON. The script only analyzes user code and excludes common module directories (.terraform, modules, .git, etc.) by default.

Usage

./scripts/collect-resource-types.sh [OPTIONS]

Required Arguments

  • --path PATH: Local path to Terraform repository

Optional Arguments

  • --output-dir PATH: Output directory for text file (default: ./resource-stats)
  • --exclude-dirs DIRS: Space-separated list of directories to exclude (default: .terraform .git)
  • -h, --help: Display help message

Examples

# Analyze local repository
./scripts/collect-resource-types.sh --path /path/to/terraform/repo

# Analyze current directory
./scripts/collect-resource-types.sh --path .

# Custom output directory and exclusions
./scripts/collect-resource-types.sh --path ./my-terraform --output-dir ./stats --exclude-dirs ".terraform modules vendor"

Output Structure

The script creates a single text file:

<output-dir>/
└── <repo-name>-resource-types.txt

Output Format

The text file contains:

  • Header: Repository information and collection metadata
  • Parameters: Configuration used (excluded directories)
  • Statistics: Summary of the analysis
    • Files scanned
    • Total resources found
    • Unique resource types
    • Analysis duration
  • Resource Type Counts: Sorted list of resource types with their frequency counts

Example Output

Terraform Resource Type Statistics
===================================

Repository: examplerepo
Repository Path: /path/to/repo
Collection Timestamp: 2024-01-15 10:30:00 UTC

Parameters:
  Excluded Directories: .terraform .git

Statistics:
  Files Scanned: 42
  Total Resources: 156
  Unique Resource Types: 23
  Analysis Duration: 2s

Resource Type Counts:
---------------------
  aws_instance                                         45
  aws_s3_bucket                                        12
  aws_iam_role                                          8
  ...

Features

  • Smart Module Handling: Includes user-written modules but excludes external modules (from .terraform/ directory)
  • Local Analysis: Works directly with local Terraform repositories
  • Custom Exclusions: Configurable directory exclusions
  • Efficient Scanning: Fast file scanning and resource extraction
  • Text Output: Human-readable plain text output with statistics and counts

Adding New Scripts

When adding new scripts to this repository:

  1. Place scripts in scripts/ directory

    • Use descriptive, kebab-case names (e.g., analyze-commits.sh)
  2. Follow the existing script structure:

    • Include a shebang: #!/usr/bin/env bash
    • Use set -o errexit, set -o nounset, set -o pipefail
    • Include a help function with -h, --help support
    • Add proper error handling and validation
  3. Document in this README:

    • Add a new section under "Scripts"
    • Include description, usage, arguments, examples, and output structure
    • Follow the same format as existing script documentation
  4. Make scripts executable:

    chmod +x scripts/your-script.sh

Contributing

When contributing new scripts or improvements:

  • Follow bash best practices and style guidelines
  • Include error handling and input validation
  • Add appropriate comments and documentation
  • Test scripts on both macOS and Linux when possible
  • Update this README with script documentation

License

[Add your license here]

About

Data gathering scripts for Overmind POC prep

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages