Skip to content

Conversation

@rongweiji
Copy link

@rongweiji rongweiji commented Jun 23, 2025

============update===============

AI Code generated Description:

What AI did Code:

3 new files are AI-generated:

  1. ProcessTap/RealtimeAudioMonitor.swift - Real-time audio monitoring
  2. ProcessTap/VoiceActivityDetector.swift - Voice activity detection algorithms
  3. Views/RealtimeVADView.swift - VAD visualization UI

1 modified file:

  • Views/ProcessSelectionView.swift - Enhanced with VAD integration

Key Differences from Original:

Structure Changes:

  • Views moved to organized Views/ folder
  • 2 additional files in ProcessTap/ directory

What AI did PR content revised:

Based on the function implemantion ask AI gpt to revised the text based content. Original description is based on my own.

AI tools: Cursor

What Human did:

  • Review the structure of the code ,and implemenation design.
  • UI integration test on my own device.
  • Reognize the code files and submit the commit and PR
  • Reply the pr and solve the following issue (reference file issue)

Reflection from this PR and AI

Recently, I used the Cursor and Claude tools to help implement a new feature in my real-time live-caption app. Although these AI assistants aren’t as strong with Swift as they are with web development or other fields, I found this repository to be the best place to integrate the CoreAudioTap API. After successfully building and testing my real-time buffer processing separately, I decided to merge it into this repo. Impressively, the AI tools recognized the implementation from my other project and reproduced it here almost seamlessly, making this PR a great place to start.

============= original ==============

Thank you for your demo and repo. It's greatest one for the new api demo.

📘 Description

Adds real-time audio monitoring and voice activity detection (VAD) to AudioCap using CoreAudio APIs on macOS 14.4+. Enhances UI and code structure for live audio analysis. Since the Core audio tap can naturally support the real-time buffer processing.

✨ Features

  • RealtimeAudioMonitor: Non-blocking audio stream monitoring with dedicated tap and thread-safe processing.
  • VoiceActivityDetector: RMS-based VAD with configurable threshold, multi-channel support, and Accelerate optimization.
  • RealtimeVADView: Live UI with RMS meter, speech indicators, and frame history.
  • Refactored structure: Modular code separation for monitor, VAD, and UI.

✅ Contribution Checklist

  • Added new monitoring and VAD components
  • Updated UI with real-time feedback
  • Preserved existing recording features
  • Updated README.md with examples

Checklist

  • Forked the repository and created a new branch
  • Demo video included
  • Ensured the update is minimal and clear
  • Explained the motivation and impact of the change in this PR

📸 Demo

RealtimeBufferDemo.mp4

Environment

  • Xcode version: 16.4 (16F6)
  • Device: MacBook Air 13" M3
  • macOS version: 15.5 (24F74)

- Introduced `RealtimeAudioMonitor` and `VoiceActivityDetector` classes for real-time audio analysis.
- Added UI components for process selection, recording, and real-time VAD monitoring.
- Updated README to include a demo video link.
- Implemented `RecordingIndicator` and `RecordingView` for better user feedback during recording sessions.
- Created `RealtimeVADView` to display voice activity detection status and statistics.
add the video dirc
add video
@thisislvca
Copy link

Dang! Great work, looks amazing!

@Yogesh-Dubey-Ayesavi
Copy link

Hey @insidegui @wayne-xyz — I’ve been working on a Swift CLI tool to capture system audio and save it to a file. The audio file is being created, but it ends up empty (no audio data). I’ve already granted the CLI tool the necessary permissions to capture both the screen and system audio via System Settings.

You can check it out at https://gist.github.com/Yogesh-Dubey-Ayesavi/61d5f9e302c96b1521a8a37ea9588fe7

@rongweiji
Copy link
Author

Hey @insidegui @wayne-xyz — I’ve been working on a Swift CLI tool to capture system audio and save it to a file. The audio file is being created, but it ends up empty (no audio data). I’ve already granted the CLI tool the necessary permissions to capture both the screen and system audio via System Settings.

You can check it out at https://gist.github.com/Yogesh-Dubey-Ayesavi/61d5f9e302c96b1521a8a37ea9588fe7

Hi Yogesh ,Did you access to tap on the certain process or the system-wide output audio ?

@Yogesh-Dubey-Ayesavi
Copy link

Hey @insidegui @wayne-xyz — I’ve been working on a Swift CLI tool to capture system audio and save it to a file. The audio file is being created, but it ends up empty (no audio data). I’ve already granted the CLI tool the necessary permissions to capture both the screen and system audio via System Settings.
You can check it out at https://gist.github.com/Yogesh-Dubey-Ayesavi/61d5f9e302c96b1521a8a37ea9588fe7

Hi Yogesh ,Did you access to tap on the certain process or the system-wide output audio ?

Hey @wayne-xyz I am tapping on certain processes only, you can see gist

func main() {
    print("Audio Capture Tool")
    print("------------------")
    
    // List available audio processes
    let processes = listAudioProcesses()
    
    if processes.isEmpty {
        print("No audio processes found. Exiting.")
        exit(1)
    }
    
    // User selection
    print("\nEnter the number of the process to capture audio from: ", terminator: "")
    
    guard let input = readLine(), let selection = Int(input.trimmingCharacters(in: .whitespacesAndNewlines)), 
          selection > 0, selection <= processes.count else {
        print("Invalid selection. Exiting.")
        exit(1)
    }
    
    // Get selected process
    let selectedProcess = processes[selection-1]
    print("Selected: \(selectedProcess.name)")
    
    // Create output directory in user's home
    let homeDirectory = FileManager.default.homeDirectoryForCurrentUser
    let outputDirectory = homeDirectory.appendingPathComponent("AudioCapture", isDirectory: true)
    
    // Output file
    let outputFile = outputDirectory.appendingPathComponent("\(selectedProcess.name)-audio.wav")
    
    // Create recorder
    do {
        let recorder = Recorder(process: selectedProcess, fileURL: outputFile)
        
        print("Press Enter to start recording...")
        _ = readLine()
        
        try recorder.start()
        
        print("Recording... Press Enter to stop.")
        _ = readLine()
        
        recorder.stop()
        
        print("Audio saved to: \(outputFile.path)")
    } catch {
        print("Error: \(error)")
        exit(1)
    }
}

@insidegui
Copy link
Owner

@wayne-xyz Thank you for the contribution.

I need to clarify something: did an LLM generate this entire pull request? I ask because I'm seeing clear signs of AI-generated code in the changed files, and the pull request comment itself looks entirely AI-generated.

While I'm not against using LLMs for coding (I do it myself quite often), I would appreciate folks being transparent when they open a pull request. It helps during the review, and if you share the tools/models used it can help others to learn about them.

Additionally, your branch has invalid file references in the Xcode project, so I can't build it:

Xcode 2025-06-27 09 14 28

@rongweiji
Copy link
Author

Hi , @insidegui
Thank you for your review and check. I totally understand your concern and respect this repo and your style. I have update the PR description and solve the issue in the file reference. Thank you again for this repo. If there is anything you concern or comment , Please feel free to reach out.

@rongweiji
Copy link
Author

@Yogesh-Dubey-Ayesavi HI sorry for the late response , did you extra buffer processing ? you could do something debug like print the log infor at the root buffer taping on , and also I have similar issue due to the int32 and int16 changing through the whole processing . you can step by step to print the buffer data to check which stage not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants