🎙️ Add Real-time Buffer Monitoring and Voice Activity Detection (VAD) based on the Core Audio taps #1

rongweiji · 2025-06-23T20:22:55Z

============update===============

AI Code generated Description:

What AI did Code:

3 new files are AI-generated:

ProcessTap/RealtimeAudioMonitor.swift - Real-time audio monitoring
ProcessTap/VoiceActivityDetector.swift - Voice activity detection algorithms
Views/RealtimeVADView.swift - VAD visualization UI

1 modified file:

Views/ProcessSelectionView.swift - Enhanced with VAD integration

Key Differences from Original:

Structure Changes:

Views moved to organized Views/ folder
2 additional files in ProcessTap/ directory

What AI did PR content revised:

Based on the function implemantion ask AI gpt to revised the text based content. Original description is based on my own.

AI tools: Cursor

What Human did:

Review the structure of the code ,and implemenation design.
UI integration test on my own device.
Reognize the code files and submit the commit and PR
Reply the pr and solve the following issue (reference file issue)

Reflection from this PR and AI

Recently, I used the Cursor and Claude tools to help implement a new feature in my real-time live-caption app. Although these AI assistants aren’t as strong with Swift as they are with web development or other fields, I found this repository to be the best place to integrate the CoreAudioTap API. After successfully building and testing my real-time buffer processing separately, I decided to merge it into this repo. Impressively, the AI tools recognized the implementation from my other project and reproduced it here almost seamlessly, making this PR a great place to start.

============= original ==============

Thank you for your demo and repo. It's greatest one for the new api demo.

📘 Description

Adds real-time audio monitoring and voice activity detection (VAD) to AudioCap using CoreAudio APIs on macOS 14.4+. Enhances UI and code structure for live audio analysis. Since the Core audio tap can naturally support the real-time buffer processing.

✨ Features

RealtimeAudioMonitor: Non-blocking audio stream monitoring with dedicated tap and thread-safe processing.
VoiceActivityDetector: RMS-based VAD with configurable threshold, multi-channel support, and Accelerate optimization.
RealtimeVADView: Live UI with RMS meter, speech indicators, and frame history.
Refactored structure: Modular code separation for monitor, VAD, and UI.

✅ Contribution Checklist

Added new monitoring and VAD components
Updated UI with real-time feedback
Preserved existing recording features
Updated README.md with examples

Checklist

Forked the repository and created a new branch
Demo video included
Ensured the update is minimal and clear
Explained the motivation and impact of the change in this PR

📸 Demo

RealtimeBufferDemo.mp4

Environment

Xcode version: 16.4 (16F6)
Device: MacBook Air 13" M3
macOS version: 15.5 (24F74)

- Introduced `RealtimeAudioMonitor` and `VoiceActivityDetector` classes for real-time audio analysis. - Added UI components for process selection, recording, and real-time VAD monitoring. - Updated README to include a demo video link. - Implemented `RecordingIndicator` and `RecordingView` for better user feedback during recording sessions. - Created `RealtimeVADView` to display voice activity detection status and statistics.

add the video dirc

add video

thisislvca · 2025-06-23T22:37:15Z

Dang! Great work, looks amazing!

Yogesh-Dubey-Ayesavi · 2025-06-26T13:01:07Z

Hey @insidegui @wayne-xyz — I’ve been working on a Swift CLI tool to capture system audio and save it to a file. The audio file is being created, but it ends up empty (no audio data). I’ve already granted the CLI tool the necessary permissions to capture both the screen and system audio via System Settings.

You can check it out at https://gist.github.com/Yogesh-Dubey-Ayesavi/61d5f9e302c96b1521a8a37ea9588fe7

rongweiji · 2025-06-26T22:16:24Z

Hey @insidegui @wayne-xyz — I’ve been working on a Swift CLI tool to capture system audio and save it to a file. The audio file is being created, but it ends up empty (no audio data). I’ve already granted the CLI tool the necessary permissions to capture both the screen and system audio via System Settings.

You can check it out at https://gist.github.com/Yogesh-Dubey-Ayesavi/61d5f9e302c96b1521a8a37ea9588fe7

Hi Yogesh ,Did you access to tap on the certain process or the system-wide output audio ?

Yogesh-Dubey-Ayesavi · 2025-06-27T11:29:47Z

Hey @insidegui @wayne-xyz — I’ve been working on a Swift CLI tool to capture system audio and save it to a file. The audio file is being created, but it ends up empty (no audio data). I’ve already granted the CLI tool the necessary permissions to capture both the screen and system audio via System Settings.
You can check it out at https://gist.github.com/Yogesh-Dubey-Ayesavi/61d5f9e302c96b1521a8a37ea9588fe7

Hi Yogesh ,Did you access to tap on the certain process or the system-wide output audio ?

Hey @wayne-xyz I am tapping on certain processes only, you can see gist

func main() {
    print("Audio Capture Tool")
    print("------------------")
    
    // List available audio processes
    let processes = listAudioProcesses()
    
    if processes.isEmpty {
        print("No audio processes found. Exiting.")
        exit(1)
    }
    
    // User selection
    print("\nEnter the number of the process to capture audio from: ", terminator: "")
    
    guard let input = readLine(), let selection = Int(input.trimmingCharacters(in: .whitespacesAndNewlines)), 
          selection > 0, selection <= processes.count else {
        print("Invalid selection. Exiting.")
        exit(1)
    }
    
    // Get selected process
    let selectedProcess = processes[selection-1]
    print("Selected: \(selectedProcess.name)")
    
    // Create output directory in user's home
    let homeDirectory = FileManager.default.homeDirectoryForCurrentUser
    let outputDirectory = homeDirectory.appendingPathComponent("AudioCapture", isDirectory: true)
    
    // Output file
    let outputFile = outputDirectory.appendingPathComponent("\(selectedProcess.name)-audio.wav")
    
    // Create recorder
    do {
        let recorder = Recorder(process: selectedProcess, fileURL: outputFile)
        
        print("Press Enter to start recording...")
        _ = readLine()
        
        try recorder.start()
        
        print("Recording... Press Enter to stop.")
        _ = readLine()
        
        recorder.stop()
        
        print("Audio saved to: \(outputFile.path)")
    } catch {
        print("Error: \(error)")
        exit(1)
    }
}

insidegui · 2025-06-27T12:21:49Z

@wayne-xyz Thank you for the contribution.

I need to clarify something: did an LLM generate this entire pull request? I ask because I'm seeing clear signs of AI-generated code in the changed files, and the pull request comment itself looks entirely AI-generated.

While I'm not against using LLMs for coding (I do it myself quite often), I would appreciate folks being transparent when they open a pull request. It helps during the review, and if you share the tools/models used it can help others to learn about them.

Additionally, your branch has invalid file references in the Xcode project, so I can't build it:

rongweiji · 2025-07-03T22:27:59Z

Hi , @insidegui
Thank you for your review and check. I totally understand your concern and respect this repo and your style. I have update the PR description and solve the issue in the file reference. Thank you again for this repo. If there is anything you concern or comment , Please feel free to reach out.

rongweiji · 2025-07-03T22:43:44Z

@Yogesh-Dubey-Ayesavi HI sorry for the late response , did you extra buffer processing ? you could do something debug like print the log infor at the root buffer taping on , and also I have similar issue due to the int32 and int16 changing through the whole processing . you can step by step to print the buffer data to check which stage not work.

rongweiji added 4 commits June 23, 2025 15:04

Update README.md

d511557

add the video dirc

clean the folder

0a92db6

Update README.md

11c163e

add video

fix the reference error for the additional files

60765d8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🎙️ Add Real-time Buffer Monitoring and Voice Activity Detection (VAD) based on the Core Audio taps #1

🎙️ Add Real-time Buffer Monitoring and Voice Activity Detection (VAD) based on the Core Audio taps #1

Uh oh!

rongweiji commented Jun 23, 2025 •

edited

Loading

Uh oh!

thisislvca commented Jun 23, 2025

Uh oh!

Yogesh-Dubey-Ayesavi commented Jun 26, 2025

Uh oh!

rongweiji commented Jun 26, 2025

Uh oh!

Yogesh-Dubey-Ayesavi commented Jun 27, 2025

Uh oh!

insidegui commented Jun 27, 2025

Uh oh!

rongweiji commented Jul 3, 2025

Uh oh!

rongweiji commented Jul 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

🎙️ Add Real-time Buffer Monitoring and Voice Activity Detection (VAD) based on the Core Audio taps #1

Are you sure you want to change the base?

🎙️ Add Real-time Buffer Monitoring and Voice Activity Detection (VAD) based on the Core Audio taps #1

Uh oh!

Conversation

rongweiji commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Code generated Description:

What AI did Code:

What AI did PR content revised:

AI tools: Cursor

What Human did:

Reflection from this PR and AI

📘 Description

✨ Features

✅ Contribution Checklist

Checklist

📸 Demo

Environment

Uh oh!

thisislvca commented Jun 23, 2025

Uh oh!

Yogesh-Dubey-Ayesavi commented Jun 26, 2025

Uh oh!

rongweiji commented Jun 26, 2025

Uh oh!

Yogesh-Dubey-Ayesavi commented Jun 27, 2025

Uh oh!

insidegui commented Jun 27, 2025

Uh oh!

rongweiji commented Jul 3, 2025

Uh oh!

rongweiji commented Jul 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rongweiji commented Jun 23, 2025 •

edited

Loading