-
Notifications
You must be signed in to change notification settings - Fork 56
🎙️ Add Real-time Buffer Monitoring and Voice Activity Detection (VAD) based on the Core Audio taps #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Introduced `RealtimeAudioMonitor` and `VoiceActivityDetector` classes for real-time audio analysis. - Added UI components for process selection, recording, and real-time VAD monitoring. - Updated README to include a demo video link. - Implemented `RecordingIndicator` and `RecordingView` for better user feedback during recording sessions. - Created `RealtimeVADView` to display voice activity detection status and statistics.
add the video dirc
add video
|
Dang! Great work, looks amazing! |
|
Hey @insidegui @wayne-xyz — I’ve been working on a Swift CLI tool to capture system audio and save it to a file. The audio file is being created, but it ends up empty (no audio data). I’ve already granted the CLI tool the necessary permissions to capture both the screen and system audio via System Settings. You can check it out at https://gist.github.com/Yogesh-Dubey-Ayesavi/61d5f9e302c96b1521a8a37ea9588fe7 |
Hi Yogesh ,Did you access to tap on the certain process or the system-wide output audio ? |
Hey @wayne-xyz I am tapping on certain processes only, you can see gist func main() {
print("Audio Capture Tool")
print("------------------")
// List available audio processes
let processes = listAudioProcesses()
if processes.isEmpty {
print("No audio processes found. Exiting.")
exit(1)
}
// User selection
print("\nEnter the number of the process to capture audio from: ", terminator: "")
guard let input = readLine(), let selection = Int(input.trimmingCharacters(in: .whitespacesAndNewlines)),
selection > 0, selection <= processes.count else {
print("Invalid selection. Exiting.")
exit(1)
}
// Get selected process
let selectedProcess = processes[selection-1]
print("Selected: \(selectedProcess.name)")
// Create output directory in user's home
let homeDirectory = FileManager.default.homeDirectoryForCurrentUser
let outputDirectory = homeDirectory.appendingPathComponent("AudioCapture", isDirectory: true)
// Output file
let outputFile = outputDirectory.appendingPathComponent("\(selectedProcess.name)-audio.wav")
// Create recorder
do {
let recorder = Recorder(process: selectedProcess, fileURL: outputFile)
print("Press Enter to start recording...")
_ = readLine()
try recorder.start()
print("Recording... Press Enter to stop.")
_ = readLine()
recorder.stop()
print("Audio saved to: \(outputFile.path)")
} catch {
print("Error: \(error)")
exit(1)
}
} |
|
@wayne-xyz Thank you for the contribution. I need to clarify something: did an LLM generate this entire pull request? I ask because I'm seeing clear signs of AI-generated code in the changed files, and the pull request comment itself looks entirely AI-generated. While I'm not against using LLMs for coding (I do it myself quite often), I would appreciate folks being transparent when they open a pull request. It helps during the review, and if you share the tools/models used it can help others to learn about them. Additionally, your branch has invalid file references in the Xcode project, so I can't build it: |
|
Hi , @insidegui |
|
@Yogesh-Dubey-Ayesavi HI sorry for the late response , did you extra buffer processing ? you could do something debug like print the log infor at the root buffer taping on , and also I have similar issue due to the int32 and int16 changing through the whole processing . you can step by step to print the buffer data to check which stage not work. |

============update===============
AI Code generated Description:
What AI did Code:
3 new files are AI-generated:
1 modified file:
Key Differences from Original:
Structure Changes:
What AI did PR content revised:
Based on the function implemantion ask AI gpt to revised the text based content. Original description is based on my own.
AI tools: Cursor
What Human did:
Reflection from this PR and AI
Recently, I used the Cursor and Claude tools to help implement a new feature in my real-time live-caption app. Although these AI assistants aren’t as strong with Swift as they are with web development or other fields, I found this repository to be the best place to integrate the CoreAudioTap API. After successfully building and testing my real-time buffer processing separately, I decided to merge it into this repo. Impressively, the AI tools recognized the implementation from my other project and reproduced it here almost seamlessly, making this PR a great place to start.
============= original ==============
Thank you for your demo and repo. It's greatest one for the new api demo.
📘 Description
Adds real-time audio monitoring and voice activity detection (VAD) to AudioCap using CoreAudio APIs on macOS 14.4+. Enhances UI and code structure for live audio analysis. Since the Core audio tap can naturally support the real-time buffer processing.
✨ Features
✅ Contribution Checklist
README.mdwith examplesChecklist
📸 Demo
RealtimeBufferDemo.mp4
Environment