🚀 Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.
-
Updated
Oct 16, 2025 - C
🚀 Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.
This GitHub repository contains the complete code for building Business-Ready Generative AI Systems (GenAISys) from scratch. It guides you through architecting and implementing advanced AI controllers, intelligent agents, and dynamic RAG frameworks. The projects demonstrate practical applications across various domains.
Hub for researchers exploring VLMs and Multimodal Learning:)
🐊 Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! ⭐
A web app that dynamically generates playable 'Spot the Difference' games from a single text prompt using a multimodal pipeline with Google's Gemini and Imagen models.
🎭 Real-time voice-controlled 3D avatar with multimodal AI - speak naturally and watch your AI companion respond with perfect lip-sync
Learn how multimodal AI merges text, image, and audio for smarter models
Neocortex Unity SDK for Smart NPCs and Virtual Assistants
Enterprise-ready solution leveraging multimodal Generative AI (Gen AI) to enhance existing or new applications beyond text—implementing RAG, image classification, video analysis, and advanced image embeddings.
AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.
ICML 2025 Papers: Dive into cutting-edge research from the premier machine learning conference. Stay current with breakthroughs in deep learning, generative AI, optimization, reinforcement learning, and beyond. Code implementations included. ⭐ support the future of machine learning research!
A demo multimodal AI chat application built with Streamlit and Google's Gemini model. Features include: secure Google OAuth, persistent data storage with Cloud SQL (PostgreSQL), and intelligent function calling. Includes a persona-based newsletter engine to deliver personalized insights.
⚡ Production-ready .NET Standard 2.1 RAG library with 🤖 multi-AI provider support, 🏢 enterprise vector storage, 📄 intelligent document processing, and 🗄️ multi-database query coordination. 🌍 Cross-platform compatible.
This is a fully autonomous, self-operating computer automation system designed to automate tasks on Windows without any user interaction. It runs scheduled or trigger-based workflows using Python, system tools, and smart agents — ideal for repetitive tasks, bots, or self-executing pipelines.
Vision Foundation Models: SAM, ViT, CLIP, DINOv2, object detection, segmentation, and multimodal AI for computer vision.
AI Framework for Remote Sensing Image Analysis using RAG - 88%+ accuracy, multi-modal queries, ChatGPT-like interface
The ultimate companion tool for translators and writers. Context-aware AI translation leveraging multiple sources: full document context, images, TM, termbases. Featuring: Prompt Library/Manager, PDF Rescue, TMX Editor, Supervoice (AI voice dictation), Superbench (LLM translation quality benchmarking), Universal Lookup, and CAT tool integration.
#3 Winner of Best Use of Zoom API at Stanford TreeHacks 2025! An AI-powered meeting assistant that captures video, audio and textual context from Zoom calls using multimodal RAG.
VLDBench: A large-scale benchmark for evaluating Vision-Language Models (VLMs) and Large Language Models (LLMs) on multimodal disinformation detection.
Leveraging Bayesian Neural Networks for multimodal AUV data fusion, enabling precise and uncertainty-aware mapping of underwater environments.
Add a description, image, and links to the multimodal-ai topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-ai topic, visit your repo's landing page and select "manage topics."