Skip to content

RFC: Improve developer experience by anchoring on multimodal use-case #7093

@mergennachin

Description

@mergennachin

🚀 The feature, motivation and pitch

Let's build an example demo app, perhaps in pytorch-labs, which will become a forcing function to improve developer experience from a user perspective. A positive outcome of this demo app is to define and build new higher level abstractions (e.g., similar to Pipelines).

On a high-level, here's the app we would like to build: LLM based on voice input and output. In terms of implementation, it's a three step process:

  • Given a voice input, convert to text (e.g., Whisper)
  • Run text based LLM (e.g., Llama 1B)
  • Convert text output to voice (e.g., using T5)

Here are the requirements:

  • Be able to run on iOS, Android and Desktop app.
  • Be able to prototype e2e flow in Python first and HuggingFace
  • Be able to deploy on laptop without Python runtime easily for testing purposes.
  • Be able to swap underlying models easily (e.g., Whisper -> Seamless, Llama 1B -> Qwen)
  • Easy to swap Sampler/Tokenizer/KVCache implementations in LLM, perhaps, use this issue
  • Easy deployment process to mobile and desktop app.
  • Everything in OSS
  • Easy to improve performance optimization and debugging (e.g., use mobile accelerators, quantization)

Here's a positive outcome of this demo app:

  • Define and build new higher level abstractions to make these possible.
  • ExecuTorch and torchchat uses this abstraction for text-based LLMs.
  • Llava and multimodal image uses this abstraction.
  • Community can build completely new apps using this new abstractions

Alternatives

No response

Additional context

Already another RFC, but specifically in the context of LLMs

RFC (Optional)

No response

cc @cccclai @helunwencser @dvorjackz

Metadata

Metadata

Labels

module: llmIssues related to LLM examples and apps, and to the extensions/llm/ coderfcRequest for comment and feedback on a post, proposal, etc.triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions