Expand base model support to include Mistral, Phi, and Qwen families #35

bbkx226 · 2025-12-13T08:37:19Z

Resolves #6

This pull request introduces a major refactor to the model initialization system, adding a flexible model factory pattern to support multiple popular base models and simplifying configuration management. It also updates documentation, adds new model configuration files, and improves .gitignore coverage.

Key changes include:

Model initialization and configuration refactor

Introduced a new model_factory.py module implementing a factory pattern for dynamic model and tokenizer instantiation, with support for different model families (Qwen, LLaMA, Mistral, Phi, Code-LLaMA, Gemma). This centralizes and standardizes model configuration and initialization logic.
Refactored model.py to use the new model factory by default, allowing flexible selection of models, improved configuration, and fallback to standard initialization if the factory is unavailable. Added support for additional arguments like model_id, use_flash_attention, and torch_dtype.

Expanded model support and configuration

Added configuration JSON files for several supported models: llama2-7b.json, llama3-8b.json, mistral-7b.json, phi3-mini.json, code-llama-7b.json, and gemma-7b.json, enabling easy training setup for each. [1] [2] [3] [4] [5] [6]

Documentation and usability improvements

Updated README.md to document support for 13 base models across 6 families, provide quick-start examples for different models, and clarify new training and data tokenization steps.

Project hygiene

Updated .gitignore to exclude local training data, outputs, cache directories, and Python cache files, preventing accidental commits of large or sensitive files.

- Introduced new configuration file for Phi-3 Mini model. - Refactored model initialization in `model.py` to support flexible configurations and model factory usage. - Implemented a `ModelFactory` class to handle dynamic model instantiation and configuration management. - Created a `ModelRegistry` class to maintain a centralized registry of supported models with detailed configurations. - Developed a generic tokenizer module to support multiple model families and improve tokenization processes. - Added validation utilities for testing model loading, tokenization, and embedding generation. - Updated requirements to ensure compatibility with new features and dependencies.

…odels and add support for Qwen3-4B

…rocess

…n for full mode

bbkx226 added 4 commits December 13, 2025 16:00

Update model registry and documentation to reflect removal of gated m…

1d7a699

…odels and add support for Qwen3-4B

Add .gitignore files, update requirements, and enhance tokenization p…

c395c7f

…rocess

Fix validation execution in main block by initializing ModelValidatio…

572f67b

…n for full mode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expand base model support to include Mistral, Phi, and Qwen families #35

Expand base model support to include Mistral, Phi, and Qwen families #35

Uh oh!

bbkx226 commented Dec 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Expand base model support to include Mistral, Phi, and Qwen families #35

Are you sure you want to change the base?

Expand base model support to include Mistral, Phi, and Qwen families #35

Uh oh!

Conversation

bbkx226 commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Model initialization and configuration refactor

Expanded model support and configuration

Documentation and usability improvements

Project hygiene

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bbkx226 commented Dec 13, 2025 •

edited

Loading