Expand base model support to include Mistral, Phi, and Qwen families #35
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves #6
This pull request introduces a major refactor to the model initialization system, adding a flexible model factory pattern to support multiple popular base models and simplifying configuration management. It also updates documentation, adds new model configuration files, and improves .gitignore coverage.
Key changes include:
Model initialization and configuration refactor
model_factory.pymodule implementing a factory pattern for dynamic model and tokenizer instantiation, with support for different model families (Qwen, LLaMA, Mistral, Phi, Code-LLaMA, Gemma). This centralizes and standardizes model configuration and initialization logic.model.pyto use the new model factory by default, allowing flexible selection of models, improved configuration, and fallback to standard initialization if the factory is unavailable. Added support for additional arguments likemodel_id,use_flash_attention, andtorch_dtype.Expanded model support and configuration
llama2-7b.json,llama3-8b.json,mistral-7b.json,phi3-mini.json,code-llama-7b.json, andgemma-7b.json, enabling easy training setup for each. [1] [2] [3] [4] [5] [6]Documentation and usability improvements
README.mdto document support for 13 base models across 6 families, provide quick-start examples for different models, and clarify new training and data tokenization steps.Project hygiene
.gitignoreto exclude local training data, outputs, cache directories, and Python cache files, preventing accidental commits of large or sensitive files.