Skip to content

Conversation

@bbkx226
Copy link

@bbkx226 bbkx226 commented Dec 13, 2025

Resolves #6

This pull request introduces a major refactor to the model initialization system, adding a flexible model factory pattern to support multiple popular base models and simplifying configuration management. It also updates documentation, adds new model configuration files, and improves .gitignore coverage.

Key changes include:

Model initialization and configuration refactor

  • Introduced a new model_factory.py module implementing a factory pattern for dynamic model and tokenizer instantiation, with support for different model families (Qwen, LLaMA, Mistral, Phi, Code-LLaMA, Gemma). This centralizes and standardizes model configuration and initialization logic.
  • Refactored model.py to use the new model factory by default, allowing flexible selection of models, improved configuration, and fallback to standard initialization if the factory is unavailable. Added support for additional arguments like model_id, use_flash_attention, and torch_dtype.

Expanded model support and configuration

  • Added configuration JSON files for several supported models: llama2-7b.json, llama3-8b.json, mistral-7b.json, phi3-mini.json, code-llama-7b.json, and gemma-7b.json, enabling easy training setup for each. [1] [2] [3] [4] [5] [6]

Documentation and usability improvements

  • Updated README.md to document support for 13 base models across 6 families, provide quick-start examples for different models, and clarify new training and data tokenization steps.

Project hygiene

  • Updated .gitignore to exclude local training data, outputs, cache directories, and Python cache files, preventing accidental commits of large or sensitive files.

- Introduced new configuration file for Phi-3 Mini model.
- Refactored model initialization in `model.py` to support flexible configurations and model factory usage.
- Implemented a `ModelFactory` class to handle dynamic model instantiation and configuration management.
- Created a `ModelRegistry` class to maintain a centralized registry of supported models with detailed configurations.
- Developed a generic tokenizer module to support multiple model families and improve tokenization processes.
- Added validation utilities for testing model loading, tokenization, and embedding generation.
- Updated requirements to ensure compatibility with new features and dependencies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Codefuse开源轻训营] More base models

1 participant