Skip to content

Conversation

@JyotinderSingh
Copy link
Collaborator

@JyotinderSingh JyotinderSingh commented Dec 4, 2025

Overview

This PR introduces two major improvements to the Keras quantization API:

  1. Selective Quantization: Adds a filters argument to model.quantize(), allowing users to specify exactly which layers should be quantized using regex or callables.
  2. Explicit GPTQ Topology: Removes the brittle heuristic-based backbone detection in GPTQ. Instead, it requires explicit definition of the model structure (pre-block layers and sequential blocks) via GPTQConfig or a new model hook get_quantization_layer_structure.

Key Changes

Selective Quantization (filters)

  • Updated model.quantize(mode, config, filters=...).
  • filters argument: Accepts a regex string, a list of regex strings, or a callable.
  • Behavior: Only layers matching the filter criteria will be quantized.
  • New Utility: Added keras.src.quantizers.utils.should_quantize_layer to centralize filtering logic.

Explicit GPTQ Structure

  • Removed Heuristics: Deleted _get_backbone_layers and _get_custom_layers from gptq_core.py. The previous logic attempted to guess where the embedding and transformer blocks were, which was fragile and dependent on specific KerasHub naming conventions. This caused dependency inversion, where an upstream library had to be aware of downstream implementation details.
  • New API Hook: Added model.get_quantization_layer_structure(mode) method. Model authors can override this to return the dictionary {'pre_block_layers': [...], 'sequential_blocks': [...]}.
  • Config Update: Added quantization_layer_structure to GPTQConfig.
  • Precedence: GPTQ now resolves structure in this order:
    1. config.quantization_layer_structure
    2. model.get_quantization_layer_structure(mode)
    3. If neither is found, a ValueError is raised.

Usage Examples

1. Using Filters (Regex)

# Quantize only layers with "dense" in the name, but skip "output"
model.quantize("int8", filters=["dense", "^((?!output).)*$"])

2. Using Filters (Callable)

def my_filter(layer):
    # Only quantize Dense layers that aren't the output
    return isinstance(layer, layers.Dense) and layer.name != "output_head"

model.quantize("int8", filters=my_filter)

3. GPTQ with Explicit Structure

# Option A: Define via Config
structure = {
    "pre_block_layers": [model.get_layer("token_embedding")],
    "sequential_blocks": [model.get_layer("transformer_block_0"), ...]
}
config = GPTQConfig(..., quantization_layer_structure=structure)
model.quantize("gptq", config=config)

# Option B: Override in Model Class
class MyLLM(keras.Model):
    def get_quantization_layer_structure(self, mode):
        if mode == 'gptq':
            return {
                "pre_block_layers": [self.embedding],
                "sequential_blocks": self.transformer_blocks
            }
        return None

Testing

  • Added unit tests for regex, list, and callable filters in model_test.py.
  • Updated gptq_core_test.py and gptq_test.py to use explicit structure definitions.
  • Added specific tests ensuring filters work correctly within the GPTQ loop (excluding specific dense layers inside a transformer block).
  • Added unit tests for should_quantize_layer utility.

Related Changes

Since heuristic-based auto-detection of layers is no longer supported at the Keras level, the KerasHub models are now required to define their own get_quantization_layer_structure hooks. A PR for the same has been created at keras-team/keras-hub#2462

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @JyotinderSingh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the flexibility and control over model quantization by introducing a layer filtering mechanism and improving the dependency inversion for GPTQ. Users can now precisely select which layers to quantize, and the GPTQ process no longer relies on internal heuristics for model structure, instead requiring explicit definition. These changes make the quantization API more robust and adaptable to various model designs.

Highlights

  • Layer Filtering for Quantization: Introduced a new filters argument to the Model.quantize method, allowing users to specify which layers should be quantized using regex strings, lists of regex strings, or a callable function. This provides fine-grained control over the quantization process.
  • Improved GPTQ Dependency Inversion: Refactored the GPTQ quantization process to remove heuristic-based model structure detection. Instead, the model's quantization structure can now be explicitly provided via the GPTQConfig or by overriding a new Model.get_quantization_layer_structure method, enhancing flexibility and robustness for diverse model architectures.
  • New get_quantization_layer_structure Method: Added a new method get_quantization_layer_structure to the Model class, intended for model authors to define the topology required for structure-aware quantization modes like GPTQ.
  • New should_quantize_layer Utility: Introduced a new utility function should_quantize_layer to centralize the logic for determining if a layer should be quantized based on the provided filters.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two significant improvements to the quantization functionality. First, it adds a flexible layer filtering mechanism to the quantize method, allowing users to selectively quantize parts of a model using regex, a list of regexes, or a callable. Second, it refactors the GPTQ (Gradient-based Post-Training Quantization) implementation to decouple it from the Model class, which is a great design improvement that enhances modularity and testability. The changes are well-implemented and include corresponding tests. I have a few minor suggestions to improve code clarity and robustness.

@codecov-commenter
Copy link

codecov-commenter commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 88.23529% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.36%. Comparing base (9fc8185) to head (ec7ac8f).

Files with missing lines Patch % Lines
keras/src/models/model.py 88.23% 1 Missing and 1 partial ⚠️
keras/src/quantizers/gptq_core.py 90.90% 1 Missing and 1 partial ⚠️
keras/src/quantizers/utils.py 81.81% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #21894      +/-   ##
==========================================
- Coverage   82.36%   82.36%   -0.01%     
==========================================
  Files         578      579       +1     
  Lines       59816    59830      +14     
  Branches     9387     9394       +7     
==========================================
+ Hits        49270    49278       +8     
- Misses       8147     8150       +3     
- Partials     2399     2402       +3     
Flag Coverage Δ
keras 82.18% <88.23%> (-0.01%) ⬇️
keras-jax 62.77% <88.23%> (+<0.01%) ⬆️
keras-numpy 57.44% <33.33%> (+<0.01%) ⬆️
keras-openvino 34.32% <19.60%> (-0.01%) ⬇️
keras-tensorflow 64.33% <88.23%> (-0.01%) ⬇️
keras-torch 63.36% <72.54%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Dec 5, 2025
@hertschuh hertschuh merged commit 9d1d650 into keras-team:master Dec 6, 2025
14 of 16 checks passed
@google-ml-butler google-ml-butler bot removed awaiting review ready to pull Ready to be merged into the codebase labels Dec 6, 2025
@JyotinderSingh JyotinderSingh deleted the quantization-filters branch December 6, 2025 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants