Introduces layer filtering for quantization and fixes GPTQ dependency inversion #21894

JyotinderSingh · 2025-12-04T11:40:58Z

Overview

This PR introduces two major improvements to the Keras quantization API:

Selective Quantization: Adds a filters argument to model.quantize(), allowing users to specify exactly which layers should be quantized using regex or callables.
Explicit GPTQ Topology: Removes the brittle heuristic-based backbone detection in GPTQ. Instead, it requires explicit definition of the model structure (pre-block layers and sequential blocks) via GPTQConfig or a new model hook get_quantization_layer_structure.

Key Changes

Selective Quantization (`filters`)

Updated model.quantize(mode, config, filters=...).
filters argument: Accepts a regex string, a list of regex strings, or a callable.
Behavior: Only layers matching the filter criteria will be quantized.
New Utility: Added keras.src.quantizers.utils.should_quantize_layer to centralize filtering logic.

Explicit GPTQ Structure

Removed Heuristics: Deleted _get_backbone_layers and _get_custom_layers from gptq_core.py. The previous logic attempted to guess where the embedding and transformer blocks were, which was fragile and dependent on specific KerasHub naming conventions. This caused dependency inversion, where an upstream library had to be aware of downstream implementation details.
New API Hook: Added model.get_quantization_layer_structure(mode) method. Model authors can override this to return the dictionary {'pre_block_layers': [...], 'sequential_blocks': [...]}.
Config Update: Added quantization_layer_structure to GPTQConfig.
Precedence: GPTQ now resolves structure in this order:
1. config.quantization_layer_structure
2. model.get_quantization_layer_structure(mode)
3. If neither is found, a ValueError is raised.

Usage Examples

1. Using Filters (Regex)

# Quantize only layers with "dense" in the name, but skip "output"
model.quantize("int8", filters=["dense", "^((?!output).)*$"])

2. Using Filters (Callable)

def my_filter(layer):
    # Only quantize Dense layers that aren't the output
    return isinstance(layer, layers.Dense) and layer.name != "output_head"

model.quantize("int8", filters=my_filter)

3. GPTQ with Explicit Structure

# Option A: Define via Config
structure = {
    "pre_block_layers": [model.get_layer("token_embedding")],
    "sequential_blocks": [model.get_layer("transformer_block_0"), ...]
}
config = GPTQConfig(..., quantization_layer_structure=structure)
model.quantize("gptq", config=config)

# Option B: Override in Model Class
class MyLLM(keras.Model):
    def get_quantization_layer_structure(self, mode):
        if mode == 'gptq':
            return {
                "pre_block_layers": [self.embedding],
                "sequential_blocks": self.transformer_blocks
            }
        return None

Testing

Added unit tests for regex, list, and callable filters in model_test.py.
Updated gptq_core_test.py and gptq_test.py to use explicit structure definitions.
Added specific tests ensuring filters work correctly within the GPTQ loop (excluding specific dense layers inside a transformer block).
Added unit tests for should_quantize_layer utility.

Related Changes

Since heuristic-based auto-detection of layers is no longer supported at the Keras level, the KerasHub models are now required to define their own get_quantization_layer_structure hooks. A PR for the same has been created at keras-team/keras-hub#2462

comment

gemini-code-assist · 2025-12-04T11:41:19Z

Summary of Changes

Hello @JyotinderSingh, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the flexibility and control over model quantization by introducing a layer filtering mechanism and improving the dependency inversion for GPTQ. Users can now precisely select which layers to quantize, and the GPTQ process no longer relies on internal heuristics for model structure, instead requiring explicit definition. These changes make the quantization API more robust and adaptable to various model designs.

Highlights

Layer Filtering for Quantization: Introduced a new filters argument to the Model.quantize method, allowing users to specify which layers should be quantized using regex strings, lists of regex strings, or a callable function. This provides fine-grained control over the quantization process.
Improved GPTQ Dependency Inversion: Refactored the GPTQ quantization process to remove heuristic-based model structure detection. Instead, the model's quantization structure can now be explicitly provided via the GPTQConfig or by overriding a new Model.get_quantization_layer_structure method, enhancing flexibility and robustness for diverse model architectures.
New get_quantization_layer_structure Method: Added a new method get_quantization_layer_structure to the Model class, intended for model authors to define the topology required for structure-aware quantization modes like GPTQ.
New should_quantize_layer Utility: Introduced a new utility function should_quantize_layer to centralize the logic for determining if a layer should be quantized based on the provided filters.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces two significant improvements to the quantization functionality. First, it adds a flexible layer filtering mechanism to the quantize method, allowing users to selectively quantize parts of a model using regex, a list of regexes, or a callable. Second, it refactors the GPTQ (Gradient-based Post-Training Quantization) implementation to decouple it from the Model class, which is a great design improvement that enhances modularity and testability. The changes are well-implemented and include corresponding tests. I have a few minor suggestions to improve code clarity and robustness.

keras/src/quantizers/gptq_core.py

keras/src/quantizers/utils.py

keras/src/quantizers/utils_test.py

codecov-commenter · 2025-12-04T11:47:39Z

Codecov Report

❌ Patch coverage is 88.23529% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.36%. Comparing base (9fc8185) to head (ec7ac8f).

Files with missing lines	Patch %	Lines
keras/src/models/model.py	88.23%	1 Missing and 1 partial ⚠️
keras/src/quantizers/gptq_core.py	90.90%	1 Missing and 1 partial ⚠️
keras/src/quantizers/utils.py	81.81%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #21894      +/-   ##
==========================================
- Coverage   82.36%   82.36%   -0.01%     
==========================================
  Files         578      579       +1     
  Lines       59816    59830      +14     
  Branches     9387     9394       +7     
==========================================
+ Hits        49270    49278       +8     
- Misses       8147     8150       +3     
- Partials     2399     2402       +3

Flag	Coverage Δ
keras	`82.18% <88.23%> (-0.01%)`	⬇️
keras-jax	`62.77% <88.23%> (+<0.01%)`	⬆️
keras-numpy	`57.44% <33.33%> (+<0.01%)`	⬆️
keras-openvino	`34.32% <19.60%> (-0.01%)`	⬇️
keras-tensorflow	`64.33% <88.23%> (-0.01%)`	⬇️
keras-torch	`63.36% <72.54%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JyotinderSingh added 3 commits December 4, 2025 09:58

Adds layer filtering support to the quantization API

218bcff

multi-filter support

5ed530b

removes unused params

2ec0e20

comment

google-ml-butler bot added the size:L label Dec 4, 2025

google-ml-butler bot assigned gbaned Dec 4, 2025

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

keras/src/quantizers/gptq_core.py Outdated Show resolved Hide resolved

keras/src/quantizers/utils.py Show resolved Hide resolved

keras/src/quantizers/utils_test.py Show resolved Hide resolved

JyotinderSingh mentioned this pull request Dec 4, 2025

Adds get_quantization_layer_structure hooks for GPTQ keras-team/keras-hub#2462

Merged

6 tasks

fix comments

ec7ac8f

JyotinderSingh requested a review from hertschuh December 5, 2025 06:28

google-ml-butler bot added the awaiting review label Dec 5, 2025

hertschuh approved these changes Dec 5, 2025

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Dec 5, 2025

kokoro-team removed the kokoro:force-run label Dec 5, 2025

JyotinderSingh added the kokoro:force-run label Dec 6, 2025

kokoro-team removed the kokoro:force-run label Dec 6, 2025

hertschuh merged commit 9d1d650 into keras-team:master Dec 6, 2025
14 of 16 checks passed

google-ml-butler bot removed awaiting review ready to pull Ready to be merged into the codebase labels Dec 6, 2025

JyotinderSingh deleted the quantization-filters branch December 6, 2025 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduces layer filtering for quantization and fixes GPTQ dependency inversion #21894

Introduces layer filtering for quantization and fixes GPTQ dependency inversion #21894

JyotinderSingh commented Dec 4, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Dec 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Introduces layer filtering for quantization and fixes GPTQ dependency inversion #21894

Introduces layer filtering for quantization and fixes GPTQ dependency inversion #21894

Conversation

JyotinderSingh commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Changes

Selective Quantization (filters)

Explicit GPTQ Structure

Usage Examples

1. Using Filters (Regex)

2. Using Filters (Callable)

3. GPTQ with Explicit Structure

Testing

Related Changes

Uh oh!

gemini-code-assist bot commented Dec 4, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JyotinderSingh commented Dec 4, 2025 •

edited

Loading

Selective Quantization (`filters`)

codecov-commenter commented Dec 4, 2025 •

edited

Loading