From 439ff8750c391a2f94a16f64b935e43956bb99b8 Mon Sep 17 00:00:00 2001 From: Eduardo Patrocinio Date: Sat, 29 Nov 2025 10:17:20 -0500 Subject: [PATCH] Improve tracer_model_split documentation in pipelining tutorial - Added clear section headers for both splitting options - Option 1: Manual Model Splitting - Option 2: Tracer-based Model Splitting - Fixed typo: 'before the before' -> 'before the' - Added explanation of split_spec dictionary parameters - Clarified that split_spec specifies module path and split point type - Made the tracer_model_split function definition more prominent The tracer_model_split code block was present but users were missing it because it wasn't clearly labeled as a separate option. Fixes issue #3530 --- intermediate_source/pipelining_tutorial.rst | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/intermediate_source/pipelining_tutorial.rst b/intermediate_source/pipelining_tutorial.rst index 63170e6064d..4442a62c7c5 100644 --- a/intermediate_source/pipelining_tutorial.rst +++ b/intermediate_source/pipelining_tutorial.rst @@ -108,6 +108,8 @@ Step 1: Partition the Transformer Model There are two different ways of partitioning the model: +**Option 1: Manual Model Splitting** + First is the manual mode in which we can manually create two instances of the model by deleting portions of attributes of the model. In this example for two stages (2 ranks), the model is cut in half. @@ -139,10 +141,13 @@ As we can see the first stage does not have the layer norm or the output layer, The second stage does not have the input embedding layers, but includes the output layers and the final four transformer blocks. The function then returns the ``PipelineStage`` for the current rank. +**Option 2: Tracer-based Model Splitting** + The second method is the tracer-based mode which automatically splits the model based on a ``split_spec`` argument. Using the pipeline specification, we can instruct ``torch.distributed.pipelining`` where to split the model. In the following code block, -we are splitting before the before 4th transformer decoder layer, mirroring the manual split described above. Similarly, -we can retrieve a ``PipelineStage`` by calling ``build_stage`` after this splitting is done. +we are splitting before the 4th transformer decoder layer, mirroring the manual split described above. The ``split_spec`` dictionary +specifies where to split the model by providing the module path (``"layers.4"``) and the split point type (``SplitPoint.BEGINNING``). +Similarly, we can retrieve a ``PipelineStage`` by calling ``build_stage`` after this splitting is done. .. code:: python