This repository contains the code and analysis scripts accompanying the following preprint.
Giebeler, L., Krishnaswamy, D., Clunie, D., Wasserthal, J., Sundar, L. K. S., Diaz-Pinto, A., Maier-Hein, K. H., Xu, M., Menze, B., Pieper, S., Kikinis, R. & Fedorov, A. In search of truth: Evaluating concordance of AI-based anatomy segmentation models. arXiv [eess.IV] (2025). doi:10.48550/arXiv.2512.15921
The scripts implement a practical framework for harmonizing, comparing, and visually inspecting anatomy segmentation results from multiple AI models in the absence of ground-truth annotations. They aim to support informed model selection and transparent evaluation of AI-based segmentation methods.
Preprint: https://doi.org/10.48550/arXiv.2512.15921
Interactive plots: https://imagingdatacommons.github.io/segmentation-comparison/
Public data: https://doi.org/10.5281/zenodo.17860591
3D Slicer CrossSegmentationExplorer Extension: https://github.com/ImagingDataCommons/CrossSegmentationExplorer
This repository is organized into several folders, each corresponding to a specific stage of the workflow:
-
Segmentation Results Harmonization
Scripts for harmonizing model-specific segmentation outputs into a standardized representation -
Quantitative Evaluation using Dice Score
Script to compute Dice scores and consensus segmentations. -
Convert Consensus to DICOM
Scripts to convert the consensus segmentations to DICOM using dcmqi. -
Quantitative Evaluation using Volume
Radiomics-based extraction of structure volumes. -
Visualization of Model Agreement
Scripts for generating interactive Dice and volume plots using Plotly and OHIF Viewer. -
docs/
Contains the static files used to deploy the interactive plots website.
This folder exists due to GitHub Pages requirements and can be ignored for code reuse.
Once segmentation results from different models are available, they should first be harmonized into a standard representation.
Example scripts for the conversion to DICOM using the harmonized metadata are provided in the Segmentation Results Harmonization folder.
For details on the underlying conversion workflow and parameters, please refer to the dcmqi documentation: https://github.com/QIICR/dcmqi
The Dice score script computes pairwise Dice scores between segmentations and generated consensus segmentations, which are stored as NIfTI files.
This step should be performed before radiomics analysis, since the consensus segmentation is required as input.
After Dice computation and consensus generation:
- Convert the consensus segmentation to DICOM using the provided scripts in the Convert Consensus to DICOM folder
- Run the
calculate_radiomics.pyscript, which is located in the Quantitative Evaluation using Volume folder, to extract the radiomic features including volume.
Quantitative results can be visualized using the scripts in the Visualization of Model Agreement folder. There are two types of interactive plots:
- Interactive Dice score plots
- Interactive Volume plots
Clicking on a data point in the interactive plots opens an OHIF Viewer window with the corresponding CT series all associated segmentations. Within OHIF the layout can be adjusted using the toolbar at the top and segmentations can be assigned to views via drag-and-drop.
Further details on OHIF usage can be found in the official documentation:
https://docs.ohif.org/
The interactive plots for the paper are publicly available at:
https://imagingdatacommons.github.io/segmentation-comparison/
For qualitative comparison, we developed a dedicated 3D Slicer extension that streamlines loading and inspection of harmonized segmentations across models.
The extension is available here: https://github.com/ImagingDataCommons/CrossSegmentationExplorer
All data used in this study is publicly available on Zenodo:
https://doi.org/10.5281/zenodo.17860591
Users can reuse the scripts in this repository with their own data by following the same harmonization, conversion, analysis, and visualization steps described above.