This repository accompanies the paper "Machine Learning-Enabled Large-Scale Capacity Expansion Planning under Uncertainty" and contains the official implementation of the models and experiments.
codes/
├── Data handler/
├── Datalog/
├── Experiments/
├── models/
├── scripts/
├── src/
├── requirements.txt
└── README.md
Contains dataset files for the EMPIRE model, including input data and scenario files.
Stores aggregated results used to a paper.
Contains baseline models and solution validation scripts:
Data handler/- Contains dataset files for the EMPIRE modelDatalog/- Stores validatione results and runtime log.parameter_convergence/- Parameter selection algorithm testing codes and result plots.MLEMBEDSOLS_{adaptive, fixed}/- AutoSCEP's solutions (adaptive) and fixed parameters solutions.sol_sets/- Baesline Solutionsscripts/- Shell scriptsmain_ef.sh- Extensive Form baselinemain_bm.sh- Benders Decomposition & Progressive Hedging baselinesol_valid_ML.sh- Solution validation for ML surrogate methodsol_valid.sh- Solution validation for baseline
src/- relavant python codes
Stores trained machine learning models and checkpoints.
Shell scripts for automated workflow execution:
job.sh- Main job submission scriptsampling_script.sh- Data sampling scriptml_train.sh- Model training scriptembedding.sh- Embedding execution scriptworker_script.sh- Worker script for parallel processingwrapper.sh- Wrapper script for coordination
Python source code and modules:
config_run.yaml- Configuration filedata_preprocessing.py- Data preprocessing modulelabel_generation_adaptive.py- Adaptive label generationlabel_generation_fixed.py- Fixed label generationml_embedding.py- ML model embeddingml_train.py- Model trainingNEUREMPIRE.py- Main EMPIRE model implementationreader.py- Data reader utilitiesrun.py- Main execution scriptsampling.py- Sampling utilitiesscenario_random.py- Random scenario generationsecond_stage_label.py- Second stage labeling
- Clone the repository:
git clone https://github.com/Subramanyam-Lab/NeurMHSP.git
cd NeurMHSP
cd codes- Create a virtual environment:
conda create -n myenv python=3.11
conda activate myenv
conda install pip- Install dependencies:
pip install -r requirements.txtAll code is optimized for use on a High-Performance Computing (HPC) cluster. Due to the large size of the EMPIRE model and its dataset, significant computational resources are required. Therefore, running this code on a local laptop or desktop is not recommended.
The entire workflow, from initial data generation to solving the final problem, is fully automated. Submitting a single job script will automatically execute the following sequence:
- Sampling: Generates the initial data samples.
- Labeling: Processes and labels the generated samples.
- Preprocessing: Cleans and prepares the data for model training.
- Model Training: Trains the machine learning surrogate model.
- Embedding & Solving: Embeds the trained model into the optimization problem and solves it.
To run this complete pipeline, you only need to specify the desired number of samples and submit the main job script.
- Open
scripts/job.shand set theTOTAL_FILESvariable to the number of samples you want. - Submit the job by running the following command in your terminal:
cd scripts
sbatch job.shThis section outlines how to validate the solution from the machine learning model and how to run the baseline optimization models for comparison.
To validate the feasibility and cost of the solution obtained from the ML-driven approach, follow these steps:
- Navigate to the
Experimentsdirectory. - Submit the validation job script.
cd Experiments
sbatch sol_valid_ML.shThe baseline models are implemented using the mpi-sppy library, which is also optimized for HPC environments. If you wish to use this library, please follow the instructions provided in the official documentation.
Once your environment is configured, you can run the baseline models as follows:
Submit the job using this script:
sbatch main_ef.shSubmit the job using this script:
sbatch main_bm.shNote: The BD and PH implementations use multiple nodes for parallel computing. Therefore, you must ensure that the number of nodes requested in the Slurm script (--nodes) matches the number of scenarios you are running.
This project builds upon the EMPIRE model developed by Dr. Stian Backe at NTNU (Norwegian University of Science and Technology). We gratefully acknowledge their work, which forms the foundation of our capacity expansion planning framework.
EMPIRE Resources:
- Initial code release
- Reproducible experiments
- Complete documentation