Comparing scripts generated by OpenAI o3 and deepseek R1
R1 first output included two syntax errors. It left off two closing ")" in two function declarations. Then it had a data type issue. It treated the v1,v2 values as a list rather than tensor and was unable to execute .to(device). The first re-prompt fixed this.
o3 version worked fine out of the box with zero modifications.
Deepseek appears to consume more power than o3. will evaluate total runtime for each as well as backbone model used.
byol_deepseek took 104 minutes, byol_o3 took 115 minutes