Dear all,
We are pleased to invite you to our seminar in Statistics and Data Science on Thursday 7.11
Speaker: Thomas Nagler
Title: The surprising effect of reshuffling the data during hyperparameter tuning
When? Thursday 07.11.2024, 15:15-16:15
Where? Erling Svedrups plass and Zoom: https://uio.zoom.us/j/66903066148?pwd=tZseXpVvMvwbdURnGkYpLqBSWbyaGE.1
Abstract: Tuning parameter selection is crucial for optimizing predictive power of statistical and machine learning models. The standard protocol evaluates various parameter configurations using a resampling estimate of the generalization error to guide optimization and select a final parameter configuration. Without much evidence, paired resampling splits, i.e., either a fixed train-validation split or a fixed cross-validation scheme, are often recommended. We show that, surprisingly, reshuffling the splits for every configuration often improves the final model's generalization performance on unseen data. Our theoretical analysis explains how reshuffling affects the asymptotic behavior of the validation loss surface and provides a bound on the expected regret in the limiting regime. This bound connects the potential benefits of reshuffling to the signal and noise characteristics of the underlying optimization problem. We confirm our theoretical results in a controlled simulation study and demonstrate the practical usefulness of reshuffling in a large-scale, realistic hyperparameter optimization experiment.
Welcome!
Best regards,
Thordis Linda Thorarinsdottir & Aliaksandr Hubin