Speaker: Lara Lusa, University of Primorska (SLO)
Location: Erling Sverdrups plass, Niels Henrik Abels hus, 8th floor
Title: On biases of (penalized) logistic regression when predicting rare events
Abstract: Logistic regression is one of the most commonly used statistical methods to estimate prognostic models that relate a binary outcome (with levels event and non-event) to a number of explanatory variables. A low event proportion, encountered frequently in clinical or epidemiological studies, causes unequal treatment of events and non-events in terms of their respective predictive accuracies (rare events bias). It is well known that maximum likelihood estimates of the regression coefficients in the logistic regression model are biased (small sample bias) and it it known that the bias is amplified when the sample size and the proportion of events are smaller. We explain that the rare events bias is not a consequence of small sample bias which can explain why the bias corrected estimates, as for example Firth's bias correction, cannot remove the rare events bias. We provide an explanation of the rare events bias by using some simulated examples as well as some theoretic results. The rare events bias is explained for the maximum likelihood and penalized likelihood estimation using some common penalty functions. We also explain why the intuitive solution of weighting the samples amplifies the rare events bias while under-sampling the non-events is efficient in removing the rare events bias.
Contact Information:
Riccardo De Bin – debin@math.uio.no
Riccardo Parviero – riccarpa@math.uio.no