Tilbake til alle arrangementer

Seminar series in Statistics and Data Science: Ebad Fardzadeh Haghish

We are pleased to invite you to the next seminar within our traditional

Seminar series in Statistics and Data Science

Speaker: Ebad Fardzadeh Haghish,  Postdoc,  Department of Psychology, University of Oslo

Title:  mlim: Multiple Imputation with Automated Machine Learning

When? TUESDAY,  18.10.2022, 14:15-15:15

Where?  Erling Svedrups plass and Zoom https://uio.zoom.us/j/69891599197?pwd=dS9KczhRN1NtTXJ5YUFSODl4VmQ3dz09 

Abstract

Supervised and unsupervised machine learning algorithms have been commonly used for performing a single imputation, replacing missing data with most plausible values. However, yet, there has not been any attempt to administer automated machine learning algorithms to fine-tune a model and address some of the practical challenges of working with factor data. In this presentation, I will introduce mlim, an R package that can use a handful of machine learning algorithms for performing single or bootstrapping-based multiple imputations. mlim supports Elastic Net, Random Forest, Gradient Boosting Machine, Extreme Gradient Boosting, Deep Learning, and Stacked Ensemble algorithms for performing single or multiple imputation. I will discuss the pros and cons of using these state-of-the-art machine learning algorithms for missing data imputation. Moreover, I will compare the algorithm of mlim with other well-known R packages such as missForest to see whether 1) fine-tuning a model for imputing each variable (feature) and 2) automatically balancing factor variables that suffer from class imbalance can lead to lower imputation error and fairer imputations. mlim is already available on CRAN, but much more recent version of the package is available on GitHub (https://github.com/haghish/mlim). 

Welcome!

Best regards,

Sven Ove Samuelsen & Aliaksandr Hubin

Senere arrangement: 2. november
WEDNESDAY LUNCH - Francesco Ravazzolo