nov.
19.
2:15 p.m.14:15

Seminar in Statistics and Data Science: Leiv Rønneberg

We are pleased to invite you to our next Tuesday seminar in Statistics and Data Science

Speaker:  Leiv Rønneberg

Title:  High-dimensional function-on-scalar regression via multi-output Gaussian Processes: Biomarker discovery in high-throughput cancer screens

When? Tuesday 19.11.2024, 14:15-15:15

Where?  Erling Svedrups plass (lunch area 8th floor, NHA hus) and Zoom https://uio.zoom.us/j/65480900265?pwd=mKtbGGpPO6i0QzaSl0e8pUTD7yUP6i.1 

Abstract
High-throughput drug sensitivity screens enable rapid in-vitro testing of compounds on cancer cell lines, in order to determine the efficacy of a certain treatment. Coupled with various “omics” characterisations of the cancer cell lines, these experiments provide the ingredients necessary to discover biomarkers of treatment effect. However, in-vitro cell viability measurements are frequently corrupted by measurement error due technical error sources and natural biological variability, making the precise quantification of treatment efficacy difficult. Furthermore, the process of biomarker discovery is usually completely disentangled from the noisy reality of the raw viability measurements, utilising crude summary measures of treatment efficacy, estimated with no uncertainty quantification. 

In this talk, I’ll present a methodology to jointly estimate non-linear dose-response functions, while at the same time performing biomarker discovery. The model is based on a multi-output Gaussian Process (MOGP), coupled with a horseshoe prior within the kernel construction for variable selection. I further establish a link between this MOGP model and a version of function-on-scalar (FoS) regression, highlighting potential computational gains and show modifications that incorporate patient-specific random-effects and complex interactions between high-dimensional covariates.

Welcome!
Best regards,
Thordis Linda Thorarinsdottir & Aliaksandr Hubin

Leiv Rønneberg

Vis arrangement →
nov.
7.
3:15 p.m.15:15

Seminar in Statistics and Data Science: Thomas Nagler

Dear all, 
We are pleased to invite you to our seminar in Statistics and Data Science on Thursday 7.11

Speaker:  Thomas Nagler 

Title:  The surprising effect of reshuffling the data during hyperparameter tuning

When? Thursday 07.11.2024, 15:15-16:15

Where?  Erling Svedrups plass and Zoom: https://uio.zoom.us/j/66903066148?pwd=tZseXpVvMvwbdURnGkYpLqBSWbyaGE.1

Abstract: Tuning parameter selection is crucial for optimizing predictive power of statistical and machine learning models. The standard protocol evaluates various parameter configurations using a resampling estimate of the generalization error to guide optimization and select a final parameter configuration. Without much evidence, paired resampling splits, i.e., either a fixed train-validation split or a fixed cross-validation scheme, are often recommended. We show that, surprisingly, reshuffling the splits for every configuration often improves the final model's generalization performance on unseen data. Our theoretical analysis explains how reshuffling affects the asymptotic behavior of the validation loss surface and provides a bound on the expected regret in the limiting regime. This bound connects the potential benefits of reshuffling to the signal and noise characteristics of the underlying optimization problem. We confirm our theoretical results in a controlled simulation study and demonstrate the practical usefulness of reshuffling in a large-scale, realistic hyperparameter optimization experiment.

Welcome!
Best regards,
Thordis Linda Thorarinsdottir & Aliaksandr Hubin

Vis arrangement →
nov.
5.
2:15 p.m.14:15

Seminar in Statistics and Data Science: Håkon Andreas Hoel

Dear all, 
We are pleased to invite you to our next Tuesday seminar in Statistics and Data Science

Speaker:  Håkon Andreas Hoel

Title:  Overview of the Multilevel Monte Carlo method

When? Tuesday 05.11.2024, 14:15-15:15

Where?  Erling Svedrups plass and Zoom https://uio.zoom.us/j/67248449383?pwd=pGfAYMKpMlMaxfaJFbTbSOU5qTJfDt.1 

Abstract: The multilevel Monte Carlo (MLMC) method is a relatively recent method that offers a compelling approach to tackling computational challenges in approximations of quantities of interest. This talk presents the fundamentals of the method and describes its advantages over traditional Monte Carlo techniques in settings with complicated distributions, whose samples have to be approximated by numerical methods. We also present different applications of MLMC: solving stochastic differential equations, state estimation in data assimilation, solving partial differential equations with random input, and for sampling a posterior measure in high dimensional state space by multilevel Markov Chain Monte Carlo simulations.

 Welcome!
Best regards,
Thordis Linda Thorarinsdottir & Aliaksandr Hubin

Håkon Andreas Hoel

Vis arrangement →
okt.
29.
2:15 p.m.14:15

Seminar in Statistics and Data Science: Mikko Kuronen

We are pleased to invite you to our next Tuesday seminar of in Statistics and Data Science

Speaker:  Mikko Kuronen, Research scientist/Postdoc, Natural Resources Institute Finland (Luke), Helsinki, Finland 

Title:  Point process models for sweat gland activation observed with noise

When? Tuesday 29.10.2024, 14:15-15:15

Where?  Erling Svedrups plass and Zoom https://uio.zoom.us/j/65226718358?pwd=FoOHkzOz321yFbJSDxFztrRKgTBTaf.1

Abstract
The aim of this work is to construct spatial models for the activation of sweat glands for healthy subjects and subjects suffering from peripheral neuropathy by using videos of sweating recorded from the subjects. The sweat patterns are regarded as realizations of spatial point processes and two point process models for the sweat gland activation and two methods for inference are proposed. Several image analysis steps are needed to extract the point patterns from the videos and some incorrectly identified sweat gland locations may be present in the data. To take into account the errors, we either include an error term in the point process model or use an estimation procedure that is robust with respect to the errors.

Welcome!
Best regards,
Thordis Linda Thorarinsdottir & Aliaksandr Hubin

Mikko Kuronen

Vis arrangement →
okt.
8.
2:15 p.m.14:15

Seminar series in Statistics and Data Science: Camilla Lingjærde

We are extremely pleased to invite you to our first in this semester Tuesday seminar of Seminar series in Statistics and Data Science

Speaker:  Camilla Lingjærde, Postdoctoral Fellow at The Norwegian Centre for Knowledge-driven Machine Learning Integreat,  University of Oslo

Title:  Covariate-dependent network modelling with the horseshoe prior

When? Tuesday 08.10.2024, 14:15-15:15

Where?  Erling Svedrups plass and Zoom https://uio.zoom.us/j/62402209920?pwd=YisOD0lynaNtSG1yotZmuVqwEa5vv5.1  

Abstract: Network models are useful tools for modelling complex associations in a vast number of areas, including statistical omics. If a Gaussian graphical model with independent and identically distributed samples is assumed, an association network can be estimated by determining the non-zero entries of the inverse covariance (precision) matrix of the data. This independence assumption is however violated in many applications. For example, observations from multivariate time series data will not be independent due to temporal correlations. Similarly, spatial data will exhibit spatial correlations. Additionally, the identical distribution assumption will often not hold due to heterogeneity in data; a common problem in many applications such as cancer omics, where large tumour heterogeneity in patients with the same cancer type is common due to various genetic and genomic factors. 

In this talk, I will present recent advances and challenges in our work with modelling covariate-dependent Gaussian graphical models through kernel smoothing and the graphical horseshoe prior. This work is a collaboration with Prof. Sylvia Richardson and Dr. Hélène Ruffieux.  

Welcome! 

Best regards,
Thordis Linda Thorarinsdottir & Aliaksandr Hubin

Camilla Lingjærde

Vis arrangement →
sep.
12.
8:45 a.m.08:45

1st Oslo Invitational Workshop on Model-Agnostic Explainable AI

Dear all BigInsighters  

Thursday, September 12th, Integreat and BigInsight organize the 1st Oslo Invitational Workshop on Model-Agnostic Explainable AI at Kristine Bonnevies hus, Blindern, Oslo. 

Top national and international researchers in the field are on the speaker list, and we are looking forward to day full of great presentation, fruitful discussion and knowledge sharing.

The workshop will have a methodology session aimed at researchers, and an application session aimed at practitioners. It is possible to attend both or just one of the two. 

The registration form is now closed, but you could send an email to Martin Jullum, jullum@nr.no if you want to participate.

On behalf of the organizing team,
Martin Jullum
Senior Research Scientist
Norwegian Computing Center
jullum@nr.no 

Vis arrangement →
juni
27.
2:30 p.m.14:30

OCBE Biostatistics Seminar: Alan Hubbard

The Biostatistics seminar on Thursday, June 27th, at 14:30-15:30 is in Runde Auditorium (Domus Medica, Oslo).

Alan Hubbard, professor of biostatistics at the University of California, Berkeley, will talk about

An overview of Targeted Maximum Likelihood Estimation and its augmentations

Abstract: The Targeted Maximum Likelihood Estimation, proposed in 2006, is a general framework for developing efficient asymptotically linear estimators for general parameters in large (semi-parametric) statistical models. It originally provided an alternative to estimation equation (EE) approaches (such as inverse probability of treatment weighted, IPTW, and augmented IPTW estimators) with some finite sample benefits and provided estimation for parameters/models where EE estimators are not available or do not lead to single solutions. TMLE relies on initial fits of the relevant components of the data-generating distribution, and the emphasis has been on machine learners with explicit optimality properties (i.e., the SuperLearner). One clear advantage of TMLE is its likelihood-based model selection, which has led to ad hoc but very powerful finite-sample augmentations that lead to more robust “machine-like” performance, such as collaborative TMLE (C-TMLE). Another important augmentation is cross-validated TMLE (CV-TMLE), which provides additional protection against overfitting and, relatedly, fewer constraints on the underlying model and fitting algorithms (i.e., Donsker class constraints). Most of the specific algorithms available in the software (mainly R) are for estimating specific estimands inspired by causal inference (e.g., the SuperLearner/tmle packages on CRAN, the sl3/tmle3 packages on Git Hub). Now, a wide variety of algorithms are available for many more complex parameters (longitudinal treatment effects, mediation effects, optimal treatment rules, etc.). TMLE and the general Targeted Learning framework is an active area of research, with new improvements to both the asymptotic and finite sample behavior being proposed regularly. In this talk, I will provide a general background and highlight some of these augmentations and future directions. 

The OCBE biostatistics seminar series is organized by the Oslo Centre for Biostatistics and Epidemiology and is an arena for presenting new interesting work in biostatistics, both on the methodological and applied side, from researchers in Oslo, from Norway and abroad. The seminars are open to everyone.

See https://www.med.uio.no/imb/english/research/centres/ocbe/events/biostat-seminar for more details (including subscription to future announcements).

Welcome!

Best wishes,
Jon Michael

Vis arrangement →
juni
13.
8:45 a.m.08:45

Seminar om maskinlæring og samarbeid om antihvitvasking i banksektoren

Hei alle BigInsightere! 

BigInsight har i en årrekke arbeidet med deteksjon av hvitvasking i samarbeid med DNB. I forlengelsen av dette, inviterer vi  til heldagsseminar om hvordan banker kan og bør bruke datadrevne løsninger i sitt antihvitvaskingsarbeid, torsdag 13.juni. 

Seminaret vil også utfordre den isolerte tilnærmingen der hver bank bygger løsninger basert telukkende på egne data.

DNB, Sparebank1 SMN og BN Bank, vil dele erfaringer fra sitt datadrevne antihvitvaskingsarbeid.

I Danmark har man kommet betydelig lenger enn oss i Norge når det gjelder samarbeid rundt antihvitvasking, og det danske finanstilsynet vil være på plass for å dele sine erfaringer.

Finterai vil også fortelle hva de lærte om føderert læring for antihvitvasking gjennom datatilsynets sandkasse.

Vi vil også organisere gruppediskusjoner som tar for seg hvordan man kan samarbeide og sammen forme fremtidens antihvitvaskingsystem.

Seminaret vil være på norsk, er gratis og åpent for alle, men vil spesielt rette seg mot de som til daglig jobber med antihvitvasking i banksektoren.

Mer informasjon, fullt program og påmelding gjøres på arrangementets nettside https://nr.no/arrangementer/seminar-om-maskinlaering-og-samarbeid-om-hvitvasking-i-banksektoren/ 

Seminaret er støttet av Finansmarkedsfondet.

Høres dette interessant ut? Meld deg på i dag!

Vennlig hilsen
Martin Jullum og Kjersti Aas
Norsk Regnesentral

Vis arrangement →
juni
7.
2:15 p.m.14:15

Seminar Series in Statistics and Data Science: Sylvia Frühwirth-Schnatter

We are extremely pleased to invite you to our final for this semester seminar of Seminar series in Statistics and Data Science

Speaker:  Sylvia Frühwirth-Schnatter, Professor, Vienna University of Economics and Business

Title:  Sparse Finite Bayesian Factor Analysis when the Number of Factors is Unknown

When? Friday 07.06.2024, 14:15-15:15

Where?  Erling Svedrups plass and Zoom https://uio.zoom.us/j/68591073814?pwd=TeP0Ew0rieJIV8iXebZXgjmjk5tD4y.1 

Abstract
Factor analysis is a popular method to obtain a sparse representation of the covariance matrix of multivariate observations and to uncover the unobserved driving factors behind the observed  correlation.  However, it is challenging to estimate the unknown number of factors and to recover the factor loading matrix from the data. The present talk reviews recent research in the area of sparse Bayesian factor analysis (BFA) that successfully addresses these issues within a Bayesian framework:

(a) the approach relies on the choice of well-calibrated, highly structured priors. Finite and infinite cumulative shrinkage process (CUSP) priors play a crucial role in recovering the number of factors, while elementwise spike-and-slab priors allow to reveal the finer structure of the factor loading matrix (Frühwirth-Schnatter, 2023); 

(b) to achieve full identification of the factor model, the approach operates in the class of generalized lower triangular (GLT) factor models that generalizes common way of solving rotational invariance and addresses variance identification through a counting rule (Frühwirth-Schnatter, Hosszejni and Lopes, 2023);

(c) fitting models to data under these priors requires efficient algorithms to sample from the full posterior distribution and a reversible jump MCMC sampler is discussed that moves across models of different dimensions (Frühwirth-Schnatter, Hosszejni and Lopes, 2024).

Applications to financial time series will serve as an illustration.

References:
Sylvia Frühwirth-Schnatter (2023): Generalized Cumulative Shrinkage Process Priors with Applications to Sparse Bayesian Factor Analysis, Philosophical Transactions of the Royal Society A, 381: 20220148. DOI:10.1098/rsta.2022.0148.

Sylvia Frühwirth-Schnatter, Darjus Hosszejni and Hedibert F. Lopes (2023): When is counts - Econometric Identification of Factor Models Based on GLT Structures, Econometrics, 11 (4), 26. DOI: 10.3390/econometrics11040026. 

Sylvia Frühwirth-Schnatter, Darjus Hosszejni and Hedibert F. Lopes (2024): Sparse finite Bayesian Factor Analysis when the Number of Factors is Unknown, Bayesian Analysis, accepted for publication. 

Welcome!

Best regards,
Sven Ove Samuelsen & Aliaksandr Hubin

Vis arrangement →
mai
30.
2:30 p.m.14:30

OCBE Biostatistics Seminar: Sonia Petrone

OCBE Biostatistics Seminar with Sonia Petrone, Professor, Department of Decision Sciences, Bocconi University of Milano, Italy, who will present about:

Empirical Bayes in Bayesian learning: understanding a common practice

Time and place: May 30, 2024 14.30 – 15.30, Domus Medica, Auditorium 13

Abstract:
In applications of Bayesian procedures, even when the prior law is carefully specified, it may be delicate to elicit the prior hyperparameters so that it is often tempting to fix them from the data, usually by their maximum marginal likelihood estimates (MMLE), obtaining a so-called empirical Bayes posterior distribution. Although questionable, this is a common practice; often thought of as a computationally convenient approximation of a genuine Bayesian procedure. However, whether it is actually such, or what Bayesian inference would it approximate, is unclear; and most theoretical results seem only available on a case-by-case basis. In the talk we will discuss this empirical Bayes practice, suggesting a theoretical framework that allows us to give formal contents to the above common beliefs, and to prove general results for parametric models.

We first establish the limit behavior of the MMLE in quite general settings; we also conceptualize the frequentist context as an unexplored case of maximum likelihood estimation under model misspecification. Finally, we show that, in regular cases, the empirical Bayes posterior is a fast approximation to the "oracle" Bayesian posterior distribution, that corresponds to the prior law that, within the given class, expresses the most information about the true model's parameters. This is a faster approximation than classic Bernstein-von Mises results.

These results assume that the class of priors is given; choosing the class of priors is a wider problem, deeply studied in Bayesian statistics.

This is a joint work with Judith Rousseau and Stefano Rizzelli.

Organizer Oslo Centre for Biostatistics and Epidemiology (OCBE)




Vis arrangement →
mai
23.
2:30 p.m.14:30

OCBE Biostatistics Seminar: Ksenia Sokolova

The next OCBE Biostatistics seminar will be held on Thursday, May 23rd, at 14:30-15:30 in Auditorium 13 (Domus Medica, Oslo).

Ksenia Sokolova, Department of Computer Science, Princeton University (USA), will present about

Deep learning for sequence based gene expression prediction

Abstract: Human biology is defined by specialized cell types driven by a common genome,  98% of which is outside of genes. This noncoding genetic space is linked to the majority of disease risk but remains poorly understood. In this talk, I will discuss how deep learning can be used to predict the effects of noncoding variants on gene expression in primary human cell types. I will introduce ExPectoSC, an atlas of deep-learning models that predict cell-type-specific gene expression from genomic sequences, covering 105 primary cell types across seven organ systems, and how it can be used in the disease context. Additionally, I will present a novel genomic-centered contrastive pre-training method, cGen, to improve training of the models from sequence alone in limited-data contexts. Utilizing sequence augmentations, after pre-training cGen generates unsupervised embeddings that highlight functional clusters and are informative of gene expression in the absence of any labeled information.

The OCBE Biostatistics seminar series is organized by the Oslo Centre for Biostatistics and Epidemiology and is an arena for presenting new interesting work in biostatistics, both on the methodological and applied side, from researchers in Oslo, from Norway and abroad.

See also: https://www.med.uio.no/imb/english/research/centres/ocbe/events/biostat-seminar/ for updates or for subscribing to the events. If you'd like to be added in the "biostat-ext" mailing list, just let us know.

Other upcoming seminars this semester:

May 30th, 2024: Sonia Petrone, Department of Decision Sciences, Bocconi University of Milano, Italy

June 27th, 2024: Alan Hubbard, Center for Targeted Machine Learning and Causal Inference, University of California, Berkeley, USA

Welcome!

Best wishes,
Valeria Vitelli
Associate Professor
Oslo Centre for Biostatistics and Epidemiology,
Department of Biostatistics, University of Oslo, Norway
mailto: valeria.vitelli@medisin.uio.no

Vis arrangement →
mai
22.
2:15 p.m.14:15

Seminar in statistics and data science: Alexis Akria Toda

We are very pleased to invite you to our next Tuesday seminar of Seminar series in Statistics and Data Science

Speaker: Alexis Akria Toda, Associate Professor at the University of California San Diego

Title: Recent Advances in the Theory of Power Law and Applications

When? Wednesday 22.05.2024, 14:15-15:30

Where?  Erling Svedrups plass and Zoom 
https://uio.zoom.us/j/66503159220?pwd=alhPVFpHNUxVUTNoeHhIcVFtUWx4UT09 

Abstract: In this talk I will cover the recent advances in the theory of power law distributions, in particular the role of Markov modulation and random stopping emphasized by Beare and Toda (2022), which builds on Nakagawa (2007)’s Tauberian theorem. Applications include the emergence of Zipf’s law in Japanese cities and the spread of COVID-19. I will also present open mathematical problems.

• Beare, Brendan K., and Alexis Akira Toda. "Determination of Pareto exponents in economic models driven by Markov multiplicative processes." Econometrica 90.4 (2022): 1811-1833.

• Nakagawa, Kenji. "Application of Tauberian theorem to the exponential decay of the tail probability of a random variable." IEEE Transactions on Information Theory 53.9 (2007): 3239-3249.

• Beare, Brendan K., and Alexis Akira Toda. "On the emergence of a power law in the distribution of COVID-19 cases." Physica D: Nonlinear Phenomena 412 (2020): 132649.

Welcome!

Best regards,
Sven Ove Samuelsen & Aliaksandr Hubin

Vis arrangement →
apr.
23.
2:15 p.m.14:15

Seminar series in Statistics and Data Science: Knut Rand

We are very pleased to invite you to our next Tuesday seminar of Seminar series in Statistics and Data Science

Speaker:  Knut Dagestad Rand, Researcher at HISP, Department of Informatics, University of Oslo

Title: Bayesian Time Series Modelling of Climate-Health Data

When? Tuesday 23.04.2024, 14:15-15:15

Where?  Erling Svedrups plass and Zoom 
https://uio.zoom.us/j/66618660796?pwd=bVVOenVLeDFPT25LZ3RLdnRQaC9udz09 

Abstract
Climate and weather can affect disease prevalence in different ways. For instance, humidity and temperature affect the life cycles of mosquitos which can greatly influence the prevalence of vector-borne diseases like malaria and dengue. Modelling this relationship is very important, both in the short term for outbreak preparedness, and in the long term, for health systems to adapt to the changing climate. However, this modelling is difficult because of low amounts of quality health data, complexities in spatial-temporal modelling, and the many different domains (vector biology, climate, epidemiology).
In this talk I will present our work on building a framework both for developing modularized and adaptable climate-health models, and for rigorously evaluating the utility of these models.

Welcome!
Best regards,
Sven Ove Samuelsen & Aliaksandr Hubin

Vis arrangement →
apr.
9.
2:15 p.m.14:15

Seminar series in Statistics and Data Science: Steffen Grønneberg

We are extremely pleased to invite you to our postponed Tuesday seminar of
Seminar series in Statistics and Data Science

When? Tuesday 09.04.2024, 14:15-15:15 

Where?  Erling Svedrups plass and Zoom https://uio.zoom.us/j/62093083083?pwd=MmxrWGEyU0NNMWVrdDJmakx0ckhYdz09 

Speaker: Steffen Grønneberg, Professor BI

Title: Non-parametric regression among factor scores: Motivation and diagnostics for nonlinear structural equation models

Abstract: Structural equation models are simultaneous equation regression models, whose variables are latent, and measured via a confirmatory factor model (that is, with measurement error and repeated measurements). When the functional form of the simultaneous equation system is unknown, it has previously been observed in simulations that factor scores inputted into non-parametric regression methods approximate the true functional form. Factor scores estimate the latent variables (per person), and several types exist. We provide a theoretical (though population-based) analysis of this procedure, and provide assumptions under which it is theoretically justified in using Bartlett factor scores, which are simple linear transformations of the data. In simulations, we compare this suggestion to an already available though understudied non-linear and computationally heavy procedure, and observe that the simple Bartlett approach appears to work better.

Welcome!
Best regards,
Sven Ove Samuelsen & Aliaksandr Hubin

Steffen Grønneberg is Professor at the Department of Economics at BI (Norwegian Business School), Oslo.

Vis arrangement →
apr.
4.
2:15 p.m.14:15

Seminar series in Statistics and Data Science: Ian McKeague

We are very pleased to invite you to our next seminar of

Seminar series in Statistics and Data Science that will be held on Thursday this time

Speaker:  Ian McKeague, Professor of biostatistics at Mailman School of Public Health, Columbia University

Title: Empirical likelihood based inference for concurrent functional linear regression with applications to wearable device data

When? Thursday 04.04.2024, 14:15-15:15

Where?  Erling Svedrups plass and Zoom https://uio.zoom.us/j/67486742080?pwd=VHJPam5BSEc2Wkl6SjlhN2p5WjA2Zz09 

Abstract
This talk discusses a nonparametric inference framework for occupation time curves derived from wearable device data. Such curves provide the total time a subject maintains activity above a given level as a function of that level. Taking advantage of the monotonicity and smoothness properties of these curves, we develop a likelihood ratio approach to construct confidence bands for mean occupation time curves.  An extension to fitting concurrent functional regression models is also developed. Application to wearable device data from an ongoing study of an experimental gene therapy for mitochondrial DNA depletion syndrome will be discussed. Based on joint work with Hsin-Wen Chang (Academia Sinica).

Welcome!
Best regards,
Sven Ove Samuelsen & Aliaksandr Hubin

Vis arrangement →
mars
12.
2:15 p.m.14:15

Seminar series in Statistics and Data Science: Guilherme Clarindo Marcos

We are very pleased to invite you to our next Tuesday seminar of

Seminar series in Statistics and Data Science

Speaker:  Guilherme Clarindo Marcos, Researcher and Ph.D. candidate in the Marine Environment Group from Centre for Marine Technology and Ocean Engineering (CENTEC), University of Lisbon, Portugal

Title: Robust estimation and representation of climatic wave spectrum

When? Tuesday 12.03.2024, 14:15-15:15

Where?  Erling Svedrups plass and Zoom 

https://uio.zoom.us/j/68412228703?pwd=Y2FFZDlCSzBZbDZ4Rkw0S2NQWHpTQT09  

Abstract

The climatic ocean wave spectrum serves as a pivotal tool in comprehending the long-term characteristics and variations of wave patterns across different regions of the world's oceans. The presentation explores the methodologies employed to derive wave spectra from observational data. Basically, consists of a statistical approach that provides a quantitative understanding of the variability and extremes of wave conditions. In essence, an ocean wave spectrum is a representation of the distribution of energy among different wave frequencies and wavelengths. So, engineers rely on this valuable information to mitigate risks and design solutions that can withstand the dynamic forces of ocean waves. However, it is necessary to present such information in a robust and practical mode to better comprehend the variations. In this way, a robust and resistant approach will be presented to define such variabilities, thus reducing uncertainties and representing the climatic wave spectrum in a compact and informative way.

Welcome!
Best regards,
Sven Ove Samuelsen & Aliaksandr Hubin

Guilherme Clarindo Marcos

Vis arrangement →
feb.
13.
2:15 p.m.14:15

Seminar series in Statistics and Data Science: Michael Scheuerer

After a short festive break, we are extremely pleased to invite you again to our Tuesday seminar of

Seminar series in Statistics and Data Science

Speaker:  Michael Scheuerer, Senior Research Scientist, Norwegian Computing Center

Title: Decadal inflow projections for catchments in Brazil

When? Tuesday 13.02.2024, 14:15-15:15

Where?  Erling Sverdrups plass and Zoom 

https://uio.zoom.us/j/69724784636?pwd=Z2cyakFuMjJBVFQwcGtZaEJGdHFTQT09  

Abstract
Our project partner Statkraft owns and operates several hydropower plants in Brazil and requires information about the future potential for hydropower production in this region. To provide inflow projections for the next several decades, we use climate model output in combination with a regression model that links meteorological variables such as precipitation and temperature to inflow over various catchments in the region. The relatively short time period for which observation data are available raises concerns about overfitting. We therefore explore an alternative model fitting approach that retains the original, easily interpretable regression model but estimates the regression coefficients within an artificial neural network (ANN) framework which permits spatial and temporal regularization and thus prevents overfitting. We show some examples of the inflow projections obtained with that methodology and discuss some caveats and limitations.

Welcome!
Best regards,
Sven Ove Samuelsen & Aliaksandr Hubin

Vis arrangement →
feb.
8.
2:30 p.m.14:30

Biostatistics Seminar: Martin Bladt

The next OCBE Biostatistics Seminar will be held on Thursday, February 8th, at 14:30-15:30 in Auditorium 13 (Domus Medica, Oslo).

Martin Bladt, Associate Professor, Department of Mathematical Sciences, University of Copenhagen, will talk about

Conditional Aalen–Johansen estimation

Abstract

Aalen–Johansen estimation targets transition probabilities in multi-state Markov models subject to right-censoring. In particular, it belongs to the standard toolkit of statisticians specializing in health and disability. We introduce for the first time the conditional Aalen-Johansen estimator, a kernel-based estimator that allows for the inclusion of covariates and, importantly, is also applicable in non-Markov models. We establish uniform strong consistency and asymptotic normality under lax regularity conditions; here, the theory of empirical processes plays a central role and leads to a transparent treatment. We also illustrate the practical implications and strength of the estimation methodology.

See also: https://www.med.uio.no/imb/english/research/centres/ocbe/events/biostat-seminar/ for updates or for subscribing to the events. If you'd like to be added in the "biostat-ext" mailing list, just let us know.

Best wishes,
Jon Michael Gran
OCBE

Vis arrangement →
feb.
1.
2:30 p.m.14:30

OCBE Biostatistics Seminar: Veronica Vinciotti

The next OCBE Biostatistics Seminar will be held on Thursday, February 1st, at 14:30-15:30 in Auditorium 13 (Domus Medica, Oslo).

Veronica Vinciotti, Associate Professor at the Department of Mathematics, University of Trento (Italy), will present about

Random graphical model of microbiome interactions in related environments

Abstract: The microbiome constitutes a complex microbial ecology of interacting components that regulates important pathways in the host. Measurements of microbial abundances are key to learning the intricate network of interactions amongst microbes. Microbial communities at various body sites tend to share some overall common structure, while also showing diversity related to the needs of the local environment. In this talk, I will describe a computational approach for the joint inference of microbiota systems from (count) metagenomic data for a number of body sites. The random graphical model (RGM) allows for heterogeneity across the different body sites via environment-specific copula graphical models, while quantifying their relatedness at the structural level via a joint generative model of the graphs. In addition, the model allows for the inclusion of external covariates at both the microbial and interaction levels, further adapting to the richness and complexity of microbiome data. In the last part of the talk, I will show how a similar methodology has been used to study cross-country cultural heterogeneity from (ordinal) survey data.

[Reference: V. Vinciotti, E. Wit, F. Richter. Random graphical model of microbiome interactions in related environments, 2023, https://arxiv.org/abs/2304.01956 ]

The OCBE biostatistics seminar series is organized by the Oslo Centre for Biostatistics and Epidemiology and is an arena for presenting new interesting work in biostatistics, both on the methodological and applied side, from researchers in Oslo, from Norway and abroad.

See also: https://www.med.uio.no/imb/english/research/centres/ocbe/events/biostat-seminar/ for updates or for subscribing to the events. If you'd like to be added in the "biostat-ext" mailing list, just let us know.

Other upcoming seminars this semester:
February 8th, 2024: Martin Bladt, University of Copenhagen, Denmark

Welcome!

Best wishes,
Valeria Vitelli
Associate Professor
Oslo Centre for Biostatistics and Epidemiology,
Department of Biostatistics, University of Oslo, Norway
mailto: valeria.vitelli@medisin.uio.no

webpage: http://www.med.uio.no/imb/english/people/aca/valeriv/

Vis arrangement →
jan.
10.
2:00 p.m.14:00

Explaining AI seminar

Speaker: Jiachen (Tianhao) Wang (Princeton University)

Location: Microsoft Teams

Title: Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation

Abstract: Data valuation aims to quantify the usefulness of individual data sources in training machine learning (ML) models, and is a critical aspect of data-centric ML research. However, data valuation faces significant yet frequently overlooked privacy challenges despite its importance. This paper studies these challenges with a focus on KNN-Shapley, one of the most practical data valuation methods nowadays. We first emphasize the inherent privacy risks of KNN-Shapley, and demonstrate the significant technical difficulties in adapting KNN-Shapley to accommodate differential privacy (DP). To overcome these challenges, we introduce TKNN-Shapley, a refined variant of KNN-Shapley that is privacy-friendly, allowing for straightforward modifications to incorporate DP guarantee (DP-TKNN-Shapley). We show that DP-TKNN-Shapley has several advantages and offers a superior privacy-utility tradeoff compared to naively privatized KNN-Shapley in discerning data quality. Moreover, even non-private TKNN-Shapley achieves comparable performance as KNN-Shapley. Overall, our findings suggest that TKNN-Shapley is a promising alternative to KNN-Shapley, particularly for real-world applications involving sensitive data.

Vis arrangement →
des.
7.
2:30 p.m.14:30

OCBE Biostatistics Seminar: Geir Kjetil Sandve

Dear all, 

The next OCBE biostatistics seminar will be held on Thursday, December 7th, at 14:30-15:30 in Lille Auditorium (Domus Medica, Oslo).
Geir Kjetil Sandve, professor at the Biomedical Informatics Research Group (BMI), Section of Machine learning, Department of Informatics (UiO), will present about
Approaching intriguing problems with machine learning:
the full picture from data availability to methodology development and assessment, with the adaptive immune system as case.

Abstract: As statisticians and machine learners, we often talk quite exclusively about the methodology we develop, although we typically agree that appropriate data and method assessment is equally important. 

I will here present how we through a broad interdisciplinary collaboration have tried to approach the full spectrum of aspects that influence our success in approaching an intriguing problem with machine learning. At the core is the development of a novel deep learning architecture whose components are motivated by (tailored according to) insights from the application domain. Looking ahead, we have also critically analysed how large language models might improve on the more classic deep learning approaches to the problem. To support the methodology development, we have initiated separate projects to generate both experimental and synthetic data. And to support interoperability, reproducibility and rigorous assessment of the developed methodology, we have developed a software platform for machine learning in the domain, as well as initiating an international competition to benchmark competing methods in the field. 
The case (the machine learning problem) that is underlying the above developments is the question of how the adaptive immune system recognises foreign threats - e.g. viruses, bacteria or cancer. This is essentially a DNA sequence classification problem, known to be driven by complex, higher-order interactions. There is a strong interest in better solutions to this computational problem, as it could accelerate drug development and allow early diagnosis of disease.

The OCBE biostatistics seminar series is organized by the Oslo Centre for Biostatistics and Epidemiology and is an arena for presenting new interesting work in biostatistics, both on the methodological and applied side, from researchers in Oslo, from Norway and abroad.

See also: https://www.med.uio.no/imb/english/research/centres/ocbe/events/biostat-seminar/ for updates or for subscribing to the events. If you'd like to be added in the "biostat-ext" mailing list, just let us know.

Other upcoming seminars in the Spring semester:

February 1st, 2024: Veronica Vinciotti, Department of Mathematics, University of Trento (Italy)

February 8th, 2024: Martin Bladt, Department of Mathematical Sciences, University of Copenhagen (Denmark)

Welcome!
Best wishes,
Valeria Vitelli
Associate Professor
Oslo Centre for Biostatistics and Epidemiology,
Department of Biostatistics, University of Oslo, Norway

Vis arrangement →
nov.
21.
9:15 a.m.09:15

Seminar in Statistics and Data Science: Adam Lee

Dear all, 

We are extremely pleased to invite you to our next  seminar of Seminar series in Statistics and Data Science.

 Speaker: Adam Lee, Assistant Professor, Department of Data Science & Analytics, BI Norwegian Business School

 Title: Locally robust and efficient tests for non-regular semiparametric models

 When? Tuesday 21.11.2023, 09:15-10:15

 Where?  Erling Svedrups plass and Zoom https://uio.zoom.us/j/64061448393?pwd=L21KWjZxbFpWZDY4N3RBTUZHLzVvQT09 

Abstract

This paper considers hypothesis testing in semiparametric models which may be non – regular for certain values of a (potentially infinite dimensional) nuisance parameter. In such models no (locally) regular estimator of the parameter of interest exists. The situation for testing is somewhat different: I establish that $C(\alpha)$ – style test statistics achieve their limiting distributions in a (locally) regular manner under mild conditions, leading to tests with correct size in situations where standard tests fail to control size. Additionally, I characterise the appropriate limit experiment in which to study local (asymptotic) optimality of tests in the case where the efficient information matrix is singular. This permits the generalisation of classical power bounds to the non – regular case. I provide appropriate statements of these bounds and give conditions under which they are attained by the proposed $C(\alpha)$ – style 

Welcome!

Best regards,
Sven Ove Samuelsen & Aliaksandr Hubin

Vis arrangement →
nov.
9.
2:30 p.m.14:30

OCBE Biostatisctis Seminar: Antonio Canale

Biostatistical seminar with Antonio Canale, Assoc. Professor, Department of Statistical Sciences, University of Padova, Italy.

Time and place: Nov. 9, 2023 2:30 PM – 3:30 PM, Domus Medica, Auditorium 13

Title: Escaping The Curse of Dimensionality in Bayesian Model-Based Clustering

Abstract: Bayesian mixture models are widely used for clustering of high-dimensional data with appropriate uncertainty quantification. However, as the dimension of the observations increases, posterior inference often tends to favor too many or too few clusters. This article explains this behavior by studying the random partition posterior in a non-standard setting with a fixed sample size and increasing data dimensionality. We provide conditions under which the finite sample posterior tends to either assign every observation to a different cluster or all observations to the same cluster as the dimension grows. Interestingly, the conditions do not depend on the choice of clustering prior, as long as all possible partitions of observations into clusters have positive prior probabilities, and hold irrespective of the true data-generating model. We then propose a class of latent mixtures for Bayesian clustering (Lamb) on a set of low-dimensional latent variables inducing a partition on the observed data. The model is amenable to scalable posterior inference and we show that it can avoid the pitfalls of high-dimensionality under mild assumptions. The proposed approach is shown to have good performance in simulation studies and an application to inferring cell types based on scRNAseq.

Joint work with Noirrit Kiran Chandra & David B. Dunson

Vis arrangement →
okt.
31.
2:15 p.m.14:15

Seminar Series in Statistics and Data Science: Valeria Vitelli

After a break, we are extremely pleased to invite you again to our Tuesday seminar of

Seminar series in Statistics and Data Science

Speaker:  Valeria Vitelli, Associate Professor, Department of Biostatistics, University of Oslo

Title: Rank-based covariate-informed clustering of high-dimensional data with variable selection

When? Tuesday 31.10.2023, 14:15-15:15

Where?  Erling Svedrups plass and Zoom 

https://uio.zoom.us/j/66849037258?pwd=NkNiR0lkbm5VK0VyMytVZW4vV0hNQT09

Abstract

Rank-based models can be used to estimate individual behaviours and preferences in several areas, such as marketing and politics. Often, combining the expressed preferences with additional user-related information (covariates) can potentially lead to a better accuracy in individual predictions, by enhancing the understanding of the users’ personal profiles. The Mallows model is a popular model for rankings, as it flexibly adapts to different types of preference data, and the previously proposed Bayesian Mallows Model (BMM) offers a computationally efficient framework for Bayesian inference also allowing capturing the users’ heterogeneity, via a finite mixture. However, the Mallows model does not seem realistic when the pool of items is large, and furthermore BMM does not currently allow the use of covariates. In this talk, I will introduce a recent extension of BMM that embeds covariate information in a joint rank-based clustering framework. The proposed method is based on a similarity function that a priori favours the aggregation of people into a cluster when their covariates are similar. A lower-dimensional version of BMM (lowBMM) that scales to large datasets has also been proposed and used in the context of cancer genomics; however, lowBMM does not perform clustering. We now propose to combine the Bayesian mixture of Mallows models with items selection, to jointly perform variable selection and clustering. Performance of both methods is investigated via simulation studies, and real-data examples in genomics and preference learning are also shown. This is joint work with Emilie Eliseussen, Arnoldo Frigessi, Haakon Muggerud, Ida Scheel.  

 Welcome!

 Best regards,

Sven Ove Samuelsen & Aliaksandr Hubin

Vis arrangement →
okt.
24.
10:00 a.m.10:00

Explaining AI-seminar

Dear all BigInsighters,

You are hereby invited to a new Explaining AI seminar. The seminar will be a webinar, since the speaker speaks all the way from Germany.

Speaker: Julia Herbinger (Ludwig-Maximilians-Universität München)

Location: Click here to join the meeting (Microsoft Teams)

Title: Decomposing Global Feature Effects Based on Feature Interactions

Abstract: Global feature effect methods, such as partial dependence (PD) plots, provide an intelligible visualization of the expected marginal feature effect. However, such global feature effect methods can be misleading, as they do not represent local feature effects of single observations well when feature interactions are present. In this talk, I will introduce a new framework called generalized additive decomposition of global effects (GADGET), which is based on recursive partitioning to find interpretable regions in the feature space such that the interaction-related heterogeneity of local feature effects is minimized. I will demonstrate its applicability to the most popular methods to visualize marginal feature effects, namely PD, accumulated local effects (ALE), and Shapley additive explanations (SHAP) dependence. Additionally, I will show that different measures to quantify and analyze feature interactions can be derived when GADGET is applied. To define the interacting feature subset for GADGET, I will introduce PINT, a novel permutation-based significance test to detect global feature interactions that is applicable to any feature effect method used within GADGET. I will demonstrate the applicability of the proposed methods based on simulation and real-world examples.

Welcome!

Vis arrangement →
okt.
19.
2:30 p.m.14:30

OCBE Biostatistics Seminar: Thomas Matcham

Biostatistical seminar with Thomas Matcham, Imperial College London, UK.

Time and place: Oct. 19, 2023 2:30 PM – 3:30 PM, Domus Medica, Auditorium 13

Title: A Proper Concordance Index for Time-Varying Relative Risk

Abstract: Harrel's concordance index is a commonly used discrimination metric for survival models, particularly for models where the relative ordering of the risk of individuals is time-independent, such as the proportional hazards model. There are several suggestions, but no consensus, on how it could be extended to models where relative risk can vary over time, e.g. in case of crossing hazard rates. We show that these concordance indices are not proper, in the sense that they are maximised in the limit by the true data generating model. Furthermore, we show that a concordance index is proper if and only if the risk score used is concordant with the hazard rate at the first event time for each comparable pair of events. Thus, we suggest using the hazard rate as the time-varying risk score when calculating concordance. Through simulations, we demonstrate situations in which other concordance indices can lead to incorrect models being selected over a true model, justifying the use of our suggested risk prediction in both model selection and in loss functions in, e.g., deep learning models.

Vis arrangement →
okt.
12.
2:30 p.m.14:30

OCBE Biostatistics Seminar: Jesse Hemerik

Biostatistical seminar with Jesse Hemerik, Assistant Professor, Department of Econometrics, Erasmus University Rotterdam.

Time and place: Oct. 12, 2023 2:30 PM – 3:30 PM, Auditorium 13, Domus Medica

Title: Robust testing in generalized linear models with many responses

Abstract: Generalized linear models (GLMs) are widely used in biostatistics, e.g. to model binary responses or counts. For example, when analyzing RNA-Seq data, it is common to fit many GLMs simultaneously. GLMs are often misspecified due to overdispersion and heteroscedasticity. Existing quasi-likelihood methods for testing in misspecified GLMs often do not provide satisfactory type I error rate control. We provide a novel semi-parametric test, based on a permutation-type approach. Our test often provides better type I error control than its competitors. Further, we consider the common scenario that there are multiple response variables. Think for example about RNA-Seq or neuroimaging data. For each of the responses, association with the predictor of interest is tested. The challenge is then to deal with the multiple testing problem in a powerful and reliable way. To achieve this, we combine our approach with powerful permutation-based multiple testing methods.

Vis arrangement →
sep.
14.
2:30 p.m.14:30

OCBE Biostatistics Seminar: Stein Emil Vollset

Our next OCBE biostatistics seminar take place Thursday next week, September 14, at 14:30-15:30, in Auditorium 13 (Domus Medica, Oslo).

Stein Emil Vollset, Professor, Department of Health Metrics Sciences, University of Washington, USA, will talk about

Forecasting the Global Burden of Disease Study
Abstract: I will present the Global Burden of Disease Study (GBD) that estimate disease burden for 204 countries and territories, and at the first administrative level for a subset of 22 countries. GBD also produce estimates of disease burden attributable to close to 70 risk factors. The main measures of disease burden are deaths, years of life lost, years lived with disability, DALYs (disability adjusted life years), prevalence and incidence for more 350 diseases and injuries, in 23 age groups and for males and females back to 1990. The main focus of the talk will be to present the ongoing effort to produce forecasts of disease burden and population  to 2050 or 2100. I will also present some computational and methodological challenges encountered by the project. 

See also: https://www.med.uio.no/imb/english/research/centres/ocbe/events/biostat-seminar/ for updates or for subscribing to the events. If you'd like to be added in the "biostat-ext" mailing list, let us know.

Welcome!

Jon Michael Gran
Oslo Centre for Biostatistics and Epidemiology

Vis arrangement →
juni
15.
2:30 p.m.14:30

OCBE Biostatistics Seminar: Susanne Strohmaier

Remember our OCBE biostatistics seminar tomorrow, Thursday June 15, at 14:30-15:30, in Auditorium 13 (Domus Medica, Oslo).

Susanne Strohmaier, Medical University of Vienna, Austria, will talk about

Survival benefit of kidney transplantation compared to remaining waitlisted on dialysis - Results from an Austrian nation registry using target trial emulation

Abstract: For certain medical research questions, randomization is unethical or infeasible so causal effects have to be estimated from observational data. If confounding and exposure status are time-dependent, this requires sophisticated methodology such as target trial emulation involving longitudinal matching methods. Here we present an example from nephrology aiming to quantify the survival benefit of first kidney transplantation compared to remaining on dialysis and never receiving an organ, across ages and across times since waitlisting.

We analyzed data from the Austrian Dialysis and Transplant Registry comprising patients on dialysis and waitlisted for a kidney transplant with repeated updates on patient characteristics and waitlisting status. As often with registry data, a tricky task was data management, i.e. dealing with inconsistencies and incompleteness. Data availabilities also had to be taken into consideration when deciding on the most relevant causal effect that could be identified and estimated. We adapted the approaches of Gran et al. (2010) and Schaubel et al. (2006) by constructing a series of auxiliary trials, where each trial was initiated at the time of a transplantation (relative to time of first/second waitlisting). Transplanted patients contributed to the treatment group while patients with current active waitlisting status were classified to the control group. Controls were artificially censored if they were transplanted at a later time and their transplantation then initiated a further trial of the series. We applied pooled logistic regression adjusted for time-varying patient characteristics to estimate inverse probability of treatment weights (IPTWs) to achieve exchangeability and trial specific Cox proportional hazards models to compute yearly updated IPCWs to account for non-adherence to the assigned treatment. The marginal effect and the effect conditional on age and duration of waitlisting expressed on different scales (hazard ratios, survival probabilities and restricted mean survival times) were obtained from Cox models weighted by the product of IPTWs and IPCWs fitted to the stacked data set of all trials. A bootstrap approach was used to obtain confidence intervals. 

See also: https://www.med.uio.no/imb/english/research/centres/ocbe/events/biostat-seminar/ for updates or for subscribing to the events. If you'd like to be added in the "biostat-ext" mailing list, let us know.

Welcome!
Best wishes,
Valeria Vitelli, Manuela Zucknick and Jon Michael Gran

Vis arrangement →
juni
6.
2:15 p.m.14:15

Seminar Series in Statistics and Data Science: Edward Austin

We are very pleased to invite you to our next seminar of

Seminar series in Statistics and Data Science

Speaker:  Edward Austin, Senior research associate, Lancaster University

Title: Detecting Emergent Anomalies in Functional Data 

When? Tuesday 06.06.2023, 14:15-15:15 

Where?  Erling Svedrups plass and Zoom https://uio.zoom.us/j/61696001256?pwd=ZTVSUXJSUjNLYW5VR3AxRnI1MU5Edz09 

Abstract
This talk will focus on recent work about the sequential detection of anomalies within partially observed functional data, motivated by a problem encountered by an industrial collaborator. Classical sequential changepoint detection approaches look for changes in the parameters, or structure, of a data sequence and are not equipped to handle the complex non-stationarity and dependency structure of functional data. Conversely, existing functional data approaches require the full observation of the curve before anomaly detection can take place. We propose a new method, FAST, that performs sequential detection of anomalies in partially observed functional data. This talk will introduce the approach, and some associated theoretical results, and highlight its application on telecommunications data.

Welcome!

Best regards,
Sven Ove Samuelsen & Aliaksandr Hubin

Vis arrangement →