Recent Posts

More Posts

Lasso & glinternet Every Data Scientist and her dog know linear and logistic regression. The majority will probably also know that these models have regularized versions, which increase predictive performance by reducing variance (at the cost of a small increase in bias). Choosing L1-regularization (Lasso) even gets you variable selection for free. The theory behind these models is covered expertly in The Elements of Statistical Learning (for an easier version, see An Introduction to Statistical Learning), and implemented nicely in the packages glmnet for R and scikitlearn for Python.


See also the MJA podcast episode accompanying this article. Our joint work (UNSW CBDRH and Statistics) which analyses Australian patient claim data using big network algorithms is now available on the MJA website. We have processed MBS claims data of 10% of Australians over the years 1994-2014, trying to shed light on the following research questions: What is the patient sharing behaviour of general practitioners (GPs): are there any meaningful clusters (called “Provider Practice Communities, PPC”) of GPs which collaborate and share patients?


Last week I had the privilege to participate in the NUS-NUH-MIT DATATHON and Workshop on applications of AI in healthcare with the UNSW Centre for Big Data Research in Health (CBDRH) team (Tim Churches, Mark Hanly, Oisin Fitzgerald and Oluwadamisola Sotade). Thu & Fri: Workshop & Talks In the workshop “Deploying AI Solutions in Real Clinical Practices” by Dr Ngiam Kee Yuan (CTO, NUHS) we discussed The large NUHS (National University Health System) databases and their storage structure Data security and ownership Applications for access to data The always changing standards of diagnosis codes (ICD9, ICD10, SNOMED, …) and the problem of matching doctors diagnoses to these codes.


In the internship, you’ll explore the latest machine learning methods such as tree ensemble methods, graphical models and neural networks and compare their performance to what is the current industry standard. A great opportunity to improve your machine learning skills and create valuable insights for the credit risk industry.

Applications close 20 June, 2018.

Apply here:


The R package is now available on CRAN. It Models extremes of ‘bursty’ time series via Continuous Time Random Exceedances (CTRE). (See companion paper.)


Recent Publications

More Publications

. Identification of pollutant source for super-diffusion in aquifers and rivers with bounded domains. Water Resources Research, 2018.


. Peaks Over Threshold for Bursty Time Series. arXiv:1802.05218, 2018.

Code Project HTML arXiv

. A semi-markov algorithm for continuous time random walk limit distributions. Mathematical Modelling of Natural Phenomena, 2017.

Project DOI arXiv

. Fokker–Planck and Kolmogorov backward equations for continuous time random walk scaling limits. Proceedings of the American Mathematical Society, 2017.

Project DOI arXiv

Recent & Upcoming Talks

The Network Structure of General Practice in Australia
Nov 28, 2018 2:00 PM
Inference for Bursty Extremes
Jun 21, 2018 10:00 AM


Networks of Health Providers

What effect does the social structure of health systems have on health outcomes?

R package CTRE

Models extremes of bursty time series.

R package MittagLeffleR

Computes Mittag-Leffler probability densities.

Continuous Time Random Walks

Limit theorems and statistical physics

Extremes of Bursty Time Series

Heavy tails, scaling limits, R packages

Identifying antidepressant users with depression

Only 1223 of antidepressant users have depression. How do we use statistical learning to predict depression given antidepressant use?