Machine Learning

Add Interactions to Regularized Regression

Lasso & glinternet Every Data Scientist and her dog know linear and logistic regression. The majority will probably also know that these models have regularized versions, which increase predictive performance by reducing variance (at the cost of a small increase in bias). Choosing L1-regularization (Lasso) even gets you variable selection for free. The theory behind these models is covered expertly in The Elements of Statistical Learning (for an easier version, see An Introduction to Statistical Learning), and implemented nicely in the packages glmnet for R and scikitlearn for Python.

Big Networks in Healthcare

See also the MJA podcast episode accompanying this article. Our joint work (UNSW CBDRH and Statistics) which analyses Australian patient claim data using big network algorithms is now available on the MJA website. We have processed MBS claims data of 10% of Australians over the years 1994-2014, trying to shed light on the following research questions: What is the patient sharing behaviour of general practitioners (GPs): are there any meaningful clusters (called “Provider Practice Communities, PPC”) of GPs which collaborate and share patients?

NUS-MIT Datathon

Last week I had the privilege to participate in the NUS-NUH-MIT DATATHON and Workshop on applications of AI in healthcare with the UNSW Centre for Big Data Research in Health (CBDRH) team (Tim Churches, Mark Hanly, Oisin Fitzgerald and Oluwadamisola Sotade). Thu & Fri: Workshop & Talks In the workshop “Deploying AI Solutions in Real Clinical Practices” by Dr Ngiam Kee Yuan (CTO, NUHS) we discussed The large NUHS (National University Health System) databases and their storage structure Data security and ownership Applications for access to data The always changing standards of diagnosis codes (ICD9, ICD10, SNOMED, …) and the problem of matching doctors diagnoses to these codes.

PhD internship in Machine Learning