A Comparison of Dimensionality Reduction Techniques for Unstructured Clinical Text

Yoni Halpern, Steven Horng, Larry A. Nathanson, Nathan I. Shapiro, David Sontag

2012

PDF

Abstract

Much of clinical data is free text, which is challenging to use together with machine learning, visualization tools, and clinical decision rules. In this paper, we compare supervised and unsupervised dimensionality reduction techniques, including the recently proposed sLDA and MedLDA algorithms, on clinical texts. We evaluate each dimensionality reduction method by using them as features for two important prediction problems that arise in emergency departments: predicting whether a patient has an infection, which can progress to sepsis, and predicting the likelihood of a patient being admitted to the Intensive Care Unit (used for risk stratification). We find that, on this data, existing supervised dimensionality reduction techniques perform better than unsupervise techniques only for very low dimensional representations.

Type

Conference paper

Publication

ICML 2012 Workshop on Clinical Data Analysis

"Health care"

A Comparison of Dimensionality Reduction Techniques for Unstructured Clinical Text

Abstract

Yoni Halpern

PhD student

David Sontag

Professor of EECS

Related