The intelligent use of electronic health record data opens up new opportunities to improve clinical care. Such data have the potential to uncover new sub-types of a disease, approximate the effect of a drug on a patient, and create tools to find patients with similar phenotypic profiles. Motivated by such questions, this thesis develops new algorithms for unsupervised and semi-supervised learning of latent variable, deep generative models – Bayesian networks parameterized by neural networks. To model static, high-dimensional data, we derive a new algorithm for inference in deep generative models. The algorithm, a hybrid between stochastic variational inference and amortized variational inference, improves the generalization of deep generative models on data with long-tailed distributions. We develop gradient-based approaches to interpret the parameters of deep generative models, and fine-tune such models using supervision to tackle problems that arise in few-shot learning. To model longitudinal patient biomarkers as they vary due to treatment we propose Deep Markov Models (DMMs). We design structured inference networks for variational learning in DMMs; the inference network parameterizes a variational approximation which mimics the factorization of the true posterior distribution. We leverage insights in pharmacology to design neural architectures which improve the generalization of DMMs on clinical problems in the low-data regime. We show how to capture structure in longitudinal data using deep generative models in order to reduce the sample complexity of nonlinear classifiers thus giving us a powerful tool to build risk stratification models from complex data.