Rising antibiotic resistance rates pose a serious public health threat and are largely driven by overuse and inappropriate use of antibiotics. Antibiotic stewardship efforts have been established around the world to improve prescription practices, but further optimization of antibiotic usage is still needed. Improvement is particularly necessary in the empiric treatment setting, the period of time immediately after a patient presents with an infection, during which clinicians must select a treatment without microbiological testing results. In this thesis, we develop methods to learn treatment policies for empiric antibiotic prescription that are tailored to individual characteristics. We present three policy learning approaches and evaluate them in the setting of uncomplicated urinary tract infections (UTIs) using data from two Boston-area hospitals. All three approaches learn policies that significantly improve over clinicians and practice guidelines with respect to rates of inappropriate antibiotic therapy (IAT) and broad spectrum antibiotic usage, and are able to trade off between these two outcomes as desired.We then address considerations important for deploying such learned policies as clinical decision support tools in real-world medical settings. We present techniques for learning treatment policies with the ability to defer to clinician decisions and strategies for improving the interpretability and transparency of the learned policies. We are able to successfully derive an effective, clinically intuitive treatment policy that uses fewer than 20 features. Even after accounting for several real-world treatment considerations, this policy is able to reduce rates of IAT by 20% and broad spectrum usage by nearly 50% relative to clinicians. We hope that the work presented in this thesis provides a meaningful step towards using machine learning to improve antibiotic stewardship practices in the future.