Student Capstone Projects

The capstone is the culminating project for each student in a SACS Master of Science program. The comprehensive, real-life industry-type projects are oriented toward the student’s domain of interest.

Each project includes: formulation of a question to be answered by the data; collection, cleaning and processing of data; choosing and applying a suitable model and/or analytic method to the problem; and communicating the results to a non-technical audience.

a photo of Aysia Veal

Aysia Veal

M.S. Data Science

Predicting Maternal Mortality Rates Using Socioeconomic, Demographic, and Health Infrastructure Indicators

Abstract

Maternal mortality rates (MMR) remain a critical public health concern in the United States, particularly due to persistent racial and socioeconomic disparities. This project explores patterns and trends in maternal mortality to better understand how structural, economic, and environmental factors may contribute to these outcomes. By examining national data alongside indicators such as income levels, gross domestic product (GDP), and climate-related variables, the research provides a broader context for understanding inequities in maternal health. Through detailed trend analysis and data visualization, the project highlights differences across geographic regions and population groups, emphasizing the disproportionate burden experienced by marginalized communities. The goal is to translate complex health data into clear, meaningful insights that can inform public health awareness and policy discussions. By centering health equity and data-driven inquiry, this work contributes to ongoing efforts to reduce maternal mortality and improve outcomes for mothers nationwide.

Advisor: Lei Qian, Ph.D.

James Walton

James Walton

MS. Biomedical Data Science

SportRx-T1D: Precision Exercise Decision Support for Athletes with Type 1 Diabetes: A Data-Driven Approach to Personalized Glucose and Performance Management

Athletes with Type 1 Diabetes (T1D) often experience unpredictable glucose fluctuations during exercise, yet current technologies rely on static, population-level exercise presets. SportRx-T1D is a simulation-based framework designed to evaluate whether personalized modeling can improve exercise safety. Using retrospective CGM, insulin, carbohydrate, and heart-rate–derived intensity data, an individualized sequence model forecasts glucose 15–90 minutes ahead and produces calibrated hypoglycemia-risk estimates. The FDA-validated UVA/Padova simulator is then used to test guideline-consistent insulin or carbohydrate adjustments informed by these risk predictions. Findings illustrate how personalized forecasting and virtual experimentation can reduce projected hypoglycemia and support more adaptive exercise decision-support tools.

Woman in white lab coat standing at lab bench.

Ebony Weems, Ph.D.

M.S. Biomedical Data Science

Exploring Health Disparities Among Older Americans (65+) Residing in Food Deserts: A Multifaceted Analysis

The Administration on Aging reports that 1 in 6 people living in the United States is 65 years old or older. This represents 55.7 million people with a 38 percent increase in this population since 2010. The older adult population represents a vulnerable population due to age-related health concerns and potential limitations in mobility, income, and access to resources. Understanding and addressing health disparities among older Americans is crucial to ensuring their well-being and quality of life. This research study will examine the relationship between food insecurity and health outcomes among older adults in food deserts, including the prevalence of chronic conditions such as obesity, diabetes, hypertension, and cardiovascular disease. A comprehensive content analysis and quantitative analysis will be done to examine the impact of food access on health outcomes, explore socioeconomic factors, and propose interventions. Both the National Health and Nutrition Examination Survey (NHANES) 2017-2018 will be used to analyze data on various health indicators, dietary habits, and nutritional status in older adults. The Food Access Research Atlas (FARA) will also be used to map food access and proximity to grocery stores, farmer’s markets, and other food retail outlets. Results from this research will contribute to the existing knowledge, raise awareness, inform policymakers, and provide insights to improve the health outcomes of older Americans residing in food deserts.

Two men in suits standing outdoors. One man is smiling and posing for the photo.

Clarence White, Ph.D.

M.S. Data Science

Evaluating Factors That Contribute to Substance Use and Co-occurring Mental Health Disorders

Substance abuse continues to be heavy social and medical burdens. Many misused drugs can alter a person’s thinking and judgment, leading to health risks, including addiction, impaired driving, and infectious diseases. Substance use disorder (SUD) affects more than 8% of people in the United States at some point in their lives. Prescription opioids, marijuana, psychostimulants like cocaine and methamphetamine and alcohol are the most commonly abused substances in the United States. As the active addiction grows more serious, its social impact on the community expands exponentially in a multitude of ways. Abused drugs act to increase the dopamine in reward regions of the brain. A protein called dopamine transporter helps to clear the dopamine released to restore dopamine homeostasis. Additionally, individuals who experience a substance use disorder during their lives may also experience a co-occurring mental health disorder or vice versa.

Man in suit posing for portrait.

Noah Whittenbarger

M.S. Biomedical Data Science

Vowel-Based Estimation of Upper Airway Area

Accurate assessment of upper airway (UA) dimensions is critical for understanding its functional dynamics and addressing clinical challenges such as surgical planning and airway management. This study explores vowel articulation as a novel approach to UA evaluation, using MRI imaging and acoustic analysis to address limitations of existing tools. The objectives were to (1) examine how obesity affects UA area changes during vowel articulation and (2) predict MRI-based UA dimensions using acoustic features. 

Results revealed significant differences in UA area variation between high and low BMI groups during vowel articulation, suggesting obesity-related limitations in tongue movement.  

Leveraging acoustic features, this study developed machine learning models to estimate MRI-based UA dimensions, demonstrating the feasibility of using vowel articulation as a non-invasive assessment technique. These advancements pave the way for integrating vowel-based UA evaluation into clinical workflows, offering a cost-effective and scalable alternative to traditional imaging in diverse healthcare settings. 

Woman standing in front of a sign. She is wearing a black top and has a name tag around her neck.

Lorayya Williams

M.S. Data Science

A Novel Pipeline for Virus Integration Sites Detection in Tumor Genomes Using Deep Learning

Cancer is one of the leading causes of death worldwide. Pathogenic viruses are estimated to be responsible for 15% of all human cancers globally and pose significant threats to public health. Viruses integrate their genetic material into the host genome, increasing the risk of cancer promoting changes in it. To understand the molecular mechanisms of virus-mediated cancers, it is crucial to identify viral insertion sites in cancer genomes. However, this effort is hindered by the rapidly increasing volume of tumor sequencing data, along with the challenges of accurate data analysis caused by high viral mutation rates and the difficulty of aligning short reads to the reference genome. Thus, it is crucial to develop an efficient method for virus integration site detection in tumor genomes. This paper proposes a novel pipeline to identify viral integration sites leveraging deep Convolutional Neural Networks (CNN). Our contributions are twofold: (i) We propose and integrate two novel matrix generation methods into the pipeline, developed after aligning the host and viral genomes with their respective reference genomes.; (ii) We employ one-hot encoded images with reduced computational complexity to represent viral integration sites and harness the capabilities of Deep CNN networks for detection. The paper illustrates our proposed approach and presents experiments conducted using both synthetic and real sequencing data. Our experimental results are promising, showcasing the effectiveness of the proposed methods in detecting viral integration sites.

Sheronda Wilson

Sheronda Wilson

MS. Biomedical Data Science

Healthcare Intelligence for Risk, Utilization, and Quality Analysis of U.S. Nursing Care Facilities

Nursing care facilities in the United States provide essential post-acute and long-term care to clinically complex populations. These facilities are evaluated through metrics for payment, utilization, and quality, making them critical for data-driven performance and risk assessment. This study develops an integrated intelligence framework. The framework analyzes facility-level risk, utilization intensity, and quality variation using Medicare provider data from the Centers for Medicare & Medicaid Services (CMS).CMS public datasets, accessed via BigQuery, were used to construct a facility-level analytic dataset incorporating nursing facility and home health agency data. The key measures, which include episode counts, beneficiary volume, Medicare payments, outlier share, and service intensity, were first transformed into normalized features. Geographic information was derived from ZIP codes to enable spatial comparisons. The analytical framework combined principal component analysis (PCA), K-means clustering, autoencoder-based anomaly detection, and supervised machine learning models (logistic regression, random forest, and XGBoost), with SHAP used for model interpretability. This integrated approach identified distinct facility clusters, uncovered patterns of utilization intensity, and detected anomalous providers with potentially elevated operational risk. Machine learning models demonstrated the ability to differentiate high- and low-risk facility profiles, while SHAP analysis (a method to interpret feature impacts) highlighted key drivers of variation in utilization and payments. The framework provided a multidimensional understanding of facility behavior beyond traditional descriptive metrics. This study demonstrates that combining public CMS data with advanced machine learning and unsupervised methods can enhance healthcare intelligence and oversight. By identifying hidden facility segments and flagging outlier behavior, the proposed framework enables more targeted quality monitoring and informs policy decision-making. As a result, it offers a scalable, data-driven approach for improving transparency and performance evaluation in nursing care systems.

Man standing in front of a wall with photos and text.

Fuxue Xin

M.S. Biomedical Data Science

The Impact of Housing Condition on AMA among Pregnant and Postpartum Women with SUDs

Leaving treatment against medical advice (AMA) among pregnant and postpartum women with substance use disorder (SUD) is influenced by various factors, including housing and other social determinants of health (age, insurance, SUDs and mental state). Housing instability can be a significant barrier for pregnant and postpartum women with SUD seeking treatment. Lack of stable housing can lead to difficulties in accessing care, completing treatment programs, and maintaining recovery.

This study aims to explore the feasibility and effectiveness of utilizing natural language processing to extract housing information from clinical notes, validate model performance. The dataset is coming from a clinical chart for patients in the Rainbow/Mending Rainbow program at Elam Mental Health Center (EMHC) at Meharry.

Invest in Knowledge

With Your Support We Can Change the World.