Student Capstone Projects

The capstone is the culminating project for each student in the M.S. Data Science and M.S. Biomedical Data Science programs. The comprehensive, real-life industry-type projects are oriented toward the student’s domain of interest.

Each project includes: formulation of a question to be answered by the data; collection, cleaning and processing of data; choosing and applying a suitable model and/or analytic method to the problem; and communicating the results to a non-technical audience.

Weems, Ph.D., Ebony

M.S. Biomedical Data Science

Exploring Health Disparities Among Older Americans (65+) Residing in Food Deserts: A Multifaceted Analysis

The Administration on Aging reports that 1 in 6 people living in the United States is 65 years old or older. This represents 55.7 million people with a 38 percent increase in this population since 2010. The older adult population represents a vulnerable population due to age-related health concerns and potential limitations in mobility, income, and access to resources. Understanding and addressing health disparities among older Americans is crucial to ensuring their well-being and quality of life. This research study will examine the relationship between food insecurity and health outcomes among older adults in food deserts, including the prevalence of chronic conditions such as obesity, diabetes, hypertension, and cardiovascular disease. A comprehensive content analysis and quantitative analysis will be done to examine the impact of food access on health outcomes, explore socioeconomic factors, and propose interventions. Both the National Health and Nutrition Examination Survey (NHANES) 2017-2018 will be used to analyze data on various health indicators, dietary habits, and nutritional status in older adults. The Food Access Research Atlas (FARA) will also be used to map food access and proximity to grocery stores, farmer’s markets, and other food retail outlets. Results from this research will contribute to the existing knowledge, raise awareness, inform policymakers, and provide insights to improve the health outcomes of older Americans residing in food deserts.

Jena, Caroline

M.S. Data Science

Integrating Clinical Characteristics to Predict ICU Mortality in Sepsis Patients: A Comprehensive Approach

Increasing sepsis mortality rate remains a critical global health concern, particularly in intensive care unit (ICU) patients. Therefore, early, and accurate prediction of sepsis outcomes is crucial for guiding timely clinical interventions and reducing preventable deaths. This study aimed to investigate a comprehensive machine learning (ML) approach to accurately predict mortality risk outcomes using clinical characteristics such as laboratory results and underlying comorbidities data from the Medical Information Mart for Intensive Care-IV (MIMIC-IV) collected from the first 48 hours of ICU admission. To predict outcomes the study utilized several ML methods, Logistic Regression, Random Forest and XGBoost. We analyzed data from 16,994 septic patients who were admitted into the ICU. Model performance was assessed using a combination of cross-validation and test set metrics, including ROC AUC, average precision, and accuracy. TheXGBoost model demonstrated the best overall performance, achieving a test ROC AUC of 0.86, average precision of 0.76, and accuracy of 80%. Logistic Regression and Decision Tree models showed moderate predictive capability with lower AUC and precision scores. The results confirmed that ensemble methods, particularly XGBoost, offer superior performance in predicting ICU mortality among sepsis patients in contrast to commonly used traditional scoring tools like SOFA. These findings highlight the potential of machine learning to aid in early risk stratification and informed clinical decision-making in the management of sepsis.

Gentile, Ellen

M.S. Data Science

Effects of Different Types of State Abortion Laws on Maternal & Infant Mortality Rates

The United States has the highest infant & maternal Mortality rates of any comparable developed nation. Amongst black women and infants these rates are even higher. States with restrictive abortion laws have higher mortality rates. There are many types of abortion laws. However, studies on the association of abortion laws with infant and maternal mortality have typically been done based on an index of state restrictiveness, rather than based on specific laws. In this study we examine the association of specific types of abortion laws with maternal, infant and combined (maternal + infant) mortality rates, as measured per 1,000 births. The data for the study were gathered and joined from 18 independent sources, including various datasets within the CDC WONDER Database, U.S. Census and LawAtlas.com. Mann Whitney U tests were conducted for each law and outcome to assess univariate association. For multivariate studies, several regression models including Linear Regression, Lasso, Ridge, Random Forest Regression and Linear Mixed Effects models were conducted using an 80/20 train-test split with 5-fold cross validation and feature selection methods integrated into their pipeline. Model fit was assessed based on r-squared values and root mean squared error. Finally, the Mixed Effects Linear Model was used to determine the significance and effect size of predictors and confounders based on p-values and coefficients. In the univariate analysis, all types of laws significantly increased maternal, infant and combined mortality except for bans6 weeks after a woman’s last menstrual period (LMP), which was only associated with a significant increase in maternal mortality and bans 7-14 weeks LMP which had no association with any of the outcomes. The multivariate analyses all fit well (r-squared >0.7) to the combined and infant mortality rates, but poorly to the maternal mortality rates. This indicated that the combined mortality fit was likely driven by infant mortality. The mixed effects linear model proved that the only law that was significantly increased infant and combined mortality rates after accounting for covariates was banning abortions between 15-20 weeks LMP (infant: B = 0.435, p=0.026, combined: B=0.494, p= 0.012). Significant covariates included natural log of the percent of births paid for by private insurance (infant: B=2.157, p=0.015, combined: B=2.061, p=0.019), natural log of the percent of births to mothers less than 19 years old (infant: B=1.499, p=0.044, combined: B=1.466, p=0.047), natural log of the percent of birth paid by self-pay (infant: B=0.380, p=0.007, combined: B=0.380, p=0.007), percent of births to minority mothers(infant: B=0.041, P<0.001, combined: B= 0.040, p<0.001), average interval since last other pregnancy outcome (infant: B= -0.066, p=0.019,combined: B=-0.065, p=0.02). It is possible that bans in the 15-20thweek LMP are associated with higher infant mortality because this is approximately the timeframe when a mother can determine if her growing baby has a fetal abnormality. The inability to terminate pregnancies that are ultimately not viable may be resulting in higher infant mortality rates. Further study should be conducted to examine if this is, in fact, occurring.

Whittenbarger, Noah

M.S. Biomedical Data Science

Vowel-Based Estimation of Upper Airway Area

Accurate assessment of upper airway (UA) dimensions is critical for understanding its functional dynamics and addressing clinical challenges such as surgical planning and airway management. This study explores vowel articulation as a novel approach to UA evaluation, using MRI imaging and acoustic analysis to address limitations of existing tools. The objectives were to (1) examine how obesity affects UA area changes during vowel articulation and (2) predict MRI-based UA dimensions using acoustic features.  Results revealed significant differences in UA area variation between high and low BMI groups during vowel articulation, suggesting obesity-related limitations in tongue movement.   Leveraging acoustic features, this study developed machine learning models to estimate MRI-based UA dimensions, demonstrating the feasibility of using vowel articulation as a non-invasive assessment technique. These advancements pave the way for integrating vowel-based UA evaluation into clinical workflows, offering a cost-effective and scalable alternative to traditional imaging in diverse healthcare settings. 

Invest in Knowledge

With Your Support We Can Change the World.