Student Capstone Projects

Explore

The capstone is the culminating project for each student in a SACS Master of Science program. The comprehensive, real-life industry-type projects are oriented toward the student’s domain of interest.

Each project includes: formulation of a question to be answered by the data; collection, cleaning and processing of data; choosing and applying a suitable model and/or analytic method to the problem; and communicating the results to a non-technical audience.

Ange Rukundo

M.S. Data Science

Diesel Prices Trends Analysis and Forecasting with Machine Learning on State and National Levels in U.S. (2020 – 2024)

Diesel fuel prices in the U.S. saw dramatic changes between 2020 and 2024, driven by a mix of global events, shifting regulations, and supply chain disruptions. These price swings had wide-reaching effects—raising transportation costs, straining businesses, and putting extra pressure on consumers, especially in fuel-dependent industries. This project overviews the trends behind these changes, looking at how factors like crude oil prices, geopolitical tensions, and environmental policies have shaped regional diesel price patterns. Using machine learning models like Linear Regression and Gradient Boosting Regressors, the study forecasts diesel prices into 2025, offering a data-driven approach to help businesses and different stakeholders plan ahead. The results point to continued price volatility, with significant differences across U.S. regions. By blending historical analysis with predictive tools, this research aims to support smarter, more resilient decision-making in an uncertain energy landscape.

Wajehah Sanders

M.S. Biomedical Data Science

Predictors of Maternal Mental Disease: An Analysis of Perinatal Depression

The World Health Organization (WHO) defines maternal mental health as “a state of well-being in which a mother realizes her own abilities, can cope with the normal stresses of life, can work productively and fruitfully, and is able to make a contribution to her community” [1]. In the United States, mental health conditions are the most common complications of pregnancy and childbirth, and suicide and overdose combined are the leading cause of death for new mothers [2]. Depression is one of the most common serious complications of pregnancy [3]. According to a 2020 report by the Centers for Disease Control and Prevention, (CDC) 1 in 8 women report symptoms of depression after giving birth. Twenty percent of women were not asked about depression during a prenatal visit and over half of pregnant women with depression were not treated [4]. There is a wide range in prevalence estimates across studies specific to depression in the perinatal period. One meta-analysis of 59 studies reported a prevalence of thirteen percent for postpartum depression [5]. Another review detailed the prevalence of major and minor depression in the United States is 8.5 percent –11 percent during pregnancy and 6.5 percent–12.9 percent during the first postpartum year. The intent of this research is to identify predictors associated with Maternal Mental Health (MMH) in particular, factors associated with maternal depression and anxiety. Data used in this study was sourced from The National Institutes of Health, All of Us Research Program.

Uma Sarder

M.S. Data Science

Impact of Socioeconomic Status on Mental Health Disorders among Pregnant Women Using NIH All of Us Data

Maternal mental health (MMH), particularly depression and anxiety are crucial to the well-being of pregnant women and affecting 1 in 5 pregnant individuals every year in the U.S. Socioeconomic status (SES) is increasingly recognized as a key factor influencing MMH outcomes. Understanding the complex interplay between socioeconomic status and MMH is crucial for developing effective preventive measures and treatment strategies aimed at reducing maternal and fetal mortality. Our study utilized data from the All of Us research program to examine the relationship between SES factors and MMH, especially depression and anxiety. We further developed predictive models for early prediction of MMH based on socioeconomic factors. Our findings demonstrate the potential of statistical and machine learning approaches to uncover the risk factors and enhance early detection strategies, which could contribute to improve the maternal and fetal health outcomes.

Jacquese Starling

M.S. Data Science

Unraveling the Complexities of Employee Retention: A Comparative Analysis of Logistic Regression and Random Forest Models

Employee attrition is a critical challenge facing organizations, with significant impacts on productivity, morale, and the bottom line. This study employed a rigorous, data-driven approach to uncover the key drivers of employee turnover within a specific organizational context.

Using a combination of Logistic Regression and Random Forest models, we analyzed a comprehensive dataset of employee records and demographic information. The findings revealed a multifaceted set of factors contributing to attrition, including work-life balance, compensation, career development opportunities, and manager-employee relationships.

By leveraging these data-driven insights, the study provides a roadmap for targeted interventions and talent management strategies. The results underscore the power of integrating people analytics and business acumen to foster work environments that prioritize employee engagement, well-being, and long-term retention.

This research offers valuable guidance for organizational leaders seeking to transform their approach to talent management, ultimately driving sustainable success and positively impacting the lives of their workforce. The study’s methodology and findings contribute to the growing body of knowledge on evidence-based human capital management practices.

Brian Strong

M.S. Data Science

Privacy-Aware Academic Risk Intelligence System (ARIS)

Bio

Brian C. Strong is a master’s student in the Master of Data Science program at Meharry Medical College School of Applied Computational Sciences and a graduate of Morehouse College. His academic and professional interests center on ethical, fair, and privacy-preserving data science systems, with a focus on educational equity and high-stakes decision-making. Brian’s work integrates machine learning, federated learning, and explainable AI to design systems that generate insight while protecting individual privacy. He plans to pursue a PhD in Data Science and is committed to building data-driven solutions that create meaningful societal impact.

Abstract

This project focuses on developing a privacy-aware, federated learning system to support personalized student support and academic decision-making. The project employs Federated XGBoost, a tree-based ensemble learning approach, to train decision tree models across distributed datasets without centralizing sensitive student information. By integrating explainable AI techniques, the system translates model outputs into actionable, human-centered recommendations that account for real-world circumstances affecting student performance. This work demonstrates how ethical, secure, and non-manipulable data science systems can improve educational outcomes while preserving data privacy, fairness, and transparency in high-stakes academic environments.

Advisor: Asmah Muallem, Ph.D.

Shara Taylor

M.S. Data Science

Mic Check: The Evolution of Lyricism in Hip-Hop

Since its inception in the South Bronx in 1973, Hip-Hop has grown into one of the most widespread cultures around the world. The four original elements of Hip-Hop—DJ’ing, MC’ing (rapping), breakdancing, and graffiti—form the foundation for what the Hip-Hop lifestyle entails. Rapping quickly becomes the predominant expressive form of the culture. Over its first 50 years, rap has introduced the world to some of the most gifted songwriters in history. Rappers are judged based on their lyrical prowess. This includes flow, cadence, vocabulary, rhyme schemes, similes, metaphors, and other literary devices. This study will examine the lyrical complexities and content presented by selected rappers over the first 50 years of the genre’s existence. How have lyrics changed over time? What have been the most common topics among these selected rappers? How do topics differ by gender, region, and decade?

Jean-Hus Theodore

M.S. Biomedical Data Science

Computational methods for novel antibiotic drug discovery for Caseinolytic peptidase P (ClpP)

Caseinolytic peptidase P (ClpP) is a multimeric serine protease found in many prokaryotes, as well as in the mitochondria of eukaryotic cells and in chloroplasts. In prokaryotes, ClpP is essential for maintaining protein homeostasis, and its disruption can affect the virulence and infectivity of various pathogens.1 In eukaryotes—particularly in humans—ClpP plays a crucial role in protein quality control by degrading denatured and misfolded proteins, thereby preserving the integrity of the respiratory chain and sustaining oxidative phosphorylation.2 Due to these multifaceted functions, ClpP is an attractive target for both anticancer and antibiotic drug development. Although several compounds, such as acyldepsipeptides (ADEP), have been developed to target ClpP, bacterial resistance continues to pose a significant challenge to effective antibiotic therapy.3 Here in this project, different computational drug discovery approaches will be implemented to discover novel antibiotics for ClpP. More specifically, ligand based and structure based drug discovery approaches will be utilized to discover candidate drug for ClpP. First, different chemical bioassay will be used to curate the biochemical data that is available for different ClpP gene from different species including prokaryotes and eukaroytes. These dataset will be used to train different machine learning model to predict the bioactivity of unknown compounds. Finally these models will be tested against different chemical databases including ( Enamine, Mcule)4-5 to predict the activity of unknown compound for ClpP. Our model will focuses of two approaches, such as identification of bioactive molecule for ClpP and predicting the IC50 ( biochemical descriptor) to determine the activity of a molecule. Overall, these approaches will allow to screen chemical compounds active to ClpP.

Cyruss Tsurgeon

M.S. Biomedical Data Science

Cell Census: Unlocking the Power of Artificial Intelligence for Accurate Cell Quantification

Cell counting is a fundamental task in various biological and medical research fields, providing crucial information about cellular populations in a variety of contexts. The addition of fluorescence microscopy has revolutionized cell imaging by enabling visualization of specific cell types and components with high precision and sensitivity. Thus, providing advanced techniques for distinguishing individual cells or cellular features, segregating clusters of cells by type, and even labeling distinct cells in culture or tissue section. However, the manual counting of cells in-situ is a time-consuming and subjective process prone to human error and cannot be performed from microscope images themselves. To overcome these limitations, researchers have turned to deep learning techniques, leveraging their ability to learn intricate patterns and relationships in large datasets. In this paper, we present a comprehensive approach for automated cell counting using deep learning algorithms applied to fluorescent microscopy images. We propose a novel framework that combines convolutional neural networks (CNNs) with advanced image processing techniques and statistical methods, enabling accurate and efficient cell quantification. Our method utilizes annotated training data to train the network, and subsequently employs it for automated cell counting in unseen microscopy images. We demonstrate the effectiveness and robustness of our approach through extensive experiments on diverse datasets, showcasing improved performance compared to existing methods. The proposed deep learning-based automated cell counting technique holds immense potential for accelerating research and advancing our understanding of various biological processes, while also serving as a valuable tool for diagnostic and therapeutic applications in clinical settings. In addition, we demonstrate the application of our model in various contexts including medical diagnosis, drug discovery, biological research, and environmental monitoring. With this research, we provide a foundation for future investigations in biomedical image analysis, offering new insights into the applications of deep learning in computer vision for medicine and healthcare.