Student Capstone Projects

The capstone is the culminating project for each student in a SACS Master of Science program. The comprehensive, real-life industry-type projects are oriented toward the student’s domain of interest.

Each project includes: formulation of a question to be answered by the data; collection, cleaning and processing of data; choosing and applying a suitable model and/or analytic method to the problem; and communicating the results to a non-technical audience.

Megan Leak

Megan Leak

MS. Biomedical Data Science

Spatiotemporal Analysis of Malaria Incidence and Mortality: Integrating Environmental, Climate, and Socioeconomic Drivers

This study examines the spatiotemporal patterns and key determinants of malaria burden across African countries using a comprehensive country year panel dataset covering the period 2000–2024. Given the near elimination of malaria in other regions, the analysis focuses on 45 countries within the World Health Organization (WHO) African Region, where malaria remains a major public health concern. The dataset integrates malaria incidence and mortality with a range of climatic, socioeconomic, and health system indicators, providing a robust basis for long-term analysis.

To address missing data, a structured imputation strategy combining forward fill, backward fill, and mean imputation was implemented, ensuring temporal consistency and completeness of the dataset. A forward-chaining time-series framework was adopted to preserve temporal integrity, with models trained on data from 2000–2019 and evaluated in 2020–2024 observations. Both traditional panel regression models and advanced machine learning methods including Extreme Gradient Boosting (XGBoost), LightGBM, and CatBoost were applied to capture linear and nonlinear relationships.

Model performance was evaluated using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R², while SHapley Additive exPlanations (SHAP) were employed to interpret model outputs and identify the most influential predictors. Results reveal pronounced geographic disparities, with countries in West and Central Africa consistently exhibiting the highest malaria burden, while several countries maintain near-zero incidence and mortality.

Despite ongoing global control efforts, malaria burden patterns have remained relatively stable over time, with some countries experiencing increasing trends, indicating emerging challenges. Overall, this study highlights the effectiveness of integrating machine learning with interpretable AI to enhance malaria forecasting and provides valuable insights to inform targeted and context-specific public health interventions across Africa.

Two photos of a woman with long hair, smiling, wearing glasses and a jean jacket.

Charleston Lee

M.S. Data Science

Investigating Criminal References in Rap Lyrics: A Data Science Approach

Rap music has long served as a platform for artists to express their realities, often exploring themes of crime and street life. This study employs data science methodologies to identify and analyze potential references to criminal activities within rap lyrics. Leveraging natural language processing techniques, this study analyzes a curated dataset of rap lyrics spanning the years 2000-2023, sourced from a diverse range of music labels, certifications, and artists, encompassing modern slang associated with criminal behavior while ensuring ethical data collection practices.

Through sentiment analysis, topic modeling, and named entity recognition, the aim of this study is to quantify and contextualize the prevalence of criminal references in rap songs during this time period. Additionally, this study investigates the relationship between lyrical content and commercial success by examining the impact of identified themes on the performance of songs, measured through RIAA certifications from 2000-2023.

Furthermore, based on these linguistic insights, a chatbot is developed that is equipped with a comprehensive understanding of contemporary slang terminologies related to crime. This chatbot enables interactive engagement and discourse on pertinent subjects within the rap genre, facilitating broader discussions about cultural representations in music and industry influences.

This interdisciplinary approach not only advances data science methodologies but also provides valuable insights into the portrayal of societal realities within artistic expression and its reception in the music industry across different labels and time periods.

Collin Lindsay

Collin Lindsay

MS. Biomedical Data Science

Predicting Global and Regional Tuberculosis Trend: An Artificial Intelligence-Based Analysis of TB Risk Factors

Tuberculosis (TB), caused by Mycobacterium tuberculosis, remains a leading global cause of infectious disease mortality, with 1.25 million deaths reported in 2023, including 161,000 among people living with HIV (PLHIV) (1,4). TB disproportionately affects vulnerable populations, and individuals with HIV face a 16-fold increased risk of developing active TB due to immunosuppression (4,5). The dual burden of TB and HIV continues to pose a major global health challenge.

This study investigates the global determinants of TB and TB-HIV burden by integrating epidemiologic and socioeconomic data from the World Health Organization and World Bank across 194 countries from 2015 to 2023. Exploratory data analysis revealed highly skewed distributions and significant geographic disparities, with the highest burden concentrated in Sub-Saharan Africa. Supervised machine learning regression models were developed to predict TB incidence, TB mortality, TB-HIV incidence, and TB-HIV mortality. A time-based train–validation–test split was used to preserve temporal structure. Tree-based ensemble models achieved strong predictive performance, with Extra Trees yielding test R² values of 0.8805 for TB incidence, 0.9067 for TB mortality, and 0.9545 for TB-HIV incidence, while Random Forest performed best for TB-HIV mortality (R² = 0.8719).

Feature importance analysis showed that TB burden is driven by socioeconomic conditions, healthcare access, and HIV prevalence, while TB-HIV outcomes are primarily influenced by HIV epidemiology. These findings emphasize the need for integrated strategies addressing both structural determinants and HIV-related risk factors.

Two images of a woman with a smile, one with a neutral background and one with a white background.

Tara Linney

M.S. Data Science

Assessing Water Quality in Schools Around the World

Water quality is an important issue to address in schools around the world. Access to clean and safe water for drinking, hygienic, and waste purposes is essential for the health and well-being of students in our schools. Issues with water quality can pose serious risks to the health of students, potentially leading to illnesses which hinder a student’s educational outcomes. There is an adequate amount of water quality in schools research out there, but there has yet to be a study that compares water quality across countries located on different continents. This research proposal seeks to address this global issue by conducting a systematic assessment of water quality in schools located within a variety of specific regions around the world. Data from the WHO/UNICEF Joint Monitoring Programme for Water Supply, Sanitation and Hygiene (JMP) will be used to find and assess water quality over time.

Woman in black shirt and black pants, smiling, standing in front of a wall.

Lexius Lynch

M.S. Data Science

Exploratory Analysis of Alzheimer’s Disease: Unraveling the complexities of single cell RNA sequence Data

Single-cell biology is a field that focuses on understanding human health and diseases at the cellular level, with a particular emphasis on precision medicine. Identifying specific cell types in major brain disorders is a critical area of research. However, the complex cellular architecture of the brain, which consists of a diverse set of cell types, makes it challenging to determine the primary pathological cell type for a particular disease.

Recent studies have used single-cell RNA and expression-weighted cell type enrichment to identify specific neuronal cell types associated with brain disorders, such as Alzheimer’s disease. Sc-RNA is a powerful technology that allows the analysis of a large number of individual cells. These studies have revealed statistically significant enrichment of certain neuronal cell types in the context of these disorders, providing valuable insights into the differentially expressed genes as well as cell signaling pathways critical to the understanding of variants associated with brain diseases.

headshot of Michella Maddox-McGhee

Michella Maddox-McGhee

M.S. Data Science

Predictive Modeling of Angular Insertion Depth in Lateral Wall Cochlear Implant Electrodes

Bio

Michella Maddox-McGhee is a Master’s student in Data Science at Meharry Medical College, expected to graduate in Spring 2026. She earned her Bachelor’s degree from Fayetteville State University and has a strong foundation in data analysis, machine learning, and research. Her work focuses on applying data-driven methods to real-world healthcare challenges, including her capstone on improving surgical outcomes for cochlear implant patients.

Abstract

Cochlear implant (CI) surgery is considered to be an effective procedure for patients with severe hearing loss. Accurate prediction of angular insertion depth (AID) can optimize cochlear implant outcomes and preserve residual hearing. This study presents a machine learning approach using computed tomography (CT) data from 86 patients, incorporating features such as Base Insertion Depth (BID), cochlear scale, electrode array length, and diameter. A Support Vector Machine (SVM) based model was developed that achieved a mean absolute error of 27° and a standard deviation of error of 36°. Compared to the previously developed linear regression method (standard deviation of error: 41°, and the mean absolute error: 32°), the proposed model demonstrates improved predictive performance. The results indicate that the proposed model could be used as a tool for patient-customized preoperative surgical planning.

Advisor: Mohammad M. R. Khan, Ph.D.

Woman in black shirt smiling, arms crossed.

Aleesa Mann

M.S. Data Science

The Past and Future of Global Human Rights Discourse: Analysis and Predictive Modeling Using UN Roll Call Data

The United Nations (UN) is a global intergovernmental organization that convenes member states on issues of international peace and security. While its declarations and activities are non-binding, one of its important actions is to adopt, by vote or by consensus, resolutions that reflect the opinion of a majority of member states among the UN’s general assembly or subsidiary bodies. In this way, the UN plays an important and highly visible role in setting the tone for global policy discourse. In this paper, we look at the UN’s historical position on human rights issues via roll call votes and archival data from the UN digital library. Through analysis, we will provide an overview of thematic trends and voting patterns regarding human rights resolutions put before the UN. This information will then be used to develop a predictive model for voting outcomes on future resolutions put before the UN. Understanding these patterns and approximating future voting outcomes can provide critical insights to inform diplomatic and international policy strategies by political actors across the world.

Woman in maroon shirt and glasses, smiling.

Micaiah McDonald

M.S. Biomedical Data Science

Exploring the Impact of Neighborhood Environment, Food Insecurity, Discrimination, and Social Support on Mental Health Among People Who Use Marijuana

This study examines the impact of Social Determinants of Health (SDoH), including neighborhood environment, food insecurity, discrimination, and social support, on mental health outcomes, specifically depression and anxiety, among individuals who use marijuana. Using data from the NIH All of Us Research Program, which works to improve health care through research. The All of Us Research Program is building a diverse database that can inform thousands of studies on a variety of health conditions.

The research focused on participants who completed the SDoH and lifestyle surveys, where marijuana use was self-reported. Electronic Health Records (EHR) were used to identify participants diagnosed with mental health conditions, including depression and anxiety, using ICD-10 codes. Variables such as neighborhood conditions (cleanliness, noise, graffiti), food insecurity (binary indicator), discrimination (experiences of inequitable treatment) and perceived social support were extracted from the surveys. This analysis also took into account demographic factors such as age, race, gender, education, marital status, and income. To explore how these factors are related to mental health outcomes, logistic regression models were used for statistical analysis.

The study included 7,519 participants, with 51% reporting a prior diagnosis of depression and 54% reporting anxiety before completing the survey. The findings of this study showed that both food insecurity and discrimination were significant factors influencing depression and anxiety. Social support was a protective factor, which means that greater social support will reduce both the diagnoses of depression and anxiety. In addition, people with lower education levels were at an increased risk of being diagnosed with anxiety and people with lower income have a higher likelihood of a diagnosis of depression.

Overall, this study highlights the role that social determinants play in shaping mental health outcomes such as depression and anxiety. These results underscore the importance of addressing health disparities in social support and income through targeted interventions to help reduce mental health burdens in diverse populations.

The study will also offer valuable information on how these social factors influence mental health and points to a key area for future research and intervention, in particular to develop public health strategies that will help equip both individual and broader systemic causes of mental health challenges.

Invest in Knowledge

With Your Support We Can Change the World.