Student Capstone Projects
The capstone is the culminating project for each student in the M.S. Data Science and M.S. Biomedical Data Science programs. The comprehensive, real-life industry-type projects are oriented toward the student’s domain of interest.
Each project includes: formulation of a question to be answered by the data; collection, cleaning and processing of data; choosing and applying a suitable model and/or analytic method to the problem; and communicating the results to a non-technical audience.

Robinson, Gina
M.S. Data Science
DEFIN’D: Examining the Efficacy of Data-Driven Digital Recruitment Strategies for Clinical Trials in Attracting Candidates from Diverse Backgrounds
Clinical trials play a vital role in advancing medical research and enhancing patient outcomes. Nevertheless, the recruitment of diverse participants for these trials remains a significant challenge. The objective of this study is to assess the efficacy of digital recruitment strategies in attracting candidates from diverse backgrounds for clinical trials. To achieve this, the study will conduct a comprehensive review of existing literature on digital recruitment efforts in clinical trials, with a particular emphasis on diversity considerations. Additionally, a pre-collected dataset consisting of diverse digital recruitment campaigns and their outcomes will be utilized, supplemented with data from the US census. The analysis will primarily focus on key metrics such as the number of recruited participants, demographic information, recruitment channels and outreach, participant engagement, and participant retention rates. By examining these data points, the study aims to identify trends and patterns pertaining to the effectiveness of digital recruitment strategies in enrolling participants from diverse backgrounds. Preliminary findings indicate that digital recruitment strategies have the potential to reach a broader audience and attract participants from diverse backgrounds compared to traditional recruitment methods. However, several factors were found to influence the effectiveness of these strategies, including the selection of appropriate digital platforms, targeted messaging, and cultural sensitivities. By identifying the strengths and limitations of digital recruitment strategies, this study aims to provide valuable insights and recommendations for optimizing future clinical trial recruitment efforts. The findings will inform researchers, pharmaceutical companies, and clinical trial coordinators on the best practices for designing inclusive digital recruitment campaigns that effectively engage candidates from diverse backgrounds.

Mann, Aleesa
M.S. Data Science
The Past and Future of Global Human Rights Discourse: Analysis and Predictive Modeling Using UN Roll Call Data
The United Nations (UN) is a global intergovernmental organization that convenes member states on issues of international peace and security. While its declarations and activities are non-binding, one of its important actions is to adopt, by vote or by consensus, resolutions that reflect the opinion of a majority of member states among the UN’s general assembly or subsidiary bodies. In this way, the UN plays an important and highly visible role in setting the tone for global policy discourse. In this paper, we look at the UN’s historical position on human rights issues via roll call votes and archival data from the UN digital library. Through analysis, we will provide an overview of thematic trends and voting patterns regarding human rights resolutions put before the UN. This information will then be used to develop a predictive model for voting outcomes on future resolutions put before the UN. Understanding these patterns and approximating future voting outcomes can provide critical insights to inform diplomatic and international policy strategies by political actors across the world.

Linney, Tara
M.S. Data Science
Assessing Water Quality in Schools Around the World
Water quality is an important issue to address in schools around the world. Access to clean and safe water for drinking, hygienic, and waste purposes is essential for the health and well-being of students in our schools. Issues with water quality can pose serious risks to the health of students, potentially leading to illnesses which hinder a student’s educational outcomes. There is an adequate amount of water quality in schools research out there, but there has yet to be a study that compares water quality across countries located on different continents. This research proposal seeks to address this global issue by conducting a systematic assessment of water quality in schools located within a variety of specific regions around the world. Data from the WHO/UNICEF Joint Monitoring Programme for Water Supply, Sanitation and Hygiene (JMP) will be used to find and assess water quality over time.
Hannah, Andrea
M.S. Biomedical Data Science
Applying Machine Learning to Ovarian Cancer Predicting Biomarkers
Identifying biomarkers that predict patient’s risk for Ovarian Cancer is a key factor in the fight to improve survival rates. Ovarian Cancer is a group of diseases that originate in the ovaries, fallopian tubes or peritoneum. Ovarian Cancer is best treated at its earliest stages when it is most treatable. Therefore, early screening and diagnosis is key to successfully treating or curing the disease. This study will use heatmap visualization, pearson correlation coefficient method, scatterplot visualizations, logistic regression, and existing literature to determine the best biomarkers of importance in comparison with elevated CA125 levels importance identified include Age, Menopause, Human Epididymis Protein 4 (HE4), Alkaline Phosphatase (ALP), and Calcium. Preliminary analysis shows variables of interest, except HE4, correspond with elevated CA125 levels and would be biomarkers to play closer attention to in predicting ovarian cancer with machine learning models. To optimize performance of the prediction model, removal of non-biomarkers, Age and Menopause, is necessary. Menopause is a nominal category that could still decrease performance even if its cleaned and converted to numeric form.

City, Brittany
M.S. Data Science
A Technology Career Recommendation System Based on Personality, Skills, and Interests
With the rapid expansion of technology, there is an increased demand for individuals with technology skills, leading to an interest in technology careers. Despite the abundance of technology job opportunities, many individuals struggle to identify which technology career field is best suited for their skills, interests, and personality. This lack of clarity can lead to high job turnover rates, low job satisfaction, and lack of productivity in the workplace. The research proposal aims to develop a career recommendation system based on key personalities, skills, and interests using machine learning algorithms to suggest viable technology career decisions. The research will analyze and develop a predictive model and recommendation system based on the personalities, technical skills, and interests collected through a survey.

Taylor, Shara D.
M.S. Data Science
Mic Check: The Evolution of Lyricism in Hip-Hop
Since its inception in the South Bronx in 1973, Hip-Hop has grown into one of the most widespread cultures around the world. The four original elements of Hip-Hop—DJ’ing, MC’ing (rapping), breakdancing, and graffiti—form the foundation for what the Hip-Hop lifestyle entails. Rapping quickly becomes the predominant expressive form of the culture. Over its first 50 years, rap has introduced the world to some of the most gifted songwriters in history. Rappers are judged based on their lyrical prowess. This includes flow, cadence, vocabulary, rhyme schemes, similes, metaphors, and other literary devices. This study will examine the lyrical complexities and content presented by selected rappers over the first 50 years of the genre’s existence. How have lyrics changed over time? What have been the most common topics among these selected rappers? How do topics differ by gender, region, and decade?

Sanders, Wajehah
M.S. Biomedical Data Science
Predictors of Maternal Mental Disease: An Analysis of Perinatal Depression
The World Health Organization (WHO) defines maternal mental health as “a state of well-being in which a mother realizes her own abilities, can cope with the normal stresses of life, can work productively and fruitfully, and is able to make a contribution to her community” [1]. In the United States, mental health conditions are the most common complications of pregnancy and childbirth, and suicide and overdose combined are the leading cause of death for new mothers [2]. Depression is one of the most common serious complications of pregnancy [3]. According to a 2020 report by the Centers for Disease Control and Prevention, (CDC) 1 in 8 women report symptoms of depression after giving birth. Twenty percent of women were not asked about depression during a prenatal visit and over half of pregnant women with depression were not treated [4]. There is a wide range in prevalence estimates across studies specific to depression in the perinatal period. One meta-analysis of 59 studies reported a prevalence of thirteen percent for postpartum depression [5]. Another review detailed the prevalence of major and minor depression in the United States is 8.5 percent –11 percent during pregnancy and 6.5 percent–12.9 percent during the first postpartum year. The intent of this research is to identify predictors associated with Maternal Mental Health (MMH) in particular, factors associated with maternal depression and anxiety. Data used in this study was sourced from The National Institutes of Health, All of Us Research Program.

White, Ph.D., Clarence
M.S. Data Science
Evaluating Factors That Contribute to Substance Use and Co-occurring Mental Health Disorders
Substance abuse continues to be heavy social and medical burdens. Many misused drugs can alter a person’s thinking and judgment, leading to health risks, including addiction, impaired driving, and infectious diseases. Substance use disorder (SUD) affects more than 8% of people in the United States at some point in their lives. Prescription opioids, marijuana, psychostimulants like cocaine and methamphetamine and alcohol are the most commonly abused substances in the United States. As the active addiction grows more serious, its social impact on the community expands exponentially in a multitude of ways. Abused drugs act to increase the dopamine in reward regions of the brain. A protein called dopamine transporter helps to clear the dopamine released to restore dopamine homeostasis. Additionally, individuals who experience a substance use disorder during their lives may also experience a co-occurring mental health disorder or vice versa.