In collaboration with two peers, I engaged in a multifaceted project encompassing data wrangling, exploration, analysis, and machine learning in Python Spark. Our objective was to create predictive models for patient ICU admission and immunosuppression risk using a substantial COVID-19 dataset provided by the Mexican government. My specific role involved applying and fine-tuning regression analysis techniques in PySpark, encompassing model development, training, and rigorous testing.
The outcome was remarkable, with my logistic regression model achieving a 91.3% accuracy for ICU admission predictions and an outstanding 95.8% accuracy for identifying immunosuppressed individuals. Our presentation included comprehensive prediction analysis, actionable recommendations, and explanations of key metrics like precision, recall, and area under the ROC curve. We encountered challenges, such as shifting our focus from intubation prediction due to low correlation with immunosuppressive disorders, demonstrating our adaptability and commitment to robust results.
In a group project, two students and I performed data cleaning in python on three datasets (approximately about 27,000 values) collected from a United Kingdom open-source database with a small team to conduct extensive exploratory analysis. Utilizing matplotlib and seaborn to observe and create visualizations, we identified patterns significant to gender pay disparities across a wide range of UK companies and industries from 2018 to 2021.
After research and analysis, we produced a comprehensive report that meticulously documented the code, analysis methodologies, and actionable conclusions, providing a solid foundation for further research and policy-making initiatives.
See Final Report with Code and Visualizations.
Independently performed simple data exploration and data visualization to create a visually compelling story about the gender pay disparity in the United Kingdom for 2021 using R and data visualization techniques, such as ggplot2, dplyr, and a geocode API.
For the end-of-the-year python project, students were tasked to produce a novel program in Jupyter Notebook utilizing programming skills demonstrated since the beginning of the semester. Requirements for this project included novelty, extensive lines of code (around 300 lines), coding techniques such as APIs, functions, for loops, widgets, and more to create a program that had purpose and functionality. To emphasize the skills I learned with Python and Jupyter Notebook I decided to create a program on a subject I was interested in: Music. To add more novelty I narrowed my focus to Korean Pop and Korean Pop fans as the niche and target of this project.