Welcome to my data portfolio! Here, I provide a summary of my projects in the data field.
Project Link | Completion Date | Tools | Project Description |
---|---|---|---|
๐ข -- | -- | -- | -- |
๐ถ [--] | -- | -- |
Project Link | Tools/Strategies Used | Project Description |
---|---|---|
๐ข Allocating Shipping Data Columns | Conditional Formatting, Custom Formulae, VLOOKUP, Cell Concatenation, Data Transposing | Allocated columns of missing data to other workbooks. I transformed and concatenated the column data across using VLOOKUP and then combined the tables all into one new sheet for reference. |
Project Link | Area of Analysis | Project Description |
---|---|---|
๐ Basketball Game Ticket Data | Data Query Langauge, Data Manipulation Language | Simple exercise to showcase my ability to tackle various SQL challenges and demonstrates my proficiency in SQL query writing and problem-solving skills with a ticketing dataset. |
โพ MLB SQL Queries | Data Retrieval/Manipulation | Here I write, analyze, and optimize queries for complex datasets, such as these from Sean Lahman's website featuring Batting and People data from the MLB. (Major League Baseball) Queries include subqueries, window functions, and joins. |
Project Link | Area | Project Description | Libraries |
---|---|---|---|
๐ Can we predict if a mushroom is poisonous? | EDA, Predictive Analysis, Classifying | Here, my team and I used the UCI Mushroom Data Set to prepare, analyze, and predict which variables of mushrooms make them more likely to be inedible/poisonous. | sklearn, pandas, NumPy, matplotlib, seaborn |
๐ฅ Sentiment Analysis on Movie Reviews | EDA, Naive Bayes | My team and I used a Multinomial Bayes Classifier to determine whether a movie review had negative or positive sentiment. | pandas, BeautifulSoup, Sklearn, matplotlib, re(regex) |
โฝ๏ธ Predicting Vehicle Weight | EDA and Linear Regression | Analysis on a vehicle dataset and constructing linear regression models that predict the curb weight of a vehicle. | pandas, matplotlib, seaborn |
๐ท Cleaning a Wine Dataset | EDA and Imputing Data | An exercise where a partner and I studied a wine dataset, became familiar with domain knowledge regarding wine, studied each variable numerical and categorical, adjusted skew, normalized, and imputed values for missing values of several variables. | pandas, matplotlib |
๐ Decision Tree Vs. Random Forest on NY State Graduation Data | EDA, Supervised Machine Learning, Decision Trees/Random Forest | In this analysis, we constructed three different kinds of decision trees and random forest models based on feature importance analysis using Logistic Regression on our Boolean variables, trained them on subsets of our data, analysed their performance using confusion matrices, and chose the best one for prediction. | pandas, matplotlib, sklearn, Yellowbrick |
๐ K-Nearest Neighbors and Support Vector Machines to Predict Online Purchases | EDA, KNN, SVM | We used supervised learning methods such as K-nearest neighbors and support vector machines in Python to predict whether or not online shoppers were more willing to make a purchase. | pandas, matplotlib, seaborn |
๐ฎ Sentiment Analysis - A Machine Learning Approach into Hideo Kojima's Divisive Platformer | EDA, Naive Bayes, Feature Engineering, Natural Language Processing | Our team sought to perform sentiment analysis on Twitter tweets in anticipation for Hideo Kojima's video game release, Death Stranding, in 2019. We sourced the Tweets from two libraries, preprocessed them, stored them using MongoDB and then performed sentiment analysis. | pandas, matplotlib, pymongo, NLTK, json |
Project Link | Project Description | Dashboard Link |
---|---|---|
โพ MLB The Show 24 Dashboard | Cleansed and transformed data on Playstation baseball game MLB the Show 24. Bought and extracted data into Tableau from third-party website. Generated three workbooks with interesting baseball player insights in the game. Findings can help people choose the best players in the game. | DASHBOARD LINK 1 |
Project Link | Project Description | _ |
---|---|---|
๐ Global Sales Dashboard | Uploaded and modeled CSV sales data in a star schema fashion to then create a dashboard highlighting sales information in PowerBI. |