Predicting-Asthma-Rates-in-California

Author: Pranjal Srivastava

Date: 5/8/2022

This project aims to predict Asthma rates in California using various environmental factors such as the presence of pesticides, toxic releases, pollution burden score, PM2.5, traffic, and ozone levels. The data used for this project is extracted from the Clean_Cal_Environtmentdata.csv file, which contains comprehensive information about the environment in different regions of California. Stages in the Project

1. Data Cleaning

The project begins by cleaning the dataset and removing any missing values to ensure accuracy in predictions. This process involves using the na.omit() function and writing the clean data back to the csv file.

2. Univariate Analysis & Standard Deviation

Univariate analysis is carried out to summarize individual variables in the dataset. Standard deviations for the variables are also computed.

3. Exploratory Plots

Histograms and scatter plots of the variables are drawn to observe the distribution of data and potential relationships between variables.

4. Fitting a Model

A multiple linear regression model is fitted on the data using all the variables to predict asthma rates.

5. Diagnostics

The residuals of the model are plotted against the fitted values to identify any non-linearity or heteroscedasticity. Quantile-Quantile plots are used to verify the assumption of normality for the residuals. The presence of outliers and multicollinearity is also checked.

6. Modified Model

A Box-Cox transformation is applied to handle non-normality in the dependent variable, and a new model is fit on the transformed variable.

7. Removing Outliers

Cook's Distance is calculated for each observation in the new model to identify outliers. Any observation with a Cook's Distance greater than 4/n is considered an outlier and removed from the dataset.

8. Final Model Selection

Models are compared based on their AIC scores and the model with the minimum AIC score is selected. This model includes Ozone, Traffic, Pollution Burden Score, and Toxic Release as predictors.

Usage

To replicate this analysis, you need to run the code provided in this README in an R environment. You should also have the Clean_Cal_Environtmentdata.csv file in your working directory. The code has been written and commented in a way that allows for easy understanding and modification.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Asthma.jpg		Asthma.jpg
AsthmaRatesFinal - Group 1.Rmd		AsthmaRatesFinal - Group 1.Rmd
AsthmaRatesFinal---Group-1.pdf		AsthmaRatesFinal---Group-1.pdf
Asthma_Project_byPranjalSrivastava.pdf		Asthma_Project_byPranjalSrivastava.pdf
Clean_Cal_Environtmentdata.csv		Clean_Cal_Environtmentdata.csv
README.md		README.md
calenviroscreen40resultsdatadictionary_F_2021.pdf		calenviroscreen40resultsdatadictionary_F_2021.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting-Asthma-Rates-in-California

Date: 5/8/2022

1. Data Cleaning

2. Univariate Analysis & Standard Deviation

3. Exploratory Plots

4. Fitting a Model

5. Diagnostics

6. Modified Model

7. Removing Outliers

8. Final Model Selection

Usage

About

Releases

Packages

Pranjal-statistics/Predicting-Asthma-Rates-in-California

Folders and files

Latest commit

History

Repository files navigation

Predicting-Asthma-Rates-in-California

Date: 5/8/2022

1. Data Cleaning

2. Univariate Analysis & Standard Deviation

3. Exploratory Plots

4. Fitting a Model

5. Diagnostics

6. Modified Model

7. Removing Outliers

8. Final Model Selection

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages