Skip to content

minadreamer/California-Property-Insights

Repository files navigation

Zillow Prize: Zillow’s Home Value Prediction (Zestimate)

Introduction:

“Zestimates” are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. And, by continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning.

I am aiming to answer the following questions: Question 1: Has houses got bigger over the years in California?

Question 2: Is there seasonality in the transactions?

Question 3: Where are the underestimated/overestimated houses located?

Question 4: Where are the historic houses located?

By using Linear Regression: Predict the Logerror.

Methodology:

Step 1: To understand the data provided by looking at summary statistics alongside given data definitions, to make sense of what each data series (column) is representing, whether it's continous, binary or categorical, what values it takes, whether its affected by duplicates and nulls.

Step 2: Based on the finding from previous step, we will clean the data, replacing, converting or deleting Nulls, duplicates as appropriate.

Step 3: Based on the given data, and with the ultimate aim of predicting house price in mind. We ask a few questions that may review interesting trends or useful insights. We plan to use visulisation to help answer the questions.

Step 4: We use visulisation to further study the distribution of potential key independent variables as indicated by correlation matrix. We will also check for a linear relationship between independent variables and the dependent variable via scatter plots. We will also be looking for any obvious data outliers in the graphs.

Step 5: Predictive modelling fitting and evaluation. Including checking of the model assumptions.

Step 6: Interpret the model, give examples of how the x variables relate to y variable, and how we can use it to predict y.

Step 7: Further insight into data knowing limitations of model.

About

California Property Insights

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published