-
Notifications
You must be signed in to change notification settings - Fork 1
3 Project Proposal
souribe edited this page Nov 5, 2017
·
20 revisions
Table of Contents: Project Description | Technical Description | Logistics
- The goal of our project is to use data science in order to uncover patterns in the data that we collect to find if restaurants in affluent areas get better reviews and ratings. The term “affluent” for our description will be based on census data using household median income in order to find places in which these areas could play a factor in restaurant prices and ratings. The area we will be focusing on is in and around the Seattle area as we know that there is a mix of socio-economic zones. Based on our previous knowledge, we have assumed that more affluent areas tend to have more higher-end restaurants, and area’s that are not as wealthy have will have more restaurants with a more affordable price range. However, even with us knowing about the price of of these restaurants, our research will mainly focus on the differences between the reviews and ratings of restaurants in different communities. The data we will be using comes from the US census data through the seattle government website. The data will have information on all census tracts in which we will extract city borders and calculate median household incomes. Using this income data, we can create a clear distinction between affluent and non-affluent areas. To obtain the restaurant data, we will be using the Yelp Api found on their website. With this API, we can extract information on Seattle restaurants in different areas and also extract their ratings and reviews so that we can try to make a connection between the restaurant location, its current reviews and ratings and it’s surrounding economic status.
- Restaurant-goers and foodies are who we believe are our key stakeholders, and there are several reasons. First of all, comparing with other people, they care more about the food quality and popularity. Online platform, such as Yelp, has become the most popular app where they could proactively search for these information, besides asking friends around for recommendations. However, with the problem assumption that restaurants in affluent areas get better reviews and ratings, they might not get the most correct information about restaurants ratings. Our data science project can help them access to better-evaluated data on restaurants ratings by extracting information on restaurants and see if there’s correlation between its reviews, locations and socio-economic status.
- The main goal of our project is to support our audience make better decisions when they use any business-review. Specifically, users can understand that with our project, they’ll be able to distinguish there’s a biased difference between restaurants in different areas, if our assumption is proved correct. So for foodies and restaurants goers who care about restaurant qualities based on online platform, they’ll be more carefully making their decisions after using those platforms, such as Yelp, Google reviews, etc.
- There are 3 steps we use to measure and answer this goal. Our first goal is to get the dataset of median household income of different locations from Census.gov. By accessing these information, we can have a general understanding between affluent and non-affluent areas, and thus help us locate restaurants in those areas later in the next sub-goal. Secondly, we’ll extract rating, price and reviews from the online platform we’re using, which is Yelp. By accessing the restaurant's’ data from Yelp, we’ll be able to filtered restaurant information only related with areas we did research on from Census.gov, which is Seattle region. Our last subgoal is to make the connections between location information and resturants data, in order to analyze if there’s a correlation between restaurants ratings and their locations. For example, we made an assumption that restaurants in affluent areas get better reviews and ratings. With the last question, we’ll be able to discover if there’s this pattern and prove or disprove our assumption and these further develop our data science project based on the discovery.
- The final project will be a Shiny app. Since Shiny is a web application framework, we are able to create a more interactive, dynamic display of our findings. It will increase the user experience and it will be able to create a bigger opportunity for the audience to view the data in multiple perspectives, versus a static html page. In addition, Shiny allows us to publish our apps online which will make displaying our final product easier. Our project has mainly two types of data: (1) the restaurant’s ratings and locations, and (2) the socio-economics of communities. We already faced some challenges when trying to retrieve restaurant’s ratings and locations since Yelp and TripAdvisor’s main API closed down to the public. However, we were able to find a work around and is now able to continue to use Yelp as a source of information. The next challenge is obtaining census data of the socio-economic of communities. We would have to not only define what socio-economics mean, but also define the borders of communities. This leads into our last challenge, which is that using the restaurant’s location and knowing which communities the restaurant is a part of.
- We all will need to learn how to gather the appropriate data within the scope of Seattle, WA. There are different definitions of how we want to split up the neighborhoods into different socioeconomic and that will influence the results we get. In order to split them up, we will need to understand the most reasonable, and consistent way to split the location. The hardest challenge we will need to overcome is figuring out how we will determine which neighborhood the restaurant is in based on the latitude and longitude of the restaurant. We will be conducting our own metrics to define the restaurant’s ratings based on their current ratings and number of reviews. We want to do this because this formula will include different factors that contributes to a restaurant’s overall review rather than comparing based on one factor. This is how we are going to model our approach.